In simple Async ASCII communication in baseband , there is excess bandwidth, so the Receiver generally uses a 16x clock and looks for the leading edge of a start bit then assumes the best time to sample the data once per bit is in the middle using 8 intervals of the 16x clock. THis is how Async data is synced, one byte at a time from the leading edge of the start bit 1>0
If the Stop bit is not a 1 , it is called a framing error, it goes to idle waiting for the next character's start bit or a 1 to 0 transition.
If bandwidth of the channel is not at least twice the bit rate, the probability of error or bit error rate (BER) rises rapidly. Shannon's Law proves the tradeoffs between Signal to Noise ratio (SNR) , bandwidth (BW) and Bit Error rate.
THere are methods to improve SNR by increasing bandwidth and over-sampling but if ambient noise is large, thereby reducing SNR with increasing bandwidth requirements of the channel and receiver, BER degrades. In the ideal channel SNR vs BER yields a straight line in log scales.
Synchronous channels allow deletion of start/stop bits thus save 20% of the time but extra hardware and sync words are needed to allow the Receiver to lock for long data packets and use every bit transition for clock phase error reduction with a design knowing what rate to expect.
Modulation of baseband or ASCII data uses more bandwidth but gives better timing sync error reduction, such as Bi-Phase where clock is XOR's with data used for many wired and wireless communication channels and becomes insensitive to the data patterns. If Bandwidth is expensive such as in mobile phones, time compression is performed by assigning a larger number of bits to an analog value of amplitude and phase rather than simple binary. So 12 bytes can be sent in the same BW channel with compression rates that increase with the SNR available, such as 256 levels.
So to answer your question about how many samples per bit, depends on the channel bandwidth (BW) , the Receiver BW and detector type ( integrator or centre sample) and the transmitter BW used and distortion of group delay vs f if there is bandwidth limiting in any part or special filtering added.
In the simplest case with excess bandwidth like all UARTs Data is sampled at 16x clock for start bit then only centre sampled once per bit into a register byte then repeats for next character. 16 BYTE UART buffers allow consecutive bytes without loss, otherwise some latency or delay between words are expected so the Receiver does not miss and data sent. Thus 1 start bit (always =0) 8 data bits and 1 stop bit (always=0) means 10 bits * 12 bytes * 1us per bit = 120us. up to 16 bytes, then delay depends on handshaking of the serial port for flow control or a special character of data call XON, XOFF.
1Mb/s is pretty fast so you can't really use an LED to transmit the data as it does not have the required BW due to LED diode capacitance, unless special low capacitance IR diodes are used such as found in IRDA2 specs.
Keep in mind ASCII serial ports on uC , when converted to RS232 are different voltages and also inverted polarity ( at both ends) by definition.
I have no doubt this answer will raise more questions.