1) flatulent is right. The differential amplifier at the bottom needs to be biased as such, therefore you need to have the MOS in saturation.
2) The top transistors work as switches where they switch the polarity of signal coming from the diff. amp. Because they are turned on and off all the way by the LO signal, you don't want to differentially bias them. The LO will determine which set is on, not a DC bias voltage. You therefore don't want them to have any Vgs-vt.
The common-mode voltage will have to be such that the bottom transistors are not driven into the linear region.
3) Assuming linearity (IP3) is not a prime concern, you need to leave enough headroom for your output signal to swing. If the circuit needs to handle an RF signal that is going to fully switch the diff. amp, then:
- Peakdrop = Rl * Isink
Vdd = (RFpeak - Vth) + Peakdrop + delta
Vdd ~= (Vgs-Vth) + Peakdrop + delta
If the commutator's (M3 to M6) resistance when fully on is much smaller then RL, then the extra delta you add is small (i.e. 100mV).
Again, this all assumes linearity is not a big issue. If it is, then you start worrying about the IP3 contributions and you need to provide a larger Vds.
Greg