I agree with the methodology that Opamp used, but am not sure the calcualtions are right:
99.9% settling -> 0.001 remaining ->ln(0.001)=-6.9. This means you need at least 6.9 time constants settling in 20uSec. (use 8x is okay) 1 time constant must be less than 2.5uSec.
GBW = 1/(2*pi*tau) = 64kHz (not 400kHz!)
GBW = gm/(2*pi*Cc). If you assume that the ouput of the amplifier is the dominant pole (single stage of gain), then Cc is the 20pF output capacitance. In this case, gm = 8uS.
If you assume that the 2uA current is carried in 6 equal parts (2x PMOS input devices, 2x NMOS input devices, 2x output legs), this makes each part about 300nA. In this case, the gm/I needed is about 27x, which is very difficult, even in subthreshold. Perhaps this can be done with some transconductance boosting.
However, if the current from the NMOS is diverted to the PMOS input pair when the NMOS can no longer be active due to low input voltage, and the opposite when the NMOS can no longer be active due to high input voltage, this makes 600nA available to transconductance. Now gm/I is more like 13x, which can be done if the input devices are in subthreshold operation.
Output slew rate limitation would still make 20uSec settling time difficult without an output stage. If all of the current is split 3 ways (one input NMOS device, one input PMOS device, and one output leg), this would produce only 600nA at the output. The time to slew 5V (rail to rail) would be 160uSec.
Here are some search keywords to help make things work:
To get high gm per current: "subthreshold MOS"
To boost gm to get better output results: "transconductance boosting"
To obtain rail-to-rail input range: "Complimentary differential pair"