Well, my way of thinking is that you changed the bias current in the output stage i.e. the pass transistor. That is the biasing when the load current is 0. You actually dropped it by 2x. Of course, I haven't done the math but this should affect the speed. This output stage is part of the loop.
Yes, if you can drive your load with just an OTA, that's perfectly fine. You are right that adding a bigger cap will decrease variations and make the OTA more stable. It is a matter of how much load current you have, how much headroom you have and so on.