The offset voltage is related with two mismatch sources (in first order) - 1) differential pair threshold voltage mismatch and 2) current mismatch in the load of diff-pair.
As long as the mosfets are without halo implants, the first component is bias independent and equal to A_VTH/sqrt(AREA).
The second one is transferred to the input by input device transconductance as Delta_Id/gm_in. Transconductance is maximized for weak inverted mosfets and close to Id/n·Vt (≈30mS·Id), so for weak inversion devices we can minimized the influence of current mismatch in diff-pair. There is also an additional factor which is related with the current gain factor mismatch (oxide thickness and mobility) and inversion level, and for strong inverted mosfets is a product of overdrive voltage and µ·Cox mismatch.
So, we can concluded, biasing diff pair close to weak inversion is a good idea for offset minimization (also for noises) however has to be a trade-off with other parameters like bandwidth.