deltaVt= sigmaVt/sqrt(W.L)
sigmaVt is a constant (at a given temeprature) given by the fab and depends on process (etching, device type etc)
On the other hand, the term 2.deltaVt/(Vgs-Vt) should indeed be (make the equations on your side)
gm.deltaVt/Id = gm.sigmaVt/(Id.sqrt(W.L))
Consider that gm=dId/dVgs, so dId/dVt=-gm. Then
dId/Id=-gmdVt/Id
Up to you to continue.
For a fixed L (determines max operating freq) and fixed Id (you design for a fixed current), gm/((Id.sqrt(W.L)) increases as Vgs-Vt increases. All books make a mistake here. Go to spice, choose a model that describes the transistor behavior in a continuous way (EKV for instance) and plot gm/((Id.sqrt(W.L)) for a fixed L and ID (you must change W).
Don't believe books without thinking by yourself. All traditional books were developed in base to the equation
Id=0.5.µ.Cox.(W/L)(Vgs-Vt)^2
This equation is only valid in strong invertion. Take a look at the book of Tsividis on MOS modeling to see what I'm talking about.
You must also know that terms depending on "sigmas" mean random mismatch. On the other hand, temrs including "lamdas" mean systematic mismatch. Do you have access to Sansen's book on analog design? It is just to see what it is random and what systematic. This book also makes the mistake deltaVt/(Vg-Vt). Terms depending on lamdas can be reduced by using cascodes so that Vds of both transistors in a current mirror are equal. Then, no current error due to non equivalent bias conditions.
Developing the equations may be tedious but, believe me, you'll impress your advisor.
Tell me if you need further help.