Quote: "I still argue that folding of standard transistor might result in a slight increase in the delay because of the extra parasitic caps and res associated with the routing between different diffusion gaps. However this delay might be compensated by the reduction in the gate resistance since you are reducing the gate size. "
Actually, this depends on how the folding was done and which parasitic cap/res will affect the output switching speed. If additional parasitic cap is added to say the source but output capacitance is reduced due to sharing of drain from folding, then you may find that it actually helps switching speed.
Added after 1 minutes:
By folding, gate size also doesn't change...