Hi batman!
M1-M3 is indeed a Diff Pair, and M4-M13 is a folded cascode. The structure M10,M12 is a self-biased cascode (gates connected together). That structure, and also M11,M13 and M15,M16 are the outputs of a current mirror whose input looks the same. That input generates Vbias. This makes it possible that the current through M14 tracks the currents in the two folded-cascode paths quite well.
M14 therefore has a constant current and acts as a voltage level shifter between the drains of M7,M9 and the gate of M18. Since the drains of M7,M9 are connected to the gate of M17, this means that there will be a constant voltage difference bewteen the gates of M17 and M18. The circuit would also work if you omitted M14-M16 and connected both gates of M17,M18 to the drains of M7,M9, but then a huge current would flow from Vdd to Vss through M17,M18 in the operating point. The circuit M14-M16 is there to reduce that operating-point current. Only if the drains of M7,M9 move down (or up) will M18 (or M17) start to deliver much more current than in the operating point. This is therefore a class-AB output buffer.
I hope this helps!
Slainte!
H.