OK, here is a bandgap made of 1N4148 to explain the principle better.
We use the diode D10 as the voltage that decreases with temperature, and we use resistor R2 to multiply the PTAT current until the slope and magnitude of the falling and rising voltages are equal. Look at page 4, and you will see that the diode voltage falls with temp, and the voltage across the resistor rises with temp. R3 has been scaled so these voltages are equal in slope, giving the temperature-independant bandgap output.
You can see in page 2 that Vbg is not perfectly flat, but it's close enough to use as a precision reference voltage, especially when compared to how much the diode voltage alone changes!
Now I will explain how to make the PTAT current. I use only 1n4148, but I have to throw a few transistors in in order to regulate some things.. First, Q1/Q2 acts as a belt, (like for your pants) in order to keep the voltage across the multi-diode equal to the voltage across the single diode. Now when you parallel 8 diodes, the voltage across them is smaller than across a single diode. We add the resistor, and now the voltage difference (given by kT/Q*ln( 8 )) is pushed onto the resistor. This voltage difference, called deltaVBE, is very well defined for anything made in silicon - this guy is the magic one really.
Now deltaVBE always has a nice, well behaved, positive tempco. We can use this to compensate for the not-well-behaved diode tempco. This means that every time you build one of these, the single diode D10 will mismatch a little bit, and R2 may have to be more or less than 3.25k in order to reach 1.073v, the magic voltage for the 1N4148.
The mirrors Q3/Q4 complete the PTAT loop. Now the current in the Q3 branch is set to be the same as the current in the Q4 branch, and we have already locked the voltage by the Q1/Q2 mirror, so now we can say that the current in all branches is equal to the ptat voltage divided by the resistor, so therefore the current in all branches is ptat. A plot of the ptat current is shown on the last page. Now all we have to do is choose a resistor such that this current times our resistor gives the opposite of the diode voltage, and we are done!
We add one more mirror (Q5), which will again source the same PTAT current as found in Q3/Q4, and we put the gain resistor plus a single diode in it's path. Now the output voltage is Vbe(t) + IPTAT(t)*R
Vbe decreases with t, PTAT increases with t, and R scales the PTAT voltage to match, giving a very very flat reference voltage made out of a whole bunch of moving tagets - pretty cool, eh?
Speaking of cool, you should be able to stick this circuit in the freezer for an hour (keep it running, a 9v battery will run this for months) and measure the output after a while - it should be very close to the room temp value. But measure the voltage at the top of D10 and it will have changed 20%!
Same thing for the oven - don't melt your circuit or explode your battery, but you should be able to get this circuit up to 100F in an oven with no problem (maybe heatgun is better) and again measure little to no temp drift. If you do have drift, use this strategy.
If Vout is low at room temp and higher at hot, decrease R2. If Vout drops with increasing temp, increase R2. Even professional IC's need to be tested like this, to find the "Magic Voltage". Then all IC's with this bandgap can have R2 "trimmed" to obtain this same magic voltage, and we know the temp response will be well centered.