1. When you quantize a signal, you're basically rounding the signal to the nearest value you have a representation for (eg. for an 4-bit ADC operating between 0V and 1V, you can represent the values 0:1/((2^4) - 1))):1, where (1/((2^4)-1) is an LSB. The act of rounding a signal value, say 0.214 to the nearest value you can represent, 0.2, introduces an error (of 0.014 in this case). If you have 1000 different signal values which you quantize, then in probability terms, the error will be uniformly distributed between +/- 0.5*LSB.
In the frequency domain, these errors look like noise which is uniformly distributed across the spectrum. For a 4-bit ADC, in the output spectrum you would see your signal and also some noise somewhere around 24dB down (4-bit @ 6dB/bit). Noise shaping involves shaping this noise so that less of it appears at low frequency (where your signal is) and more of it appears at high frequency, so that for example, your noise floor will be -80dB in the signal band, but -10dB at high frequency. But that's not a big deal as you can use a low-pass filter at the output of your ADC with a cutoff higher than the highest signal frequency, but below the point at which the noise starts getting too high, to bring the high frequency noise down also to -80dB. So in this example, noise shaping has achieved 13-bit performance from a 4-bit ADC.
Have a look at this intersil application note: **broken link removed**, especially figure 7C.
2. Noise shaping uses a high pass filter to attenuate noise in the signal band, and to move it to high frequency. First order noise shaping uses a first order high pass filter.
3. Second/Third/n order noise shaping uses a Second/Third/n'th order high pass filter.
4. No.