The fast explanation is that the FFT is a cyclic convolution. For example, people sometimes try to filter data by taking an N-point FFT adjusting coefficients, and then doing the inverse FFT. This would be nice if it always worked. But even a simple problem, like modeling a sample delay, is impossible. The FFT takes in N samples, and outputs N samples, while true convolution takes in N, and outputs N+k. (k being the FIR filter length).
The cyclic convolution of a unit delay would have placed the last sample at the front of the output (eg, FFT, apply "delay" coefficients, and then do inverse FFT). This means the FFT + EQ + iFFT will let you model a filter/channel only when the last samples in the time domain smear across the first ones.
So basically, the CP attempts this. You can output N+k samples, pass it through the channel. There is some convolution, and thus energy in N+k+k samples*. samples k to N+k are ideal though. the first k samples have smeared into this window, and the last original samples (N-k to N) smear into samples N to N+k (which are the CP and copies of the first k samples). This looks exactly like the cyclic convolution that the FFT is so good with. (the first k samples have whatever the last symbol was smeared into them, and samples N+k to N+2k smear into the next symbol)
* assuming N + cyclic prefix + channel