Typically you convert to an IF such that you can use very sharp, single-frequency tuned, filters to knock down interfering signals that are nearby your carrier, in frequency. That way, the receiver can demodulate just the signal on the RF carrier you choose, and not be drowned out by signal from a nearby carrier. This is called selectivity... how well can your receiver "hear" the desired signal, and reject the undesired signal. If you went from RF down to baseband, directly, you'd need absurdly steep filters that would be impossible to create.
The baseband signal is the "information" you want to send/receive. The RF is just a carrier... the way that carrier is modulated contains the information. Like a terrestrial radio station. The carrier frequency may be 88.1 MHz, but the music is Frequency Modulated (hence, why it's called FM) onto it. If you looked at 88.1 MHz on a spectrum analyzer, you'd see that it has a bandwidth of a few tens of kilohertz... which is the bandwidth of the audio being sent out. You mix the whole signal from RF to IF (the modulation remains on the IF signal), you run it through a steep filter (like a quartz crystal filter) to attenuate all the other signals coming in from the antenna, then mix the IF down to 0 Hz, and what you are left with is just the modulation signal (the audio).
Example:
RF = 88.1 MHz
LO1 = 58.1 MHz (first downmix)
LO2 = 30 MHz (second downmix)
IF = 30 MHz
RF - LO1 = IF (88.1 MHz - 58.1 MHz = 30 MHz)
IF - LO2 = Baseband (30 MHz - 30 MHz = 0 Hz)