i will try to explain as i understood,
lets first look at the inputs of the general cit. u will find them consists of two parts ac(signal u need to amplify) + dc(biasing voltage)+ noise
second: dc is common on both side.
third: any noise will affect both sides(v1,v2) by same value so the cit. see them as a common input(refelects in CM- move together up and down), is that signal which is similar on both sides, we need to reject this part as much as we can, we dont need the noise to appear at output. from analysis u will understand it more.
for the diff. mode : it represents the signals that when one up the second goes down, this signal is our desierd signal so we amplify it as much as we can.