The purpose of CTS is to make buffer tree to supply clock to every flop. You have to implement CTS physically on the chip to supply clock. Otherwise how your flops will get clock and function?
one doubt regarding macro space formula
(width+spacing x number of pins /vertical routing layers) + spacing. width means either core widh or macro width?same like spacing and which pins either io pins or macro pins?