Clocks and Reset signals are dealed with in back-end, only in the back-end, you really know how to balance it, what a insertion delay for clocks is reasonable.
Firstly, after synthesized in Synopsys, you need to inspect the structure of the gated clock tree. Then in SE or Apollo, you set proper clock tree generation constrants according to the structure. The insertion delay has to be got in several iterations.
Wish you luck
Another option is to insert lookup latches for delay insertion between different clock domain flops, when you put them in one scan chain. If latency need to be balanced for some functional reason, we again need delay insertion.
The first option (to force balance clock latency in Clock Tree Synthesis tool) is better, of course.
when you do synthesis, you can set dont_touch on clock and the reset signal. Then do gated clock synthesis when do p&r use Apollo or other tools. if you do clock between macros, you must define synchoronous ping on the clcok pins. The tools will add some delay cells ( you have defined).
Another option is to insert lookup latches for delay insertion between different clock domain flops, when you put them in one scan chain. If latency need to be balanced for some functional reason, we again need delay insertion.
The first option (to force balance clock latency in Clock Tree Synthesis tool) is better, of course.