Think of a clock gate as "simple and" with an enable gating the clock. The reason you do this is to stop unnecessary toggles on the clock pin of flops. Even if the output doesn't toggle, the internal flop circuitry dissipates unnecessary power. Power saving can be achieved by simply gating the clock with an enable.
Here's the catch, if the enable is asynchronous to the clock and gates the clock during its active phase, you can end up with a clipped clock, which effects the duty cycle. This scenario can lead to timing violations on the flop and downstream logic. If clock clipping happens very close to the active edge of the clock, there might even be a clock width violation.
To prevent violations, its best to sync the enable signal with respect to the clock it is gating. This is achieved by using a latch which is transparent only during the inactive phase of the clock. For example, to gate the clock to its low state, use an active low latch to sync the enable and gate the clock with the sync'ed version as described by the code below:
Code:
always_latch
if(!clk) en_syn <= en;
assign gated_clk = clk & en_syn;
always_ff @(posedge gated_clk) ....
Many ASIC vendors, supply clock-gate cells. Which is internally a combination of the latch and the gate described above. These gates can be instantiated in the design or better yet inserted during synthesis (if the RTL is coded following the tool requirements of cg insertions).
The clock_gating_setup_time is essentially the setup requirement of the latch and is available for STA delay annotation in the dbs read by PT.
Its worth mentioning that clock gating does not have much significance on individual flops. But imagine a scenario of writing a 64bit register based on an enable. The power dissipated by all 64 flops can be greatly reduced by using a single clock gate cell common to all 64 flops, which amounts to a very significant power saving. On a side note, clock gating can greatly help save area in the scenario described by getting rid of the large mux'ed feedback path which would otherwise be necessary to meet logic requirements.