The data width (256 bytes) of the GPU/AI L1/L2 cache is very large, which makes my invention not directly applicable to it.
Now I have invented a new SRAM macro partition, which reduces the power consumption of GPU/AI L1/L2 cache to 15%~30%.
This figure Fig. 10 explains "•Reference Fig. 2, ,Fig.5, Fig. 6, bit line output is in the same way(just at S0, S4, S8, S12 TOP), so there does not need more routing power."
Because all bit line output of each small part output to S0, S4, S8, S12 TOP, no mux is needed and no routing power is wasted.
Notice: The arrangement of Fig. 6 is marked for small part. All bit line output is shown in the figure Fig. 10.