Keep embedded RAM number as small as you could.
Remember that each RAM has overhead related to BIST, so 1X4KB has less area then 2X2KB, etc.
It is true that smaller memory is faster, but in general it isn't the issue.
Only in the case that you have routing problem in backend due to embedded RAM size, you could consider to divide RAM into more peaces.
If area is your primary concern, use 4KB block directly. You don't need to pay extra area for multiple decoder for each smaller block. If power is your design target, then multiple smaller blocks (like 1KB block x4) is better, because only 1/4 ckt is active during access. Compared with 4KB block, roughly 3/4 power is saved .
You may ask a ram generator from Foundry. They may have it in their Design Kit. With the generator, You will know the exact size, operation speed and estimated power consumption.