To implement full chip design it is more or less like the block level implementation. Here you might need to have greater understanding of the architecture of the design and very good understanding of many issues upfront. We can implement full-chip in flat as well as hierarchical. If it is less than 10mm2 die size we may prefer to implement in flat. There are lot of advantages when implementing in flat when compared to hierarchical as we might save area (no need to leave additional channel between the partitions), save power, if you over constrain the interface paths we might end up using high drive strength buffers than the required and also if the critical path logic is placed in multiple blocks then it might be difficult to close the timing path and we might have to open multiple blocks to fix the timing issues. In short we might waste area/power/timing if we implement in hierarchical but there are benefits with the hierarchical designs as we can address the problem by divide and conquer method. As blocks can be distributed between engineers and achieve good QOR and integrate them back into the top level. If the multiple instances of the block are used in the design then it might save the run time and save CPU memory foot print etc in hierarchical designs.
Partitioning the design is basically we need to follow min-cut algorithm while trying to partition a logic seeing the number of cross over pins as less as possible. If we have CPU/GPU/security processor/peripheral's like USB separate voltage islands and high speed interfaces like DDR/SATA partition them into separate blocks. If there are small logic which is repeatable more time's then it is recommended to design the block separately. Also, you might need to plan for feedthru ports so that some of the path distance to the destination/sink can be reduced instead of detouring. Always try to minimise the channel as much as possible by abutting the two blocks.
In one of the designs in order to limit timing problem related to the DDR we have designed a DDR block. In DDR block we need to do skew balancing and interconnect balancing. If we have to do this at the top level we might take couple of days as TAT. So, we decided to move the DDR logic into one block excluding the DDR IO's. So all the above mentioned problems are limited with in the block. It is block designer headache to meet all the constraints. Once it is done we abutted the block directly to IO's and planned to have only direct connections from IO ports of DDR block to IO ring, there won't be any channel routing.
Always try to push timing critical logic to within a block.
Timing budgeting 30% in block and 50% external and rest 20% is for margin, relax the constraints whenever required.
You can extract spefs/netlist for each block and all of them can be run in parallel. You can readin the netlist and spef and link them and do flat STA with hierarchical spefs. Alternately you can do flat extraction and can do flat STA. Or we can extract ILM's for the blocks and do hierarchical STA as well. constraints need to be merged carefully else you might see invalid paths.
You might need to plan the channels between the blocks properly, always try to add some spare and feedthru ports and spread those ports evenly in all sides. Plan the clock pins of all the blocks by drawing rough clock tree in a white paper. If you are having multiple voltage domains you might need to place the voltage pads related to them accordingly. Need to isolate the crosstalk related problem by adding shielding or blockage surrounding the blocks. try to minimise the crosstalk on the clock tree as much as possible. As it might impact the timing closure as we cannot remove the cross talk on the common path. Need to add guard rings on high speed circuity like PLL's or analog blocks. Always try to use the same version of tools for all the blocks as much as possible. Need to prepare list of don't buffer, analog nets and special care need to be taken. Also, you might need to place glitch suppressor or similar logic by placing guides manually. Need to maintain don't size cell list.