Being a newbie to timing analysis, I am neither doing a tape-out here nor selling this design to anyone.
I would appreciate technical explanations if possible.
I just tested the .bit file in bare-metal (after merging the ELF/binary) and I can see that R/W to the registers of peripheral IPs can be performed by the uP core.
being a newbie or not has no relationship to "tape-out" or "selling" a design.
Static timing analysis is done to check it the design will functionally work at the best (Best P, High V, & low T) case and worst (Worst P, Low V, high T) case corners.
Suppose you violate setup time by 1 ns and your clock period is 30 ns at room temp with an average die (process) you'll be highly unlikely to see any kind of issue given that there is probably a lot of cells and routing between source and destination.
Instead if we violate setup time by 1 ns but our clock is 400 MHz (2.5 ns period), It's very likely that you will have intermittent setup violations (possibly at room temp) that cause functional failure due to metastability. In this case your path delay stack up requires 3.5 ns to work across PVT, which is 40% larger than the 2.5 ns you have to work with.
What you
should be doing is looking at why there are timing violations, which is usually because of a poor understanding of pipeline design and "how much you can do with a LUT with x-inputs". Using a design with "bad" timing isn't the way to verify functionality of a design as you'll never know for certain if the errors in functionality are really due to functional problems or timing problems. Even if you are a hobbyist you should fix timing violations
Post a couple worst case paths from the reports showing the data and clock paths and the relevant constraint(s) you have applied to the clock used.