ILIA KALISTRU
Newbie level 6
Hi colleagues,
I have a problem meeting timing in my FPGA to FPGA source synchronous DDR data bus.
There are 2 FPGAs on my board and I need to pass data in one direction through a parallel bus. I need it run at 666.666 Mbit/s per lane (32 lanes total), so I use DDR with 333.333 MHz clock. My setup is source synchronous, TX FPGA is a big Virtex UltraScale+ (speed -3). Clock is differential, data is single-ended. Interface is edge aligned. Bits are scattered across 2 HP banks. I try to make it static capture (without these SelectIO HW edge tracking, de-emphasis and stuff) - data runs through ODDR (it's an OSERDES internally), then to OBUF, then to pin. Clock (from MMCM) also goes to ODDR with data hardwired to "10" to match the data delay, then is forwarded to a differential buffer, then to pins.
For constrains I do create_generated_clock from a positive port of a forwarded (output) clock, and I define data transition to be within ±0.4 ns skew from either positive or negative edges of the generated clock. With a 1.666 ns of a UI, I cannot afford more uncertainty than half of UI on the TX FPGA as I also need some freedom on the RX FPGA. I assume for simplicity that the PCB traces are perfectly aligned (for now at least).
Timing report does make sense to me except that Vivado cannot meet these requirements because of a crazy high difference of clock propagation times through MMCM and OSERDES components. The design fails with 170 ps of negative slack on both setup and hold analysis on the path to the pins. I understand that for MAX analysis (setup) the timing analyzer takes the longest possible propagation time for the launching clock edge and the shortest possible for the capture clock edge, but 400 ps of difference in propagation time through the exactly the same instance of MMCM for 2 consecutive clock edges seems too much for me. The same is with 200ps of difference for OSERDES. Other parts of the path are also skewed like that, though not that spectacular. But it adds up. Clocking Wizard for the MMCM core promises <100ps of jitter for the clock.
A also tried:
Maybe someone can share his ideas about:
I have a problem meeting timing in my FPGA to FPGA source synchronous DDR data bus.
There are 2 FPGAs on my board and I need to pass data in one direction through a parallel bus. I need it run at 666.666 Mbit/s per lane (32 lanes total), so I use DDR with 333.333 MHz clock. My setup is source synchronous, TX FPGA is a big Virtex UltraScale+ (speed -3). Clock is differential, data is single-ended. Interface is edge aligned. Bits are scattered across 2 HP banks. I try to make it static capture (without these SelectIO HW edge tracking, de-emphasis and stuff) - data runs through ODDR (it's an OSERDES internally), then to OBUF, then to pin. Clock (from MMCM) also goes to ODDR with data hardwired to "10" to match the data delay, then is forwarded to a differential buffer, then to pins.
For constrains I do create_generated_clock from a positive port of a forwarded (output) clock, and I define data transition to be within ±0.4 ns skew from either positive or negative edges of the generated clock. With a 1.666 ns of a UI, I cannot afford more uncertainty than half of UI on the TX FPGA as I also need some freedom on the RX FPGA. I assume for simplicity that the PCB traces are perfectly aligned (for now at least).
Timing report does make sense to me except that Vivado cannot meet these requirements because of a crazy high difference of clock propagation times through MMCM and OSERDES components. The design fails with 170 ps of negative slack on both setup and hold analysis on the path to the pins. I understand that for MAX analysis (setup) the timing analyzer takes the longest possible propagation time for the launching clock edge and the shortest possible for the capture clock edge, but 400 ps of difference in propagation time through the exactly the same instance of MMCM for 2 consecutive clock edges seems too much for me. The same is with 200ps of difference for OSERDES. Other parts of the path are also skewed like that, though not that spectacular. But it adds up. Clocking Wizard for the MMCM core promises <100ps of jitter for the clock.
A also tried:
- Center aligned interface
- 300 MHz clock b/c it is nicer to generate 300 from 100 MHz
- defining the constraints through creation of a virtual clock instead of a generated one
- separate forwarded clock with manual adjusting of its phase to better align with
Maybe someone can share his ideas about:
- Why there is 400 ps of propagation time difference between 2 consecutive clock edges when jitter is 100 ps?
- What can I do with it?
- Am I asking for to much from the static interface on the FPGA?