Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

How to constrain a source-synchronous FPGA input?

Status
Not open for further replies.

weetabixharry

Full Member level 4
Full Member level 4
Joined
Oct 9, 2013
Messages
232
Helped
69
Reputation
142
Reaction score
73
Trophy points
1,318
Location
Sweden
Activity points
3,463
I have a source-synchronous input to my FPGA, coming from an external chip whose datasheet gives the following information:

fmax = 300 MHz (single data rate)
tsetup = 0.4 ns
thold = 0.5 ns

Using an oscilloscope, it is clear that the data is "center-aligned" (such that the data values are stable near the rising edge of the clock). Therefore, my interpretation of the datasheet is that the data is guaranteed to be stable for 0.9 ns per clock cycle (from 0.4 ns before the rising edge of the clock, until 0.5 ns after).

I don't think my question is really vendor-specific, but I am targeting an Intel Cyclone 10 GX (part 10CX085). I have stripped my whole design down to two flip-flops (pink blocks below) and I still struggle to meet timing, even if I heavily relax the timing constraints:

1640537543323.png


By closely following Intel's free training course, I understand that for best timing performance I should instantiate a PLL (specifically, an IOPLL - green block above) and my timing constraints should be defined like this:
Code:
# Define input clock parameters
set period 3.333
set tsu 0.4
set th 0.5

# Define input delays
set half_period [expr $period/2]
set in_max_dly [expr $half_period - $tsu]
set in_min_dly [expr $th - $half_period]

# Create virtual launch clock
create_clock -name virtual_clock -period $period

# Create physical base clock (phase shifted by 180 degrees) on FPGA pin
create_clock -name Clk -period $period -waveform "$half_period $period" [get_ports Clk]

# Create generated clocks on the PLL outputs
derive_pll_clocks

# Set input delay constraints
set_input_delay -clock [get_clocks virtual_clock] -max $in_max_dly [get_ports InData*]
set_input_delay -clock [get_clocks virtual_clock] -min $in_min_dly [get_ports InData*]

This failed timing catastrophically, so I tried relaxing to a 10 MHz clock with an extremely generous thold = 25 ns:
Code:
# Define input clock parameters
set period 100
set tsu 0.4
set th 25

It surprised me that this also failed timing. Since the data is stable for >25 ns and I have told the tools when that stable time will be (relative to the phase of the 10 MHz clock), I thought maybe the tools would be able to adjust delays (and/or PLL phase) to ensure the data is sampled near the middle of the "stable" time.

However, if I manually adjust the phase of the PLL (to basically anything sufficiently larger than zero) then timing is of course met comfortably:
1640538468426.png


This leaves me very confused about the following:
  • Are my requirements (300 MHz, 0.5 ns, 0.4 ns) feasible with this device?
    • If so, then how? (Timing contraints, PLL configuration, etc).
    • If not, then what performance is possible? (I can't see this information anywhere in the FPGA documentation).
Many thanks in advance for any help.
--- Updated ---

The Quartus project is attached...
 

Attachments

  • 1640538242260.png
    1640538242260.png
    4.2 KB · Views: 261
  • quartus_project.zip
    8.7 KB · Views: 178
Last edited:

I think that 0.4ns is a pretty demanding requirement (it doesn’t matter even if you set your clock to 60 Hz) especially for a lower-end device like a Cyclone.
 
I have a source-synchronous input to my FPGA, coming from an external chip whose datasheet gives the following information:

fmax = 300 MHz (single data rate)
tsetup = 0.4 ns
thold = 0.5 ns

Using an oscilloscope, it is clear that the data is "center-aligned" (such that the data values are stable near the rising edge of the clock). Therefore, my interpretation of the datasheet is that the data is guaranteed to be stable for 0.9 ns per clock cycle (from 0.4 ns before the rising edge of the clock, until 0.5 ns after).

I don't think my question is really vendor-specific, but I am targeting an Intel Cyclone 10 GX (part 10CX085). I have stripped my whole design down to two flip-flops (pink blocks below) and I still struggle to meet timing, even if I heavily relax the timing constraints:

View attachment 173548

By closely following Intel's free training course, I understand that for best timing performance I should instantiate a PLL (specifically, an IOPLL - green block above) and my timing constraints should be defined like this:
Code:
# Define input clock parameters
set period 3.333
set tsu 0.4
set th 0.5

# Define input delays
set half_period [expr $period/2]
set in_max_dly [expr $half_period - $tsu]
set in_min_dly [expr $th - $half_period]

# Create virtual launch clock
create_clock -name virtual_clock -period $period

# Create physical base clock (phase shifted by 180 degrees) on FPGA pin
create_clock -name Clk -period $period -waveform "$half_period $period" [get_ports Clk]

# Create generated clocks on the PLL outputs
derive_pll_clocks

# Set input delay constraints
set_input_delay -clock [get_clocks virtual_clock] -max $in_max_dly [get_ports InData*]
set_input_delay -clock [get_clocks virtual_clock] -min $in_min_dly [get_ports InData*]

This failed timing catastrophically, so I tried relaxing to a 10 MHz clock with an extremely generous thold = 25 ns:
Code:
# Define input clock parameters
set period 100
set tsu 0.4
set th 25

It surprised me that this also failed timing. Since the data is stable for >25 ns and I have told the tools when that stable time will be (relative to the phase of the 10 MHz clock), I thought maybe the tools would be able to adjust delays (and/or PLL phase) to ensure the data is sampled near the middle of the "stable" time.

However, if I manually adjust the phase of the PLL (to basically anything sufficiently larger than zero) then timing is of course met comfortably:
View attachment 173550

This leaves me very confused about the following:
  • Are my requirements (300 MHz, 0.5 ns, 0.4 ns) feasible with this device?
    • If so, then how? (Timing contraints, PLL configuration, etc).
    • If not, then what performance is possible? (I can't see this information anywhere in the FPGA documentation).
Many thanks in advance for any help.
--- Updated ---

The Quartus project is attached...
Based on my old experience you need to enter set_input_delay as "clock period - tSU" for max value
and enter set_input_delay as "tH" for min, ignoring clock/data delay difference arrival at FPGA.
 

Based on my old experience you need to enter set_input_delay as "clock period - tSU" for max value
and enter set_input_delay as "tH" for min, ignoring clock/data delay difference arrival at FPGA.
If I remember correctly, that's the way they recommend doing it in the Xilinx documentation. But in the Intel/Altera documentation, they recommend defining a virtual clock to represent the launch clock in the external device. So if the data is center-aligned, the virtual clock is 180 degrees out of phase with the physical clock (which is used to latch the data inside the FPGA), so you end up with a difference of clock_period/2 between the two methods.
 

If I remember correctly, that's the way they recommend doing it in the Xilinx documentation. But in the Intel/Altera documentation, they recommend defining a virtual clock to represent the launch clock in the external device. So if the data is center-aligned, the virtual clock is 180 degrees out of phase with the physical clock (which is used to latch the data inside the FPGA), so you end up with a difference of clock_period/2 between the two methods.
Both xilinx and altera moved to sdc timing analysis. I used altera before and it is -yes- plenty of messy options for input/output delays due to different ways chip makers specify their timing. You can use virtual clock or if chip maker defines tSU/tH then it can be translated as per my suggestion above. Visually it is understood as follows:

With respect to latch edge the max delay is [period - tSU] while min delay is [tH]

1640559700407.png
 

Although 10CX should be able to sample the serial data with fixed clock timing, I wonder why you don't utilize the Cyclone 10 GX DPA (dynamic phase align) feature that adjusts the sampling window to compensate for internal and external clock to data skew?
 
Although 10CX should be able to sample the serial data with fixed clock timing, I wonder why you don't utilize the Cyclone 10 GX DPA (dynamic phase align) feature that adjusts the sampling window to compensate for internal and external clock to data skew?

Thank you for this idea, FvM. This is my first time working in low-level detail with an Intel/Altera device. I saw DPA mentioned in the context of SERDES, but I assumed a serialization factor of 1 would not be possible. I will read more about DPA now, but would you be able to help me with 2 quick questions?

  1. When you said a 10CX should be able to sample the serial data with fixed clock timing, did you mean in general, or even with my specific requirements (300 MHz, 0.5 ns, 0.4 ns)?
  2. The SERDES data rates look very impressive. Can the SERDES IP cores be used to support higher source-synchronous clock frequencies, even if no serialization/deserialization is required? If so, then I really hope my clock and data pins are connected to a SERDES core...
 

I have a source-synchronous input to my FPGA, coming from an external chip whose datasheet gives the following information:

fmax = 300 MHz (single data rate)
tsetup = 0.4 ns
thold = 0.5 ns
I'm not sure you are specifying your FPGA input constraints correctly you seem to be applying the external chip setup and hold values to your FPGA inputs.

The output delays of the external chip along with the skew between the output clock.data and the trace delays should be used to generate the input delay constraints to the FPGA.

A set_input_delay constraint is the delay incurred outside the FPGA before it gets to an FPGA pin i.e., you are telling the tools is how long after a clock you expect the FPGA to see the signal arrive at its pins.
 
@ads-ee I was also very confused by this, but as far as I can see, the terminology is just horribly (and commonly) abused.

In this context, tsetup and thold are apparently not setup and hold times as we all know and understand them. In fact, they are the durations before and after the clock edge that the transmitting (external) device guarantees the data value to be stable. In other words, they are timing guarantees associated with the transmitting flip-flop and not timing requirements of the receiving flip-flop.

I can hardly think of a more effective way to cause unnecessary confusion than to redefine widely accepted terminology in a subtly different new way. But that seems to be where we are.

It is certainly possible I have misunderstood something, but I have really tried to follow Intel's examples very closely. Specifically, "Setup and Hold Method" on page 46 of AN433 and Section 3.12 of OCSS1000.
 

@ads-ee I was also very confused by this, but as far as I can see, the terminology is just horribly (and commonly) abused.

In this context, tsetup and thold are apparently not setup and hold times as we all know and understand them. In fact, they are the durations before and after the clock edge that the transmitting (external) device guarantees the data value to be stable. In other words, they are timing guarantees associated with the transmitting flip-flop and not timing requirements of the receiving flip-flop.

I can hardly think of a more effective way to cause unnecessary confusion than to redefine widely accepted terminology in a subtly different new way. But that seems to be where we are.

It is certainly possible I have misunderstood something, but I have really tried to follow Intel's examples very closely. Specifically, "Setup and Hold Method" on page 46 of AN433 and Section 3.12 of OCSS1000.
Yes that is true about terminology. and that is why I explained my view about mapping chip tSU/tH to fpga perspective and indicated in the diagram. I think the reason is that chip makers don't have to follow sdc definitions.
It is weird that the chip did not give tCO which is more meaningful but I guess we can deduce tCO from their tSU/tH since the transition window is between their tH and tSU based on my previous work in quartus or vivado.
 

Status
Not open for further replies.

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top