Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

Seeking Help with VHDL Timing Issue in Arithmetic Operation

Status
Not open for further replies.

chandra1502

Newbie
Newbie level 3
Joined
Aug 29, 2023
Messages
3
Helped
0
Reputation
0
Reaction score
0
Trophy points
1
Activity points
36
Hello,

I am currently working on a VHDL design that involves arithmetic operations with certain signals, but I'm facing timing issues that I need assistance with. I've provided a simplified version of the relevant code below.
Code:
 when scale_calc =>

                reg_intermediate_sub <= resize(max_input -min_input,32) ;--cc0
                reg_intermediate_div <= to_signed(to_integer(signed(reg_intermediate_sub) / to_signed(255,32)), 32);
                if unsigned(scale_cnt) <2 then
                    scale_cnt <= scale_cnt+1;
                else 
                 scale <= std_logic_vector(reg_intermediate_div);
                 state <= zero_point_calc;
                 scale_cnt<= "000";
                end if;
when zero_point_calc =>
    reg_mult_input <= to_signed(to_integer(signed(min_input) / signed(scale)), 32);
    if unsigned(scale_cnt) < 2 then
        scale_cnt <= scale_cnt + 1;
        reg_mult <= not reg_mult_input(15 downto 0);
    else
        state <= zero_point_done;
        scale_cnt <= "000";
        q_min <= to_signed(-128, 16);
    end if;

when zero_point_done =>
    reg := reg_mult;
    zero_cnt <= zero_cnt + 1;
    if unsigned(zero_cnt) = 2 then
        sum <= ('0' & q_min) + ('0' & reg) + "1"; ----timing error shows is i use reg/reg_mult signal here
    elsif unsigned(zero_cnt) = 3 then
        state <= out_data;
        zero_point <= std_logic_vector(sum(7 downto 0));
    end if;
when out_data => ...

In the above code, I'm performing arithmetic operations involving signals like reg_mult, q_min, and sum. When using specific signals like reg_mult or reg in the sum calculation, I'm encountering timing issues in implementation. Interestingly, if I use random values instead of these signals like to_signed(-27,16), the code works fine without any timing issue.

I've tried pipelining, using registered signals, and analyzing timing reports, but I'm still struggling with timing optimizations.

I'd greatly appreciate any insights, suggestions, or solutions from experienced designers who may have encountered similar situations. If you have any advice on how to tackle such timing issues in VHDL designs or can offer guidance on my specific case, please share your thoughts.

Thank you in advance for your help!
 

You don't mention intended clock speed and FPGA family. But if clock speed isn't very low (e.g. a few MHz), it's not possible to implement 32 bit divider without many pipeline cycles.

If arithmetic operation is not performed continously, consider a sequential divider (1 cycle per result bit).
 

You don't mention intended clock speed and FPGA family. But if clock speed isn't very low (e.g. a few MHz), it's not possible to implement 32 bit divider without many pipeline cycles.

If arithmetic operation is not performed continously, consider a sequential divider (1 cycle per result bit).
I am implementing the code on ZYBO Z7-20 FPGA board and using clock period 20ns. Below is the timing issue details. I am facing set up timing error
Code:
A    B
Name    Path 1
Slack    -47.093ns
Source    co_effs/quan/reg_mult_input _reg[0]__0/C   (rising edge-triggered cell FDRE clocked by sys_clk_pin  {rise@0.000ns fall@10.000ns period=20.000ns})
Destination    co_effs/quan/reg_mult_input _reg[12]/D   (rising edge-triggered cell FDRE clocked by sys_clk_pin  {rise@0.000ns fall@10.000ns period=20.000ns})
Path Group    sys_clk_pin
Path Type    Setup (Max at Slow Process Corner)
Requirement    20.000ns (sys_clk_pin rise@20.000ns - sys_clk_pin rise@0.000ns)
Data Path Delay    66.844ns (logic 35.559ns (53.197%)  route 31.285ns (46.803%))
Logic Levels    175  (CARRY4=145 LUT1=2 LUT2=25 LUT4=1 LUT5=1 LUT6=1)
Clock Path Skew    -0.244ns
Clock Uncertainty    0.035ns

is there any way, i can wait for the signal to calculate and receive the output after. Below i have attached the complete code for better understanding.
 

Attachments

  • quantize_int32_8.txt
    12.3 KB · Views: 109

it’s obvious you have an ENORMOUS 66ns delay. Find the source of that.
 

Hi,

"ZYBO Z7-20 FPGA" is not an FPGA family. It is a development board.
You need to know the FPGA familiy. Maybe it´s a "Xilinx Zynq-7000".
Is there a hardware divider inside?

66ns I guess in a combinatorial process. (divider?).
* So either you need to add (at least 4) rather equal pipeline cycles ... I guess it´s hard to do.
* Or you create a delayed "data_ready" signal ... and constrain the data path with the according value.

Klaus
 

Hi,

"ZYBO Z7-20 FPGA" is not an FPGA family. It is a development board.
You need to know the FPGA familiy. Maybe it´s a "Xilinx Zynq-7000".
Is there a hardware divider inside?

66ns I guess in a combinatorial process. (divider?).
* So either you need to add (at least 4) rather equal pipeline cycles ... I guess it´s hard to do.
* Or you create a delayed "data_ready" signal ... and constrain the data path with the according value.

Klaus
yes its a development board , centered around the Xilinx Zynq-7000 family..

The timing issue occurs because i am using 32 bit divider. i also checked the timings by using divider function uses 'for loop' to give the quotient , but still that also cause timing problem in implementation.

i was thinking to loop for 5-10 clock cycle in the same state machine and then move to next one so that it might have the data with no delay.
Code:
if positive_edge(clk)
.....
when fsm1=>
    cnt <= cnt+1;
    if cnt = 0 then
        reg_mult_input <= to_signed(to_integer(signed(min_input) / signed(scale)), 32);--cc0
    elsif cnt=10 then
        State <= fsm2;
        reg_mult <= reg_mult_input;
    end if;
when fsm2=>
    --access reg_mult for furthur step

But im not sure that will work. I already wasted lot of time to fix /use different methods.

Only the division part is giving lot of problem. Any solution for to solve this issue, would be really helpful thanks.
 

1.
As others have mentioned, a pipelined division might solve the timing problem. If I were you, I would have posted the error messsage that Xilinx reports regarding the timing error.
Have you considered to use a Xilinx IP core that will perform the division for you?

2. Another thing that I did not like in your code...
sum <= ('0' & q_min) + ('0' & reg) + "1"; ----timing error shows is i use reg/reg_mult signal here
You are mixing signals and variables! Is there a good reason for you to do it? Best not to do it.
 

Hello,

I am currently working on a VHDL design that involves arithmetic operations with certain signals, but I'm facing timing issues that I need assistance with. I've provided a simplified version of the relevant code below.
Code:
 when scale_calc =>

                reg_intermediate_sub <= resize(max_input -min_input,32) ;--cc0
                reg_intermediate_div <= to_signed(to_integer(signed(reg_intermediate_sub) / to_signed(255,32)), 32);
                if unsigned(scale_cnt) <2 then
                    scale_cnt <= scale_cnt+1;
                else
                 scale <= std_logic_vector(reg_intermediate_div);
                 state <= zero_point_calc;
                 scale_cnt<= "000";
                end if;
when zero_point_calc =>
    reg_mult_input <= to_signed(to_integer(signed(min_input) / signed(scale)), 32);
    if unsigned(scale_cnt) < 2 then
        scale_cnt <= scale_cnt + 1;
        reg_mult <= not reg_mult_input(15 downto 0);
    else
        state <= zero_point_done;
        scale_cnt <= "000";
        q_min <= to_signed(-128, 16);
    end if;

when zero_point_done =>
    reg := reg_mult;
    zero_cnt <= zero_cnt + 1;
    if unsigned(zero_cnt) = 2 then
        sum <= ('0' & q_min) + ('0' & reg) + "1"; ----timing error shows is i use reg/reg_mult signal here
    elsif unsigned(zero_cnt) = 3 then
        state <= out_data;
        zero_point <= std_logic_vector(sum(7 downto 0));
    end if;
when out_data => ...

In the above code, I'm performing arithmetic operations involving signals like reg_mult, q_min, and sum. When using specific signals like reg_mult or reg in the sum calculation, I'm encountering timing issues in implementation. Interestingly, if I use random values instead of these signals like to_signed(-27,16), the code works fine without any timing issue.

I've tried pipelining, using registered signals, and analyzing timing reports, but I'm still struggling with timing optimizations.

I'd greatly appreciate any insights, suggestions, or solutions from experienced designers who may have encountered similar situations. If you have any advice on how to tackle such timing issues in VHDL designs or can offer guidance on my specific case, please share your thoughts.

Thank you in advance for your help!
It seems you just want to quantise 32 bits to 8 bits.
If so you have gone too far on the hard track. You can just do truncation and choose a rounding methods with fewer lines of code and no need for division.
 

I agree that you should check in the first place if division can be avoided.

If not, you have clarified that you don't need continous division operation. A parallel, optionally pipelined divider is therefore not needed. The advantage of a parallel divider is that it can be directly inferred from behavioral HDL code, resource usage is however high, e.g. > 1k logic elements for 32/32 bit divider. You also need to provide pipelining or multi-cyle result delay in your code.

The post #6 code doesn't work as shown, but with a small modification as sketched below.
You'll assign a multi cycle timing constraint -to register reg_mult.
Code:
    if cnt = 0 then
        min_input_s <= min_input;
        scale_s <= scale;
    elsif cnt=10 then
        State <= fsm2;
        reg_mult <= mult_result;
    end if;
-- perform continuously
    mult_result <= to_signed(to_integer(signed(min_input_s) / signed(scale_s)), 32);

But as previously mentioned, you can considerably reduce resource usage if you switch to a sequential divider.
If division only applies a scaling factor, you may also calculate the inverse of scaling factor and use a much faster multiply.
 

Status
Not open for further replies.

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top