mrflibble
Advanced Member level 5
Argh! I've been using clock enables for datapaths that are allowed to take multiple cycles to operate. Basically duplicate the logic and run it at half the speed to still have full throughput.
Now in principle things are working, but two things are not going exactly the way I would like.
1 - Specifying timing constraints for multi-cycle paths is totally annoying.
2 - The fanout for the clock enables is such that it does not hit the limit for MAX_FANOUT yet. As such no register duplication occurs. But the clock enable has to reach flip flops that are spread out far enough that some of the more distant ones do not meet timing.
Now I can specify a MAX_FANOUT attribute on the clock enable flip-flop, such that it should start duplicating registers. The fun part is that when I do this, the timings actually get worse. I also read that things can get unpredictable with a MAX_FANOUT value of 30 or less. Now for one particular clock enable that had a fanout of 63, I thought I'd override that and put a MAX_FANOUT = 40 on it. And as said, it did change things, only for the worse.
Since the clock enable was for a 2 cycle path, the clock enable was nothing more than something like this:
And then for the 2 cycle operation:
Like I said, that works well, right up until the point where the Q output of that "ce" flip-flop has to travel too far to the CE pin of the various flip-flops.
Now the fix for that is relatively simple, namely make a couple of local copies by hand at roughly the right locations. That does work, and it's also tedious and annoying. This is precisely the sort of thing I would expect a tool to do for me. No doubt I am doing something wrong, but not sure what that is. Maybe I should be setting the MAX_FANOUT to some magic number? Mmmh, haven't tried 42 yet... Or maybe I should enable/disable the magic synthesis option? Anyone have any idea on how to go about this?
As for timing constraints... I have tried to setup constraints using the clock enable in a TNM_NET, and then do the FROM and TO on that TNM_NET. So:
The use of a NET for the "TNM_MY_CE" TNM_NET makes sure that it will trace from that net to the first synchronous element it encounters ... i.e the CE pin of the flip-flop it connects to. That sortof works, but has some side-effect that prevent this method from being really useful. So far the best method I have found is to just make TNM_NET's for the source and destination INSTances, and do timespecs for those. So something like:
This does work, has no side-effects, and is a pain to maintain. Ideally I would like something that uses the clock enable to define the TIMESPEC, but I am not sure how to go about that...
Thanks in advance for any ideas/hints/tips/reading material that you can think of!
Now in principle things are working, but two things are not going exactly the way I would like.
1 - Specifying timing constraints for multi-cycle paths is totally annoying.
2 - The fanout for the clock enables is such that it does not hit the limit for MAX_FANOUT yet. As such no register duplication occurs. But the clock enable has to reach flip flops that are spread out far enough that some of the more distant ones do not meet timing.
Now I can specify a MAX_FANOUT attribute on the clock enable flip-flop, such that it should start duplicating registers. The fun part is that when I do this, the timings actually get worse. I also read that things can get unpredictable with a MAX_FANOUT value of 30 or less. Now for one particular clock enable that had a fanout of 63, I thought I'd override that and put a MAX_FANOUT = 40 on it. And as said, it did change things, only for the worse.
Since the clock enable was for a 2 cycle path, the clock enable was nothing more than something like this:
Code:
(* MAX_FANOUT = 40 *) reg ce = 1'b0;
always @(posedge clk) begin
ce <= (~ce)
end
And then for the 2 cycle operation:
Code:
always @(posedge clk) begin
if (ce) begin
// stuff
end
end
Like I said, that works well, right up until the point where the Q output of that "ce" flip-flop has to travel too far to the CE pin of the various flip-flops.
Now the fix for that is relatively simple, namely make a couple of local copies by hand at roughly the right locations. That does work, and it's also tedious and annoying. This is precisely the sort of thing I would expect a tool to do for me. No doubt I am doing something wrong, but not sure what that is. Maybe I should be setting the MAX_FANOUT to some magic number? Mmmh, haven't tried 42 yet... Or maybe I should enable/disable the magic synthesis option? Anyone have any idea on how to go about this?
As for timing constraints... I have tried to setup constraints using the clock enable in a TNM_NET, and then do the FROM and TO on that TNM_NET. So:
Code:
TIMESPEC "TS_GCLK" = PERIOD "GCLK" 2.666 ns; # global clock
NET "*/something/ce" TNM_NET = "TNM_MY_CE";
TIMESPEC TS_MCP_CE = FROM "TNM_MY_CE" TO "TNM_MY_CE" TS_GCLK*2; # allow 2 cycles
The use of a NET for the "TNM_MY_CE" TNM_NET makes sure that it will trace from that net to the first synchronous element it encounters ... i.e the CE pin of the flip-flop it connects to. That sortof works, but has some side-effect that prevent this method from being really useful. So far the best method I have found is to just make TNM_NET's for the source and destination INSTances, and do timespecs for those. So something like:
Code:
INST "*/module_A/source_ff" TNM_NET = "TNM_FROM_HERE";
INST "*/module_B/dest_ff" TNM_NET = "TNM_TO_THERE";
TIMESPEC TS_MCP_WORKS_BUT_IS_TEDIOUS = FROM "TNM_FROM_HERE" TO "TNM_TO_THERE" TS_GCLK*2; # allow 2 cycles
This does work, has no side-effects, and is a pain to maintain. Ideally I would like something that uses the clock enable to define the TIMESPEC, but I am not sure how to go about that...
Thanks in advance for any ideas/hints/tips/reading material that you can think of!
Last edited: