Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

Synthesized code is Way too big for chip. How can I reduce?

Status
Not open for further replies.

IckyT2012

Newbie level 6
Newbie level 6
Joined
Mar 31, 2011
Messages
11
Helped
0
Reputation
0
Reaction score
0
Trophy points
1,281
Activity points
1,457
I have a fully functional program now that simulates correctly. Unfortunately it is 3 times too large for my device. Its using around 29000 logic elements but the cyclone 2 only has room for 8200. I was processing 18-30 bit vectors with extensive multiplication and division. By shaving bits off and sacrificing accuracy, i have been able to get that number under 10000 bit it is still too large and the code is less reliable.
Wikipedia says:
"It is relatively easy for an inexperienced developer to produce code that simulates successfully but that cannot be synthesized into a real device, or is too large to be practical. One particular pitfall is the accidental production of transparent latches rather than D-type flip-flops as storage elements."

How do I determine if this is my problem and how do I avoid this pitfall?
 

IckyT2012,
Your synthesis tool should warning about Latches. They are undesirable in digital devices.
To avoid them, be sure everything is part of a clocked process.

Sckoarn
 

Most of the logic in my project is from one block which does some heavy calculations asynchronously. The results are then passed to the next block for use in a state machine. Are you suggesting that I would be better off synchronizing this calculation block?
 

look at what parts of the design can use time-sharing of resources. For example, a CPU can re-use the same multiplication/addition logic for several programs. In HDL, each operator infers logic. As a result, many one-time or ocassional use portions of the design end up requiring as much logic as the main design. You may need to clock the logic, and do part of the operation in each cycle.
 
IckyT2012,
I am not sure. If the calculation is huge, you may have no choice, you may need a bigger device. If the calculation is not time critical, it may benifit from synchronization. If your design created a bunch of latches that could have been implemented in BRAMs', that could cause the size to way way more than it has to be. Synchronization could enable you to use BRAMs in some way and help implement some of the logic in RAMS.

Sckoarn
 

An example would be this:
suppose you want to calculate the RMS value of a array of signals. Then instead of doing it in one clock cycle, or doing it asynchrnously do it in steps -
1)find sum of signals in first clock cycle.
2)find mean in second clock cycle.
3)find sum of squares in third clock cycle.
4)find rms value in 4th clock cycle.

this type of design facilitates the re use of resources in fpga. you should try something like this in your design.
 
Basic answers:

1. FPGAs give very poor performance for asynchronous arithmatic. You're normally better served doing it all synchronously and reducing the arithmetic workload per clock. This will probably use way less resources and fix the throughput.
2. Chose an appropriate device in the first place.
3. Avoid divisions - especially asynchronously. I get the feeling your FMax is terrible (probably less than 1-2MHz).
 
I broke my computations into 8 states. My total resource usage went from 9238 to 9256. So that didn't save any space. An article I read said that remove the else from if-else statments would remove transparent latches. so
if x=1 then y=1
else y=0
...becomes:
y=o
if x=1 then y=1
I've applied this to most cases, except when the statement looks like the following:
if reset=1 then state=00
else if clk'event and clk=1 then
if state=0 then state=1
if state=1 then state=0
If i apply the above method here, i receive an error that signals are changing outside the clock edge. Can I use this same method to remove transparent latches from clocked events or do i need to settle for the current code?
 

I think you must have read that wrong. You remove transparent latches by adding an else to all if statements that are not synchronous. Without an else, then it will have to remember values between states. With an else, the output is always a combination of inputs. And this does not count if you add an else and dont assign all outputs in it (eg. just put null).

The errror you are getting is because you have included the clock somewhere in the state decode, and so it is building clocked registers, but you're trying to assign values on non-clock edges. But you're not tackling your main problem which is your asynchrnous arithmatic. That is what is making your design too big and make it unlikely to work - and even if it does work, it wont work very efficiently.
 
just few notes

you can guide the tool to focus on reducing resource usage
[if possible] with the following settings:
Code:
settings ->

  Analysis & Synthesis ->
       Optimization Technique : Area
       More Settings...  ->
         Auto Resource Sharing   ON
         Remove Duplicate Reg    ON
         Remove Redundant Logic  ON

   Compilation Process Settings ->
       Optimize for fitting:  check both options
you can find if you have unwanted latches by grep the map report
for strings: "inferring latch" and/or "inferred latch"
[ btw. latch itself is not bad if you do it intentionally and
you can explain why you use it... ]

synchronizing the design will not reduce area, you will get
the same combo gates number plus registers,
what you can gain is possibility of re-use one module/block
then having two or more the same blocks in parallel
[e.g instead of having 2 multipliers doing a*b and c*d,
you can have one with a mux in front of it, not always
applicable of course]

your design does not use memory, in fact all gates in fpga
are look-up tables, i.e small memories, you can quite effectively
use internal fpga ram as a 'logic generator', decoders etc.
----
have fun
 
Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top