FFT Implementation on Spartan 3E xc3s100e-5vq100

Status
Not open for further replies.

Abhinav Mishra

Newbie level 5
Joined
Jan 17, 2014
Messages
9
Helped
0
Reputation
0
Reaction score
0
Trophy points
1
Visit site
Activity points
64
Hello Everyone,

I am trying to implement 64-point FFT on Spartan 3E xc3s100e-5vq100. But when I compile my code the overusage error appears. I have tried many ways to optimize the code but could not solve it.

Is it possible to implement it on this hardware ?

Can anyone tell what are the minimum hardware requirements?
 

I can't give really specific help, as I am not familiar with the Spartan 3. However, if you look at Table 3.1 of this: [link], a 64-point FFT can be achieved with 264 multipliers and 1032 adders (or 196 multipliers and 1280 adders).

Those numbers seem a bit high to me, so there is possibly a way to do better than that. Perhaps this [link] is relevant (a 64-point FFT implemented on a Spartan 3E).
 
Last edited:

Hi Abhinav,

There are no minimum hardware requirements generally, unless you specify a particular speed that this 64-point FFT must run at.

It is definitely possible to implement in hardware, but what you will need to do is serialize it. Otherwise you'll get resource usage like weetabixharry mentioned. Doing this by hand is awful and I highly wouldn't recommend it.

I would get started with this Versatile FFT, it claims to be synthesizable on the xc3s500e with reasonable resource usage using four of the 18x18 multipliers. You have four multipliers in your FPGA... it will be a tight fit, but it could work if the FPGA is to provide the FFT and little to no other functionality...

Good luck!
 
Hello Everyone,

Thank you for your suggestions. I tried the above mentioned links. Now, I have my code synthesized but there is still MAP errors. how can I fix it?

Do I really have to use FPGA Editor, never have used it and as the design is heavy I dont really want to pick n place. So, is there another way round?

 

use smartxporer it uses various seed and map/par settings to try and find a solution that routes.
 

Hi Abhinav Mishra,

Can you be more specific about which approach you are taking?

If you are having mapping errors I would say you might want to look at rearranging stuff at the HDL level, as the autorouter for FPGAs is very good, and if it can't find a solution you may be out of luck for your current configuration.

Perhaps you can give more context about how you are using this FPGA? where does the input come from, and where does the output go to? what are the control signals (reset, enable, start, busy etc) connected to?
 

If you are having mapping errors I would say you might want to look at rearranging stuff at the HDL level, as the autorouter for FPGAs is very good, and if it can't find a solution you may be out of luck for your current configuration.
Not in my experience with dense designs. Which this one may be as the device the OP is using is rather small for the application. Also the indication from route implies the design is probably packing CLBs with unrelated logic, which map will do when the part is nearly full.

Abhinav Mishra, you should probably post the utilization reports for the design and the settings used for XST, map, and par. The command lines used should be in the respective report files.
 

Quite possibly, I haven't gone down this path before. In any case, it is important that the inputs, outputs and control signals are routed out to the pads or connected to an internal controller. There is going to be very little room left for any logic inside this device and there is no point trying to squeeze in this FFT if you can't provide the IO and control lines to it...
 

I'm going by the OP's error, which was from route (par) so they have the I/O to fit the design, otherwise it wouldn't have made it to par (it would have failed map). So the problem appears to be bad placement, which I've also seen when a design starts reaches 80%+ utilization.

Once Abhinav Mishra post the reports a determination can be made of how to proceed. I still suggest they should try smartxplorer, if most of the strategies have 0 timing scores then the original map placement was probably non-optimal.
 

====================================================================
hello friends, |
|
This is what i got from xplorer. i m still working to use xplorer. So, what should I do next? |
=====================================================================
----------------------------------------------------------------------
 

All of the runs failed par, not a good sign. All of them mapped, so there is enough "logic" in the device.

Are the I/O's you are using locked down? Where are the I/O's placed? Don't give just give us a list of the pin locations...I'd prefer to know what banks are used and the I/O that are assigned to them. You may have a really bad pin placement, which is aggravating the routing issue.

You still need to post the resource utilization reports from map and you need to include the par report. You're probably running into a problem with LUT packing that is killing the router. You should also tell us what XST options are set, you may want to force area to be the primary goal of the synthesis and use the highest effort level.

Regards
 

Mmmh, what do you mean with LUT packing killing the router? I can think of a few things, but those are not a 100% match for how you describe it...
 

When par starts packing LUTs when it's really full, unrelated logic gets stuffed into CLBs resulting in increased congestion, which is what the OP reported on their original run. That doesn't "kill the router" but causes the router to fail, which is what I was referring to.
 

Check. Which was one of out the set of things I was thinking you could mean. Hell, sometimes even packing it really full when the logic IS related is bad enough. At one time I thought I'd be real clever and pack things close and snug. You know, to keep routing delays to a minimum. Ooopsie, now I ran out of 1-step routes. As in you have 1,2,4 step routes, and you had better balance a bit between those. So it was actually more optimal to have slightly more relaxed constraints, so it would actually route. I forgot what the actual error message was, but it boiled down to "sorry, ran out of possible routes". And of course those *bleep* marketing people at Xilinx thought it's be cool to have routing congestion info available for Virtex family, but not for spartan. Because extracting & displaying the exact same info for spartan is soooo much harder. Yeah, that's it.
 

Par report
=================================================
 

Seems to be a fairly readable warning...


Congestion due to packing too much into CLB. Please do not pack so much and try again. Thank you for your cooperation citizen.
 

Hardware Utilization Report
=======================================

========================================

- - - Updated - - -

Can you plz elaborate how not to over-pack?

- - - Updated - - -

Can you plz elaborate how not to over-pack?
 

According to the utilization report the design isn't over packed, in fact the design isn't even packing unrelated logic together, and has enough resources (LUT, memory, mults) to fit. I'm somewhat surprised that smartxplorer didn't have at least 1 run that completed route.

Some questions you still haven't answered...

Q1) How did you select the I/O placement? (see post #11)

Q2) What settings are your using for XST?

Q3) What are the map settings?


I would also take a look at the design in planahead (if you have it) or in FPGA editor and take a look at where the congestion problem is located. I've played tricks like forcing regions of SLICES that aren't allowed so that it spreads out logic so it has more available routing resources.

regards.
 
Last edited:

You can get precisely this if you use RLOCs to stuff it too close together. That way it's starved for routing resources while the device is still far from full. Indeed, best check it in planahead. Although for the non-expensive devices I have not been able to find an actual useful metric to display to check routing congestion. Other than my usual "gee, that looks a bit crowded" eyeballing approach. Which usually works as well, but sometimes an overlay that just shows it would be nice. So if you know one that works for let say spartan-6 please tell me.
 

Your problem is here:

Number of Slices 4386 out of 4656 94%

You need additional slices for routing, and ISE cannot find a way to route your logic with only this 6% remaining slices. By the way, you are using a XC3S500E (which has 4656 Slices), not a 100E.

You need to use less slices. Better way to accomplish this is removing some register, using less pipelines, using less distributed memory (maybe using additional BRAM instead distributed RAM?), or having a better use of your flip-flops (using enable when possible, using flip-flop reset instead zero attribution).
 

Status
Not open for further replies.
Cookies are required to use this site. You must accept them to continue using the site. Learn more…