[SOLVED] how xilinx frequency is more, what are the techniques they are using

Status
Not open for further replies.

achaleus

Member level 5
Joined
Dec 21, 2012
Messages
85
Helped
5
Reputation
10
Reaction score
5
Trophy points
1,288
Location
Bangalore
Visit site
Activity points
1,866
hi all, I done project on 1024 fft 32 bit real and 32bit complex input.. my freq is 125 MHZ after post par for vertex 6.. how xilinx is achieving 366 to 400 Mhz involving complex multiplier in it for the same device please suggest and how to improve my frequency..

suggestions are much appreciated

thank you.
 

A lot of time, effort, elaboration, and a good team of engineers - this is how Xilinx is achieving better performance

Have you pipelined your design? This can improve speed, but latency is also rising
 

The key points will he lots of pipelining and keeping logic to a max of 1 LUT between each register. Then probably a load of good timing constraints.
 

but multiplier degrades the whole frequency.. even it is pipelined (I done 32x32 to 16x16 mac which is giving 200Mhz after post par)
 

are you using the embedded multipliers, and feeding them with a high enough clock rate?
 

yes I am using 16x16 multiplier (using simple * operator). which requires in radix calculation
 

I think you are not instantiating the inbuilt multplier for your task. Ensure this first...
Secondly as mentioned add pipeline registers whereever possible....Do selected block placement so that all other logic excluding multiplier will be around multiplier block. It will reduce unwanted routing delays...
In Virtex 6 FPGA DSP48E1 slices there is 25x18 multipler..with optional pipeline.. this you should instantiate in your design....Keep the pipeline opton on. and add pipeline to other logic too..inoder to improve over all performance.
Hope this is worth for you..
 

using * probably wont give the best solution. You probably need to instantiate the hardware macro, and ensure the inbuilt pipeline registers are used properly (using * tends not to be the most efficient for this). Also, make sure there are enough pipeline stages after the multipler to ensure that when P&R completed the distance from the DSP slice to the next register isnt too long (this can often be a bottleneck, as you get a time penalty routing into and out of DSP and ram blocks)

To get top top speeds, Ive seen people ditch the hardware multipliers and go for LUTs instead.
 

but I have limitation not to use multiplier from coregen... i have to develop from scratch
 

any reason why? without using the coregen version, you will not get optimal performance. The synthesisor can only infer so much - to make it work optimally you need to guide it in the right places.
 

Without core generator...Getting equivalent or better performance over what mentioned by xilinx...is itself can be a small project....It will take more time and in depth study.
Well, you have to check for license core gen so that you can use inbuilt multiplier.....
 

but what are the other specs? do they say you must reach 400 MHz? or is 125MHz fine? if its the latter, why are you so concerned? if you have to read 400MHz, you will have to use coregen, or instantiate the library functions yourself.

A spec that says "no coregen" implies they want platform independent code. This independence comes at the cost of speed.
 
yes you are right, but they said to reach at least above half of xilinx frequency for vertex 6.. there comes the problem
 

then its probably a problem of pipelining. 200MHz should be easily achievable without core gen. Look at the timing reports and look at the worst path. Look at the number of layers of logic between the registers, and increase the pipe lining. then re-run simulations, get it working, then timing, then increase pipelining - keep on repeating until you get your desired FMax.
 

thanks for your help.. will post again if I found any problem... I will do that pipelining thing.. but I have to look at multiplier logic 32*32(16*16 mac is done) where the critical path was
any suggestion for implementing multiplier is much appreciable
 

my specs are given that not to use coregen.. its like code of our own

Just out of curiousity, do you know what the reason is for this "no coregen"? This requirement seems like a good way to ensure the project becomes more expensive and has lower performance for a given amount of hours spent.
 

Just out of curiousity, do you know what the reason is for this "no coregen"? This requirement seems like a good way to ensure the project becomes more expensive and has lower performance for a given amount of hours spent.

I dont know - I always avoid coregen/megawizard as much as I can, but until recently Ive only had clock frequencies around 125MHz.
 

Well, the main reason I ask is that in my experience those sort of hard "never use XYZ" rules tend to have ... curious motivations. Not always, but more often than not. So I was curious what the reason would be this time.

As for use coregen or not ... I am not advocating to use it all the time, but I am advocating to use it where it is sensible to do so. For example, since I don't happen to have super awesome code for a better fifo solution lying around I use coregen to spit out fifo's. Those are better than what I manage to come up with.

On the other hand, to do for example clock generation I tend to use my own stuff because coregen is funny sometimes.
 

Just out of curiousity, do you know what the reason is for this "no coregen"? This requirement seems like a good way to ensure the project becomes more expensive and has lower performance for a given amount of hours spent.

They said if we use coregen we use FFT directly rather given to you.. don't know the reason
 

Status
Not open for further replies.
Cookies are required to use this site. You must accept them to continue using the site. Learn more…