[SOLVED] square root output in two clockcycle....

Status
Not open for further replies.

dipin

Full Member level 4
Joined
Jul 16, 2014
Messages
223
Helped
14
Reputation
28
Reaction score
14
Trophy points
18
Visit site
Activity points
1,731
HI,

Did anyone know any methode to find squareroot using 2 clockcycle???

in xilinx ip core (cordic), with out pipelining with 2 clok latency got output.

but in fully pipeline mode it will take 5 clocks.

i have checked cordic algoritham but it will atleast take N/2 clockcycle.

i have written program using cordic and nonrestoring algoritham. both programs took N/2 clockcycle to get

output.(N-input width).how can i get output with in 2 clockcxycle if any one know how please help. i got tired

of searching in intenet.

thanks and regards
 

Depending on the resolution you require, etc., you could use a lookup table approach. This would obviously require significant memory resources, but if you are willing to trade-off real-estate for speed...
 

You can always line-up multiple iterations of the cordic algorithm in a single clock cycle, at the expense of logic resources and reduced maximum clock speed.
 
Reactions: dipin

    dipin

    Points: 2
    Helpful Answer Positive Rating
hi

this is the link i reffered
https://www.convict.lu/Jeunes/Math/square_root_CORDIC.htm

actually is this really the cordic methode????
because in cordic methode they wont use multiplier. but in this i need a multiplier.in internet for integers i didnt get any examples for cordic methode except this. :bang:

plz give your suggestion.

regards
 

You didn't mention why you want to use cordic method?
hi,
thanks for the replay fvm
i need to reduce the number of clock cycle it takes. for nonrestoring division it takes n/2 clockcycle . ( code posted in previous thred).
more over in xilinx cordic core they are getting output in 2 clockcycle (with out pipelining). so i thought of trying cordic methode so that i can reduce the clock cycle.
for 32 bit input atleast i need to get output in 6 clockcycle


thanks and regards
 

What's the FMAX u want to achive, because 32bit is large enought.
32 bits is from 16 bit X*2+Y*2 ??
 

HI
What's the FMAX u want to achive
,first i need to do is to reduce the number of clockcycle .

because 32bit is large enought.
32 bits is from 16 bit X*2+Y*2 ??[/
really sorry i didnt get this ??
thanks & regards
 

The question was, why do u need 32bit square root c ? Is it because data is from 32 bit ADC ?
You wrote that u tried cordic algorithm ( which required X,Y cordinates)
 


So i fmax is not critical u can create 16bit ROM with SQRT of address line in it as output. Then as an input put 16 MSB of your input word that aren't zeros. Then rotate output word left to get real number.
pseudo code here:
Code:
  if rising_edge(CLK_i) then    
    if    INPUT(31 downto 16) = x"0000" then OUTPUT <=  SQRT_ROM(INPUT(15 downto 0));  
    elsif INPUT(31 downto 20) = x"000"  then OUTPUT <=  ROTATE_LEFT (SQRT_ROM(INPUT(19 downto 4),2);  
    elsif INPUT(31 downto 24) = x"00"   then OUTPUT <=  ROTATE_LEFT (SQRT_ROM(INPUT(23 downto 8),4);
    elsif INPUT(31 downto 28) = x"0"    then OUTPUT <=  ROTATE_LEFT (SQRT_ROM(INPUT(27 downto 12)))),6);  
    else  OUTPUT <=  ROTATE_LEFT (SQRT_ROM(INPUT(31 downto 16),8);        
    end if;
  end if;

the error of truncating in this method should be less then 1% (around 0.5%)
 
Reactions: dipin

    dipin

    Points: 2
    Helpful Answer Positive Rating
HI,

thanks for the replay.
can you please comment a little bit more about the above code. iam not able to get it completely.
thanks
 

1. Create a 16-bit ROM with SQRT address line as output.
example address is "0101_0100_0111_0101" (21621 dec) -> value in that address field is "1001_0011" (147 dec)

2.Input signal is 32-bit wide but ROM is only 16bit so there is need to shrink input word. From base math u know that SQRT(4x) = 2 * SQRT(x)
Multiply by two is equall to single rotation left.
example input word "0101_0100_0111_0101_01" (86485 dec) = (294,0833....) -> so u take "0101_0100_0111_0101" as rom address (147 dec as output) and rotate is left once (294 dec) output.

3. Checking for zeros if condition is because if u had low input signal like 4 (0000_0000_0000_0000_0100") and placed "0000_0000_0000_0000" in rom output would be 0 instead of 2

4. You can generqate SQRT rom values from a functions like:

Code:
  FUNCTION SQRT2 (Number_of_samples : integer) RETURN unsigned_array IS
  variable result_v : unsigned_array(0 to Number_of_samples-1) := (others=>(others=>'0'));
  begin						
    for i in result_v'range loop
  	   result_v(i) := to_unsigned(integer(SQRT(real(i))),resize_to_number_of_bytes_u_want);
  	 end loop;
  RETURN result_v;
  END FUNCTION SQRT2;

using math_real library (VDHL) dunno the verilog libraries and code standards

hope that helped.
 
The solution proposed by axcdd it's nice, but you are going to need a large ROM.

I have never tried but maybe you could also split in segments the function sqrt(X) using linear aproximations, so you will finally need to save in ROM some "m" and "b" parameters of the "y=mx+b". instead of having the SQRT function, you will have a few linear functions. As the index you could use maybe the MSByte. You still need to check the error of this method because i've never tried with a SQRT function. Also the idea is that you use less ROM that the axcdd method, otherwise i guess is not worth it. try in matlab.

does it make sens?
 
Reactions: dipin

    dipin

    Points: 2
    Helpful Answer Positive Rating
Table interpolation (piecewise linear approximation) makes sense for most math functions. I'm e.g. representing a half quarter of atan function with a 256 point table, achieving 16 bit accuracy.
 

Status
Not open for further replies.
Cookies are required to use this site. You must accept them to continue using the site. Learn more…