Arbitrary Precision integer arithmetic support on Altera FPGA board

Status
Not open for further replies.

AkhilKalathungal

Newbie level 3
Joined
Dec 11, 2012
Messages
4
Helped
0
Reputation
0
Reaction score
0
Trophy points
1,281
Visit site
Activity points
1,324
Hi,

This requirement is for my Masters Thesis project. My professor would like to see the implementation of arbitrary precision
integer arithmetic on Altera FPGA DE-2 board. The HDL used should be Verilog. One way to implement is, to modify
the NIOS ii processor architecture (which has 32 bit registers) by adding custom instructions. So, if there is a need for 1024 bit
variable support, we can have data-chunks of size 32 bits * 32 numbers to construct those 1024 bits.
(Please do not ask me to create a reg of 0:1023 or bit 0:1023 in Verilog, we would like if the synthesized core can go and fit into the hardware properly, hence need this arbitrary precision logic to be implemented somehow).

Since I am a beginner in Verilog, one question I have is there a possiblity of linked list type of implementation in Verilog?
If the above requirement was in C, we would have used a linked list structure with the first 32 bits pointing to the address of the next 32 bits, and so on. I was wondering how to implement the same in Verilog.

Please advice me here and any small hints will be really appreciated.


Thank You,
Akhil
 

Please do not ask me to create a reg of 0:1023 or bit 0:1023 in Verilog.
I don't understand the relation of this statement to your question. You should distinguish how the custom number format is acessed/represented in NIOS operation and in FPGA hardware respectively Verilog. There can be no doubt that the numbers have to be handled as a bitvector in FPGA hardware.

A different question is which arithmetic operation you want to implement with a operand length of 1024 and if the respective logic can be expected to fit your available FPGA resources in a usual implementation.
 

Since I am a beginner in Verilog, one question I have is there a possiblity of linked list type of implementation in Verilog?
Yes you could but I wouldn't implement that in HW. Linked lists make sense when using them for DMA engines as those are controlled by SW using a linked list as the data may be out of order packet data being reconstructed.

In the case of arithmetic data the data is unlikely to be distributed randomly over addresses in a memory, so there wouldn't be any reason to use a linked list.

I suggest a design with two data start addresses, two data lengths, and a shared n-bit ALU resource that operates on the data and produces an output for each n-bits of data at its inputs. n could be 32-bits if you want a Nios to control the arbitrary word width ALU. So for your 1024-bit operation you would have to read 32 32-bit data sequentially from the start addresses and preform your ALU operation on them and store them into a results memory. Any carries would be routed back to the ALU for the next 32-bit read of the input data, until all the bits of the two inputs are operated on. You'll have to come up with a method to deal with inputs that are non-32-bit multiples.

I'm sure there are other approaches, but this is the first thing that came to mind.

BTW this design would require multiple clock cycles to perform an ALU operation that increases proportionally to the number of bits in the numbers being used.

-alan
 


Hi,

Thank you for the prompt reply. Here the problem statement is, I should be able to represent an integer of any infinite length (say 1024 bits) and should be able to do the math operations in it. Like addition, subtraction, multiplication, division etc. And since the NIOS ii architecture is 32 bit and supports only math on the 32 bit registers, my professor asked me to look into the implementation of arithmetic on data signals that could be more than 32 bits by adding custom instructions for the NIOS ii.

A typical example application is the RSA implementation on the FPGA board where the Public Key can be a very long integer value. And the encryption part in RSA is given by: C = M ^ e (mod n) ; where M is the message, e the Public Key and n the product of two prime numbers p and q selected at random.
The decryption is given by D = C ^ d (mod n); where d is the Private Key.
This typically needs some arbitrary precision arithmetic logic I hope since the number of bits can go really large.


I hope I was able to explain my requirement properly. Thanks in advance for your valid response.

Thank you,
Akhil
 


Hi alan,

Thank you for your answer and now I understand implementation of a linked list type is no good and I can use arrays at a better performance.
For adding 32 bits, you mean to suggest me a ripple carry adder (which needs a delay for the carry to propagate) or a carry look ahead adder
(which can calculate carry in advance at the cost of a carry look ahead circuitry). While adding 1024 bits, the result can also be 1024 bit, and
a carry bit might be. Is it possible for me to add more custom instructions to NIOS ii core, example ADD1024 which can add two 1024 bit long operands
and put the results into a result register where the data chunks per word can be 32 bits, since the size of the register in NIOS ii architecture is 32 bits.
Which will be the best method of implementation?

Thank you in advance for your valid response.

Thanks,
Akhil
 

you can test both types of adders, as well as any of the other options.

The FPGA has a dedicated carry chain as well as fast logic for adders. This makes the default "x+y" inferred adder almost always the best choice. A ripple-carry adder will end up with a long chain of highly optimized logic and routing. A carry lookahead (and other non-ripple based options) will not be able to make use of this. The generate/propagate operations are also fairly small and may not make good use of the LUTs on the FPGA. The routing is also difficult and doesn't scale well either. This makes carry-lookahead and other similar methods uncompetitive in an FPGA. An example chart is here: https://cdstahl.org/?attachment_id=22 .

Likewise, the DSP slices will have fast, low power adders which can be used.

You can also pipeline additions (and accumulations) by breaking the addition up into smaller (eg 32/64b operations). This may be useful even for the inferred adders. The dedicated resources force the adder to use specific relative locations on the FPGA, and this can result in less than ideal placement constraints during implementation.
 

Status
Not open for further replies.
Cookies are required to use this site. You must accept them to continue using the site. Learn more…