I'm not sure what experience you have designing hardware...it looks like all you've done in the past is software. Comments like
Values are all double precision for better accuracy
and
I have a vector and a matrix of hexadecimal values stored in .mem files.
seem to indicate a software view of the design.
centers matrix is of eight rows and each row with eight entries in each row
This would mean you have an 8x8 matrix of double precision values (sw hat on...64-bit IEEE-754 values). Or in other words sixty-four 64-bit IEEE-754 values.
You have a centers matrix? of
Code:
reg [63:0] centers [7:0];
This looks like only eight 64-bit values not a matrix of sixty-four 64-bit values. This looks like the first architectural error in the design.
The majority of Verilog coders I've met code memory arrays like this:
Code:
reg [63:0] centers [0:63];
using the opposite direction of the indices for the address. When displayed in a simulator the ordering when expanding the array shows [0] at the top and [63] at the bottom of the list. This also correctly makes the array a 8x8 matrix ([0:64]) of 64-bit ([63:0]) values.
I hope you understand that this code does not iterate for each clock cycle, e.g. clock 1 i=0, clock 2 i=1, etc. Instead it unrolls the loop assigning the eight test_input and center arrays to flat 512-bit packed words.
Code Verilog - [expand] |
1
2
3
4
5
6
7
8
9
10
| integer i;
always @ (posedge aclk)
begin
for (i = 0; i < 8; i = i+1)
begin
test_flat[64*i+: 64] <= test_input[i];
center_flat[64*i+: 64] <= centers[i];
end
end |
Assuming that you knew that...this only assigns the first row of centers every clock cycle. Not multiple rows, you need to index through multiple rows, which you can't do as there is only one row in your centers matrix to begin with.
I also don't get why you would code the testbench with the correct named association for the instantiated module and then use positional association for the instantiated submodule. Using positional associated port mapping is prone to errors and I would point this out in a code review.
The fact that you're bit banging the interface with the large initial block seems to indicate you don't understand how to write a good testbench that uses bus functional models.
-----
As an aside, I'm not sure why you want to use 64-bit IEEE-754 floating point math in an FPGA hardware design. This 8 copies of a IEEE-754 subtraction is going to use a lot of resources, let me repeat that,
it's going to use a lot of resources. The majority of FPGA designs use either fixed point or integer math and can therefore take advantage of the hard IP DSP blocks in the majority of FPGA vendors offerings. Unless your data has a dynamic range that is so enormous that it can't fit in less than 1024-bits in fixed/integer there is no reason to be using floating point.