Yes, my mistake.128 (KJ: Did you mean 0 here?)
Next_Symbol <= Next_Symbol + MAX_SYMBOLS[COLOR="#FF0000"]-1[/COLOR]-empty;
Next_Symbol <= Next_Symbol + MAX_SYMBOLS - empty;
You should look into creating a smaller packer with 256 (ish) registers, where a variable amount of data is shifted in per cycle and then a 128b word is selected from it.
Actually, it should beYes, my mistake.
I think that "-1" is incorrect.Code:Next_Symbol <= Next_Symbol + MAX_SYMBOLS[COLOR=#ff0000]-1[/COLOR]-empty;
For example:
At system start, Next_Symbol = 0.
Suppose we get a message with only 1 valid symbol ( I.E, empty = 15 ).
With the "-1" in place, Next_Symbol would remain 0 ( when it should increment to 1 ).
I think it should be simply:
Am I right?Code:Next_Symbol <= Next_Symbol + MAX_SYMBOLS - empty;
Next_Symbol <= (Next_Symbol - empty) mod MAX_SYMBOLS;
I didn't simulate yours or mine, so I don't know.Next, I suppose you agree that what I wrote in #18 would also do the work.
If so, how would you compare it to what you wrote at #20?
What are the pros and cons of each approach ?
This part is not correct.Actually, it should be
If MAX_SYMBOLS is not a power of 2, then you might want to implement it as you noted.Code:Next_Symbol <= (Next_Symbol - empty) mod MAX_SYMBOLS;
library ieee ;
use ieee.std_logic_1164.all ;
use ieee.numeric_std.all ;
use ieee.math_real.all ;
entity message_assembler is
port
(
IN_CLOCK : in std_logic ;
IN_RESET : in std_logic ;
IN_VALID : in std_logic ;
IN_EMPTY : in unsigned ( 3 downto 0 ) ;
IN_DATA : in std_logic_vector ( 127 downto 0 ) ; -- Global clock.
OUT_DATA : out std_logic_vector ( 12799 downto 0 )
) ;
end entity message_assembler ;
architecture synthesizable_message_assembler of message_assembler is
signal whole_message : std_logic_vector ( 12799 downto 0 ) ;
signal number_of_valid_symbols : unsigned ( positive ( ceil ( log2 ( real ( 16 + 1 ) ) ) ) - 1 downto 0 ) ;
signal pointer : unsigned ( positive ( ceil ( log2 ( real ( 12799 + 1 ) ) ) ) - 1 downto 0 ) ;
begin
OUT_DATA <= whole_message ;
number_of_valid_symbols <= 16 - resize ( IN_EMPTY , 5 ) ;
process ( IN_CLOCK , IN_RESET ) is
begin
if IN_RESET = '1' then
whole_message <= ( others => '0' ) ;
elsif rising_edge ( IN_CLOCK ) then
if IN_VALID = '1' then
whole_message ( ( to_integer ( pointer ) + to_integer ( number_of_valid_symbols ) * 8 - 1 ) downto to_integer ( number_of_valid_symbols ) ) <= IN_DATA ( to_integer ( number_of_valid_symbols ) * 8 - 1 downto 0 ) ;
pointer <= pointer + to_unsigned ( to_integer ( number_of_valid_symbols ) * 8 , pointer ' length ) ;
end if ;
end if ;
end process ;
end architecture synthesizable_message_assembler ;
Yeah, and it persists in v14 as well.As it explains in the message you posted - its a quartus bug. Not a VHDL problem.
What do you mean by serial and linear shift registers in this case?The shift register can be 8*16 circular or 8*32 linear
What do you mean by serial and linear shift registers in this case?
Please explain in code.
shift_register_16_bytes(0 to 15); -- downto will also work
in_data_bits(127 downto 0);
-- The parallel load, only load the valid bytes. Always loading to the same bits means no muxex.
shift_register_16_bytes(0) <= in_data_bits(7 downto 0); -- Always load the lowest byte, empty can not be 16 ?
-- The following can be a loop, unrolled for clarity
if in_empty < 15 then
shift_register_16_bytes(1) <= in_data_bits(15 downto 8);
end if;
if in_empty < 14 then
shift_register_16_bytes(2) <= in_data_bits(23 downto 16);
end if;
if in_empty < 13 then
shift_register_16_bytes(3) <= in_data_bits(31 downto 24);
end if;
.....
if in_empty < 1 then
shift_register_16_bytes(15) <= in_data_bits(127 downto 120);
end if;
for i in 0 to 14 loop
shift_register_16_bytes(i) <= shift_register_16_bytes(i + 1); -- No muxes needed to shift one step
end hdl loop
shift_register_16_bytes(15) <= shift_register_16_bytes(0); -- This is the "circular" shift
-- When 16 bytes has been loaded and shifted, the shift_register_16_bytes array contains one complete output word (128 bits = 16 bytes).
-- No shifting is needed if all 16 bytes are received at once, that special case can be optimized for speed
shift_register_32_bytes(0 to 31) -- downto will also work. We load to (16 to 31) and get the output word in (0 to 15),
-- It will work even if one Avalon transfer wraps over two output words.
in_data_bits(127 downto 0);
-- Always load 128 bits. Always loading to the same bits means no muxex.
for i in 0 to 15 loop
shift_register_32_bytes(i+16) <= in_data_bits(i*8+7 downto i*8);
end loop
for i in 0 to 30 loop
shift_register_32_bytes(i) <= shift_register_32_bytes(i + 1); -- No muxes needed to shift one step
end hdl loop
-- The data in shift_register_32_bytes(0) is thrown away, but doesn't contain anything
-- When 16 bytes has been loaded and shifted, shift_register_32_bytes(0 to 15) contains one complete output word (128 bits = 16 bytes).
library ieee ;
use ieee.std_logic_1164.all ;
use ieee.numeric_std.all ;
use ieee.math_real.all ;
entity frame_assembler is
generic
(
messages_per_frame : positive := 3 ;
bits_per_symbol : positive := 2 ;
symbols_per_message : positive := 2
) ;
port
(
IN_CLOCK : in std_logic ;
IN_RESET : in std_logic ;
IN_VALID : in std_logic ;
IN_EMPTY : in unsigned ( positive ( ceil ( log2 ( real ( symbols_per_message + 1 ) ) ) ) - 1 downto 0 ) ;
IN_DATA : in std_logic_vector ( bits_per_symbol * symbols_per_message - 1 downto 0 ) ;
OUT_DATA : out std_logic_vector ( bits_per_symbol * symbols_per_message * messages_per_frame - 1 downto 0 )
) ;
end entity frame_assembler ;
architecture synthesizable_frame_assembler of frame_assembler is
constant bits_per_message : positive := bits_per_symbol * symbols_per_message ;
constant bits_per_frame : positive := bits_per_message * messages_per_frame ;
signal number_of_valid_symbols : unsigned ( positive ( ceil ( log2 ( real ( symbols_per_message + 1 + 1 ) ) ) ) - 1 downto 0 ) ;
signal number_of_valid_bits : unsigned ( positive ( ceil ( log2 ( real ( IN_DATA ' length + 1 ) ) ) ) - 1 downto 0 ) ;
signal current_bit : unsigned ( positive ( ceil ( log2 ( real ( OUT_DATA ' length + 1 ) ) ) ) - 1 downto 0 ) ;
signal next_bit : unsigned ( positive ( ceil ( log2 ( real ( OUT_DATA ' length + 1 ) ) ) ) - 1 downto 0 ) ;
signal long_data : std_logic_vector ( OUT_DATA ' range ) ;
begin
-- long_data is simply "IN_DATA" copied n times ( n = messages_per_frame ).
generating_long_data : for index in 0 to messages_per_frame - 1
generate
long_data ( ( bits_per_message * index + bits_per_message ) - 1 downto index * bits_per_message ) <= IN_DATA ;
end generate generating_long_data ;
number_of_valid_symbols <= symbols_per_message - resize ( IN_EMPTY , number_of_valid_symbols ' length ) ;
number_of_valid_bits <= resize ( bits_per_symbol * number_of_valid_symbols , number_of_valid_bits ' length) ;
next_bit <= current_bit + resize ( number_of_valid_bits , current_bit ' length ) ;
process ( IN_CLOCK , IN_RESET ) is
begin
if IN_RESET = '1' then
current_bit <= ( others => '0' ) ;
OUT_DATA <= ( others => '0' ) ;
elsif rising_edge ( IN_CLOCK ) then
if IN_VALID = '1' then
current_bit <= next_bit ;
for index in 0 to ( bits_per_frame - 1 ) loop
if index >= current_bit then -- Update only those bits that are left of the currenct pointer position ( don't touch the already updated bits ).
OUT_DATA ( index ) <= long_data ( index ) ;
end if ;
end loop ;
end if ;
end if ;
end process ;
end architecture synthesizable_frame_assembler
I scaled it down on purpose to make the RTL view more readable.The default values for the generics are much lower than the numbers you have mentioned earlier. I think this design will explode (= not "pretty simple") if you correct the bugs and use the earlier numbers (IN_DATA = 16 symbols * 8 bits).
That's the number of full IN_DATA messages in one frame.What is "messages_per_frame"?
You're correct - indeed a bug...thanks for pointing it out!Each symbol in OUT_DATA can only come from a certain symbol position in "IN_DATA". As "IN_EMPTY" is described earlier, each symbol in OUT_DATA could come from any symbol position in "IN_DATA".
If you want to do it in one clock cycle you must use muxes and that will get you into trouble if OUT_DATA is 1600 symbols = 12800 bits. Do you really want to deliver so many bits in parallel? What happens to the data in the next step?That's the number of full IN_DATA messages in one frame.std_match said:What is "messages_per_frame"?
For example: if a full "IN_DATA" message is 128 bits and the total frame size is 12800 bits then messages_per_frame = 12800 / 128 = 100.
You're correct - indeed a bug...thanks for pointing it out!std_match said:Each symbol in OUT_DATA can only come from a certain symbol position in "IN_DATA". As "IN_EMPTY" is described earlier, each symbol in OUT_DATA could come from any symbol position in "IN_DATA".
Can you suggest a code that fixes this issue while maintaining the loop approach ( no muxes ) ?
Yeah...the seven lines of code that I posted in #20 which are two nested for loops that computes 'My_Big_Message'Can you suggest a code that fixes this issue while maintaining the loop approach ( no muxes ) ?
Yeah...the seven lines of code that I posted in #20 which are two nested for loops that computes 'My_Big_Message'
Kevin
Yeah...the seven lines of code that I posted in #20 which are two nested for loops that computes 'My_Big_Message'
for [COLOR="#FF0000"]i[/COLOR] in 0 to (MSGS_PER_BIG_MSG-1) loop
if (Next_Symbol >= [COLOR="#FF0000"]i[/COLOR]) then -- KJ: Essentially creates the enable signal here
for [COLOR="#FF0000"]i[/COLOR] in Data_In'range loop
My_Big_Message(BITS_PER_SYMBOL * [COLOR="#FF0000"]i[/COLOR]) <= Data_In([COLOR="#FF0000"]i[/COLOR]);
end loop;
end if;
end loop;
No, the 'if' statement will synthesize into a clock enable, it's not muxing together any of the input bits, which is exactly what I said in the comment line with the posted code. The exact same thing will happen with the shift register approach.You have an if statement within the for loop that will synthesize into a mux.
As I said in the post, the code has errors, but is to demonstrate the approach which is that you loop through all of the possible places where the input can end up (i.e. the 'messages per big message') and as long as the starting place for the next symbol is above that loop index, then you store the data. By not relying on the 'empty' signal to compute the data, it will result in simpler and potentially faster logic. The 'empty' signal is only used to update where the next symbol will be stored, which is a different logic path that terminates in the flip flops that compute the next symbol.Why do you use the same iteration index ( i ) in both loops ?
Is this a typo?
for j in Data_In'range loop
My_Big_Message(BITS_PER_SYMBOL * i + j) <= Data_In(j);
We use cookies and similar technologies for the following purposes:
Do you accept cookies and these technologies?
We use cookies and similar technologies for the following purposes:
Do you accept cookies and these technologies?