library ieee ;
use ieee.std_logic_1164.all ;
use ieee.numeric_std.all ;
use ieee.math_real.all ;
entity frame_assembler is
generic
(
messages_per_frame : positive := 3 ;
bits_per_symbol : positive := 2 ;
symbols_per_message : positive := 2
) ;
port
(
IN_CLOCK : in std_logic ;
IN_RESET : in std_logic ;
IN_VALID : in std_logic ;
IN_EMPTY : in unsigned ( positive ( ceil ( log2 ( real ( symbols_per_message + 1 ) ) ) ) - 1 downto 0 ) ;
IN_DATA : in std_logic_vector ( bits_per_symbol * symbols_per_message - 1 downto 0 ) ;
OUT_DATA : out std_logic_vector ( bits_per_symbol * symbols_per_message * messages_per_frame - 1 downto 0 )
) ;
end entity frame_assembler ;
architecture synthesizable_frame_assembler of frame_assembler is
constant bits_per_message : positive := bits_per_symbol * symbols_per_message ;
constant bits_per_frame : positive := bits_per_message * messages_per_frame ;
signal number_of_valid_symbols : unsigned ( positive ( ceil ( log2 ( real ( symbols_per_message + 1 + 1 ) ) ) ) - 1 downto 0 ) ;
signal number_of_valid_bits : unsigned ( positive ( ceil ( log2 ( real ( IN_DATA ' length + 1 ) ) ) ) - 1 downto 0 ) ;
signal current_bit : unsigned ( positive ( ceil ( log2 ( real ( OUT_DATA ' length + 1 ) ) ) ) - 1 downto 0 ) ;
signal next_bit : unsigned ( positive ( ceil ( log2 ( real ( OUT_DATA ' length + 1 ) ) ) ) - 1 downto 0 ) ;
signal long_data : std_logic_vector ( OUT_DATA ' range ) ;
begin
number_of_valid_symbols <= symbols_per_message - resize ( IN_EMPTY , number_of_valid_symbols ' length ) ;
number_of_valid_bits <= resize ( bits_per_symbol * number_of_valid_symbols , number_of_valid_bits ' length) ;
next_bit <= current_bit + resize ( number_of_valid_bits , current_bit ' length ) ;
process ( IN_CLOCK , IN_RESET ) is
begin
if IN_RESET = '1' then
current_bit <= ( others => '0' ) ;
OUT_DATA <= ( others => '0' ) ;
elsif rising_edge ( IN_CLOCK ) then
if IN_VALID = '1' then
current_bit <= next_bit ;
for i in 0 to ( bits_per_message - 1 ) loop
for j in 0 to ( bits_per_frame - 1 ) loop
if j >= current_bit then
OUT_DATA ( i + j ) <= IN_DATA ( i ) ;
end if ;
end loop ;
end loop ;
end if ;
end if ;
end process ;
end architecture synthesizable_frame_assembler ;
Your code will not work as it is now, Data_In only holds one symbol.No, the 'if' statement will synthesize into a clock enable, it's not muxing together any of the input bits, which is exactly what I said in the comment line with the posted code. The exact same thing will happen with the shift register approach.
std_match,
What about what I posted in #41?
Can you point out a scenario in which it will fail ?
My understanding of the problem is that the input symbol can end up at any of the 100 output symbol locations which does not require any datapath mux to implement. Instead, it requires clock enables which is the solution that I sketched out.Your code will not work as it is now, Data_In only holds one symbol.
Since the output symbols can come from any symbol position in the input data, muxes are needed for a single-clock-cycle solution. When your code is corrected, it will synthesize to muxes.
It's important for me to do it correctly but I'm having trouble understanding just by reading your explanation.both loops should be over symbols and a new loop that copies one symbol should be inside the "if" statement.
process ( IN_CLOCK , IN_RESET ) is
begin
if IN_RESET = '1' then
current_bit <= ( others => '0' ) ;
OUT_DATA <= ( others => '0' ) ;
elsif rising_edge ( IN_CLOCK ) then
if IN_VALID = '1' then
current_bit <= next_bit ;
for i in 0 to ( bits_per_message - 1 ) loop
for j in 0 to ( bits_per_frame - 1 ) loop
if j >= current_bit then
OUT_DATA ( i + j ) <= IN_DATA ( i ) ;
end if ;
end loop ;
end loop ;
end if ;
end if ;
end process
It is OK to copy both valid an non-valid symbols. The problem is that the calculation of the next symbol position depends on the number of valid symbols. The symbol to copy from in the next message depends on "empty" for the current message. If wrapping is not allowed, the first output symbol must come from the first input symbol. Depending on "empty" for the first input words, the second output symbol can come from the second symbol in the first word or the first symbol in the second word. The second output symbol will need a 2-input mux. The third output symbol will need a 3-input mux and so on. From output symbol 16 and up, 16-input muxes are needed. If wrapping is allowed, all output symbols need 16-input muxes.What you're saying with "output symbols can come from any symbol position in the input data" doesn't make much sense on the surface. The output is a collection of the time sequenced series of input symbols, where a valid input can contain multiple symbols. Once one knows the starting position to put the first symbol of the incoming message, the entire input message (valid and not valid symbols) can be copied over to the output message. Even if there is only one valid input symbol and the entire input is copied over this is not a problem because a subsequent input symbol will come along to replace it and the output is not flagged as valid until the entire bigger message has been assembled.
The advantage to copying the entire input message is that there is then no dependency on 'empty' as it relates to where to put the input message. Empty is used only to determine where to put the next incoming data message, it is not needed to figure out what needs to be done with the current incoming message.
It's important for me to do it correctly but I'm having trouble understanding just by reading your explanation.
Can you please show me what you mean by rewriting this process according to your notes?
process ( IN_CLOCK , IN_RESET ) is
begin
if IN_RESET = '1' then
current_bit <= ( others => '0' ) ;
OUT_DATA <= ( others => '0' ) ;
elsif rising_edge ( IN_CLOCK ) then
if IN_VALID = '1' then
current_bit <= next_bit ;
for i in 0 to ( symbols_per_message - 1 ) loop
for j in 0 to ( symbols_per_frame - 1 ) loop
if j >= current_output_symbol and j * bits_per_symbol <= OUT_DATA'HIGH then -- edited
for k in 0 to bits_per_symbol - 1 loop
OUT_DATA ( (i + j) * bits_per_symbol + k) <= IN_DATA ( i * bits_per_symbol + k) ;
end loop;
end if ;
end loop ;
end loop ;
end if ;
end if ;
end process
It is OK to copy both valid an non-valid symbols. The problem is that the calculation of the next symbol position depends on the number of valid symbols. The symbol to copy from in the next message depends on "empty" for the current message. If wrapping is not allowed, the first output symbol must come from the first input symbol. Depending on "empty" for the first input words, the second output symbol can come from the second symbol in the first word or the first symbol in the second word. The second output symbol will need a 2-input mux. The third output symbol will need a 3-input mux and so on. From output symbol 16 and up, 16-input muxes are needed. If wrapping is allowed, all output symbols need 16-input muxes.
This is taken care of by the Avalon Source and shall never happen - The number of seats equals the number of guests.What do do if there is two valid symbols in a particular incoming message and only space for one in the outgoing message is still an open question.
My understanding of the problem is that the input symbol can end up at any of the 100 output symbol locations which does not require any datapath mux to implement. Instead, it requires clock enables which is the solution that I sketched out.
My comment was not an argument about storing non-valid data or not. I was trying to explain why a single-cycle solution must use muxes. Storing non-valid data or not has no effect on the muxes, but it affects the comparators that decide if something should be stored or not in the output symbols (the write enables). The comparators are maybe simpler if also non-valid symbols are written, but one comparator per output symbol is needed anyway.So with Kevin's approach - it'd work correctly but with suboptimal performance ?It is OK to copy both valid an non-valid symbols. The problem is that the calculation of the next symbol position depends on the number of valid symbols. The symbol to copy from in the next message depends on "empty" for the current message. If wrapping is not allowed, the first output symbol must come from the first input symbol. Depending on "empty" for the first input words, the second output symbol can come from the second symbol in the first word or the first symbol in the second word. The second output symbol will need a 2-input mux. The third output symbol will need a 3-input mux and so on. From output symbol 16 and up, 16-input muxes are needed. If wrapping is allowed, all output symbols need 16-input muxes.
OK, that is where we diverged. Shaiko did not specify (or I missed it if he did), the format of the input data. My interpretation is that while the number of valid symbols could vary from transfer to transfer (as indicated by 'empty'), the location of the 'first' (and subsequent) symbols on a given data transfer does not change. So if a single byte is being transferred, it would be on bits 7-0. If two bytes are being transferred, the 'first' byte would still be from bits 7-0, the 'second' would come from bits 15-8 of the input data.The symbol to copy from in the next message depends on "empty" for the current message.
I think we're on the same page that the number of valid input bytes is 1-16 (if there are 0 valid, then one would think that the 'valid' signal would not be asserted).That sounds like a different and easier problem. The problem I was looking at allows 0-16 bytes to be valid on each cycle, and concatenates these input bytes until 12800 are accumulated.
No, the number of valid symbols does not affect the comparators in my approach. The comparison is with the signal that tracks where to put only the first symbol. The compare operation and therefore the write enable calculation is totally independent of the current value of 'empty'.Storing non-valid data or not has no effect on the muxes, but it affects the comparators that decide if something should be stored or not in the output symbols (the write enables). The comparators are maybe simpler if also non-valid symbols are written, but one comparator per output symbol is needed anyway.
OK, that is where we diverged. Shaiko did not specify (or I missed it if he did), the format of the input data. My interpretation is that while the number of valid symbols could vary from transfer to transfer (as indicated by 'empty'), the location of the 'first' (and subsequent) symbols on a given data transfer does not change. So if a single byte is being transferred, it would be on bits 7-0. If two bytes are being transferred, the 'first' byte would still be from bits 7-0, the 'second' would come from bits 15-8 of the input data.
If the location of the symbols do vary as you think they do, that would be a screwy protocol, but you're right you would need muxes to unscramble it all.
My interpretation is that while the number of valid symbols could vary from transfer to transfer (as indicated by 'empty'), the location of the 'first' (and subsequent) symbols on a given data transfer does not change. So if a single byte is being transferred, it would be on bits 7-0. If two bytes are being transferred, the 'first' byte would still be from bits 7-0, the 'second' would come from bits 15-8 of the input data.
I think we're on the same page that the number of valid input bytes is 1-16 (if there are 0 valid, then one would think that the 'valid' signal would not be asserted).
The input width is 16 bytes with a variable number of valid bytes (as indicated by 'empty') and the input data is always aligned to one end or the other. For convenience, I'm assuming that the first valid byte is on bits 7-0 on every transaction regardless of the value of 'empty' and regardless of what happened on previous cycles.
I doubt it, there is probably an error. But you need to write and test that in the simulator, not me.1. Kevin, will your code (started on post #29 and perfected in post #40) do the job ?
Perhaps you should create a testbench and test it yourself rather than expecting someone else to do it for you.2. std_match, will your code (post #47) do the job ?
Once you've written and tested both, you'll have the answer. Perhaps you'll consider sharing your results and post them to this forum. You know, give back a little.3. If the answer to both 1 & 2 is "yes" - What is the functional difference between both ?
By all means - I never implied that someone should do it for me.I doubt it, there is probably an error. But you need to write and test that in the simulator, not me.
No, the number of valid symbols does not affect the comparators in my approach. The comparison is with the signal that tracks where to put only the first symbol. The compare operation and therefore the write enable calculation is totally independent of the current value of 'empty'.Storing non-valid data or not has no effect on the muxes, but it affects the comparators that decide if something should be stored or not in the output symbols (the write enables). The comparators are maybe simpler if also non-valid symbols are written, but one comparator per output symbol is needed anyway.
The calculation for where to put the first symbol for the next transaction does depend on the current value of empty.
With that said I have 3 questions:
1. Kevin, will your code (started on post #29 and perfected in post #40) do the job ?
2. std_match, will your code (post #47) do the job ?
3. If the answer to both 1 & 2 is "yes" - What is the functional difference between both ?
No. But my boss does.Do you really plan to have 128 input bits and 12800 output bits?
Around 250MHz.What is the rate of the incoming data?
Would you agree that the same thing can be achieved using a modification of my own code?Writing outside of My_Big_Message must be prohibited. One solution is to add dummy bits that are written but not read. They will be removed at synthesis.
process ( IN_CLOCK , IN_RESET ) is
begin
if IN_RESET = '1' then
current_bit <= ( others => '0' ) ;
OUT_DATA <= ( others => '0' ) ;
elsif rising_edge ( IN_CLOCK ) then
if IN_VALID = '1' then
current_bit <= next_bit ;
for i in 0 to ( bits_per_message - 1 ) loop
for j in 0 to ( bits_per_frame - 1 ) loop
if j >= current_bit then
[COLOR="#FF0000"] if ( i + j ) <= ( bits_per_frame - 1 ) then[/COLOR]
OUT_DATA ( i + j ) <= IN_DATA ( i ) ;
end if ;
end if ;
end loop ;
end loop ;
end if ;
end if ;
end process ;
-- declare a "stretched_frame" signal to prevent vector boundary runout during assignment to OUT_DATA:
signal stretched_frame : ( 128 + bits_per_frame - 1 downto 0 ) ;
process ( IN_CLOCK , IN_RESET ) is
begin
if IN_RESET = '1' then
current_bit <= ( others => '0' ) ;
stretched_frame <= ( others => '0' ) ;
elsif rising_edge ( IN_CLOCK ) then
if IN_VALID = '1' then
current_bit <= next_bit ;
for i in 0 to ( bits_per_message - 1 ) loop
for j in 0 to ( bits_per_frame - 1 ) loop
if j >= current_bit then
[COLOR="#FF0000"]stretched_frame ( i + j ) <= IN_DATA ( i ) ;[/COLOR]
end if ;
end loop ;
end loop ;
end if ;
end if ;
end process ;
[COLOR="#FF0000"]OUT_DATA <= stretched_frame ( bits_per_frame - 1 downto 0 ) ;[/COLOR]
I agree. Muxing is required for symbols within a message, but not required for messages within the big message.This means that all output symbol locations except the first will need a mux to get the data from the correct input symbol.
I agree. Muxing is required for symbols within a message, but not required for messages within the big message.
I think you didn't post the following:And that's exactly what I've seen in the RTL viewer.
Kevin, what do you think about what I wrote in post #57?
We use cookies and similar technologies for the following purposes:
Do you accept cookies and these technologies?
We use cookies and similar technologies for the following purposes:
Do you accept cookies and these technologies?