Accessing records in VHDL

Status
Not open for further replies.
Thanks Kevin,
My own code has evolved into something very similar:

Code:
library ieee ;
use ieee.std_logic_1164.all ;
use ieee.numeric_std.all ;
use ieee.math_real.all ;
	
entity frame_assembler is 

generic 
(
  messages_per_frame : positive := 3 ;	
  bits_per_symbol : positive := 2 ;
  symbols_per_message : positive := 2 
) ;

port						
( 		
  IN_CLOCK : in std_logic ;	
  IN_RESET : in std_logic ;	
  IN_VALID : in std_logic ;
  IN_EMPTY : in unsigned ( positive ( ceil ( log2 ( real ( symbols_per_message + 1 ) ) ) ) - 1 downto 0 ) ; 
  IN_DATA : in std_logic_vector ( bits_per_symbol * symbols_per_message - 1 downto 0 ) ;
  
  OUT_DATA : out std_logic_vector ( bits_per_symbol * symbols_per_message * messages_per_frame - 1 downto 0 ) 
) ;	 	 			   
        
end entity frame_assembler ;

architecture synthesizable_frame_assembler of frame_assembler is

constant bits_per_message : positive := bits_per_symbol * symbols_per_message ;
constant bits_per_frame : positive := bits_per_message * messages_per_frame ;

signal number_of_valid_symbols : unsigned ( positive ( ceil ( log2 ( real ( symbols_per_message + 1 + 1 ) ) ) ) - 1 downto 0 ) ; 
signal number_of_valid_bits : unsigned ( positive ( ceil ( log2 ( real ( IN_DATA ' length + 1 ) ) ) ) - 1 downto 0 ) ; 
signal current_bit : unsigned ( positive ( ceil ( log2 ( real ( OUT_DATA ' length + 1 ) ) ) ) - 1 downto 0 ) ; 
signal next_bit : unsigned ( positive ( ceil ( log2 ( real ( OUT_DATA ' length + 1 ) ) ) ) - 1 downto 0 ) ; 

signal long_data : std_logic_vector ( OUT_DATA ' range ) ;  

begin
	
number_of_valid_symbols <= symbols_per_message - resize ( IN_EMPTY , number_of_valid_symbols ' length ) ;
number_of_valid_bits <= resize ( bits_per_symbol * number_of_valid_symbols , number_of_valid_bits ' length) ;
next_bit <= current_bit + resize ( number_of_valid_bits , current_bit ' length ) ;

process ( IN_CLOCK , IN_RESET ) is
begin
  if IN_RESET = '1' then 
     current_bit <= ( others => '0' ) ;
     OUT_DATA <= ( others => '0' ) ;
  elsif rising_edge ( IN_CLOCK ) then
     if IN_VALID = '1' then
        current_bit <= next_bit ; 
        for i in 0 to ( bits_per_message - 1 ) loop
          for j in 0 to ( bits_per_frame - 1 ) loop
            if j >= current_bit then
              OUT_DATA ( i + j ) <= IN_DATA ( i ) ;
            end if ;
          end loop ;	
       end loop ;
    end if ;	
  end if ;	
end process ;
	
end architecture synthesizable_frame_assembler ;

Would you agree that it basically does the same thing?
 

No, the 'if' statement will synthesize into a clock enable, it's not muxing together any of the input bits, which is exactly what I said in the comment line with the posted code. The exact same thing will happen with the shift register approach.
Your code will not work as it is now, Data_In only holds one symbol.
Since the output symbols can come from any symbol position in the input data, muxes are needed for a single-clock-cycle solution. When your code is corrected, it will synthesize to muxes.
 

std_match,

What about what I posted in #41?
Can you point out a scenario in which it will fail ?
 

std_match,

What about what I posted in #41?
Can you point out a scenario in which it will fail ?

The basic principle could work, but the code is bad. You will assign outside of OUT_DATA at the end of the "j" loop.
The outer loop should be over symbols instead of bits, otherwise you will probably create muxes with too many inputs (most of them will not be used) and too many comparators, if not the synthesis tool is very, very smart. As the code is now, the muxes will probably have 128 inputs. The muxes only need 16 inputs.

The best solution depends on the input and output sizes, and the speed requirement. Please synthesize with the correct numbers. The numbers you gave in #35 will require 1600 16-input muxes and 1600 comparators for a single-cycle solution if you loop over symbols.

Edit: both loops should be over symbols and a new loop that copies one symbol should be inside the "if" statement.
 
Last edited:
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
My understanding of the problem is that the input symbol can end up at any of the 100 output symbol locations which does not require any datapath mux to implement. Instead, it requires clock enables which is the solution that I sketched out.

What you're saying with "output symbols can come from any symbol position in the input data" doesn't make much sense on the surface. The output is a collection of the time sequenced series of input symbols, where a valid input can contain multiple symbols. Once one knows the starting position to put the first symbol of the incoming message, the entire input message (valid and not valid symbols) can be copied over to the output message. Even if there is only one valid input symbol and the entire input is copied over this is not a problem because a subsequent input symbol will come along to replace it and the output is not flagged as valid until the entire bigger message has been assembled.

The advantage to copying the entire input message is that there is then no dependency on 'empty' as it relates to where to put the input message. Empty is used only to determine where to put the next incoming data message, it is not needed to figure out what needs to be done with the current incoming message.

What do do if there is two valid symbols in a particular incoming message and only space for one in the outgoing message is still an open question. I pointed that out in my post with the code. I haven't seen a response on that point, so I'm assuming for now that situation does not occur due to other factors in the system that have not been put forth.

Kevin Jennings
 
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
both loops should be over symbols and a new loop that copies one symbol should be inside the "if" statement.
It's important for me to do it correctly but I'm having trouble understanding just by reading your explanation.
Can you please show me what you mean by rewriting this process according to your notes?

Code:
process ( IN_CLOCK , IN_RESET ) is
begin
  if IN_RESET = '1' then 
     current_bit <= ( others => '0' ) ;
     OUT_DATA <= ( others => '0' ) ;
  elsif rising_edge ( IN_CLOCK ) then
     if IN_VALID = '1' then
        current_bit <= next_bit ; 
        for i in 0 to ( bits_per_message - 1 ) loop
          for j in 0 to ( bits_per_frame - 1 ) loop
            if j >= current_bit then
              OUT_DATA ( i + j ) <= IN_DATA ( i ) ;
            end if ;
          end loop ;	
       end loop ;
    end if ;	
  end if ;	
end process
 

It is OK to copy both valid an non-valid symbols. The problem is that the calculation of the next symbol position depends on the number of valid symbols. The symbol to copy from in the next message depends on "empty" for the current message. If wrapping is not allowed, the first output symbol must come from the first input symbol. Depending on "empty" for the first input words, the second output symbol can come from the second symbol in the first word or the first symbol in the second word. The second output symbol will need a 2-input mux. The third output symbol will need a 3-input mux and so on. From output symbol 16 and up, 16-input muxes are needed. If wrapping is allowed, all output symbols need 16-input muxes.

- - - Updated - - -

It's important for me to do it correctly but I'm having trouble understanding just by reading your explanation.
Can you please show me what you mean by rewriting this process according to your notes?

Code:
process ( IN_CLOCK , IN_RESET ) is
begin
  if IN_RESET = '1' then 
     current_bit <= ( others => '0' ) ;
     OUT_DATA <= ( others => '0' ) ;
  elsif rising_edge ( IN_CLOCK ) then
     if IN_VALID = '1' then
        current_bit <= next_bit ; 
        for i in 0 to ( symbols_per_message - 1 ) loop
          for j in 0 to ( symbols_per_frame - 1 ) loop
            if j >= current_output_symbol and j * bits_per_symbol <= OUT_DATA'HIGH then -- edited
              for k in 0 to bits_per_symbol - 1 loop
                OUT_DATA ( (i + j) * bits_per_symbol + k) <= IN_DATA ( i * bits_per_symbol + k) ;
              end loop;
            end if ;
          end loop ;	
       end loop ;
    end if ;	
  end if ;	
end process

This approach can not handle frame breaks inside an input word (wrapping).

Edit: corrected the "if" condition
 
Last edited:
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating

So with Kevin's approach - it'd work correctly but with suboptimal performance ?

- - - Updated - - -

Kevin,
What do do if there is two valid symbols in a particular incoming message and only space for one in the outgoing message is still an open question.
This is taken care of by the Avalon Source and shall never happen - The number of seats equals the number of guests.
 

My understanding of the problem is that the input symbol can end up at any of the 100 output symbol locations which does not require any datapath mux to implement. Instead, it requires clock enables which is the solution that I sketched out.

That sounds like a different and easier problem. The problem I was looking at allows 0-16 bytes to be valid on each cycle, and concatenates these input bytes until 12800 are accumulated.
 

My comment was not an argument about storing non-valid data or not. I was trying to explain why a single-cycle solution must use muxes. Storing non-valid data or not has no effect on the muxes, but it affects the comparators that decide if something should be stored or not in the output symbols (the write enables). The comparators are maybe simpler if also non-valid symbols are written, but one comparator per output symbol is needed anyway.
 

The symbol to copy from in the next message depends on "empty" for the current message.
OK, that is where we diverged. Shaiko did not specify (or I missed it if he did), the format of the input data. My interpretation is that while the number of valid symbols could vary from transfer to transfer (as indicated by 'empty'), the location of the 'first' (and subsequent) symbols on a given data transfer does not change. So if a single byte is being transferred, it would be on bits 7-0. If two bytes are being transferred, the 'first' byte would still be from bits 7-0, the 'second' would come from bits 15-8 of the input data.

If the location of the symbols do vary as you think they do, that would be a screwy protocol, but you're right you would need muxes to unscramble it all.

Kevin Jennings

- - - Updated - - -

That sounds like a different and easier problem. The problem I was looking at allows 0-16 bytes to be valid on each cycle, and concatenates these input bytes until 12800 are accumulated.
I think we're on the same page that the number of valid input bytes is 1-16 (if there are 0 valid, then one would think that the 'valid' signal would not be asserted).

The input width is 16 bytes with a variable number of valid bytes (as indicated by 'empty') and the input data is always aligned to one end or the other. For convenience, I'm assuming that the first valid byte is on bits 7-0 on every transaction regardless of the value of 'empty' and regardless of what happened on previous cycles.

Collect 100 samples of these 16 byte sub-samples (that could take anywhere from 1 to 16 transactions) and you have the full 1,600 byte (12,800 bits) needed.

But maybe I missed something along the way.

Kevin Jennings

- - - Updated - - -

No, the number of valid symbols does not affect the comparators in my approach. The comparison is with the signal that tracks where to put only the first symbol. The compare operation and therefore the write enable calculation is totally independent of the current value of 'empty'.

The calculation for where to put the first symbol for the next transaction does depend on the current value of empty.

Kevin Jennings
 
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating

I think we agree about the input data formatting. For a certain number of valid input symbols, they will always be placed in the same positions (packed at the "low" end). It is the destination symbol locations that depend on what has happened in the previous transfers. This means that all output symbol locations except the first will need a mux to get the data from the correct input symbol.
 


I think we're on the same page that the number of valid input bytes is 1-16 (if there are 0 valid, then one would think that the 'valid' signal would not be asserted).


Kevin, all of the above is 100% correct and describes exactly what I want to do.

With that said I have 3 questions:

1. Kevin, will your code (started on post #29 and perfected in post #40) do the job ?
2. std_match, will your code (post #47) do the job ?
3. If the answer to both 1 & 2 is "yes" - What is the functional difference between both ?
 

1. Kevin, will your code (started on post #29 and perfected in post #40) do the job ?
I doubt it, there is probably an error. But you need to write and test that in the simulator, not me.
2. std_match, will your code (post #47) do the job ?
Perhaps you should create a testbench and test it yourself rather than expecting someone else to do it for you.
3. If the answer to both 1 & 2 is "yes" - What is the functional difference between both ?
Once you've written and tested both, you'll have the answer. Perhaps you'll consider sharing your results and post them to this forum. You know, give back a little.

Kevin
 

I doubt it, there is probably an error. But you need to write and test that in the simulator, not me.
By all means - I never implied that someone should do it for me.
I simply thought it's a "tell from a glance" thing for you...
 


I was only referring to the complexity of the comparator logic, not the input data. The comparator complexity is affected by the design decision to store non-valid data or not. Your approach to store everything probably have the lowest comparator complexity, so I am on your side there. I think the only disagreement we have is whether or not a working solution must synthesize to muxes.



1. I assume you mean the code started in post #20 (and corrected in #40).
I think there are some bugs, but they can be fixed.
Data_In should probably hold the complete input word (it was declared to hold only one symbol in post #20)
It is not clear if the innermost loop should loop over the bits in one symbol or over all input bits. There are contradictions in the code.
The outer loop should probably take symbol steps instead of message steps.
Writing outside of My_Big_Message must be prohibited. One solution is to add dummy bits that are written but not read. They will be removed at synthesis.
Another solution is what I did in #47.

2. Yes, but I don't want to call it my code. It is a minimal adjustment of your code in post #46 to reduce the risk for unwanted/unused logic. I think I prefer a corrected version of Kevin's code because it will probably be easier to understand.

3. The number of comparators should be identical, so don't expect a big difference in area. The speed can differ. You should synthesize both and compare.


Do you really plan to have 128 input bits and 12800 output bits?
What is the rate of the incoming data?
 
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
Do you really plan to have 128 input bits and 12800 output bits?
No. But my boss does.
What is the rate of the incoming data?
Around 250MHz.

- - - Updated - - -

Writing outside of My_Big_Message must be prohibited. One solution is to add dummy bits that are written but not read. They will be removed at synthesis.
Would you agree that the same thing can be achieved using a modification of my own code?

Either like this:
Code:
process ( IN_CLOCK , IN_RESET ) is
begin
  if IN_RESET = '1' then 
     current_bit <= ( others => '0' ) ;
     OUT_DATA <= ( others => '0' ) ;
  elsif rising_edge ( IN_CLOCK ) then
     if IN_VALID = '1' then
        current_bit <= next_bit ; 
        for i in 0 to ( bits_per_message - 1 ) loop
          for j in 0 to ( bits_per_frame - 1 ) loop
            if j >= current_bit then
             [COLOR="#FF0000"] if ( i + j ) <= ( bits_per_frame - 1 ) then[/COLOR]
                OUT_DATA ( i + j ) <= IN_DATA ( i ) ;
              end if ;
            end if ;
          end loop ;	
       end loop ;
    end if ;	
  end if ;	
end process ;

Or like this:
Code:
-- declare a "stretched_frame" signal to prevent vector boundary runout during assignment to OUT_DATA:
signal stretched_frame : ( 128 + bits_per_frame - 1 downto 0 ) ;

process ( IN_CLOCK , IN_RESET ) is
begin
  if IN_RESET = '1' then 
     current_bit <= ( others => '0' ) ;
     stretched_frame <= ( others => '0' ) ;
  elsif rising_edge ( IN_CLOCK ) then
     if IN_VALID = '1' then
        current_bit <= next_bit ; 
        for i in 0 to ( bits_per_message - 1 ) loop
          for j in 0 to ( bits_per_frame - 1 ) loop
            if j >= current_bit then
              [COLOR="#FF0000"]stretched_frame ( i + j ) <= IN_DATA ( i ) ;[/COLOR]
            end if ;
          end loop ;	
       end loop ;
    end if ;	
  end if ;	
end process ;

[COLOR="#FF0000"]OUT_DATA <= stretched_frame ( bits_per_frame - 1 downto 0 ) ;[/COLOR]
 
Last edited:

This means that all output symbol locations except the first will need a mux to get the data from the correct input symbol.
I agree. Muxing is required for symbols within a message, but not required for messages within the big message.

Kevin
 

I agree. Muxing is required for symbols within a message, but not required for messages within the big message.

And that's exactly what I've seen in the RTL viewer.

Kevin, what do you think about what I wrote in post #57?
Do you agree with that?
 

And that's exactly what I've seen in the RTL viewer.

Kevin, what do you think about what I wrote in post #57?
I think you didn't post the following:
- Which one (or both) meets all of your functional requirements passing all testbench testing
- What is the resource usage of both approaches
- What is the timing performance of both approaches?

Without that info, I don't think it is productive for me to post any further, I'm not going to try to eyeball some minor typo that you may or may not have that might have either a minor or major effect. With that info, I don't see any point in posting any further because the answer will be clear and my opinion not relevant. However, I will say that I don't see any use of 'empty' which must be an error or you're choosing not to show it. Showing part of your code isn't necessarily helpful.

Kevin
 

Status
Not open for further replies.

Similar threads

Cookies are required to use this site. You must accept them to continue using the site. Learn more…