Syncronous FIFO - flag generation

Status
Not open for further replies.
std_match said:
On what device did you develop the shift register FIFO? Which families has the FDCE_1?
UNISIM library....Therefore it's on a Xilinx.

And from doing a little research...miralipoor worked on a Virtex II Pro design a couple of years ago for some competition, from looking at the competition results, their team didn't appear to make it past the first rounds. I imagine if they used any clocks above 200 MHz they had to play a lot of these kinds of tricks to get things to meet timing. I've used these parts before with a 312MHz DDR inteface before and had to hand place the interface registers to get the design to reliably meet timing so we could get the interface down to the 156 MHz core clock (easy to meet timing at that frequency).

To keep performance up on a large version of the shift register based FIFO will require placement of the entire FIFO and likely directed routing, then locking the routing and the placement in a core.

Building up designs in this fashion seems like a good way to end up in the layoff pool due to lack of productivity. There are a reasons for not building designs up from primitives e.g. Lack of portability, development time for coding/simulation, debugging errors, maintainability of the code base for starters.
 
Last edited:
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating

I dont quite understand what you mean - RTL usually refers to behavioural code - I would count yours as more structural. With good behavioural RTL it is possible to push the chip to the limits ( >380 MHz on a rather full stratix 4). Yes you need to start pulling your registers apart and preventing merging along with plenty of pipelining, but hand placement at the reg level is a bit OTT, and rather unreadible. I would never start with your structural approach.
 
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
Miralipoor,

I'm trying to write generic, reusable IPs whenever possible. While your design example is tailored to a specific device vendor and won't be easily portable to a different one.

I don't understand the motivation behind such approach - if your FIFO can't be easily ported, why bother describing it with HDL anyways? Just use the vendor's IP generation tool - you'll probably get an even better result.
 

I'm trying to write generic, reusable IPs whenever possible. While your design example is tailored to a specific device vendor and won't be easily portable to a different one.
Said like a true working engineer, that is told 3 months into the project (after you've already produced the design specification and the majority of the RTL) that the device vendor you chose wasn't cost effect so they want you to use a different vendors parts because they can get them for a huge discount.

Of course you can tell them, sure no problem it will take a few days to regenerate some memory cores and get a new pinout, whereas Miralipoor will respond with: But I've already coded half the design, I'll need at least 2 weeks to convert all the instantiated structural components to the new vendors library, then I'll need another month to finish the design.

Guess who gets a bonus at the end of the year and who gets RIF'd?

I don't understand the motivation behind such approach - if your FIFO can't be easily ported, why bother describing it with HDL anyways?
The delusion of job security?

Regards
 

BTW:
How would you suggest integrating Almost Full and Almost Empty flags in my code below?
Please exemplify in code...

HTML:
entity fifo_management is													
port								
( 		
   clock : 	in std_logic ;           
   reset : 	in std_logic ;           
   write_request :  in std_logic ;           
   read_request :   in std_logic ;           

   full :		out std_logic ;           
   empty :  	out std_logic ;   		  
   data_out : 	out std_logic_vector ;
   read_address : 	out std_logic_vector ( 2 downto 0 ) ;
   write_address : 	out std_logic_vector ( 2 downto 0 ) 
) ;       	
end entity fifo_management ;



architecture rtl_fifo_management of fifo_management is

signal new_write_pointer : unsigned ( 3 downto 0 ) ;
signal new_read_pointer : unsigned ( 3 downto 0 ) ;

begin
	
	
	
   write_address <= write_pointer ( write_pointer ' high - 1 downto 0 ) ;
   read_address <= read_pointer ( read_pointer ' high - 1 downto 0 ) ;
   
   new_write_pointer <= write_pointer + 1 ;	
   new_read_pointer <= read_pointer + 1 ;	

   write_domain : process ( clock , reset ) is
      begin
         if reset = '1' then    
            write_pointer <= ( others => '0' ) ;
            full <= '0' ;
         elsif rising_edge ( clock ) then
            if write_request = '1' and full = '0' then
               write_pointer <= new_write_pointer ;
	   if new_write_pointer = read_pointer then
	      full <= '1' ;
	   end if ;	
	end if ;
          if read_request = '1' then
             full <= '0' ;
          end if ;	
       end if ;
   end process write_domain ;
	
 
	
   read_domain : process ( clock , reset ) is
      begin
         if reset = '1' then    
            read_pointer <= ( others => '0' ) ;
            empty <= '0' ;
         elsif rising_edge ( clock ) then
            if read_request = '1' and empty = '0' then
               read_pointer <= new_read_pointer ;
	   if new_read_pointer = write_pointer then
	      empty <= '1' ;
	   end if ;	
	end if ;
          if write_request = '1' then
             empty <= '0' ;
          end if ;	
       end if ;
   end process read_domain ;
 
	
	
end architecture rtl_fifo_management ;
 

- I would write it by having an explicit counter to keep track of how full the fifo is at any time and wouldn't bother with trying to do math on the pointers.
- I would pass in a generic that is an array of reals that specifies all of the flags that I want to have as status outputs (I.e. 0.5 would mean I want a half full output, 0.75 means 3/4 etc.)
- There would be an output vector of the same range as that generic where each bit corresponds to the 'full level' specified.
- I would repeat these steps to define a similarly arbitrary number of 'empty' flags, not necessarily the same number as the 'full' flags

That's what I would do.

Kevin
 
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
I would write it by having an explicit counter to keep track of how full the fifo is at any time and wouldn't bother with trying to do math on the pointers.
I don't understand how you can tell how full the FIFO is without doing math on the pointers. Please insert your example into my code.
 

I don't understand how you can tell how full the FIFO is without doing math on the pointers. Please insert your example into my code.

Code:
if write = '1' and read = '0' then
  if how_full = N-1 then
    full <= '1'
  end if;

  if how_full = 0 then
    empty <= '0'
  end if; 

  how_full <= how_full + 1;

elsif write = '0' and read = '1' then

  if how_full = 1 then
    empty <= '1';
  end if;
 
  if how_full = N then
    full <= '0';
  end if;
  
  how_full <= how_full - 1;
end if;

- - - Updated - - -

Then the write and read pointers are just incremented by the read and write flags only.

(of course, more logic would be needed to avoid overflows and underflows).
 
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
- I would write it by having an explicit counter to keep track of how full the fifo is at any time and wouldn't bother with trying to do math on the pointers.
I can confirm that at least the Altera Quartus FIFO IP is using this method. They have separate counters for read pointer, write pointer and used datawords. Quartus users can review the a_dpfifo.tdf and a_fefifo.tdf AHDL sources. There's also a fast FF based FIFO option, may be similar to the handcoded version by Miralipoor.
 


Some tests could be removed:

Code:
if write = '1' and read = '0' then
  if how_full = N-1 then
    full <= '1'
  end if;

  empty <= '0'

  end if; 

  how_full <= how_full + 1;

elsif write = '0' and read = '1' then

  if how_full = 1 then
    empty <= '1';
  end if;
 
  full <= '0';
  
  how_full <= how_full - 1;
end if;
 
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
Thanks for the example.

of course, more logic would be needed to avoid overflows and underflows
Please clarify...
The only part I see absent are the read/write address counters. What am I missing? Please show it in code...
 

with my code, if you did a read when the FIFO was empty, or write when the fifo was full, the how_full signal would continue to increment. You just need to put guards in to prevent this (and prevent read/write pointers moving also).

- - - Updated - - -

You dont need them though - for example, the Altera FIFO has a generic to prevent over/underflows. with the generic set to "off", timing performance is increased but it is down to the user to ensure over/underflows do not occur.
 
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
TrickyDicky,

In regard to generating almost full and almost empty flags...
What are the pros and cons of your method compared to the method that involves calculating the differences between read and write addresses?
 

Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
When read and write are in different clock domains, you can forget the "used datawords" counter. Flags must be generated from the pointers, and it isn't trivial.
Used data words still works with two clock domains, you simply have counters in both domains. From that you can generate all the flags you need in both domains.

What goes away with two clock domains is the whole concept of a single global status (i.e. 'empty', 'full' cannot be reported globally, only within a clock domain which will involve clock domain synchronizer latency).

Kevin
 


Can you explain what you mean with "counters in both domains"?
What do they count?
You can not create an "empty" or "almost empty" flag without using signals from both clock domains.

Also, I don't understand what you mean with a global "empty" signal. Only the read side needs the "empty" flag, and it must have no latency when going from '0' to '1'. Old information from the write side can create a false "empty", but that is OK. I can't see how counters in both domains will help in creating "empty", "full" or the corresponding "almost" flags.
 

An issue with using a read_data_count and a write_data_count in the two clock domain case is when the two clock domains have a large difference in their respective frequencies and the data is written in bursts from a faster clock domain and is read from a slower clock domain. The aggregate rates of both sides of the FIFO may be equal but the bursts of writes in the faster clock domain need to be reflected in the read_data_count which is on a slower clock domain. The increased complexity of accurately handling the transfer of write requests to the read side clock domain makes the idea of grey coding and passing the pointers across the clock domains simpler.

As an example of possible solutions to handle this would be to queue up the write requests in the write clock domain and sending them one at a time to the read clock domain as pulse stretched signals of at least 2x the read clock domain period. This will incur a large latency in the read side "knowing" there is more data in the FIFO.

You also need something similar in the read clock domain to allow the write side to distinguish between multiple contiguous reads and a single read (as a single read request will persist for multiple clock cycles in the write clock domain).

I typically use the read and write counter approach for single clock domain FIFOs and grey pointer transfers for asynchronous FIFOs. Occasionally I'll implement the more complex read and write counter asynchronous FIFO when I need the count values for other reasons (e.g. store-n-forward type buffers in packet based designs, where I need to know that all the data is available for bursting). But usually in these cases I don't implement a FIFO but instead divide up a block RAM into multiple buffers, which simplifies the pointer design and the generation of status flags, which allows for higher performance. It also allows you do eaisly update things like packet headers before you transmit the packet as the headers always reside at the same address offsets.


Regards
 
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
Can you explain what you mean with "counters in both domains"?
What do they count?
Both count how much is in the fifo from within their own clock domain.
You can not create an "empty" or "almost empty" flag without using signals from both clock domains.
What you do is create two sets of flags, each set is synchronized with one of the clock domains. There would be 'rdempty' and 'wrempty', which is the fifo empty status based on the used words counter in the read clock domain (for 'rdempty') and the used words counter in the write clock domain (for 'wrempty'). Do the same for full, and the 'almost' flags.

Also, I don't understand what you mean with a global "empty" signal. Only the read side needs the "empty" flag
That is not always true. There are applications where the writer needs to know the fifo is empty or almost empty as well as applications where the reader needs to know that the fifo is full or almost full.
The concept of full, empty and almost have other uses beyond low level control to prevent underflow/overflow.
I can't see how counters in both domains will help in creating "empty", "full" or the corresponding "almost" flags.
That's because there is no single 'full', 'empty', etc. Instead there are flags in both domains that are set based on the used words counter in each clock domain. Signals 'rdempty' and 'wrfull' will have the required behavior to prevent underflow/overflow. Signals 'wrempty' and 'rdfull' are status signals synchronized with the appropriate clock which can be freely used by the reader or writer since they are synchronized. There will be additional latency in generating those signals, but they wouldn't be used for low level control of the fifo anyway so that's OK.

Kevin Jennings
 
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
Both count how much is in the fifo from within their own clock domain.
I am missing something here. The counter in the write domain would only increment and the counter in the read domain would only decrement. How can they count in the other direction?
 

Not sure why you would 'occasionally' do either approach, I would think one would do it only once and then simply use it over and over in many different applications.

As you described, there are certainly more complexities to a dual clock versus a single clock fifo, it's not clear to me why the thread is off on this tangent.

Kevin Jennings

- - - Updated - - -

I am missing something here. The counter in the write domain would only increment and the counter in the read domain would only decrement. How can they count in the other direction?
They would count in the other direction after receiving a synchronized command from the other side.
 
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
Status
Not open for further replies.
Cookies are required to use this site. You must accept them to continue using the site. Learn more…