ads-ee,
I followed your suggestion and started learning the design in this article:
**broken link removed**
I see the in page 18, that the empty flag is asserted/deasserted combinatorially outside of a clocked "always".
Code:
assign rbinnext = rbin + (rinc & ~rempty);
My question:
If it's good for an asynchronous FIFO, why isn't it good for a synchronous FIFO?
I think you're taking this a bit out of context. The sole reason for producing synchronous output flags is because those flags will get used with other logic in whatever application is using the fifo. Everything else being equal (and I realize that is never really the case), if the critical timing path is in that logic that uses the flags, then the max clock frequency of a design that generates the flags with logic will be lower than one where the flags come directly out of a flip flop.
Fifo flags can end up in that type of situation because they will typically get used to directly generate flow control logic. One view of something being 'Full' is that it is no longer ready for input; similarly, 'Empty' means that there is no output ready. A data processing algorithm typically consists of several modules strung together in some fashion into a processing pipeline. If one module is not ready for input, that will then tend to cause upstream modules to have to stop sending data which will in turn cause other upstream modules to pause as well.
That's the long winded explanation.
There are many situations where there is a critical timing path and it simply has to come out of logic and not a flip flop so you could ask 'What is different about
that case'. The answer there is that you would always like the status to come out of a flip flop (to improve Fmax again) as long as that doesn't break functionality or degrade performance for other reasons. The specific case of the fifo is an example of something where it is straight forward to produce clocked output status at little or no cost in terms of logic resources used.
So, given that you can produce a clocked status at little or no cost, and that producing a clocked output
will improve Fmax, the question you should ask is 'Why would you not design something that has better Fmax at little or no logic resource cost?'
To answer the question why the Cummings paper does not produce such an output, I don't know. It would be an improvement if it did, maybe Cummings didn't see a way to produce the flags that way or simply didn't consider the fact that it would be an improvement in the first place.
To bring it all back to what you are attempting though, consider the following:
Single clock fifo that you proposed: Try synthesizing your design as a standalone entity as you wrote it comparing the pointers directly for equality and check what the clock to output delay of the 'Full' and 'Empty' flags are. Also take note of the logic resources used. Then implement it as I suggested where you check to see if the pointers are one away from the full and empty conditions with the clocked logic that I showed and again check to see what the clock to output delay for 'Full' and 'Empty as well as logic resource used. [1]
Feel free to try to hand place/route or whatever other tricks you might want to employ to both designs in order to improve performance. I'll ask you now to post those results here in this forum simply because there is a lot of talking going on and really nothing specific to compare. Actual synthesis results tend to dispel gum flapping (which I accept is what I'm doing since I haven't posted any similar results...but that's because you're the one with the deep interest in fifos, not me. I'm speaking from 20+ years of past design experience that includes fifos as well as all kinds of other stuff).
Dual clock fifo:You strongly implied (or maybe outright stated...or maybe I just interpreted it that way) that the dual clock fifo would compare the read and write pointers directly. If you really were intending that only pointers within the same clock domain would be compared, then you can ignore the rest of this paragraph. Comparing the raw pointers as generated from within their respective clock domains would be completely useless anywhere except in a simulation environment. The reason is that those flags would not be suitable for use by external logic no matter what clock domain they were in. Even worse there would be nothing that could be done external to the fifo. Synchronizing the flags externally would not help, you could never guarantee any timing margin.
Another resource to Google for would be for fifo related work by Peter Alfke. Peter (deceased) was pretty much the original grand guru of fifo design and the creator (I believe) of the original pointer based dual clock fifo a couple decades back.
Kevin Jennings
[1] Also,
very important, make sure that those two actually implement the exact same fifo functionality by simulating and checking that at every clock tick the two fifo designs are producing the exact same output. It seems trivial, but if they are not producing the exact same outputs, then it may not be a fair comparison since the two designs are not doing the same thing.