It appears you have a rate mismatch between the input and the output of your buffer. (fill rate v.s. drain rate)
a) if the buffer only contains the 20-bit data without the headers (headers are added when reading buffer).
You need to unload the buffer at a faster clock rate so you have time to insert the headers. Assuming serial you need a buffer read clock that is at least 5/4*Freq_in
b) if the buffer contains the 5-bit header and the 20-bit data (headers added when writing to buffer).
You should perform serial to parallel conversion of the data before writing, otherwise you don't have enough clock cycles on the write side to add the 5-bit header (this assumes the buffer write clock is the same as the input data clock). You still need to perform the buffer reads at the higher rate otherwise the buffer will overflow (note: if you need the data to be serial then you have to perform a parallel to serial conversion for each header+data).
c) if the buffer can contain the 5-bit header, make the header a sideband of information.
Make your serial buffer a 2-bit wide buffer that contains {hdr_bit, data_bit}. The header field could simply have a start bit followed by 5-bit header followed by 14 0's (might be helpful to ensure alignment of the serial data stream). In this case the write and read rates are the same.
Perhaps that will give you some ideas of how to approach your problem.
Note: all of these will introduce a fixed latency (per data word in a & b). You are asking the impossible of no intermediate delay, which means to me no extra latency between input and output.
regards