I agree with this, and it is a clue to a solution that might be better than what we have so far.
I think it should be a pipelined solution in 2 steps:
1. Build one complete message (16 symbols), and store it in a 128-bit register. This requires 15 symbol muxes with different number of inputs (2-16) and 16 comparators of unknown complexity. The implementation for this can be inspired by the earlier postings in this thread. Non-valid symbols can be written or not, but we never write outside of the first 128-bit register (possibly with added dummy bits that will be removed during synthesis).
2. Connect the complete message in parallel to an arbitrary number of message registers (the frame). The only required logic is to create the write strobes. This can be done without comparators since they always will be cycled in the same order. A shift register would be enough.
By describing it as 2 steps, the code will be easier to write and understand. The register after step one is good for speed, and will probably come for free since there is a register bit directly after each LUT.
The logic (muxes + comparators) for this approach will not increase with the frame size. It is only the storage for the frame that will take space.