ok, I misunderstood. But the algorithm could be done failry easily. Just store the data and sum it - and put some control logic in there to share resources. What do you really want from us? the question is too broad.
Fairly easy? Can you describe the algorithm with a little more details? Forgive me, but I don't have much experience with Verilog/VHDL as I used to program in C where everything is sequential. I don't want to reinvent the wheel or make thing way overcomplicated. If it would be a single byte every clock, then I could read a new value, store it, add to the sum, output sum every 5 clocks and reset sum to zero.
But I have 4 bytes, so every cycle it needs to be something like this:
1. read 1st 4 bytes; add bytes 1-4 and store in sum1
2. read next 4; add byte 1 to sum1; add bytes 2-4 to sum2
3. read next 4; add bytes 1-2 to sum2; add bytes 3-4 to sum3
4. read next 4; add bytes 1-3 to sum3; add byte 4 to sum4
5. read next 4; add bytes 1-4 to sum4; output sum1,sum2,sum3,sum4,sum4
I've tried to come up with an algorithm and control logic, but no matter what I do, seems like I am always at least a cycle behind and my buffer keeps overflowing.