I suspect that I am just not seeing the forest for the trees right now, but ... argh.
The problem is as follows:
I have 4 registers, all 8-bit (unsigned).
I could write down the whole lot, but essentially it boils down to this:
Sought after result:
R = ((M1 + C1) modulo N) + ((M2 + C2) modulo N)
where N = 2^8 = 256
So essentially what it does is take 2 unsigned 8 bit operands M1, C1, add it, and truncate to 8-bit unsigned result. Do the same for M2,C2. Add these 2 subresults together to form a 9-bit unsigned result.
Okay, so far so good... But this thing is part of a pipeline, and there is no good way to calculate these M1+C1 and M2+C2 sums.
However what can be done is calculate:
T1=M1+M2
T2=C1+C2
Or any other variety based on M1,M2 and C1,C2. Basically what I mean is I have M1,M2 available at one point in the pipeline and C1,C2 available in another, but never together. And no, I cannot just synchronize the pipeline so they are all available at the same time. Assume for the moment that M1+C1 etc just cannot be done for whatever reason. Part of the reason is that the C1+C2 result is also being reused for other things...
Now how do I construct the wanted result from these T1 and T2 sums? I am sure I am missing something simple, but I don't see it.
So again, M1, M2, C1 and C2 are all 8 bit unsigned, and the result R is a 9-bit unsigned. Beware the modulo.
Any ideas?