Carry-save accumulators
Supposing that we have two bits of storage per digit, we can store the values 0, 1, 2, or 3 in each digit position. It is therefore obvious that one more binary number can be added to our carry-save result without overflowing our storage capacity: but then what?
The key to success is that at the moment of each partial addition we add three bits:
* 0 or 1, from the number we are adding.
* 0 if the digit in our store is 0 or 2, or 1 if it is 1 or 3.
* 0 if the digit to its right is 0 or 1, or 1 if it is 2 or 3.
To put it another way, we are taking a carry digit from the position on our right, and passing a carry digit to the left, just as in conventional addition; but the carry digit we pass to the left is the result of the previous calculation and not the current one. In each clock cycle, carries only have to move one step along, and not n steps as in conventional addition.
Because signals don't have to move as far, the clock can tick much faster.
There is still a need to convert the result to binary at the end of a calculation, which effectively just means letting the carries travel all the way through the number just as in a conventional adder. But if we have done 512 additions in the process of performing a 512-bit multiplication, the cost of that final conversion is effectively split across those 512 additions, so each addition bears 1/512 of the cost of that final "conventional" addition.