Design overview
After putting some thought into the design of a barrel shifter it is clear that some sort of multiplexer circuitry is required to select how the input bits route to the output bits. A naive method of implementing a barrel shifter would be to use N (where N is the number of input bits) parallel N-to-1 multiplexers, one multiplexer that multiplexes all inputs into one of the outputs. This would create an equivalent N*N to N multiplexer, which would use significant hardware resources (multiplexer input grows in O(N2).
A more efficient implementation is possible by creating a hierarchy of multiplexers. For an 8 bit rotate component the multiplexers would be constructed as follows:
the top level multiplexers rotate the data by 4 bits
the second level rotates the data by 2 bits
the third level rotates the data by 1 bit.
Using this approach input data can be shifted an arbitrary number of bits. The number of multiplexers used for an 8 bit rotate component is as follows:
Level one: One 16-to-8 multiplexers
Level two: Two 8-to-4 multiplexers
Level three: Four 4-to-2 multiplexers
Total: 48 multiplexer inputs, 24 multiplexer outputs (16 of which are intermediate signals connected to lower level multiplexers). Using this design the number of multiplexer inputs grows at a rate of O(N log2(N)
The advantage of a hierarchy of multiplexers becomes especially important when the input data is wide. For instance a naive implementation of a 32 bit rotate function would take 32 * 32-to-1 multiplexer inputs (1024-to-32), in contrast a hierarchical design takes 32*log2(32) multiplexer inputs (160-to-80).
8 bit barrel shift (rotate left) diagram
VHDL Source
Note that this design has only been tested with the Xilinx ISE and Modelsim tools.