There are many ways to implement a LED Matrix.
1) Use Discrete Transistors. Source transistors (This includes PNP Bipolar and PMOS MOSFET Transistors) and Sink transistors (This includes NPN Bipolar and NMOS Transistors)
2) Array Driver Integrated Circuits. Source (UDN2981 - 8 Channel 500mA Source Drivers) and Sink (ULN2803 - 8 Channel 500mA Sink Drivers)
3) Shift Registers with Constant Current Drivers. (Most are sink, TLC5925 and a cheaper version: STP16CPC26)
4) Memory Mapped Constant Current Display Drivers (MAX7219, MAX7221, HT16K33, HT1632C)
There are many advantages and disadvantages:
1) Cost, may vary depending on location
2) Speed
3) Size
4) Complexity
Option number 1:
The complexity of this one is fairly high, you would need 18 shift registers, 16 P channel transistors, 128 N channel transistors. MOSFETs cost more but do not require the extra bias resistors (Extra complexity). Cost for this project can be fairly low. Size for this is huge, lots of board space is required. There is a lot of components to solder. Speed can be slow since you have to explicitly drive 16 shift registers 16 times over a short span of time.
Option number 2:
This option is similar to option one, except it can cost more depending on number of integrated driver circuits used. Speed is same as option 1. Complexity and size is less, you have 18 shift registers, 2 UDN2981, and 16 ULN2803. (You do not need ULN2803 on the sink, as you can only sink 20mA on each LED (not 500mA). However the 74HC595 can only handle 6mA on all pins at the same time. 74AHC595 can handle 8mA at once. One way to cut cost for this option is to limit current with resistor (max 6mA) or discrete N channel transistor (max 20mA). (This add complexity - LEDs brightness depends on current, but will require a bigger power supply.))
Option number 3:
This is the middle ground for all options. Speed is the same as options 1 and 2. Complexity and Size is smaller than options 1 and 2. You will have 2 UDN2981, 2 Shift registers and 8 TLC5925 (16-bit). Cost can be cheaper depending on IC and region.
Option number 4:
This option is the fastest of all options, because the integrated circuit does all multiplexing for you. You write to group of registers, and it lights the LEDs for you. However this is the most expensive option. Complexity is low, depending on extra hardware required by IC. Size is the smallest of all other options. (You can create your own display driver, buy using two microcontrollers.)
Hope this helps