I wanted to create an NPU using the 32-bit MIPS (Multi-cycle) microprocessor microarchitecture as a base. What additional modules among those present in the original MIPS microarchitecture (see IMAGE below) can you suggest to more efficiently calculate matrix multiplications, convolutions and other linear algebra operations, which are fundamental for artificial neural networks?
Understand as "module": the Instruction Memory; Data memory; RegisterFile; ALU; Control Unit; etc...