The APR9600 Voice Module receives 2-8 analog voice messages, digitizes them for storage, and then can output a selected messages in analog form for listening.
Since the digital form of the input message is not available I also do not see how this circuit helps because there is no way to compare the stored message with a message stored in your library to branch to the desired command.
To do voice recognition you need to capture a digital version of the persons command to compare it with the library.