Mel Filter Bank Processing

Status
Not open for further replies.

Phillip+

Newbie level 1
Joined
Dec 3, 2012
Messages
1
Helped
0
Reputation
0
Reaction score
0
Trophy points
1,281
Visit site
Activity points
1,291
Hello,

Sorry about my ignorance, I am trying to learn this subject for a finals project I am undertaking.

Brief background:

I am developing a Speech Recognition algorithm that identifies whether someone is saying a particular word, in this case "Yes" or "No".


I am computing an MFCC (From this paper: https://arxiv.org/pdf/1003.4083.pdf) and what I have done so far is:

  • Pre-emphasis
  • Framing
  • Hamming Windowing

The equation I am struggling on is "Step 4" .. Now ok, if I take the FFT of each of the "Windows" in the Time-domain and multiply by the Mel filters' frequency response, would this be enough?

I also have a problem with this equation:



For example, what does F represent? Does it represent the FFT of the "Window" or the "Window" in the time-domain?


I hope someone can help, sorry for my lack of understanding.. I am learning here.
 

Hi, please see the following link of MATLAB codes:

http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html

specially, function of "melcepst.m" implements a mel-cepstrum front end (MFCC) as a feature extractor for speech decoder.


F in your equ. is frequency. It represents range of frequency between 0 to Fs by steps of Fs/N, where N is the length of window.
 

Status
Not open for further replies.

Similar threads

Cookies are required to use this site. You must accept them to continue using the site. Learn more…