Eight
Member level 2
Hello.
This is a DIY project. I am trying to design a sound detector on a ATmega328P (20MHz) microcontroller (no, it's not Arduino). Audio DSP is something I haven't done before, at least not on *this* low level, thus, I would like to request some advice to see, if I'm doing things correctly.
I designed a custom prototype PCB with an electret microphone. It is connected to a custom preamplifier and the sensitivity can be set by an on-board potentiometer. The circuit is capable of picking up some pretty faint noises. The opamp output is then fed onto an ADC input on the on-board ATmega328P microcontroller. The ADC is configured to produce samples with about 39 kSPS stream of unsigned 8-bit values (uint8_t). The goal is to process this audio stream and detect sudden bursts of noise. My question is what would be the proper way to go about designing code for this (I am coding in C). I can't run FFT on it in realtime due to lack of processing power. I've already implemented something, but I am not quite happy with it.
My current code works like this:
The issue with this approach is that it's very difficult to tune. The tuning variables are the signal/noise window decaying speeds and the margins for noise/silence detection. In a quiet room it works decently well, but outside where the noise floor is much higher it has many false positives. As an example, imagine a constant traffic of cars driving on a highway in the distance (not honking), so there is just a continuous noise of tires driving on the asphalt. The code will sometimes trigger on this noise alone even though my ears don't hear any noticeable deviations. When a dog barks somewhere not as far away the code does pick it up correctly, but often misses some birds singing.
Thoughts?
This is a DIY project. I am trying to design a sound detector on a ATmega328P (20MHz) microcontroller (no, it's not Arduino). Audio DSP is something I haven't done before, at least not on *this* low level, thus, I would like to request some advice to see, if I'm doing things correctly.
I designed a custom prototype PCB with an electret microphone. It is connected to a custom preamplifier and the sensitivity can be set by an on-board potentiometer. The circuit is capable of picking up some pretty faint noises. The opamp output is then fed onto an ADC input on the on-board ATmega328P microcontroller. The ADC is configured to produce samples with about 39 kSPS stream of unsigned 8-bit values (uint8_t). The goal is to process this audio stream and detect sudden bursts of noise. My question is what would be the proper way to go about designing code for this (I am coding in C). I can't run FFT on it in realtime due to lack of processing power. I've already implemented something, but I am not quite happy with it.
My current code works like this:
- The code initially samples the mean value. This is done by summing up a large number of samples and dividing the sum by the sample count. The value is usually 127 or 128. Maybe I can skip this step and assume it's always going to be 127?
- For each new incoming sample the code calculates a delta value. A delta value is basically an offset from the mean: delta = abs(sample - mean). This is to get absolute values of the audio stream.
- I have two sliding windows which are basically an average approximation of the last XX delta values. The first window is a signal window and is very short (like an average of the past 5000 samples or so). The other window is a noise window and is an average of a much longer time span (say like 20 seconds worth of audio). Basically this means that the signal value will ascend and decay much faster than the noise value.
- The code checks the ratio between the signal and noise. If the signal value exceeds the noise plus some margin (signal > noise * 1.3) then the code assumes a sound is present.
- When the signal level slides beneath the noise level + margin (signal < noise * 1.2) then the code assumes the sound is gone.
- Finally, the code calculates the duration of the sound and rejects noises shorter than 100ms or so.
The issue with this approach is that it's very difficult to tune. The tuning variables are the signal/noise window decaying speeds and the margins for noise/silence detection. In a quiet room it works decently well, but outside where the noise floor is much higher it has many false positives. As an example, imagine a constant traffic of cars driving on a highway in the distance (not honking), so there is just a continuous noise of tires driving on the asphalt. The code will sometimes trigger on this noise alone even though my ears don't hear any noticeable deviations. When a dog barks somewhere not as far away the code does pick it up correctly, but often misses some birds singing.
Thoughts?