I needed to implement real-time, multichannel pitch detection in software using a small ARM Cortex-M4 microcontroller (MCU). My all-time favorite is the STM32F4 family from STMicroelectronics. It has DSP and single precision FPU instructions and can reach up to 225 DMIPS/608 CoreMark at up to 180 MHz operating frequency. Not too bad, actually, especially for this class of MCUs, but it can easily get overwhelmed with complex DSP code we normally take for granted in a desktop or laptop machine with multi-cores running in the GHz range.
I’ve been working on this for quite some time now and I am quite pleased with the results. I now have a fast, accurate, low-latency, phase-correct and efficient multichannel pitch detection. I thought I’d like to share. In case you are wondering, no, it is not for note to MIDI conversion, although that is obviously one application.
Pitch detection can be CPU hungry. As always with most DSP processing, you can do it in the time-domain or frequency-domain. Frequency-domain pitch detection algorithms require FFT to convert the samples from the time-domain to the frequency domain as a set of sine and cosine waves that represent its spectrum over time. Frequency domain algorithms include the harmonic product spectrum, cepstral analysis, and maximum likelihood. Popular time-domain pitch detection algorithms typically use Autocorrelation (ACF) which is basically an N² operation for N samples. The procedure involves correlating a signal with a delayed copy of itself and then finding the maxima which indicates where the pattern repeats. Variants include AMDF (Average Magnitude Difference Function), YIN, and MPM.
These are generalized pitch detection algorithms that can detect pitch from almost any source such as musical instruments and the human voice. In fact, a lot of the early research on pitch detection were related to speech processing. The algorithms had to cope with noise and the presence of non-periodic signals (e.g. noise-like unvoiced sounds such as ‘s’, ‘sh’). For clean periodic signals, such as those coming from electric guitar pickups, a less general and less complex (hence more efficient), approach can be used.
My pitch detection uses novel feature detection scheme I haven’t seen implemented yet. Feature based pitch detection schemes typically use zero-crossings to extract pitch information. The AXON system is one such example. To cope with multiple crossings due to harmonics, the AXON system employs a neural network to determine the actual pitch.
Instead of relying on zero-crossings, we will deal with peak detection.
Automatic Gain Control
The first step is to maintain a constant normalized signal values from -1.0 to 1.0. Before anything else, we remove any DC offset that will skew the results using a DC Blocker. The signal must be perfectly centered. Then, we extract the envelope of the incoming signal using an Envelope Follower. The current level of the envelope will determine the Automatic Gain Control (AGC) gain. The lower the signal, the higher the gain. At some point however, the signal will be overcome by noise. And so we will need a Noise Gate to inhibit output below a specific threshold.
The Peak Trigger uses another Envelope Follower with a faster decay time. The fast decay will create small ripples following the waveform’s peaks. A Schmitt Trigger, compares the raw input and 90% of the envelope follower’s output. The Schmitt Trigger switches when the envelope follower droops to this level. The scope shot at the right shows the Envelope Follower output (yellow) and the Schmitt Trigger output (blue), given a clean sine wave.
The scope shot below shows the input sine wave and the Peak Trigger output (below left). The scheme works very well with harmonics present (below right). This works out quite nicely because the Envelope Follower’s slope sufficiently masks smaller peaks.
Period Trigger: Dual Peak Triggers
The Period Trigger employs two Peak Triggers, each detecting positive and negative peaks. This makes the detector immune from multiple triggers from multiple peaks, in either the positive and negative peak detectors. To further minimize local peaks, we also clip the middle of the waveform, e.g. from -0.5 to 0.5 to zero.
Multiple triggers are caused by overtones overpowering the fundamental frequency. These harmonics typically occur as local positive or negative peaks close to the intensity of the highest peak. The scope shots below show the dual Peak Triggers in action with source inputs containing moderate to high harmonic content.
A positive peak sets the Period Trigger’s state to 1 while the negative peak sets the state to 0. A complete cycle starts from a rising edge until the next rising edge of the state. The detector is immune from multiple triggers from multiple peaks since multiple positive or negative triggers will not change the state. Scope shot below:
Multiple triggers spanning both the positive and negative peaks may still give false triggers. There are still ways to mitigate such extreme cases, but we will not deal with that here.
Frequency Locked Synth
Wrapping it all up, the Frequency Locked Synth, with the help of an AGC and a Period Trigger, looks at the input audio and extracts the fundamental frequency and phase from the waveform and uses this information to set the frequency and phase of a synthesizer. The Frequency Locked Synth generates phase accurate synthesized output required for polyphonic sustain, for example. The total latency of the system is also accounted for and compensated. Here are some screen shots of various inputs (yellow) against a synthesized sine wave (blue). Take note that the synthesizer can very well be just about any kind of synthesizer (e.g. FM, Additive, Subtractive, PWM, etc).
It is interesting to note that the electric guitar signal actually contains higher levels of the second (and even the third) harmonic than the fundamental partly due to the dv/dt response of the pickups which is basically a differentiator. What’s more interesting is that the human ear can discern the fundamental even if it is totally missing. There can be octave errors in pitch detection with our approach, especially as the signal evolves over time, or at the onset when you hit the strings hard enough. I’d say this is OK, and I think it is musical, especially for the current application that needs this pitch detector.
If you really want perfect pitch detection, well, nothing can be perfect. Even traditional autocorrelation based pitch detection schemes can have octave errors. Robert Bristow-Johnson, my favorite DSP guru, sums it up in his comment re. AMDF or ASDF: “when it makes an “octave error”, it is because there really is an objective ambiguity of what the fundamental frequency of the note is. Suppose you have a nice A-440 note and your PDA says it’s at 440 Hz. now add to it a little 220 Hz at -80 dB. You will not hear it as a 220 Hz note, but mathematically, that’s what it is”.
Having said that, if I really want to push this further, I can see potential improvements. 1) Use autocorrelation based on the position and magnitude of the peaks 2) Use a artificial neural-net classifier, trained with guitar input data, again using the position and magnitude of the peaks.
For now, I am very pleased with what I have.
- PITCH DETECTION METHODS REVIEW
- Pitch Extraction and Fundamental Frequency: History and Current Techniques
- A High Resolution Pitch Detection Algorithm Based on AMDF and ACF
- Weighted Autocorrelation for Pitch Extraction of Noisy Speech
- Efficient Pitch Detection Techniques for Interactive Music
- A comparative latency study of hardware and software pitch-trackers
- Guitar Sound Analysis and Pitch Detection
- A VERY LOW LATENCY PITCH TRACKER FOR AUDIO TO MIDI CONVERSION
- High Accuracy Monophonic Pitch Estimation Using Normalized Autocorrelation
- YIN, a fundamental frequency estimator for speech and music
- A SMARTER WAY TO FIND PITCH
- Real time pitch detection
- helmholtz. Time domain pitch tracker for Pure Data
- Performance Evaluation of Pitch Detection Algorithms
- Re: [music-dsp] Autocorrelation – probably a daft question