Update: check out the Q Audio DSP Library.
A couple of posts back, I wrote about Bitstream Autocorrelation (Here and Here). It was an evolutionary path from even older posts on the subject (Here and Here). Now it has matured quite a bit from its inception a few months back.
I’ll discuss what’s new later in another post, but before that, to give you an idea how well this continuous pitch detector tracks, I’ll present some demo sound clips. The left channel contains the original raw guitar signal (normalized), while the right channel contains a simple synthesized pulse wave, modulated by its own envelope, tracking the source. We’ll start with single notes and proceed later with a few single-string phrases demonstrating various guitar playing techniques such as legato, hammer-on, pull-off, etc.
Single Note Tests
Track single notes all the way down to note release. These test signal frequency fidelity to make sure that the result does not break up and devolve into an indiscernible mess and that there are no sudden jumps and shifts in frequency as the signal decays. Note that we are continuously tracking all the way down to -60dB, the signal release threshold.
Prediction and Zero-Latency Attacks
In theory, any Autocorrelation based frequency detector requires at least two cycles. Essentially, a cycle needs to be correlated with the next cycle in order to determine the period. However, one thing I noticed is that we can predict the frequency earlier by looking at the running stream of peaks or the zero-crossings on, or immediately preceding note onset. For the guitar, the prediction is quite accurate given one full cycle. For the low-E string, that amounts to a latency of 12ms (1/82.41Hz). That should be the minimum latency, right? Well, ehmmm…
Go back and listen to the “Low E” sample. Did you notice that there’s no latency at all? Yes, zero latency. How can that happen? Well, I faked it! Attack transients are special. They need not be periodic at all! Non-periodic attack transients are common. For example, Hammond organs are known to have a “key click” (regarded as an unfortunate defect by its designer, but embraced by users as part of the Hammond sound). That violin attack? That’s basically bow noise. The human voice may have unvoiced consonants. That guitar pick attack may sound like “clik”, “clak”, “clok”, “cluk”, depending on where you pick.
Any percussive or maybe even semi-periodic sound below 20ms is fair game and will not sound out of tune to the human ear, regardless of frequency. They will impart character, but will not contribute in establishing the note’s fundamental frequency. Listen to the “Low E” sample again. Did you notice a pitch shift at note onset? No, you can’t. The human ear can’t detect tones with such a short fragment. Yet there it is (waveform below).
Let us isolate the synthesized sound so you can hear it more closely. In addition, I will include another variant with a short 10ms, 1.6kHz burst at the onset (second sample below). Take a close listen and hear the slight change in attack character. The first (original) has something like a “khh” at note onset, the second has this very mild tap-like “tak” sound which is evident once you hear the attack sine burst isolated in the third clip.
I think it is a good idea to make the attack transient distinct from the actual sustaining note. I imagine having different synthesized or sampled attack transients that can be modulated and morphed in real time, or perhaps even combined with the actual guitar attack transient. The human hearing is more sensitive to events arriving late, especially those of a percussive nature with quick attacks. Separate treatment of the attack transient could make this annoying latency issue a thing of the past.
Now here are a few phrase tests I recorded early on for testing pitch tracking. The tests exhibit typical playing techniques such as legato, hammer-on, pull-off, vibrato, staccato, and right-hand tapping. The staccato sample is the sole example that uses a synthesized source. I needed to quickly generate a very short burst. All the rest are recorded using the Nu multichannel pickup.
There are still some minor glitches (can you tell me what and where?). I thought I fixed those already, but it seems I might have reverted something after some commits. I’ll also need more phrase samples to give this detector decent test coverage. At any rate, I think by and large, this pitch detector is tracking very well.
Oh and BTW, before someone asks again, no, this is not meant for pitch to MIDI conversion.