Fast and Efficient Pitch Detection: Bliss!


Updates: check out Revisited, Synth Tracking and the Q Audio DSP Library where this is being actively developed.

In my previous post, I introduced my invention, Bitstream Autocorrelation: an accurate, extremely fast and efficient, time-domain pitch detection scheme. I argued that it can be as accurate as standard Autocorrelation based pitch detection schemes, especially, or at least, for very specific source inputs, such as the guitar.

As far as I can tell, this is a new invention and has not been done like this before. And so, the past few weeks, I investigated deeper and studied its performance and characteristics on real world guitar samples. For analysis, I recorded single-note samples for all strings (6 strings for now) at various fret positions. Additionally, I also recorded various guitar audio samples incorporating techniques such as hammer-ons and pull-offs and fast right hand arpeggios. I am impressed!

Here are my findings and some direction changes and updates along the way…

Before anything else, please go back and review the previous article if you need to. That article establishes much of the groundwork. I’ll also be touching on older, but related articles as well.

XOR and XNOR

I remarked that the multiplication of two, one bit signals, can be simplified to:

0 * 0 = 0
0 * 1 = 0
1 * 0 = 0
1 * 1 = 1

That is correct, but keeping in mind that binary 1 and 0 are symbolic representations of some other value, 0 can also represent the real value -1.0, which works better for us since signals in DSP run from -1.0 to +1.0. If that’s the case then we have:

-1.0 * -1.0 = +1.0
-1.0 * +1.0 = -1.0
+1.0 * -1.0 = -1.0
+1.0 * +1.0 = +1.0

So, given that binary 1 is a symbolic representation of +1.0 and binary 0 is a symbolic representation of -1.0 in the table above, my intuition that the XOR operation might yield better results is correct since that above is the equivalent of its inverse, the XNOR.

Peak Triggers Revisited

Instead of detecting zero crossings, we now use the Peak Triggers introduced in my first Fast and Efficient Pitch Detection article. That article outlines a dual Peak Trigger mechanism intended to minimize multiple triggers per cycle, but for Bitstream Autocorrelation Function (BACF from now on), we only need one Peak Trigger. As we’ll see later, BACF is immune to multiple triggers.

The Peak Trigger uses an Envelope Follower with a relatively fast decay time. The fast decay will create small ripples following the waveform’s peaks. I find that a decay time constant of 10x the period of the lowest frequency of interest gives good results.

Schmitt Trigger, compares the raw input and a certain percentage (e.g. 90%) of the envelope follower’s output. The Schmitt Trigger switches when the envelope follower droops to this level. The scope shot at the right shows the Envelope Follower output (yellow) and the Schmitt Trigger output (blue), given a clean sine wave.

The waveforms below show a) The original waveform (top) b) The output of the Envelope Follower showing the ripples (bottom) and c) the output of the Peak Trigger (middle). The Peak Trigger may generate multiple triggers as evident in the evolving middle waveform below. As mentioned, and we’ll see more about that later, BACF is immune to multiple triggers.

Continuous Bitstream Autocorrelation

I noted a caveat that you need to start with a real positive edge and that you can’t start in the middle of a waveform. It turns out that that is wrong. As we will see later, just like standard Autocorrelation, with Bitstream Autocorrelation, there is no such limitation and you can start anywhere and it will still work.

As a matter of fact, I think now that due to the speed and efficiency of the BACF, there are certain advantages to doing this continuously. Normally, you only track the pitch at the note onset and then every 50ms or so afterwards to track pitch bends. That also makes sense because standard ACF and its friends are very costly (CPU hungry) operations.

In itself, note onset detection is rather tricky, especially if you have to do it in the time domain! It turns out that the result of the BACF can also be used for detecting note onsets, even if there’s no distinguishable amplitude changes. More on that later!

ACF and BACF require a buffer at least 2x the size the period of the lowest frequency of interest. At 44100 samples per second, and the lowest frequency, Low-E at 82.41Hz, we need a buffer of at least 536 x 2 = 1064 samples (aside: to allow drop tuning, we may lower that frequency to something like 70 Hz, and to speed up computation, we’ll want to use a buffer that is a power of 2). ACF (and BACF) runs through the data stream, correlating the first half of the buffer at every point from index zero (start of buffer) up to half the buffer size.

With continuous BACF, we repeat the process continuously in a loop, shifting data from the second half into the first half after each correlation step:

Recall that with BACF, the point where you have the deepest notch is where the cycle starts to repeat. The deeper the notch, the better the correlation.

With the current continuous BACF scheme, each BCF frame clearly starts with a blank. These blanks that start each BACF frame is an additional optimization. I tweaked the BACF to not correlate anything below a specified minimum period (maximum frequency). For example, with the guitar’s E string, the maximum frequency that we need to detect is 329.64 Hz, The blank at each correlation step is a result of skipping that unnecessary computation.

The detected period is the distance between the start of the blank and the deepest part of the notch (image at the right). The depth of the notch is a measure of how periodic the signal is —its “Periodicity quality”.

Here’s the obligatory Sine-wave example, but this time, I used a rapidly decaying sine wave. BACF is able to easily detect this. There’s no need for normalization or automatic gain control. It Just Works ™ 🙂

Multiple Edges

Here (below), we can see the harmonically rich low-D string as it evolves over time from note onset (left) and at at later point (right, amplified):

The plucked guitar string is very rich in harmonics. Typically, the 2nd and 3rd harmonics overpower the fundamental, as seen in the spectrum of the plucked D string below:

Such waveforms can easily generate multiple peaks per cycle. As promised, BACF is immune to multiple triggers. Here (below), we can see the BACF output consistently, and accurately detects the fundamental frequency despite the presence of multiple edges generated by the Peak Trigger. Here’s the D string waveform one or two seconds after attack (top), the Peak trigger output (middle) and the Continuous BACF output (bottom). Click to zoom.

Detecting Harmonics

Notice those additional squiggly lines in the BACF output? Each of those correspond to the harmonics present in the signal. To better understand what’s happening, take a look at this graphic of a typical BACF frame:

Each notch corresponds to the harmonics (top-left image). Here I am presenting the fundamental plus the second harmonic. Sometimes a signal will evolve with increasing amount of harmonics, typically the second harmonic, as in the case of the guitar. That 2nd harmonic notch will get deeper and deeper until it almost reaches the fundamental’s level. Yet, the fundamental will still have the deepest notch, unlike a note an octave above the fundamental (top-right image), where both notches are of equal depths.

This information can be used to follow the intensities of the harmonics as a waveform evolves over time. It can be useful, say, in synthesis, when we want to capture the nuance of the vibrating string and present them as additional synthesis parameters. Take a look at this annotated BACF waveform:

Now, let us zoom in to that region where the 2nd harmonic almost touches down:

That’s an open D string. Take note that in this case, both notches at the left and right of the fundamental are from the 2nd harmonic.

Now, let’s compare that to the BACF output of the same string, plus an octave, 12th fret:

Here, we clearly have a note an octave higher. All the notches have maximal depths.

Note Onsets

Now here’s another welcome discovery: BACF quite remarkably detects note onsets. Typically (but not always), there’s higher energy happening in this region, but there’s little correlation. Here’s the low-E string onset (top), including the Peak trigger output (middle) and the Continuous BACF output (bottom). Click to zoom.

Aside: That activity before attack, that’s the noise coming from picking the string. I suppose that too can be a useful additional parameter to synthesis, like say, generating some unvoiced sound (e.g. clicks for organ like patches, synthesized pluck, etc.). I notice that that unvoiced region can be longer, depending on picking or plucking technique.

At note onset, there are no deep notches at all. The “Periodicity quality” is very low at that point. Ergo, low periodicity quality is an indication of note onset. It’s quite consistent. Every time there’s a note transition, we see a low periodicity quality. Take a look at the waveforms below of the first four notes of a right-hand tapping arpeggio on the D string (click to zoom):Notice the visible gaps in the BACF output before each note transition? It works so well! I’m pleasantly surprised! The BACF is able to detect non-obvious note-onset transitions that are easily missed by other time domain based onset detection schemes. Take a look at this soft hammer-on, pull off sample:See that gap in the BACF output? There’s no way any time-domain based note onset detection scheme could detect that transition! Yet there it is, exposed by BACF.

Conclusion

I am quite satisfied with the results of my tests. It is getting more exciting, and the more I explore, the more I discover possibilities way beyond what I originally hoped for. These unexpected outcomes are certainly worthy of further investigation and I suspect there’s still a lot to learn. For one, it’s very enlightening how it is even possible to extract harmonic information, up to some extent, solely in the time domain. Time domain note onset detection is another welcome outcome. I was very unhappy with the accuracy and performance of note onset detection schemes until I discovered the possibilities offered by BACF.

This is it! I’m totally sold! You?


Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x