Fast and Efficient Pitch Detection: Bitstream Autocorrelation

      DSP, Electronics, Pitch Detection, Software

So, since last year, I’ve been mulling over a unique, and extremely fast(!) Autocorrelation scheme for monophonic pitch detection. Last weekend, I finally got myself to write the proof of concept. It’s not like any autocorrelation scheme I’ve seen before. I am still wondering why no one has thought about doing it this way. As far as I can tell, this is my invention, but please tell me if there’s something I am missing and if I’m not the first to actually do it this way. I dubbed the technique Bitstream Autocorrelation.

Unlike standard Autocorrelation, my scheme works on single bit binary data streams instead of floating point (or fixed point) real numbers. Compared to standard Autocorrelation, Bitstream Autocorrelation is wicked fast. As I’ve been working on multiple channels of audio on small Microcontrollers, I’ve consistently shied away from Autocorrelation schemes for pitch detection (see my original article: Fast and Efficient Pitch Detection). Popular time-domain Autocorrelation (ACF)  based pitch detection, including variants such as AMDF (Average Magnitude Difference Function), ASDF (Average Squared Difference Function),  YIN, and MPM,  are quite expensive in terms of CPU cycles required (ACF is basically an N² operation for N samples).

Autocorrelation, also known as serial correlation, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals. (From Wikipedia)

Zero Crossings

Given a pure sine wave, you can detect its period (and the reciprocal, frequency) by timing the zero-crossings. However naive, it is the simplest form of pitch detection. It works only if you have pure sine waves or fairly clean waveforms such as triangles, sawtooth, PWM or square waves. Such naive detectors fail miserably when you have multiple transitions per cycle like in the example below:

That waveform above contains strong 2nd and 3rd harmonics, even stronger than the fundamental. Such waveforms are quite common! Each cycle contains multiple zero-crossing transitions. One way to “tame” the waveform is by heavily low-pass filtering the signal. Crude, but probably usable, especially if you know the range of frequencies you intend to work with. Unfortunately for me, heavy low-pass filtering is not an option because that will introduce some unavoidable phase lag.

I’ve been looking at these waveforms zipping through the scope for hours on end. In my mind, I can see patterns emerge and I suppose some form of pattern recognition could probably work. Look at that above, you have one fat square followed by two successively thinner pulses. And it repeats.

Surely, you can use Autocorrelation to detect the cycles! After all, ACF is a tool for finding repeating patterns! Even the image in Wikipedia Autocorrelation article makes use of square waves to illustrate what’s happening (illustration at the right).

Bitstream Autocorrelation

So, I thought to myself, instead of applying ACF over the input, why not apply it over the zero-crossings binary stream? Will the result be useful?

But why bother at all? Well, if we know that we are dealing only with binary data, we can optimize the computation using binary operations. Here’s why… in standard ACF you multiply the signal by itself, repeatedly (that’s the f * f operation in the illustration at the right). That, is the most expensive operation. You slide the signal over itself, multiplying the signal with a delayed version of itself. In a tight loop, you do that repeatedly for each sample in the input N/2 x N/2 times, where N is the number of samples. It is possible to use FFT to compute the autocorrelation, but that is still very expensive, especially if you have to process many channels at once, on a small Microcontroller, as I do. And, even with a faster computer, I wouldn’t mind using a faster algorithm for this purpose. DSP operations are already taxing the CPU cores!

And so, with only ones and zeros to deal with, the multiplication can be simplified:

0 * 0 = 0
0 * 1 = 0
1 * 0 = 0
1 * 1 = 1

That’s essentially an AND operation! You can perform 32 such operations in a single shot with a 32 bit integer. 64 with 64 bits. Then, you count all the bits that are set after each iteration. The highest point (yielding the highest count) will be the point where there’s good correlation, as you slide the signal over itself. That’s the highest (black) peak in the illustration at the right.

I thought it was worth a try!

In my rumination, I was thinking to myself that the XOR operation might yield better results. Just a hunch… In the case of the XOR, you have:

0 ^ 0 = 0
0 ^ 1 = 1
1 ^ 0 = 1
1 ^ 1 = 0

So, unlike AND, with XOR you get a one when there’s a mismatch. With XOR, the bits with the lowest count will be the point where there’s good correlation, again as you slide the signal over itself.

Definitely worth a try. I still was not sure if the this will yield good results. So, I quickly hacked some python code ( . And lo and behold, I got this:

Wow, that looks promising! That graph at the bottom is the Bitstream Autocorrelation result. The point where you have the deepest notch (at 100) is where the cycle starts to repeat. The distance between that the start is the waveform’s period.

Holy cow, anyone see Batman peeking there?  😆

C++ !!!

So, that was good. Good enough to go and code some C++. And so I did, and here’s a teaser:

Actual Frequency: 261.626 Hz
Estimated Frequency: 261.626 Hz
Error: 0.000619137 cents
Periodicity: 0.997619

Hey, look! That’s 619μ cents! I actually beat the baseline code from this article: High Accuracy Monophonic Pitch Estimation Using Normalized Autocorrelation which uses a technique called Normalized Autocorrelation (NAC). Read up. He got an error of 2 millicents. Not bad, actually, but mine’s better! Haha!  😎 … Well, perhaps at least for that specific test case. It’s too early to tell how well my scheme performs on a wider range of test cases. I’ll need a comprehensive set of tests!

Here’s my code: That’s a self-contained C++ file. A proof of concept. Right now it will compile only with GCC and Clang, but it’s easy enough to port to other platforms. I was lazy. It was a Sunday! But all that needs to be done is implement the count_bits function.

Acknowledgement: I loosely used some of the support code by Gerry Beauregard from that Normalized Autocorrelation article I linked to above, such as the trick he did for preventing octave errors. Go ahead and read his article. Very interesting read!

More Graphs

So the excitement continues. Now I have more results… Let’s start with a pure sine wave. E at 82.41 Hz. Nothing surprising there. The blue line is the original waveform. The red line is the zero crossings and the bold yellow line is the Bitstream Autocorrelation result which only goes half way through the window (ACF requires at least two cycles to operate on).

Here’s the waveform I used initially. The levels of the harmonics are as follows:

  1. Fundamental level: 0.3
  2. Second harmonic level: 0.4
  3. Third harmonic level: 0.3

Then, I tried A at 440 Hz:

I added some noise on the E (82.41 Hz) waveform. The notch is still distinguishable.

How about some soft noise near DC? Notice that the Bitstream Autocorrelation result just sits there at the top. There’s simply no correlation. The noise is below the zero-crossing detector’s hysteresis (-0.1 and 0.0).

Let’s crank up the noise! That zero-crossing (blue line) gets crazy, like a torrential storm complete with sharks falling from the sky! But the Bitstream Autocorrelation result just sits there at the top. There’s still no correlation (sharks don’t correlate!).

Let’s have more even more harmonics. Here, the 2nd and 3rd harmonics are overpowering, but the fundamental is still distinguishable:

Now, ACF is known to be very good at detecting cycles even if the fundamental is missing! Here, we remove the fundamental. Very interesting!


I am very surprised it works so well and at a fraction of the cost of a full (standard) ACF implementation. There are no multiplications, only cheap XOR operations —an inherently parallel operation, processing 32 (or 64) bits per step where each bit corresponds to one sample. And look at the results! I am amazed! We are dealing only with ones and zeros, and the accumulation of these to get the correlation point is very clean and predictable. This one is definitely a keeper!

Caveats? Well, for one, you need to start with a real positive edge (where the waveform crosses the zero point). You can’t start in the middle of a waveform. With standard ACF, you can start anywhere. But that’s a reasonable tradeoff. We typically start at onsets anyway. Other caveats? Before any further investigation and tests, I don’t know yet. This is, for now, just a proof of concept. Caveat emptor.

OK, well there you go. Next time, I’ll present the actual algorithms.

Further Reading

  1. Fast and Efficient Pitch Detection
  2. Fast and Efficient Pitch Detection: Double Trouble
  3. High Accuracy Monophonic Pitch Estimation Using Normalized Autocorrelation
  5. Sound recognition with neural networks
  6. HPS Algorithm for detecting the fundamental frequency of a guitar string
  7. Adaptive filter pitch detection?
  8. An overview of monophonic pitch detection algorithms
  9. Real time pitch detection
  11. Pitch Detection Algorithms
  12. Accurate and Efficient Fundamental Frequency Determination from Precise Partial Estimates
  13. A spectral/temporal method for robust fundamental frequency tracking
  14. Robert Bristow-Johnson on
    1. How can I measure the harmonicity of an audio spectrum?
    2. What is an AMDF?
    3. How to deal with low fundamental when using AMDF for pitch extraction?