Fast and Efficient Pitch Detection: Revisited


Last year, I wrote a series of articles about Bitstream Autocorrelation:

  1. Synth Tracking
  2. Bliss!
  3. Bitstream Autocorrelation

The code has matured significantly from its inception. It is long overdue, but now, as promised, it’s time to write about the technical details. Check out the Q Audio DSP Library where this is being actively developed.

Note: Reading up on the previous articles is a prerequisite for understanding this post.

Quick Overview

For those who tuned in late, Bitstream Autocorrelation, or BACF, is an accurate, extremely fast and efficient, time-domain pitch detection algorithm. BACF can be at least as accurate as any Autocorrelation based pitch detection schemes. Unlike the standard Autocorrelation function, or ACF, BACF works on zero crossings —the Bitstream, using bitwise operations.

BACF is extremely fast. The computation consumes an average of 50 nanoseconds per sample, on a 2016 MacBook Pro, tracking complex guitar phrases that exhibit typical playing techniques such as legato, hammer-on, pull-off, vibrato, staccato, and right-hand tapping.

Autocorrelation, also known as serial correlation, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals. (From Wikipedia)

I was not the first… Initially, I noted that “As far as I can tell, this is my invention, but please tell me if there’s something I am missing and if I’m not the first to actually do it this way”. Recently, I was informed that there exists a prior invention in the 80s for a hardware based approach. The inventor is David Warrender and here’s the patent for his invention: Pitch analyzer. The patent is now expired.

Note: While I developed the code for a very specific source input: the guitar, the algorithm is generic and can be applied to other sources. Venturing outside the guitar realm is something I am very interested in, but unfortunately I have no time for… Yet…

Data Flow

Zero Crossing

A zero_crossing class saves zero-crossing information necessary to extract accurate timing information such as periods between pulses for performing analysis.

Zero Crossing Event

Each zero crossing pulse is saved in a ring buffer of info elements. Data include the maximum height of the waveform bounded by the pulse, the pulse width, as well as the leading edge and trailing edge frame positions (number of samples from the start) and y coordinates (the sample values before and after each zero crossing) of the zero crossings.

Sub-sample accurate period computation can be obtained using linear-interpolation of the x and y locations of each zero-crossing event. See image at the right. The two points P1 and P2 are the previous and current positions of the zero-crossing event from the raw data samples. The actual zero-crossing is the point where the dark blue line intersects the horizontal dotted line.


After collecting twice the BACF window’s worth of zero-crossing information, where the BACF window size is the period of the lowest frequency we want to detect, we transfer the information to a temporary bitset class, which stores the zero crossing bits as 32-bit or 64-bit integers (using the native integer size of the platform). Only significant pulses —those whose pulse heights are above a certain percentage (currently 80%) of the maximum pulse height of all pulses collected, are transferred to the bitset. This will eliminate spurious lower-level pulses and significantly improve detection.


Now we compute the bitstream autocorrelation using the XOR operation. A single XOR operation works on N bits of an integer. We get a speedup factor of N (e.g. 64 for a 64 bit machine) compared to standard ACF, and moreover, integer bit operations are a lot faster than floating-point multiplications.

With XOR you get a one when there’s a mismatch:

0 ^ 0 = 0
0 ^ 1 = 1
1 ^ 0 = 1
1 ^ 1 = 0

After XOR, the number of bits (set to 1) is counted. The lower the count, the higher the periodicity. A count of zero gives perfect correlation: there is no mismatch. Modern CPUs provide operations for counting bits in an integer. These are exposed by modern C++ compilers as built-in functions (e.g. __builtin_popcountll). The bit count, together with the window size, are used to provide a normalized periodicity value that ranges from 0.0 to 1.0, where 1.0 signifies perfect correlation.


Unlike standard autocorrelation (image at the right) where the signal window is successively shifted/delayed and correlated against the unshifted reference signal to get the peaks (or valleys, depending on detection scheme), we only need to correlate using the zero-crossing edges —another significant speedup.

Potential periods are the distances from each of the collected zero-crossing leading edge. For each potential period, we compute the BACF, discard ones with low periodicity, and collect the most significant results. Longer periods typically (but not always) have better BACF results due to better resolution from more samples. But longer periods may also be sub-harmonics of shorter periods. This is taken into account, and detected by determining if the durations are close to integer multiples of a known shorter period.

The sub-collector returns the period (taking advantage of sub-sample accurate computation) of the leading-edge-pair with the best periodicity index, divided by the integer multiple of the period of the shortest leading-edge-pair, with a sufficiently good periodicity index. The sub-collector also returns the periodicity index.

Essentially, the BACF result is only used as a hint used to choose the best edges that demarcate the correct cycle. The accuracy obtained from computing the period from the zero-crossing edges is superior to that found by extrapolating the period from the BACF’s periodicity peak.

For illustration, the figure below exemplifies a specific BACF in action.

The waveform at the top is the raw audio. The middle is the zero-crossing waveform, with non-significant pulses removed by the zero-crossing to bitset transfer filter as described in the Bitset section. The third is the computed BACF periodicity.

There are two distinct candidate periods, p1 and p2, demarcated by edge-pairs, with a duration less than the BACF window (the third, p3 is not distinct; it has the same duration as p2). These periods correspond to the dotted red BACF sections in the graph. It is obvious in this example that the second period, p2, is the best candidate since it has the best periodicity index. The period of p2 is returned, along with its periodicity index (the peak in the BACF result).

Suppose, however that there are more edges that correspond to multiple peaks in the BACF. As described in the Sub-Collector section, we return the period of the best edge-pair (best periodicity), divided by the integer multiple of the period of the shortest edge-pair (with sufficiently good periodicity). The image below is an example with multiple BACF peaks.

Higher frequencies will tend to have more peaks within the BACF window of interest. An octave above the lowest frequency of interest will have two significant peaks. Two octaves above, like in the example above, will have four. We take advantage of the multiple significant peaks to improve precision due to the fact that longer periods will average out to better precision —there are more samples to count (the period is proportional to the number of samples between two edges). That, and sub-sample accuracy from linear interpolation of the zero-crossing edges, contribute to excellent overall precision from the lowest to the highest frequency. 


Consider this BACF snapshot:

That is the BACF of the guitar’s D string as it evolves around 14 seconds after note onset. I amplified the raw waveform to make it clear that the second harmonic dominates the fundamental. Here, it is clear that the BACF still detects the fundamental (the higher peak), but at some point, the second harmonic will become more powerful and its BACF peak will eventually catch up. At that point, without taking into account the history of detected frequencies, the second harmonic will be detected as the fundamental. There will be a sudden octave jump.

The bias section takes into account the history and deals with abrupt harmonic shifts (not limited to the first harmonic only). Sudden jumps are not normal. The bias section compares the incoming frequency with the current detected frequency. If a sudden jump is detected, and the incoming frequency is a harmonic of the previous frequency, within a specific deviation (plus minus a few cents), the bias section will prefer the older stable frequency, using the incoming frequency (which may have shifted a bit), divided by the integer multiple of the previous stable frequency.

Finally, the bias section also employs a 3-point median filter in case there’s true (non-harmonic) shift. The median filter eliminates spurious spikes that can happen every once in a while.


The Q Audio DSP Library includes a comprehensive test suite. Here are the results from the latest tests involving pitch detection:

Test CaseAverage Error (cents)Min Error (cents)Max Error (cents)
Test_middle_C 0.001236 0.00009617 0.00231753
Test_middle_A 0.001998 0.0 0.00552
Test_low_E 0.00003447 0.00003447 0.00003447
Test_E_12th 0.0000359 0.00003447 0.0001258
Test_E_24th 0.00018 0.00003447 0.000606
Test_A 0.00012 0.00012 0.00012
Test_A_12th 0.000001413 0.000120 0.00012
Test_A_24th 0.000264 0.0 0.00108
Test_D 0.000189 0.0000207 0.000339
Test_D_12th 0.000733 0.0000207 0.00164
Test_D_24th 0.00151 0.0000207 0.009556
Test_G 0.0000601 0.0000601 0.0000601
Test_G_12th 0.000206 0.0000601 0.00021
Test_G_24th 0.000345 0.0000601 0.00668
Test_B 0.000643 0.00000166 0.0015
Test_B_12th 0.00436 0.00000166 0.0104
Test_B_24th 0.00332 0.00000166 0.0295
Test_high_E 0.00076 0.0000344 0.00308
Test_high_E_12th 0.00282 0.0000345 0.00805
Test_high_E_24th 0.01705 0.000286 0.0404

Phrase Tests

Here are the updated phrase tests from audio recordings. The left channel is the raw audio file. The right channel pitch-tracks the left channel, synthesizing a square wave, and an envelope-follower and envelope shaper, with control of attack, decay, sustain and release, which then controls gain and filter cutoff.

The tests exhibit typical playing techniques such as legato, hammer-on, pull-off, vibrato, staccato, and right-hand tapping. All the rest are recorded using the Nu multichannel pickup, except for the ShortStaccato which uses a short synthesized sine-burst.

Future Improvements

The pitch detector is very usable now, but it is still undergoing continuous refinement. One thing I want to improve is early detection on or immediately before onsets. Sometimes, there’s a very short “glitch” when the pitch detector does not have a good estimate yet. It is barely perceptible and can just be taken as part of the attack.

Let me quote my previous post on the matter. For reference, here’s the latest updated “Low E” sample:

Prediction and Zero-Latency Attacks

In theory, any Autocorrelation based frequency detector requires at least two cycles. Essentially, a cycle needs to be correlated with the next cycle in order to determine the period. However, one thing I noticed is that we can predict the frequency earlier by looking at the running stream of peaks or the zero-crossings on, or immediately preceding note onset. For the guitar, the prediction is quite accurate given one full cycle. For the low-E string, that amounts to a latency of 12ms (1/82.41Hz). That should be the minimum latency, right? Well, ehmmm…

Go back and listen to the “Low E” sample. Did you notice that there’s no latency at all? Yes, zero latency. How can that happen? Well, I faked it! Attack transients are special. They need not be periodic at all! Non-periodic attack transients are common. For example, Hammond organs are known to have a “key click” (regarded as an unfortunate defect by its designer, but embraced by users as part of the Hammond sound). That violin attack? That’s basically bow noise. The human voice may have unvoiced consonants. That guitar pick attack may sound like “clik”, “clak”, “clok”, “cluk”, depending on where you pick.

Any percussive or maybe even semi-periodic sound below 20ms is fair game and will not sound out of tune to the human ear, regardless of frequency. They will impart character, but will not contribute in establishing the note’s fundamental frequency. Listen to the “Low E” sample again. Did you notice a pitch shift at note onset? No, you can’t. The human ear can’t detect tones with such a short fragment.

There is still that very brief pitch shift if you look at the waveform:

One additional thing that I should mention is my observation that while transitions from a wrong higher frequency down to a correct lower frequency is barely perceptible, the opposite is not true. Transitions from a wrong lower frequency to a correct higher frequency are more pronounced. There are a few of those in the phrase tests above.

Here’s a specific example:

One possible solution is to bias the initial prediction to a higher frequency when the probability of obtaining a good prediction is low. Another possibility is to randomize the initial prediction, if it is indeterminate to begin with. Whatever the correct solution is, just-do-something is better than wait-and-do-nothing. A latent onset, especially with fast attacks, is displeasing to the ear. At any rate, this is one issue that needs further investigation.

Beyond the Guitar

These tests and the development surrounding it, are obviously guitar-centric. I do have tests extended for lower frequencies from the bass guitar, as well as 8-string guitar samples, but those are still guitar samples. I’d definitely love to get hold of samples from other instruments!

If you’ve read this far, then I suppose you must be truly interested. If you want this technology applied to other instruments, then the first thing that needs to be done is to add tests; and lots of it. You can help by sending me recordings of single notes covering all frequencies as well as phrase samples covering various styles typical of the instrument.

Notify of

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Newest Most Voted
Inline Feedbacks
View all comments
Michael Ross
Michael Ross
2 years ago

Have you compared your approach to that of Brian Kakcynski’s?

His latest product is the Unisyn:

2 years ago

>Did you notice a pitch shift at note onset? No, you can’t. The human ear can’t detect tones with such a short fragment.

Well, humans cannot recognise the presence of such minor directly directly, however our brains’ speech recognition system can. E.g. an presence/absence of 30ms onset is what subconsciously makes us realise the consonant is unvoiced.

More scientific details here:

Mixolydian Grey
Mixolydian Grey
2 years ago

If the zero-crossing algorithm gives you the start time and end time of each pulse, couldn’t you use that info directly to determine correlation without needing bitset/XOR operations? Or am I missing something?

For example, let’s say you have two pulses A and B. Pulse A starts at t=20, ends at t=30, and has width=10. Pulse B starts at t=42, ends at t=50, and has width=8. It should be easy to determine that pulse B starts 22 samples after pulse A and that the two match each other for 8 out of 10 sample periods. Count of matching samples = min(width of A, width of B). Count of non-matching samples = max(width of A, width of B) – count of matching samples. Using arithmetic to calculate the number of non-matching samples would give the same result as the XOR operations, wouldn’t it?

Mixolydian Grey
Mixolydian Grey
2 years ago
Reply to  Joel de Guzman

I partially worked out an example this morning, hopefully I can finish within a few days and see if it works. I think I need to make some diagrams. It’s hard to explain verbally. I believe that what I have in mind will achieve the exact same accuracy, but I’m not sure if performance would be better or worse.

I have a struct PulseInfo { start_time, end_time }, and I have a variable sized array of those structs. This info covers the entire sample buffer size. When checking for correlation, I grab two pointers into the array of PulseInfo. One pointer always starts at 0, the other starts in different places as I test different regions for autocorrelation.

The complicated part is that I don’t step both pulse pointers through the array at the same time. Depending on which pulse has a falling edge first, I step that pointer forward through the array and hold the other in place. Rapid short pulses result in comparing multiple small pulses to the same larger pulse. No part of the signal is skipped, but the performance may drop to a point where the cost of all the conditional branching outweighs the cost of bitstream XOR.

When the signal is clean, looking at the edges instead of doing a full bitstream XOR should be faster. When the signal is very noisy, XOR might perform better.

This approach was based on the observation that the bitstream XOR compares long strings of the same bit (111111000000111111), and that we already know the start/end/width of all of those regions. If those strings of the same bit value are long, then there are faster ways to calculate the number of bits that match between two intervals.

What’s the worst case number of pulses within the sample window that you’ve actually observed?

By the way, your site is a fantastic resource for a hobbyist like me; thanks for sharing all of this!

Nils Deitmers
Nils Deitmers
1 year ago

Would someone be interested in developing a pitch detection plugin for Construct3? Users have repeatedly asked for it and for someon familiar with the subject it should be straight forward to offer a pitch detection solution in the store.

Here’s an example of the Construct audio analyzer using the mic input and visualizing the data.

Joel de Guzman
9 months ago
Reply to  Joel de Guzman

Ooops, sorry, wrong email address. It should be joel-at-cycfi-dot-com.

Takahiko Suzuki
Takahiko Suzuki
9 months ago

Hi there, I think you are equally great, even if someone found the same thing before, because you found it from the scratch, just like Takakazu Seki. (I’ve also been involved in pitch detection since university student, from time to time.) By the way, I just found out that someone else have recently “rediscovered” it just like you did, with no references to Warrender and/or you. I just thought they were uninformed in this ages.

Joel de Guzman
9 months ago

Interesting! I do have an updated algorithm that I find to be even faster. If you want to discuss, please email me at joel-at-cycfi-dot-com.

3 months ago

Do you know of or have a suggestion for an arduino integer ‘ADC-values’ based implementation of of your concept?

Last edited 3 months ago by AndyBru
Joel de Guzman
3 months ago
Reply to  AndyBru

I’m sorry, but no, I don’t.

3 months ago

I mean your concept sounds cool. But it would better prove its
performance or capabilities if it could be applied to arduinos. And possibly compare it to
for example Arduino Guitar Pitch Detection : 9 Steps (with Pictures) – Instructables or
ArminJo/Arduino-FrequencyDetector: Fast audio frequency detector without fft for plain Arduino and Attiny85. Whistle switch example included. (

Joel de Guzman
3 months ago
Reply to  AndyBru

That one you posted is a very basic pitch detector for simple (slow) pitch detection. It won’t be able to detect complex and fast-moving signals. Anyone is free to implement my algorithm on the Arduino, of course. I’m sorry, but I won’t have time to do it. I don’t really use Arduinos.

Pascal Mazars
Pascal Mazars
3 months ago
Reply to  AndyBru

Jeff Snyder has ported Joel’s algorithm in his C audio library:
The pitch detector requires a 32-bit floating point unit so it should work on boards like STM32F4, Raspi Pico or Arduino Uno R4. Here is a proof of concept I made using a sinewave generator and Joel’s bacf algo:

Would love your thoughts, please comment.x