Bitstream Autocorrelation (BACF) is fast and accurate. What can be better? Well, two BACFs in parallel!
No PDA (Pitch Detection Algorithm) can achieve 100% accuracy. And the constraints of real-time pitch detection has proven to be a real challenge. BACF can at times give wrong results. For example, here’s the result of one of the tests (the hammer-on-pull-off example):
Take note of the two spikes. For all the tests I have thus far, I am achieving error rates less than 1%, which is pretty good. But it is not perfect. The spikes stick out like a sore thumb.
Now, the astute reader will probably want to comment that these outliers can be eliminated using a median filter: a non-linear digital filtering technique, often used to remove noise. And indeed, that is what I did, up until recently. The problem with using a median filter is that it can delay detection at note onsets. Such additional latency is undesirable especially in the lower frequency ranges.
BACF is wicked fast! Here are some numbers from my tests (using a 2015 i7 MacBook Pro):
“Tapping D”: 46.5438 nanoseconds per sample.
“Hammer-Pull High E”: 47.9835 nanoseconds per sample.
“Slide G”: 46.0815 nanoseconds per sample.
“Bend-Slide G”: 45.1953 nanoseconds per sample.
“GStaccato”: 46.1376 nanoseconds per sample.
“Attack-Reset”: 47.3524 nanoseconds per sample.
These numbers mean that BACF pitch predictors can process hundreds of channels before saturating the CPU.
Let me remind you that, so far, BACF development is guitar-centric. I do have tests extended for lower frequencies from the bass guitar, as well as 8-string guitar samples, but those are still guitar samples. Check out the Q Audio DSP Library where this is being actively developed. Feel free to join the discord channel or the FB Group for discussion.
So what can be better? Dual predictors per channel! By simply inverting the signal, BACF produces slightly different results. There are still a few errors, like the one above, but in different places. What’s happening is that BACF is now looking at the negative edges (the rising edges become the falling edges). If you look closely at the waveforms, the signal is typically asymmetric:
This is the main reason why simply inverting the signal can give slightly different results. So, if we have two pitch predictors, one looking at the non-inverted signal, and another at the inverted signal, we can use the first to corroborate the other. We do that by computing the median of three on each prediction: 1. The previous predicted frequency, 2. The non-inverted signal prediction, and 3. The inverted signal prediction. Doing this, we decrease the error rates significantly. Here’s the result:
Now you must be thinking: but how does that impact CPU usage? Double? Well, surprisingly, no. Here are the numbers from my tests (again, using a 2015 i7 MacBook Pro) with the dual pitch detector:
“Tapping D”: 58.5506 nanoseconds per sample.
“Hammer-Pull High E”: 60.4616 nanoseconds per sample.
“Slide G”: 54.4627 nanoseconds per sample.
“Bend-Slide G”: 54.3654 nanoseconds per sample.
“GStaccato”: 56.9808 nanoseconds per sample.
“Attack-Reset”: 56.8274 nanoseconds per sample.
Interestingly, the onset-detector I am working on uses up more CPU time than the dual pitch detector! The reason is that the onset detection algorithm requires more real-time computation immediately as each sample arrives. The pitch detector, on the other hand, can batch up (collect) at least two cycles before doing its work.
Oh and BTW, if you are already using my pitch detectors, I’d appreciate it if you drop me a line. I’d love to know what you are doing with it.