Before I jumped into this rabbit hole, I wouldn’t have thought that onset detection would be so deviously tricky! I just wanted to generate ADSR envelopes from the multichannel guitar signal. Simple, right? It looked easy, until I actually tried to implement it. But now I think I have something that works really well. IMO, the real acid test for true onset detection and ADSR is when you can transform very fast legato with hammer-ons, pull-offs, etc., into staccato, much like a banjo. I emphasized true because I am aware that many implementations simply fake it by treating whole legato runs from start to finish as one single note-on, note-off. Such implementations use glorified transient detectors, instead of a true onset detector. If you know anything that can reliably do this in pure DSP (no additional hardware, and in real-time), I’d love to know!
I know… I really meant to share the code when I posted about it the first time: Onset Detection. But in the end, I decided to minimize exposure of such intellectual properties, which lead me to Rethink Open Source. My apologies if I disappointed anyone. It was a pragmatic move.
That being said, please take note though that I will still share my insights on my DSP Adventures, both successes and failures, in a narrative form. I’ll still share as much as I can, reasonably, as long as my intellectual properties are protected. More importantly, if there’s sufficient interest, I can package the code as reusable binaries that people can license in SDK form. I can make the binaries available for evaluation, and I can share proofs, along with my thoughts, on the validity of the algorithms.
And so, the onsets… How does it work? I have a transform (called HS-Detector) that converts audio to something that you can use to easily identify onsets. Without revealing my secrets, and the actual implementation, let me just say that HS stands for Harmonic Shifts. In other words, this transform detects harmonic shifts in the waveform. I suspect that alone is enough hint for the smart DSP folks reading this.
The waveform below shows the HS-Detector in action. This is the same hammer-on-pull-of lick I presented in the original article:
Download link: hammer-pull.wav
Below, you can see the original audio in the first waveform and the HS-Detector result in the second:
Zooming closer, you can see the clean edges where the onsets happen:
Here’s another one. This one involves right-hand tapping with left-hand hammer-ons and pull-offs:
Download link: Tapping D.wav
Here’s the audio (first waveform) followed by the HS-Detector result (second waveform):
Again, zooming closer, you can see the clean edges where the onsets happen:
Aside: Isn’t it amazing that the human eye can detect the note-shifts by just looking at the waveform? We are excellent at pattern recognition.
Let’s zoom into the tricky parts. The second note onset in the waveform below involves very slight shifts, almost no amplitude jump. This happens when you release the right hand without doing a pull-off, while the string is still sustaining. Essentially, the frequency of the sustaining string is modulated.
The note transition on this one is even more subtle, yet it’s still detected:
And how about this one! The human eye can still catch the transition if you look very carefully. The HS-Detector still detects the onset effortlessly, and with flying colors.
Finally, and if you haven’t noticed yet, the HS-Detector is also able to detect vibrato. Now I can really bet that this is already a giveaway for the smart DSP folks reading this :-). Can you make a guess what DSP I am doing to make this happen?
Looks good? Are you convinced yet, without seeing the code?