DDSP - Differentiable Digital Signal Processing
DSP (digital signal processing) is the technology behind so many different wonderful things in our modern world. Digital guitar effects, digital amp modeling, music synthesizers, audio plug in effects for DAW. We haven't even left the world of musical applications, and already the list of applications gets longer and longer. Digital compression algorithms alone (for audio and speech, for images, for videos) have transformed society. Feel like streaming video is destroying civilized life, blame DSP for making it possible. Or is it all Claude Shannon's fault.
DDSP extends the power of conventional DSP by making it learn by example. So rather than hand designing a particular architecture and associated algorithm to perform some task, we can just put together a database of examples, and then let the system learn the correct solution to the problem we have data examples of that we want to solve.
Now this should sound very familiar to what is going on in neural networks. And DDSP is really just an example of a certain kind of neural network architecture that constrains solutions to a set of adaptive models in the specific DDSP architecture being used for training.
And the key to how all neural networks learn is that they use gradient descent to reduce some error metric while being trained on some data set. DDSP is very similar. The extra D at the beginning stands for differentiable. So we're taking standard DSP processing blocks and extending them by making them differentiable, so that they can learn just like a neural net learns by using gradient descent.
I did my graduate work in electrical engineering, but my specialization was in digital signal processing (DSP). DSSP did not exist when i did my graduate studies, much to my misfortune. Because it's really a remarkable extension of DSP that seems very obvious in hind sight.
Let's take a look at a quick introduction to DDSP.
The work at google they are referring to in the video is discussed in more detail in this magenta project blog post.
It includes a link to a github post of their code.
The paper is available here.
The google work is specifically focused on sound and music synthesis. But the ideas behind what they are doing is much more extensive. One could imagine using fixed differentiable functional blocks in a conventional neural network architecture. One example off the top of my head would be to use fixed vision functional processing blocks (that are differentiable) in combination with a more conventional neural network architecture.
Are their advantages to this approach for certain classes of vision problems? Like there are in the case of audio processing.
I was first exposed to the magenta project DDSP work in Anna Huang's presentation in the 2020 AI Music Creativity conference we discussed in this post. Thus once again pointing out the importance of staying current with what is being discussed in workshops and conferences. And that topics like Music Creativity are just as important as what some would consider more 'serious' applications of deep learning (whatever those serious applications are).
I also think there is some connection (maybe only in my head by i'm going to mention it anyway) between this research and the old Cascade Correlation architecture for deep learning developed by Scott Fahlman. I'm familiar with it because i was working with it along with more conventional PDP group neural net architectures when i was researching applications of neural networks for image style transfer back in the 90s. Nodes in that architecture could be any functional block, you were't restricted to just linear weights and a nonlinearity at a node.