Transformers for Applications in Audio, Speech, and Music

This presentation is a seminar on Transformers for Application in Audio, Speech, and Music by Prateek Vermma of Stanford University (CCRMA research).  It is a lecture in their very good Transformers United course that covers recent (as of fall 2021) developments in transformer architectures.

There are some very good take away messages form this presentation.  How to beat the wavenet architecture. How to use K-means clustering to convert continuous embeddings into a discrete representation that transformers seem to love processing.  How to incorporate ideas from wavelets into the transformer mix.  The advantages of a signal specific learned adaptive front end to the transformer system.


Popular posts from this blog

Simulating the Universe with Machine Learning

CycleGAN: a GAN architecture for learning unpaired image to image transformations

Pix2Pix: a GAN architecture for image to image transformation