Posts

Showing posts from October, 2022

Dramatron 70B Script Writer

Image
  Alan Thompson gives an introduction to a scriptwriting system put together at DeepMind called Dramatron.  It is based on the 70 billion parameter Chinchilla transformer based language model.  A script created by the mode was presented as a live play at the Edmonton Fringe Festival to positive reviews. The paper 'Co-Writing Screenplays and Theatre Scripts with Language Models : An Evaluation by Industry Professionals' is available here .  It incorporates a recursive process to augment the language mode in tracking what is going on in a script over time. The 70 billion parameter model called Chinchilla that Dramatron is based on is described in this publication and associated blog post .  The goal of Chinchillas was to help answer the question: "What is the optimal model size and number of training tokens for a given compute budget?"  The short answer is smaller models trained on more data.  Chinchilla outperforms GPT-3 and some other larger parameter models. Observat

EvoJax: Hardware Accelerated NeuroEvolution

Image
  Above is a short presentation on the EvoJAX hardware accelerated neuroevolution framework presented at GECCO 2022.  The associated EvoJAX paper is available here .  Another paper that uses EvoJAX is  Modern Evolution Strategies for Creativity: Fitting Concrete Images and Abstract Concepts (NeurIPS Creativity Workshop 2021, EvoMUSART 2022) EvoJAX is a scalable, general purpose, hardware-accelerated neuroevolution toolkit. Built on top of the JAX library, this toolkit enables neuroevolution algorithms to work with neural networks running in parallel across multiple TPU/GPUs. EvoJAX achieves very high performance by implementing the evolution algorithm, neural network and task all in NumPy, which is compiled just-in-time to run on accelerators.

Intelligence Beyond the Brain

Image
  This is a talk by Michael Levin titled 'Intelligence beyond the Brain: morphogenesis as an example of the scaling of basal cognition' presented in September 2022.  Michael's research is really fascinating, and provides a lot of food for thought about how biological systems actually work that goes way beyond conventional thinking about how that works (the genome absolutely defining everything) .  He also ties the mechanisms for how intelligence the brain works with how intelligent behavior happens in collections of individual cells acting as a collective.   This new way of thinking has a lot of potential applications in bio-medicine and also artificial intelligence.  I was first introduced to Michael's research in a really great  podcast interview with Lex Fridman.  The Levin Lab website is here . A description of the talk is as follows: Each of us takes the remarkable journey from physics to mind: we start life as a quiescent oocyte (collection of chemical reactions)

LSTM is Dead, Long Live Transformers

Image
  This is a talk from Dec 2019 on how LSTM models for natural language processing have been replaced by transformer based models.  It was referenced in the Stanford intro to transformers talk in yesterdays post, and is a great source of additional introductory information on the basics of the transformer architecture. The speaker gives a good overview of the history of natural language processing, from bag of words to recurrent neural nets (RNN) to the LSTM architecture that supplemented RNNs to how transformers then supplemented both RNNs and LSTM.  He also gives an overview of transformer architecture specifics like multi-headed self attention and positional encoding.

Transformer Deep Learning Architecture Overview

Image
  This video is an introductory overview of the transformer deep learning architecture.  It is the first introductory lecture in the Stanford CS25 seminar on transformers.  It includes an overview of transformers, attention mechanisms, self attention, encoder-decoder architectures, and applications of transformers.

Transformers for Applications in Audio, Speech, and Music

Image
This presentation is a seminar on Transformers for Application in Audio, Speech, and Music by Prateek Vermma of Stanford University (CCRMA research).  It is a lecture in their very good Transformers United course that covers recent (as of fall 2021) developments in transformer architectures. There are some very good take away messages form this presentation.  How to beat the wavenet architecture. How to use K-means clustering to convert continuous embeddings into a discrete representation that transformers seem to love processing.  How to incorporate ideas from wavelets into the transformer mix.  The advantages of a signal specific learned adaptive front end to the transformer system.  

Devoxx keynote on Artificial Intelligence

Image
  Dr Alan Thompson provides a great overview of recent artificial intelligence developments in his keynote address at the October 12, 2022 Devoxx conference.  He discusses deep learning Transformer architectures, RoBERTa, GPT-3, Pathways + PaLM, Chinchilla, Google Imagen, Google Parti, NUWA-Infinity, Google Imagen Video, and much more. I really liked this presentation because it helps showcase the explosive exponential growth we're currently experiencing in the AI field. He also showcases the potential economic implications of this trend over the next 10 years, eclipsing those created by the development of the internet.

Progressive Distillation of Fast Sampling of Diffusion Models

Image
  This is a paper overview presentation of the 'Progressive Distillation for Fast Sampling of Diffusion Models' paper , as well as the follow on 'On Distillation of Guided Diffusion Models' paper . Progressive Distillation is briefly mentioned in the Imagen Video paper . The two papers discussed in this presentation explain the details of what progressive distillation means. Observations 1:  The whole student-teacher paradigm to cut down the number of iteration cycles in the diffusion schedule is pretty interesting.  And probably is a more general thing that could be applied to other problems. 2:  Note the whole explanation about why you want to progressively reduce the cycles as opposed to doing it all in a single step.  To avoid a blurry output due to averaging multiple potential valid solutions when you try to do it in one shot. 3: Note how the second paper reduces the computation for classifier-free guided diffusion. So rather than running the model twice with the p