Building recurrent neural networks in Keras Using Textgenrnn

Let's continue our exploration of generating text using recurrent neural networks. In our #9 HTC Seminar Series lecture and accompanying online tutorial, Andrej Karpathy explained all about how to build and train recurrent deep learning neural networks.  

The HTC post for #9 seminar includes links to the tutorial that Andrej put together for people who were interested in implementing his work.  He used Torch in that tutorial, which matches up to the time period the work was done in, but is dated at this point as a platform to be working with for experiments.

Bur fear not, there are better more modern approaches available for experimenting with.  We will cover them below.  So if you are just getting started i would read through all of the Karpathy tutorial material, but then move on to the work below for a more modern approach you can run directly in a Colab notebook in your browser to actually experiment with.

Max Woolf read through Karpathy's tutorials, and then put together a Python package called textgenrnn, which is built to work with the more modern Keras system.  We covered Keras in a previous post 

Textgenrnn abstracts the process of creating and training recurrent neural networks like char-rnns (the model used in Karpahy's tutorials) to a few lines of code.  He also put together a great tutorial on how to use it.

He also put together a Colaboratory notebook that is acessible from his tutorial you can use to run your textgenrnn experiments in.  Colaboratory notebooks are essentially Jupyter Notebooks that are hosted in a virtual machine that includes 2vCpus, 13GB or memory, and a K80 GPU. Yippee, train your recurrent neural network on a virtual GPU in the cloud for free.  We just covered getting started with Colab earlier this week.

You may recall that Karpathy used LSTM for running his recurrent text generation experiments, but kept that out of the code examples he presented to keep things simple.  A Keras feature textgenrnn takes advantage of is a CuDNN implementation of RNNs like LSTM, that translate easily into native GPU code and offer a 7x speedup over previous implementations. 

Max also has another tutorial post that covers how to put together a recurrent neural network for text using the GPT-2 neural network model published by OpenAI.  It also includes a Colab notebook so you can follow along and try it out. Max put together a gpt-2-simple code library similar to his Textgenrnn system to make using GPT-2 easier.

He also tried some experiments using his GPT-2 to generate AI-bots that post on Twitter.  There is another tutorial associated with this experiment.  Again using his gpt-2-simple code library.

Of course GPT-2 is old news now that OpenAI has GPT-3 out in beta.  GPT-3 is the much more powerful followup to GPT-2.  HTC still has not gotten approval for using the GPT-3 beta (most of the people who applied have not been approved yet), but they did set up Max Wolf with early access to the beta.  And he of course put together a blog post that discusses his experiments using it to generate text.

It's interesting to read through Karpathy's tutorial and Max Wolf's 3 blog posts to get a feel for the rapid advancement of the field of text generation using deep learning recurrent neural networks.  

Microsoft just announced that apparently Microsoft will have exclusive license to the OpenAI GPT-3 neural model which they will offer via their Azure cloud service.  OpenAI also has a post that discusses this new development.

GPT is an example of a transformer deep learning network. GPT stands for generative pre-trained transformer. We'll be diving into how it works internally in future posts.


Popular posts from this blog

Simulating the Universe with Machine Learning

CycleGAN: a GAN architecture for learning unpaired image to image transformations

Pix2Pix: a GAN architecture for image to image transformation