GPT-3 Roundup Discussion and Demos
Here it is in all it's glory. Hot off the presses (very recently posted). An extended panel discussion concerning GPT-3. Where learned individuals will discourse. One of them (Gary Marcus) is a notorious deep learning curmudgeon, so be warned. The others are all pretty enthusiastic (the moderator a little bit less than some of the others, and the linguist is a linguist so he's always going to have issues). It's all quite fascinating.
The beginning is kind of a fast paced, somewhat chaotic overview of key insights discussed at different points in a somewhat long series of different discussions, and then the rest of the video is the actual discussions and some experiments with GPT-3 by the moderator.
1: Gary said something about DeepMind's Atari game system that was very misleading. He points out that if you move the location of the game paddle, then the trained system is no longer able to play the game. Implying this brittle-ness, and that the system didn't really learn how to play the game.
I see this as just a data augmentation issue. And it's an important distinction that Gary overlooks (probably because he has his own agenda in trying to rip apart the performance of deep learning systems).
Think of an image classification system. So if you naively train it on images of dogs and cats, and then you horizontally or vertically flip the input image, and then the trained system no longer works as well. A direct analogy to what Gary is talking about.
Sure, the system learns what you trained it on. If you introduce data augmentation in the training (randomly flipping and otherwise distorting the input), then the system will learn the correct transformation. And it now works on thinks like flipped, or distorted input images.
2: Conner briefly mentions manifold learning and latent spaces as another way to think of how deep learning systems work. This is the key concept in my mind.
I was listening to an interview of Ian Goodfellow by Lex Fridman, and Ian made some comment about how his thinking on neural nets had evolved and he thought of them as learning a series of programming steps, and i remember thinking (with all respect to Ian) that was disappointing that he thought of it that way, because you really want to think of these systems as learning a low dimensional manifold mapping of the data.
And why does this work. Because data about the real world actually lives in lower dimensional manifolds (lower compared to the total variability of the components of the signal). The neural net is a universal math function approximator. And the function that it is learning is this manifold mapping function.
Even though they discuss manifold learning, 5 minutes later they drop back into the misleading 'it's computing a program' mindset.