Getting to know VGG16

VGG16 is a deep learning neural net.  Developed for image classification.

It's a convolutional neural network model proposed by K. Simonyan and A. Zisserman from Oxford University.  You can read all about it in their paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”.  Here's a good overview of VGG16 on neurohive.

People care about it because the model achieved 92.7 % top five test accuracy for the ImageNet test.

ImageNet is a training set of labeled images put together to use as a benchmark for  different computational approaches to image classification.  It is composed of over 14 million labeled images belonging to 1000 classes.  The images were collected from the web and then labeled by human labelers working for the Amazon Mechanical Turk crowd sourcing tool.

VGG16 assumes the input image is of size 256x256.  It also has some assumptions about data normalization in the 256x256 input image.     All of the models we discuss use 224x224 for the default input size.

So what do you do with VGG16?  You could use it to classify images.  But you could also use it as a pre-trained neural net starting point for transfer learning.  You freeze most of the layers and associated parameters in the model, while allowing others to be optimized for the new task we want the net to perform via transfer learning.

There are pre-trained VGG16 models available as Keras, and Pytorch models.

Note that if you google search a lot of other versions of these models will come up created by various people that may or may not work properly. I selected these 2 links because they go to official models hosted directly by Keras and PyTorch respectively.

Here's an example of how you would go about working with VGG16 in PyTorch.

First you want to load the pre-trained model.

torch model = torch.hub.load('pytorch/vision:v0.6.0', 'vgg16', pretrained=True)

Then you want to get an image, pre-process it so it has the correct normalization.

from PIL import Image
from torchvision import transforms
input_image =
preprocess = transforms.Compose([
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model
Then we can run the model on the normalized image.

with torch.no_grad():
    output = model(input_batch)
# Tensor of shape 1000, with confidence scores over Imagenet's 1000 classes
# The output has unnormalized scores. To get probabilities, you can run a softmax on it.
print(torch.nn.functional.softmax(output[0], dim=0))

I selected VGG16 for this example because there are a number of differeent good transfer learnign examples that use it.  Things will get more interesting in our next post, when we look at a transfer learning example based on VGG16.

I should point out that VGG16 is pretty computationally intensive.  There are other more recent approaches that have somewhat better performance with far less computational load.  But maybe the way it is constructed with a super dense deep net offers some advantage for transfer learning?  Or not? Just wondering.

The way you work with the Keras pre-trained models is essentially the same. You load the model you want to use, you normalize any images you want to run through the system, you use a simple set of calls to run the model and get the result.

You may have noticed that there is a VGG19 model.  And be wondering why we didn't use this more recent model.  It was not available when some of the transfer learning work we are looking at was originally conceived. But the cool thing about working with standardized pre-trained models is that you can essentially just swap the newer model in and get any performance improvements.  We can take a look at that closer when we delve into the examples.

Yann LeCun had a very interesting slide in the talk that we posted to our HTC Seminar Series that plots the accuracy vs the number of operations the neural net  computes for a wide variety of different deep learning neural net architectures.

Note that both of the VGG architectures on the plot above sit very far to the right.  So in some sense it could be considered an over complete representation compared to the various architectures at the top left of the plot.  Whether this over completeness helps (or hurts or just doesn't matter) transfer learning is an interesting question (in my mind at least).

In the HTC Education Series: Getting Started with Deep Learning course, the ResNet architecture is used for a number of the different fastai code examples.  Like the various classification exercises.  But when Jeremy covers deep learning architectures that are generative, he is using VGG instead of ResNet.


Popular posts from this blog

Pix2Pix: a GAN architecture for image to image transformation

CycleGAN: a GAN architecture for learning unpaired image to image transformations

Smart Fabrics