Showing posts from April, 2021

Representing Scenes as Neural Radiance Fields for View Synthesis

 View synthesis in the context of this discussion involves the synthesis of any view of a 3d scene or object(s), given only a small sparse set of inputs images. So why does this even work in the first place? Once again, we enter the domain of manifold representations. Real world data lives on a lower dimensionality manifold. Higher than the 2D, higher than 3D, higher than 4D, but still lower than the potential variability in the data. View synthesis exploits this. Remember, neural nets do non-linear function interpolation. The function is the manifold the view synthesis data lives on. By moving along the surface of the manifold, you can change the view. And with those words of wisdom, let's dive into our good friend Yannic Kilcher's discussion of the Berkeley NeRF paper. You can check out the paper here . The project website is here . Observations 1:  It's fun to think about how to exploit this whole approach in more generic generative image synthesis schemes. 2: Modern cam

HTC Updates - April Roundup (deep learning)

 Its almost the end  of the month.  Time for another installment in HTC Updates.  With extensive help from our friend at Henry AI Labs. The idea of using Quaternions to build deep learning architectures is pretty interesting.  We will cover that in more depth in a future HTC post. The GAN Survey video looks interesting.  We will also cover that in a future HTC post. PyTorch Lightning is currently the best system to build deep learning architectures with using PyTorch.  More evidence of that here.

Checking Out the Nvidia GX Station A100

 This was announced at GTC last week.  Pretty slick.  The ants that camp out in my Hawaii office would love it. Let's check it out.

Intelligence and Generalization

 So this video is an exposition' in the beginning, followed by a conversation with Francois Chollet about what deep learning does or does not do. I don't get the 'controversy'. Yes, neural nets do interpolation. We've known that since the late 80's (at least).  They do nonlinear function approximation.  They can theoretically interpolate any nonlinear function. The reason why this is interesting is because real world information lives on a low-dimensional manifold the neural net can learn to model. Thinking of neural nets as 'running a program' is just a way to mislead yourself.  MLST makes that mistake over and over again in the discussions on their videos. When analyzing an interpolative mapping system, what you care about is the manifold surface it is interpolating, not the mechanism used to build the interpolator . Onwards. Observations: 1:  What exactly is generalization that is not interpolative? Wouldn't it not be generalization?

GLOM model deep dive

 GLOM is a computer vision model proposed by Geoffrey Hinto to decompose an image into a 'parse tree' like structure to decompose and represent objects and their parts.  the parse tree is constructed dynamically, without changing the underlying neural net structure. It's a followup to Hinton's Capsule architecture to try and address some of it's limitations. Our previous HTC Seminar featured a lecture by Hinton that covers this material.  So you should watch that first. You can read the GLOM paper by Hinton  here . You can snark at reddit snarks here . Let's start off with our old fried Yannic Kilcher's deconstruction of the paper. So how does this all relate to visual modeling studies? How does this all relate to scale space representations? How does this all relate to manifold learning?

Adaptive Discriminator Augmentation

 This video from DTC this week caught my eye.  It's about a technique to train GAN systems with limited data. By utilizing data augmentation to artificially expand the range of inout data used by the model for training.   The discussion on 'leaking' of augmentations into the Generator probability  model is interesting.  Is it relevant to other recent unsupervised learning techniques? Here's some more specific info on what the video is talking about. Here's a link to the paper on Adaptive Discriminator Augmentation for GANs, titled 'Training Generative Adversarial networks with Limited Data'. Here's an analysis of the technique described in the paper. Observations The 'mean of the system output representation' image kind of makes you wonder what this system is really learning?  Note how the output images are always so closely spatially matched up.  Just saying?  

Denoising with Bias-Free CNNs

 More information on this concept of removing the bias terms of a neural net trained to denoise images, and then using that as a prior for other tasks.  Removing the bias leads to increased generalization. The project page for this bias free denoising research is here . The githib link for bias free denoising is here . The arXiv page for the paper is  here . You can watch the paper presentation titled 'Robust and Interpretable Blind Image Denoising via Bias-Free Convolutional Neural Networks' from ICLR here . You can read the associated paper pdf  here . There is a related paper titled 'How do neural networks denoise natural images?' you can fid here . ... You can use this denoiser model as  a prior for solving linear inverse problems. The githib associated with this research  is here . ... Image priors, manifolds, and noisy observations Visual images lie on a low-dimensional manifold, spanned by various natural deformations. Images on this manifold are approximately eq

Migrating to Qt 6

 Qt 6.0 was release right at the end of 2020.  They just released Qt 6.03.  This video provides some pointers for how to migrate your existing Qt5 code to Qt 6.  And also covers some of the new features in Qt 6. You will notice that ARM native support for mac os has slipped to September, for the planned Qt6.2 release. The new graphics architecture is of particular note. We have discussed that in more detail in previous HTC posts. Watch out with that QList port info they mentioned.

HTC Seminar Series #34: How to represent part-whole hierarchies in a neural net

Geoffrey Hinton gave a really great talk at GTC this week. One among several neural net luminaries.  The free virtual conference is well worth checking out from the comfort of your pandemic refuge. You would need to register for GTC 2021 to access and view the GTC talk on their web site. I did find another talk he presented in January 2021 on the same topic (same slides as the GTC talk except a few more of them) , so you can check this out below as this week's HTC seminar talk. Observations 1:  Obviously influenced by Cortex columns in the brain. 2:  Obviously influenced by forward-backward flow of info in the visual system in the brain. 3:  Part hierarchy is really an alternative approach to create a scale space prior in the architecture (in my opinion). 4:  Best explanation for what a transformer is really doing i have heard yet. 5:  Implicit function decoder. 6:  Is a part hierarchy just another sub-manifold? It's a constraint on the possible space of representations. 7:  Cl

Making Machine Learning Art in Colab - Part 1

 We're going to be presenting series of videos on the topic of making machine learning art with colab.  They are from a course put together by Derrick  Schultz and Lia Coleman. We have talked about colab quite a bit here on HTC, so you've probably run into it before if you have read other HTC posts. It's a way to run Jupyter notebooks on a google server that allows you to use GPU resource to run the notebook for free.  They were used heavily in both of out HTC Education Series courses on deep learning. The focus of this course and accompanying series of videos is to get digital artist comfortable enough to work with pre-built colab implementations of different machine learning models.  Machine learning is a pretty broad term, and what we are really talking about is deep learning neural networks, primarily ones that implement different generative models. Most if not all of the models have already been covered in some form here at HTC.  And we'll be diving into more of th

Latent Variable Energy Based Models (LV-EBMs)

 It's time for another episode in the NYU Deep Learning Course videos (they are all excellent and worth watching in order). Alfredo schools us on latent variable energy based models LV_EBM.  You may recall from other HTC posts that Yann LeCun's energy based model theory of all things neural net generative is pretty slick, like a unified field theory for neural networks. Let's check it out.

GTC 2021 Keynote

 GTC 2021 conference is happening this week virtually.  Looking forwards to Jurgen's talk.  The other 3 'fathers of deep learning (ie: Turing Award winners)' are also speaking. Nvidia CEO Jenson Huang once again addresses us from his kitchen, schooling us on the latest developments in Nvidia GPUs, graphics and AI technology.  Feel free to skip the 18 minute commercial at the beginning before he comes on live. Observations 1:  Are we really living in the 'metaverse'?  Neil Stephensons's Snow Crash novel is a story about a dystopia at it's core.  Surely we can come up with something better as a metaphor to strive for. 2: Omniverse is cool as a tool kit. Keep meaning to look into using the api for something more artistic. 3: Interesting how deep learning is now the key driver of increased GPU services in the cloud. 4:  Look at all those ARM cores on that Bluefield 3-3 DPU. So this is where we continue Moore's law growth?  Now please stick DGX superpod in a

ImageNet-trained CNNs are biased towards texture

 This is a talk by Robert Geirhos presented at ICLR in 2019 titled ' ImageNet-trained CNNs are biased towards texture: increasing shape bias improves accuracy and robustness'. Key sentences from their abstract below: "We show that ImageNet-trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence and reveals fundamentally different classification strategies. We then demonstrate that the same standard architecture (ResNet-50) that learns a texture-based representation on ImageNet is able to learn a shape-based representation instead when trained on 'Stylized-ImageNet', a stylized version of ImageNet. This provides a much better fit for human behavioural performance". Observations: 1:  Once again, the importance of the prior information inherent in the training set. 2:  Shape bias in training set leads to noise robustness.  What does this tell us about how we should approach data au

Making Use of a Prior Implicit in a Denoiser

 This is a recent talk from Eero Simoncelli of NYU that covers examining and using a prior implicit in a denoiser.  This is really a talk about human perception and perceptual representations of images that uses denoising as an entry into that exploration. Let's check it out. Observations: 1:  Removing the bias in CNN makes it work better (better generalization) . 2:  System learns adaptive filters (directional based on input statistics of source and noise). 3:  Projection into a low dimensional space.  Sharp scale space, not blurry. 4:  Spatial derivatives at the high end of the subspace. 5:  Visual images lie on a low-dimensional surface (manifold). 6:  Using a denoiser to do iterative gradient ascent to get to a manifold surface.  This ends up being a generative model of natural images. 7:  Adding a little bit of injected noise to #6 makes it converge better. 8: You can use this 'learned denoiser system' to solve linear inverse problems.  Problems like inpainting, random

NeuralLink Update

 Cool new live neural link demo just came out.  Should we add this as a new modulator option in the Studio Artist V5.5 paint synthesizer?  Check it out. We covered neural link in depth in this HTC post.  Here's some more neural link update information below.  ArsTechnica has a new article related to this recent monkey playing pong with it's thoughts video here .

Formal Reasoning and Program Synthesis

 This is a discussion between Christian Szegedy of Google Research and the gang at Machine Learning Street Talk.  It covers a lot of ground.  One area is whether we can use neural net systems to auto-formalize the methodology of practicing and utilizing mathematics.  Wolfram research is jumping up and down at this point, saying 'hey guys, we already did that' (the auto-formalize part) . Another area is discussion about Transformers and how they work.  Hot topic. Perhaps over-hyped in my humble opinion (ai lemmings running towards a transformer cliff) , but that's for another post. I'm not sure what i think about this notion of neural networks 'running programs'. Tim's comments at the very beginning about neural networks being systems that interpolate on a lower dimensional manifold get's to the exact heart of how they work.  If someone writes a program to implement a radial basis function interpolator, do we really care about the program, or just the fun

Deep Hierarchical Variational Autoencoder

The Nouveau variational autoencoder is a deep hierarchical VAE built for image generation that uses a stack of depth-wise separable convolutions and batch normalization to do the work.  Here is a link to the paper titled 'NVAE: A Deep Hierarchical Variational Autoencoder'.  Yannic Kilcher runs us through an explanation of the paper below. We recently had a HTC post on the followup architecture to this work you can check out here . Also, make sure to check out Max Wellings lecture on unifying VAEs and Flow architectures here . It's probably also time to dive back into David McAllester's presentation on VQ-VAEs. And just to drive it all home, let's check out this 2020 lecture on variation autoencoders from Paul Hand.

HTC Seminar Series #33: Probing Sensory Representations

 Today's HTC Seminar Series presentation is by Eero Simoncelli titled 'Probing Sensory Representations, and was presented at the MIT Brains, Minds, and Machines summer course in 2015.   This is an awesome lecture, and he very clearly lays out how we can take this concept of visual metamerism, and extrapolate it from color metamerism to texture metamerism to more elaborate higher order perceptual metamerism.  And how that maps to increasingly higher order models of the human visual system. Observations 1: I think you can push the whole visual metamerism angle much higher up the perceptual foood chain than he does in this talk.  Ask yourself why to GAN systems even work at all?  Why does the fake output look real to people?  I belive you will find the answer right here. 2: The texture synthesis work he is describing pre-dates neural style transfer by quite a bit.  I was quite familiar with it at the time it was done, but kind of missed the iterative gradient descent part for the

Reverse Engineering Visual Intellligence

 This is a talk byJim DiCarlo from MIT at CCBM 2018 on 'Reverse Engineering Visual Intelligence'.  Jim's lab does really great work on modeling and understanding the IT cortex (responsible for object recognition). There's a really great slide in this talk that shows off the ability of various kinds of computational models to represent specific details associated with IT cortex behavior. It's interesting that CNN AI models developed for ImageNet recognition originally surpassed the computational visual models, but then started to actually get worse even as those CNN models got better at the ImageNet Recognition task.

Lie Group and Human Visual Modeling

 A Lie group is a differentiable manifold (that means it's smooth). A Lie algebra is a vector space together with an operation called a Lie bracket, which is an alternating bilinear map that satisfies the Jacobi identity. A Jacobi identity is a property of a binary operation that describes how the order of evaluation affects the result of the operation. Both a vector cross product and a lie bracket operation satisfy a Jacobi identity. Any Lie group gives rise to a Lie algebra, which is it's tangent space at the identity. Any finite-dimensional Lie algebra over real or complex numbers has a corresponding connected Lie group (unique up to finite coverings). Ok, that was all very clear, right. So why do we care about this stuff anyway? In physics, Lie groups appear as symmetry groups of physical systems, and their Lie algebras (tangent vectors near the identity) may be thought of as infinitesimal symmetry motions. Thus Lie algebras and their representations are used extensively in

iTheory: Visual Cortex and Deep Networks

 iTheory is not a registered trademark of apple computer, although they will probably claim that they own it. What it embodies is a theory that describes a hierarchical feedforward network model of processing in the ventral visual pathway of the primate brain.  A pathway that supports invariant object recognition. Tomaso Poggio of MIT takes us on this iTheory journey.  Lets dive in.

Very Deep VAE (VDVAE) Architecture

 U-Net and VAE architectures are like peanut butter and jelly, you just want to smash them together and see what happens. And the paper 'Very Deep VAEs Generalize Autoregressive Models and Can Outperform them on Images' by Rewon Child does just that.  You can find it here . As you can see from the block diagram above, we have the classic U-Net with the horizontal skip connections mashed into a VAE encoder-decoder structure. Average pooling and nearest neighbor up-sampling for the pool and unpool layers. So immediately you could restructure that part to do better (since the whole point is image synthesis). Put on your thinking caps. GELU non-linearity threw me for a minute (instead of ReLu).  You can read about it here . A transformer thing apparently. They claim that N-Layer VAEs are universal approximators of N-Dimensional Latent Densities. So the scale space prior imposed by the depth on the computation is why? You can check out the PyTorch implementation here . Observations

The Synapto-Dendrodendritic Web

 Yesterday's post made a reference to the synapto-denroendritic web, which made me shake my head and go 'what's that now'.  To quote from yesterday's reading: 'Is there a cortical anatomic substrate where these wavelets are formed and interact? The answer is in the affirmative. The study of these receptive electric fields is known as holonomy, and the plane waves carry out their action in the phase space, the Hilbert space. We know that the distal ends of axons split into teledendrons and form a web of interconnected fibers. These dendrites communicate through electrical and chemical synapses. Electric recordings reveal oscillations of depolarizations and hyperpolarizations of electric potential differences without electric currents. These oscillations in different cortical slices intersect with one another, producing waves of interference. Different neuronal ensembles overlap in this assembly of interfering complex plane waves, in a holographic manner.' Tha

The Biophysics of Visual Edge Detection

Here's a question for you.  Is the strong visual perception of edges in images a function of the correlated response of multiple spatial frequency channels all being activated simultaneously? With that question for the ages in mind, let's dive into a current review of the basic principals of visual perception.  And the peer review article from 2020 title 'The Biophysics of Visual Edge Detection: A Review of Basic Principals', which you can find here . An it's nice to find a great review article on this topic from within the last year, as opposed to something from Campbell and Robson, or Hubel and Wiesel. And what do we learn. 1:  A gabor filter is the inner product of a gaussian distribution and the wave function.  And we see the fourier transform of the convolution of a gabor filter and the stimulus intensity function when doing single cell recordings from the visual cortex. 2:  We are told that nature utilizes a function that minimizes the uncertainty principle of