Posts

Showing posts from June, 2021

HTC Seminar Series #36: Geometric Deep Learning: The Erlangen Programme of ML

Image
 This is the ICLR 2021 keynote presentation by Michael Bronstein (who always gives really great talks). In mathematics, symmetry was crucial in the foundation of geometry as we know it in the 19th century. Now it could have a similar impact on another emerging field. Deep Learning success in recent decades is significant – from revolutionising data science to landmark achievements in computer vision, board games, and protein folding. At the same time, a lack of unifying principles makes it is difficult to understand the relations between different neural network architectures resulting in the reinvention and re-branding of the same concepts.  Michael Bronstein is a professor at Imperial College London and Head of Graph ML Research at Twitter, who is working to bring geometric unification of deep learning through the lens of symmetry. In his ICLR 2021 keynote lecture, he presents a common mathematical framework to study the most successful network architectures, giving a constructive pr

Self-Supervised Vision Models

Image
Dr. Ishan Misra is a Research Scientist at Facebook AI Research where he works on Computer Vision and Machine Learning. His main research interest is reducing the need for human supervision, and indeed, human knowledge in visual learning systems. Today though we will be focusing an exciting cluster of recent papers around unsupervised representation learning for computer vision released from FAIR. These are; DINO: Emerging Properties in Self-Supervised Vision Transformers, BARLOW TWINS: Self-Supervised Learning via Redundancy Reduction and PAWS: Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples. All of these papers are hot off the press, just being officially released in the last month or so. Observations 1:  What the moderator said about a cartoon banana and real banana image not being on a perceptual manifold seems totally wrong.  The visual system does see them as being visually similar.

Physical Simulation and AI: Differentiability, Productivity, Performance, and the Taichi Programming Language

Image
The Taichi programming language is pretty slick. Here's a quick intro to the Taichi programming language. Here's a more recent talk on using Taichi for differentiable physical simulation. It seems like Taichi takes some of the Halide ideas for speeding up imaging code and extends them into sparse representations and differentiability. Here's the ''DiffTaichi: Differentiable Programming for Physical Simulation' paper .

Animating Pictures with Eulerian Motion Fields

Image
Cool approach to automatically converting a still image into a realistic animating loop video. Our method relies on the observation that this type of natural motion can be convincingly reproduced from a static Eulerian motion description, i.e. a single, temporally constant flow field that defines the immediate motion of a particle at a given 2D location. We use an image-to-image translation network to encode motion priors of natural scenes collected from online videos, so that for a new photo, we can synthesize a corresponding motion field. The image is then animated using the generated motion through a deep warping technique: pixels are encoded as deep features, those features are warped via Eulerian motion, and the resulting warped feature maps are decoded as images. In order to produce continuous, seamlessly looping video textures, we propose a novel video looping technique that flows features both forward and backward in time and then blends the results. We demonstrate the effectiv

Siggraph 2021Technical Paper Preview Trailer

Image
 Siggraph 2021 is being held virtually in August.  This video shows off a few previews of this years technical papers. This preview focuses on learning skeletal articulations with neural blend shapes. This preview focuses on a compiler for quantized simulations.

Halide Language for fast portable Computation on Images and Tensors

Image
Seems like a good time to take another look at Halide.  Their claims about building neural nets using Halide are fascinating.  I guess we should ask why more people don't use it?  I don't know, i'm just asking.  The new differentiable Halide is also very interesting. But let's get started at the beginning.  What is Halide, and how does it work?  Let's check it out.

Neural Rendering CVPR 2020 - morning session

Image
 I've been working through this great all day tutorial on neural rendering from CVPR 2020.  Somewhat of a slog, but worth it, since this is a fast moving field of research, and it is worth getting a full background dump on all of the different approaches from a year ago.

Acorn: Adaptive Coordinate networks for Neural Scene Representation

Image
Continuing our recent  implicit neural representation focus. This presentation is from a new paper that will be presented at Siggraph 2021 later this summer.  It proposed a hybrid implicit-explicit neural representation.  Pretty slick. The paper 'ACORN: Adaptive Coordinate Networks for Neural Scene Representation' can be found  here . The project page is here . The github page is here (coming soon).

Implicit Neural Representations with Periodic Activation Functions

Image
 Continuing our recent exploration of implicit functional approximations created using neural nets.  First the quick overview of the paper. Here's a longer talk on the same material. Here's the project page for 'Implicit Neural Representations with periodic Activation Functions'.

Fourier Feature Let Networks Learn High Frequency Functions in Low Dimensional Domains

Image
 This is a NeurlIPS 2020 talk by Matthew Tancik titled 'Fourier Feature Let Networks Learn High Frequency Functions in Low Dimensional Domains'.   We're been discussing this approach to improving the representation of high frequency information in neural nets in the recent NeRF related HTC posts .  And it seems to be related to using the Fourier transform as an alternative to attention in Transformer architectures as well. The 'Learned Initializations for optimizing Coordinate-Based Neural Representation' paper is here .  The project page is here . The project page for 'Fourier Features let Networks Learn High Frequency Functions in Low Dimensial Domains' paper is here . Ben Mildenhall's github page is here . Observations: 1:  Isn't the Fourier feature trick restructuring the problem as a certain kind of prior defined wavelet basis?  As opposed to the wavelet basis you get when you just use the raw fully connected model.

Generative Models as Distributions of Functions

Image
 Cool new paper on rethinking generative models as building a resolution independent functional representation.  So your representation is scale independent. You can use the same architecture for generating images or 3D data or audio data. They also use the same trick described in yesterday's NeRF post where they use random Fourier features to enable the model to represent high frequency data effectively. Here's a link to the paper 'Generative models as Distributions of Functions'. Here's a link to the PyTorch code for the paper. JaeJun Yoo put together a nice set of slides for a talk that covers this material here .

Understanding and Extending Neural Radiance Fields

Image
Neural Radiance Fields (Mildenhall, Srinivasan, Tancik, et al., ECCV 2020) are an effective and simple technique for synthesizing photorealistic novel views of complex scenes by optimizing an underlying continuous volumetric radiance field, parameterized by a (non-convolutional) neural network. This talk reviews NeRF, explains why it works, and then introduces some additional research work related to it. Here's a link to the 'NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis' paper. Here's a link to the 'Fourier Features let Networks Learn High Frequency Function sin Low Dimensional Domains' paper. Here's a link to the 'NeRF++: Analyzing and Improving Neural Radiance Fields' paper. Here's a link to a paper that renders NeRFs in real time using PlenOctrees 'PlenOctrees for Real-time Rendering of neural Radiance Fields'. We previously covered Yannic Kilcher's analysis of the NeRF paper here .

3D Vision for Virtual Video Production

Image
 This is a visual podcast with Vikas Reddy (co-founder of Occipital) interviewed by Satya Mallick and Phil Nelson of OpenCV.  They cover Vikas's history with founding various computer vision companies, and his latest venture Lighttwist.