Lie Group and Human Visual Modeling

 A Lie group is a differentiable manifold (that means it's smooth).

A Lie algebra is a vector space together with an operation called a Lie bracket, which is an alternating bilinear map that satisfies the Jacobi identity.

A Jacobi identity is a property of a binary operation that describes how the order of evaluation affects the result of the operation.

Both a vector cross product and a lie bracket operation satisfy a Jacobi identity.

Any Lie group gives rise to a Lie algebra, which is it's tangent space at the identity.

Any finite-dimensional Lie algebra over real or complex numbers has a corresponding connected Lie group (unique up to finite coverings).

Ok, that was all very clear, right.

So why do we care about this stuff anyway?

In physics, Lie groups appear as symmetry groups of physical systems, and their Lie algebras (tangent vectors near the identity) may be thought of as infinitesimal symmetry motions. Thus Lie algebras and their representations are used extensively in physics, notably in quantum mechanics and particle physics.

The formalism of Lie group math also seems to be related to notions of invariant transformations, which are things we very much care about in human perception, neural net representations, etc.  And as Max Welling keeps pointing out, it is very interesting that this same math formalism associated with lie groups keeps popping up in different kinds of physical phenomena that seem rather disparate at first (quantum mechanics, human perception, neural nets for object detection, etc).

So obviously we all need to learn more about this rather heady math thing, try to develop more intuitive representations for what it really means.

Onwards then, into the swamp, i mean into the literature.

'Learning the Irreducible representations of Commutative Lie Groups' can be found here.

'Learning Visual Flows: A Lie Algebraic Approach' by Lin, Grimson, Fisher, CSAIL, 1/1/09 can be found here.

They are talking about time based flow, but there is no reason why you can't repurpose what they are talking about to flow fields within a single image.

'B-Spline CNNs on Lie Groups' can be found here.

'A rotation-equivariant convolutional neural network model of primary visual cortex' can be found here.

Nice pdf on gabor filters here.

Nice python opencv based gabor filter tutorial.

'A Sub-Riemannian Model of  the Visual Cortex with Frequency and Phase' you can find here.  And this is where it starts to get very very interesting indeed.

'Once the light reflects from a visual stimulus and arrives at the retina, it evokes some spikes, which are transmitted along the neural pathways to the simple cells in V1. Each simple cell gives a response called a receptive profile to those spikes. In other words, a receptive profile is the impulse response of a simple cell. The simple cells extract the information of local visual features by using their receptive profiles, and it is possible to represent the extracted features mathematically in a higher-dimensional space than in the given two-dimensional image plane. We call this space the lifted space or the lifted geometry. We will use an extended Gabor function as the receptive profile of the simple cells. We will see that this choice naturally induces the corresponding Lie algebra of the sub-Riemannian structure, which is the corresponding lifted geometry to our model. The Lie algebra and its integral curves model neural connectivity between the simple cells. Moreover, since some pairs of the algebra are not commutative, it is possible to formulate an uncertainty principle, and this principle is satisfied by the extended Gabor function. That is, the extended Gabor function minimizes uncertainties arising from simultaneous detection of frequency-phase and simultaneous detection of position-orientation '

'receptive field models consisting of cascades of linear filters and static nonlinearities may be adequate to account for responses to simple stimuli such as gratings and random checkerboards, but their predictions of responses to complicated stimuli (such as natural scenes) are correct only approximately. A variety of mechanisms such as response normalization, gain controls, cross-orientation suppression, and intracortical modulation can intervene to change radically the shape of the profile. Then any static and linear model for the receptive profiles has to be considered just as a very first approximation of the complex behavior of a real dynamic receptive profile, which is not perfectly described by any of the static wavelet frames.'

1: Ok, that is very different than a static fixed CNN architecture. Dynamically steerable filters!  Modulated by reverse connections from higher order areas one would expect.  Cool, file the patent now. Oh, too late, we just mentioned it in the public domain.

'we are interested here in two-dimensional visual perception based on orientation, frequency, and phase-sensitive simple cells. Differently from the case with orientation-scale sensitive simple cells, frequency-phase sensitive simple cells cannot be modeled in a straightforward way by Gaussian derivative functions. A different order Gaussian derivative must be used for the extraction of each frequency component of a given image. This requires the use of different functions, each corresponding to a certain frequency and thus to a certain-order derivative. In other words, the frequency is not a parameter as in the case of scale, but each frequency corresponds to a different function.'

'a Gabor function seems to be a good candidate for the detection of different orientation, frequency, and phase values in a two-dimensional image, since orientation, frequency, and phase are parameters of the Gabor function. In other words, instead of using different functions, we can use a single Gabor function corresponding to a set of parameter values to detect different feature values. In this way, we obtain a sub-Riemannian model geometry as the natural geometry induced directly by the Gabor function (i.e., by the receptive profile itself).'

'studies on group convolutional neural networks (G-CNN) are to be mentioned. A particularly relevant one to SE(2) sub-Riemannian geometry is explained by Bekkers [8]. Those neural networks use several neural layers for extraction and representation of the features necessary to perform proper high-level visual tasks such as object recognition. Feature extraction and the representation of the extracted features take advantage of the lifting of the image to a proper sub-Riemannian geometry (e.g., SE(2) geometry). Differently from the aforementioned approaches using a model function as a receptive profile, G-CNN learns the receptive profile through a feedback mechanism updating an initial arbitrary kernel by comparing the outputs of the whole network with the objects in the input image.'

'Our model is motivated biologically, and it relies on the psychophysically and neurophysiologically relevant cortical architecture. It is a phenomenological model, and it provides a geometrical explanation for the cortical architecture, which is compatible with the architecture. '

1: So they are doing dynamic diffusion in the 'lifted' gabor space, and then reconstructing a generative image form that modified gabor space representation. And they do it in a way where they keep all of the phase info and don't through half of the useful information away like other magnitude only methods would do.

2: What does phase information in the channels tell you about image structure?

3: So on some reflection, one of the key take aways is the notion of modeling the lateral connections in the visual cortex, and what that lateral connectivity leads to.

'At the level of Gestalt organization, the neurogeometrical architecture in SE(2) [25] implements the psychophysical law of good continuation. The architecture in the affine group [92] implements good continuation and ladder (parallel chain of contours). The architecture in the Galilean group [2, 27] implements common fate. Finally, the architecture we consider here in a Gabor-based sub-Riemannian geometry implements similarity between textures/patterns and contains all the previous models employing the neurogeometrical approach.'

'adjacent simple cells in cat V1 that have a common preference for orientation and spatial frequency differ in spatial phase from each other by approximately π/2. This result is in coherence with that the receptive fields are conjugate pairs, that is, one even symmetric pair and one odd symmetric pair located around the same axis. Those experimental results support our choice of Gabor functions in such a way that adjacent simple cells can be interpreted as paired sine and cosine filters or Gabor functions.'


Popular posts from this blog

Simulating the Universe with Machine Learning

CycleGAN: a GAN architecture for learning unpaired image to image transformations

Pix2Pix: a GAN architecture for image to image transformation