The Biophysics of Visual Edge Detection

Here's a question for you. 

Is the strong visual perception of edges in images a function of the correlated response of multiple spatial frequency channels all being activated simultaneously?

With that question for the ages in mind, let's dive into a current review of the basic principals of visual perception.  And the peer review article from 2020 title 'The Biophysics of Visual Edge Detection: A Review of Basic Principals', which you can find here.

An it's nice to find a great review article on this topic from within the last year, as opposed to something from Campbell and Robson, or Hubel and Wiesel.

And what do we learn.

1:  A gabor filter is the inner product of a gaussian distribution and the wave function.  And we see the fourier transform of the convolution of a gabor filter and the stimulus intensity function when doing single cell recordings from the visual cortex.

2:  We are told that nature utilizes a function that minimizes the uncertainty principle of signal extraction.

3:  The transition from the Hubel and Wiesel way of viewing the world to the Campbell and Robson way of viewing the world was a transition from feature detectors of bars and edges (with various orientations an widths) to fourier analyzers of spatial frequency-tuned channels (with a wide bandwidth of 1.6 octaves).

4: Both of these early models fall short in explaining the perception of a visual scene, and are called primal models.

A good model needs to convolve all of the 'features of a scene', contrast, texture, orientation, color, reflectance.  We also need to ideally deal with time.

5: The bell-shaped curve of contrast sensitivity (Campbell-Robson).  Used by every compression and halftoning algorithm for images viewed by humans.

6: The visual system consists of different channels having a specific bandwidth. And the selective activation of an ensemble of neurons associated with a channel leads to a perceived visual construct.

7: The transition from the fourier domain way of thinking to the wavelet transform way of thinking.

8: In nature, physical variables come in conjugate or fourier pairs (acoustics, quantum mechanics, vision).

9: Some spatial frequencies have more cells devoted to them than others.  This maps to the contrast sensitivity function discussed above.

10: We define an intensity change by taking the second derivative of the convolution of the stimulus intensity and the function of a group of neurons.  Specifically, we are detecting the zero-crossing, which correspond to the maximum change of intensity (boundary, edge, contour).

11:  How are we able to perceive visual objects even when there is variance in the image of them (noise distortion, scale and other spatial variance)?  Look to the mathematics of symmetry groups, specifically Lie groups.

Gabor wavelets are a subgroup of the canonical coherent states relate to the Weyl group.

Lie groups are ideal for preserving the geometry of a perceived object (preserves it's shape).

A Lie group is a continuous group that is a differentiable manifold. This means one can perform calculus on the group and Lie groups provide the natural environment for continuous symmetry operations.

12:  A visual percept is constructed by a pattern of excitation of an ensemble of neurons. This is obtained by convolving the stimulus function of an object with the receptive field of the corresponding neurons. The receptive field is a plane wave due to neuronal excitation. The convolution of a plane wave with the Gaussian distribution is the Gabor transform. The Gabor transforms form an affine group with affine displacements. They form a Weyl group with rotations, translations, and dilations.

The elementary Gabor signals have an odd (anti-symmetric) and an even (symmetric) component.

13:  The synapto-dendroentric web is a cortical anatomic substrate where these wavelets are formed and interact.  Different neuronal ensembles overlap in this assembly of interfering complex plane waves.

Cool.  So did we answer our question?

Well what did we learn? 

Let us read the summary of findings from the paper below.

Visual analysis involves feature extraction, with edge detection being a fundamental process. Edge detection involves analyzing discontinuities in an image and a change in contrast to the background. From edges or boundaries, other features, such as the perimeter, the area, and the contour or shape of an object, can be defined. Change means mathematical differentiation and involves first and second-order derivatives. We demonstrated how the Laplacian of the Gaussian (LOG), where the second derivative of the convolution of the Gaussian kernel and stimulus intensity was derived and is set to zero, reflects the maximum change in intensity at the zero-crossings. This is known as LOG filtering and the result is a Mexican hat function or operator. The image obtained is a binary image where the edges are defined as the zero-crossings between the background and foreground. The Gabor transform is a simulacrum of the cortical visual system. We defined it as the convolution of a Gaussian kernel and a complex planar wave in a spatial domain. Gabor filters can be derived from a mother wavelet by dilation, scaling, or rotation. A large complement of Gabor filters of various scalar values and orientation are convolved with the stimulus function to obtain a preliminary image.

Questions for the reader

1:  How much of the summary above is influenced by the perceptual prior we all had coming into this article?

2:  What if anything are we missing because of our preconceived notion prior about how this all works?

3:  How does this information explain how GANs work?  What is it telling us about the nature of the manifold representation needed to be learned by a generative deep learning system?

4:  How can we use this information to throw out unnecessary representational dimensions in our neural net universal function approximator system?  The one we use to learn the statistic distributions of collections of images from the modeling data we train it on


Popular posts from this blog

Simulating the Universe with Machine Learning

CycleGAN: a GAN architecture for learning unpaired image to image transformations

Pix2Pix: a GAN architecture for image to image transformation