Cortical Processing Explained: A Deep Dive into the Visual System
The visual system stands out because a significant portion of visual processing happens outside the brain, specifically within the retina of the eye. The light-sensitive receptors in the eye convert images projected onto the retina into spatially distributed neural activity in the first neurons of the visual pathway. Within the retina, receptors form synapses with bipolar and horizontal cells, establishing the foundation for brightness and color contrasts.
Bipolar cells, the secondary visual afferents, then synapse with retinal ganglion cells and amacrine cells. These interactions enhance contrast effects, which are crucial for form vision and establishing the basis for movement detection. The information gathered is then transmitted from the eye via the axons of the retinal ganglion cells, which are the tertiary visual afferents, to the midbrain and diencephalon.
As with other sensory information, visual data must reach the cerebral cortex to be perceived. With one exception, this information reaches the cortex via the thalamus.
The axons of the retinal ganglion cells (3° visual afferents) form the optic nerve fiber layer of the retina on their course to the optic disc. At the optic disc, the 3° visual afferents exit the eye and form the optic nerve. The fibers of the optic nerve that originate from ganglion cells in the nasal half of the retina (i.e., the nasal hemiretina) decussate in the optic chiasm to the opposite optic tract.
Consequently, each optic tract contains retinal ganglion cell axons that originate in the nasal half of the contralateral retina and the temporal half of the ipsilateral retina. Recall that the ipsilateral temporal hemiretina and the contralateral nasal hemiretina have projected on them the images of corresponding halves of their visual fields.
For example, the temporal (left) hemiretina of left eye and the nasal (left) hemiretina of right eye both have projected on them the right halves of their respective visual fields.
The termination sites of the retinal ganglion cell axons in three nuclei that are not considered a part of the visual pathway are also illustrated. The suprachiasmatic nucleus of the hypothalamus - for control of diurnal rhythms and hormonal changes.

The visual pathway from the eyes to the brain.
Lateral Geniculate Nucleus (LGN)
The vast majority of optic tract fibers terminate on neurons in the lateral geniculate nucleus (LGN) of the thalamus. Like the retina, the lateral geniculate nucleus is a laminated structure, in this case, with six principal layers of cells. Thin layers of the smallest cells (i.e., the koniocellular neurons) are interposed between these principal layers.
The optic tract fibers (3° visual afferents) from each eye synapse in different layers of the LGN. Consequently, each LGN neuron responds to stimulation of one eye only.
The functional properties of LGN neurons are similar to those of retinal ganglion cells. The LGN neurons are monocular (i.e., respond to stimulation of one eye only) and have concentric (center-surround) receptive fields.
Primary Visual Cortex (V1)
The LGN neurons (4° visual afferents) send their axons in the internal capsule to the occipital lobe where they terminate in the striate cortex. The primary visual cortical receiving area is in the occipital lobe. The primary visual cortex is characterized by a unique layered appearance in Nissl stained tissue.
Nearly the entire caudal half of the cerebral cortex is dedicated to processing visual information.

The cerebral lobes of the brain.
Consequently, it is called the striate cortex. It includes the calcarine cortex, which straddles the calcarine fissure, and extends around the occipital pole to include the lateral aspect of the caudal occipital lobe.
The LGN axons fan out as the optic radiations of the internal capsule and travel through the temporal, parietal and occipital lobes. The LGN axons in the sublenticular segment of the optic radiations pass below the lenticular nuclei, loop around the inferior horn of the lateral ventricle within the temporal lobe and swing posteriorly to form Meyer’s loop.
Once around the inferior horn, they travel up to the inferior bank of the striate cortex, where they terminate. The LGN axons in the retrolenticular segment of the internal capsule pass superiorly through the parietal lobe to end in the superior bank of the striate cortex.
Electrical stimulation of V1 elicits visual sensations. The color (kLGN), shape (pLGN) and movement (mLGN) information from the thalamus are sent to different neurons within V1 for further processing in V1 and then sent onto different areas of the extrastriate visual cortex.
V1 Blob Cells
Some V1 cells resemble kLGN neurons:
- monocular (i.e., respond to stimulation of one eye only).
- color sensitive.
- characterized by small, concentric receptive fields.
- found in clusters (.e., blob cells).
- a special target of the kLGN axon terminals.
The P-stream information processed by the V1 blob cells is used in color perception, color discrimination and the learning and memory of the color of objects.
V1 Interblob Cells
V1 interblob cells:
- binocular (i.e., respond to stimulation of either eye).
- not color sensitive.
- characterized by elongated (rectangular-shaped) receptive fields that may or may not have a center-surround type organization.
- found around the clusters of color-sensitive V1 blob cells.
- exhibit ocular dominance (i.e., respond best to stimulation of a preferred eye).
- exhibit orientation specificity (i.e., respond best when the stimulus is oriented in a particular plane).
One subset of V1 interblob cells responds best when the stimulus is in a specific location of the receptive field (i.e., they also exhibit location specificity). The P-stream information processed by the V1 interblob cells that exhibit orientation and location specificity but are not motion sensitive is used in object perception, discrimination, learning and memory or in spatial orientation.
These interblob cells are the "shape/form" processing cells and the "location" processing cells of V1.
Movement Sensitive V1 Interblob Cells
A second subset of interblob cells respond best to moving stimuli (i.e., exhibit movement sensitivity) without a preference for the direction of movement.
Direction Specific V1 Interblob Cells
A third subset displays a preference for movement in a particular direction (i.e., some also exhibit directional sensitivity). The M-stream of information processed by the motion sensitive V1 interblob cells is used to detect object movement and direction/velocity of movement and to guide eye movements.
Extrastriate Visual Cortex
The extrastriate cortex includes all of the occipital lobe areas surrounding the primary visual cortex. The extrastriate cortex in non-human primates has been subdivided into as many as three functional areas, V2, V3, and V4.
The primary visual cortex, V1, sends input to extrastriate cortex and to visual association cortex. The information from the “color”, “shape/form”, "location" and “motion” detecting V1, neurons are sent to different areas of the extrastriate cortex.
The flow of visual information from the primary visual cortex to other cortical areas depends on the type of information being processed. Information used to locate objects and detect their motion is sent to more superior cortex (a.k.a. the dorsal stream).
Information used to identify objects, their color and form is sent to more inferior cortex (a.k.a. the ventral stream).
Visual Association Cortex
The visual association cortex extends anteriorly from the extrastriate cortex to encompass adjacent areas of the posterior parietal lobe and much of the posterior temporal lobe. In most cases, these areas receive visual input via the extrastriate cortex, which sends color, shape/form, location and motion information to different areas of the visual association cortex.
The topographic (spatial) relationships of retinal neurons are maintained throughout the visual system, which preserves the retinotopic map of the visual world.
That is, the retina is mapped onto the LGN and striate cortex in an organized (topographic) fashion. Consequently, neighboring parts of retina project to neighboring parts of LGN and neighboring parts of LGN project to neighboring parts of the striate cortex.
You should recall the following regarding the spatial representation of the retinal image within the visual pathway.
- The optic image on the retina is upside-down and left-right reversed.
- The monocular visual fields of the two eyes overlap partially to form the binocular visual field .
- The temporal hemiretina of one eye and the nasal hemiretina of the other eye have projected on them the images of corresponding halves of their visual fields.
- For example, the temporal (left) hemiretina of left eye and the nasal (left) hemiretina of right eye both have projected on them the right half of the visual fields of each eye.
- Beyond the optic chiasm, the corresponding visual hemifields of the two eyes are represented in the contralateral side of the visual pathway.
- For example, the left hemifield of both eyes are represented in the right optic tract, right lateral geniculate nucleus, right optic radiations and right striate cortex.
- The fibers of the optic radiation fan out into the temporal, parietal and occipital lobes on their course to the striate cortex.
- Those forming the sublenticular optic radiations carry information about the superior hemifield, whereas those forming the retrolenticular optic radiations carry information about the inferior hemifield.
- The optic radiation fibers traveling the most direct course back to the striate cortex carry information about the central visual field.
- There are many more receptor cells in the fovea and many more bipolar and ganglion cells in the macula than in the periphery of the retina.
Consequently, the central visual field is disproportionately represented in the visual system. That is, more visual receptors, more optic nerve fibers and more LGN and cortical neurons are involved in processing and carrying information about that portion of the retinal image representing the center of the visual field.
Dorsal and Ventral Streams
The neurons in the parietal association cortex and superior and middle temporal visual association cortex have binocular receptive fields and process P-channel information about object location and M-channel information about object movement. The dorsal stream processes information about the “where” of the visual stimulus.
Damage the dorsal visual association cortex results in deficits in spatial orientation, motion detection and in guidance of visual tracking eye movements.
The neurons in the inferior temporal visual association cortex process P-channel information about object color and form. This ventral stream processes information about the “what” of the visual stimulus.
Damage to the inferior visual association cortex produces deficits in complex visual perception tasks, attention and learning/memory.

The dorsal and ventral streams of visual processing.
Visual Field Defects
Visual field defects are areas of loss of vision in the visual field. Visual field defects are detected by perimetry testing, during which the patient fixates his eyes on a target and his ability to detect a small object in specific positions in space is determined.
A visual field defect provides clues to the structure(s) affected. That is, the area(s) of visual field loss and eye(s) exhibiting the visual field loss offer clues about the site of the damage.
Ophthalmoscope examination of the fundus detects an abnormality in the nasal hemiretina in the left eye of a diabetic patient. Notice that the fundus of the patient's left eye appears to the right, just as it appears on the right side of the physician viewing the fundus.
The patient is having his semiannual physical examination. As he is diabetic, the physician examines his retinas and performs a confrontation test of his visual fields. An abnormality is detected in his left fundus but the confrontational field test detects nothing. Perimetry testing is requested.
The results indicate the right eye's visual field is normal and that there is peripheral a scotoma (i.e., loss of vision that does not follow the boundaries of the visual field quadrants) in the left eye's temporal hemifield.
A Unified Theory of Cortical Function
A unified theory of cortical function is proposed for guiding both neuroscience and artificial intelligence research. The theory offers an empirically testable framework for understanding how the brain accomplishes three key functions:
- Inference: Perception is nonconvex optimization that combines sensory input with prior expectation.
- Exploration: Inference relies on neural response variability to explore different possible interpretations.
- Prediction: Inference includes making predictions over a hierarchy of timescales.
Most models of sensory processing in the brain have a feedforward architecture in which each stage comprises simple linear filtering operations and nonlinearities. Models of this form have been used to explain a wide range of neurophysiological and psychophysical data, and many recent successes in artificial intelligence (with deep convolutional neural nets) are based on this architecture.
However, neocortex is not a feedforward architecture. This paper proposes a first step toward an alternative computational framework in which neural activity in each brain area depends on a combination of feedforward drive (bottom-up from the previous processing stage), feedback drive (top-down context from the next stage), and prior drive (expectation).
The relative contributions of feedforward drive, feedback drive, and prior drive are controlled by a handful of state parameters, which I hypothesize correspond to neuromodulators and oscillatory activity. In some states, neural responses are dominated by the feedforward drive and the theory is identical to a conventional feedforward model, thereby preserving all of the desirable features of those models.
In other states, the theory is a generative model that constructs a sensory representation from an abstract representation, like memory recall. In still other states, the theory combines prior expectation with sensory input, explores different possible perceptual interpretations of ambiguous sensory inputs, and predicts forward in time.
Sensory stimuli are inherently ambiguous so there are multiple (often infinite) possible interpretations of a sensory stimulus. People usually report a single interpretation, based on priors and expectations that have been learned through development and/or instantiated through evolution.
For example, the image in Fig. 1A is unrecognizable if you have never seen it before. However, it is readily identifiable once you have been told that it is an image of a Dalmatian sniffing the ground near the base of a tree.
Our brains explore alternative possible interpretations of a sensory stimulus, in an attempt to find an interpretation that best explains the sensory stimulus. This process of exploration happens unconsciously but can be revealed by multistable sensory stimuli (e.g., Fig. 1B), for which one’s percept changes over time.
Other examples of bistable or multistable perceptual phenomena include binocular rivalry, motion-induced blindness, the Necker cube, and Rubin’s face/vase figure (6). Models of perceptual multistability posit that variability of neural activity contributes to the process of exploring different possible interpretations (e.g., refs. 7-9), and empirical results support the idea that perception is a form of probabilistic sampling from a statistical distribution of possible percepts (9, 10).
This noise-driven process of exploration is presumably always taking place. We experience a stable percept most of the time because there is a single interpretation that is best (a global minimum) with respect to the sensory input and the prior.
Prediction, along with inference and exploration, may be a third general principle of cortical function. Information processing in the brain is dynamic. Visual perception, for example, occurs in both space and time. Visual signals from the environment enter our eyes as a continuous stream of information, which the brain must process in an ongoing, dynamic way.
How we perceive each stimulus depends on preceding stimuli and impacts our processing of subsequent stimuli. Most computational models of vision are, however, static; they deal with stimuli that are isolated in time or at best with instantaneous changes in a stimulus (e.g., motion velocity).
Dynamic and predictive processing is needed to control behavior in sync with or in advance of changes in the environment. Without prediction, behavioral responses to environmental events will always be too late because of the lag or latency in sensory and motor processing.
Prediction is a key component of theories of motor control and in explanations of how an organism discounts sensory input caused by its own behavior (e.g., refs. 13-15). Prediction has also been hypothesized to be essential in sensory and perceptual processing (16-18).
The neocortex accomplishes these functions (inference, exploration, prediction) using a modular design with modular circuits and modular computations. Anatomical evidence suggests the existence of canonical microcircuits that are replicated across cortical areas (24, 25). It has been hypothesized, consequently, that the brain relies on a set of canonical neural computations, repeating them across brain regions and modalities to apply similar operations of the same form, hierarchically (e.g., refs. 26 and 27).
Most models of sensory processing in the brain, and many artificial neural nets (called deep convolutional neural nets), have a feedforward architecture in which each stage comprises a bank of linear filters followed by an output nonlinearity.
These hierarchical, feedforward processing models have served us well. Models of this form have been used to explain a wide range of neurophysiological and psychophysical data, and many recent successes in artificial intelligence are based on this architecture. However, neocortex is not a feedforward architecture.
Perceptual phenomena also suggest a role for feedback in cortical processing. For example, memory contributes to what we perceive. Take another look at the Dalmatian image (Fig. 1A); then close your eyes and try to visualize the image. This form of memory recall (called visual imagery or mental imagery) generates patterns of activity in visual cortex that are similar to sensory stimulation (e.g., ref. 29).
This paper represents an attempt toward developing a unified theory of cortical function, an empirically testable computational framework for guiding both neuroscience research and the design of machine-learning algorithms with artificial neural networks. It is a conceptual theory that characterizes computations and algorithms, not the underlying circuit, cellular, molecular, and biophysical mechanisms.
According to the theory, neural activity in each brain area depends on feedforward drive (bottom-up from a previous stage in the processing hierarchy), feedback drive (top-down context from a subsequent processing stage), and prior drive (expectation). The relative contributions of feedforward drive, feedback drive, and prior drive are controlled by a handful of state parameters.
The theory makes explicit how information is processed continuously through time to perform inference, exploration, and prediction. In a typical feedforward model of visual processing, the underlying selectivity of each neuron is hypothesized to depend on a weighted sum of its inputs, followed by an output nonlinearity.
The weights (which can be positive or negative) differ across neurons conferring preferences for different stimulus features. For neurons in primary visual cortex (V1), for example, the choice of weights determines the neuron’s selectivity for orientation, spatial frequency, binocular disparity (by including inputs from both eyes), etc.
Taken together, neurons that have the same weights, but shifted to different spatial locations, are called a “channel” (also called a “feature map” in the neural net literature). The responses of all of the neurons in a channel are computed as a convolution over space (i.e., weighted sums at each spatial position) with spatial arrays of inputs from channels in the previous stage in the processing hierarchy, followed by the output nonlinearity.
| Drive Type | Description | Source |
|---|---|---|
| Feedforward Drive | Bottom-up input from the previous processing stage. | Previous layer in the hierarchy |
| Feedback Drive | Top-down context from the next processing stage. | Subsequent layer in the hierarchy |
| Prior Drive | Expectation based on memory or prediction. | Memory or predictive models |
Neurons in each successive stage of visual processing have been proposed to perform the same computations. According to this idea, each layer 2 neuron computes a weighted sum of the responses of a subpopulation of layer 1 neurons, and then the response of each layer 2 neuron is a nonlinear function of the weighted sum.
Here, I take a different approach from the feedforward processing model, and instead propose a recurrent network. Similar to the feedforward network, there is again a hierarchy of processing stages, each comprising a number of channels. Also similar to the feedforward network, all neurons in a channel perform the same computation, with shifted copies of the same weights, and an output nonlinearity.
However, in addition, the network includes a feedback connection for every feedforward connection. Each neuron also has another input that I call a prior, which can be either prespecified or computed recursively. The response of each neuron is updated over time by summing contributions from the three inputs: feedforward drive, feedback drive, and prior drive.
Each neuron also provides two outputs: feedforward drive to the next layer, and feedback drive to the previous layer. Each neuron performs this computation locally, based on its inputs at each instant in time. However, the responses of the full population of neurons (across all channels and all layers) converge to minimize a global optimization criterion, which I call an energy function.
According to this equation, neural responses are updated over time because of a combination of feedforward drive f, feedback drive b, and prior drive p. The first term f is the same feedforward drive as above, and the third term p is the same prior drive as above. The middle term b, the feedback drive, is new.