The Ventriloquism Effect: How Vision Captures Sound
The ventriloquism effect is a perceptual phenomenon where a listener misattributes the location of a sound, believing it is coming from a source other than its actual origin. This effect illustrates how visual cues can influence auditory perception, leading people to perceive sounds as coming from a visual stimulus rather than the actual source. It showcases the interplay between different sensory modalities and highlights the brain's tendency to integrate information from various senses to create a coherent perception of reality.
Ventriloquism is the ancient art of making one's voice appear to come from elsewhere, an art exploited by the Greek and Roman oracles, and possibly earlier. We regularly experience the effect when watching television and movies, where the voices seem to emanate from the actors' lips rather than from the actual sound source. The ventriloquism effect describes an illusory phenomenon where the perceived location of an auditory stimulus is pulled toward the location of a visual stimulus.
The classic example of this illusion is a performer on a stage with a puppet sitting on their knee. The performer talks while moving their lips as little as possible, while making much more visible movements with the puppet’s mouth. Ventriloquists use this phenomenon to create an illusion where an inanimate puppet is perceived to speak. Ventriloquists use the expression and suppression of their own and the puppet's mouth movements as well the direction of their respective eye gaze to maximize the illusion.
Originally, ventriloquism was explained by performers projecting sound to their puppets by special techniques, but more recently it is assumed that ventriloquism results from vision "capturing" sound. In psychology, visual capture is the dominance of vision over other sense modalities in creating a percept. In this process, the visual senses influence the other parts of the somatosensory system, to result in a perceived environment that is not congruent with the actual stimuli. Through this phenomenon, the visual system is able to disregard what other information a different sensory system is conveying, and provide a logical explanation for whatever output the environment provides.
When two sensory stimuli are presented simultaneously, vision is capable of dominating and capturing the other. This occurs as visual cues can distract from other sensations, causing the origin of the stimulus to appear as if it is being produced by the visual cue. Therefore, when an individual is in an environment, and multiple stimuli reach the brain at once, there is a hierarchy that vision will guide the rest of the somatosensory cues to be perceived as though they align with the visual experience, despite where their original source may be.
In this study we investigate spatial localization of audio-visual stimuli. When visual localization is good, vision does indeed dominate and capture sound. However, for severely blurred visual stimuli (that are poorly localized), the reverse holds: sound captures vision. For less blurred stimuli, neither sense dominates and perception follows the mean position. Precision of bimodal localization is usually better than either the visual or the auditory unimodal presentation.
Our five main senses-sight, hearing, touch, smell and taste-take cues from our environment via specialized cells, sending a constant stream of signals to the brain. The brain usually does a good job of making sense of this cacophony, sometimes too good. When the brain comes across a “glitch” in the information it’s getting from its surroundings, it can work a bit too hard to make sense of it. Ventriloquism, when we perceive sound coming from an object that is in fact silent, is one example.
The Science Behind the Illusion
“Imagine you hear a loud sound, and at exactly the same time, there is an abrupt appearance of something. Then, automatically-because of the coincidence in time-you would tend to associate these two events as originating from the same cause,” says Salvador Soto-Faraco, ICREA Research Professor at the Universitat Pompeu Fabra in Spain, who researches perception and ventriloquism. In ordinary life, the assumption that sounds and movements go together is a very reasonable one. A loud bang and a flash of light that happen at the same time usually are the consequence of a single event-like a fizzle and spark from a plug or an ambulance’s siren and flashing lights.
“The ventriloquist illusion can be made more or less powerful, depending on the timing between flashing light and sound,” says Soto-Faraco. But in the case of ventriloquism, the stumbling block that results in the illusion comes from the different ways that the senses transmit information. Our sense of hearing, for example, reports information about the location of a sound in a very different way than our sense of sight. For people with good vision, locating an object in space is extremely easy, but trying to pin down exactly where a sound originates from hearing alone-i.e., with eyes closed-is much, much harder.
“Here we have two different sensory modalities giving information about spatial location, but in very different ways,” says Soto-Faraco. “We could think of different currencies. They could not be directly used together. In this conversion calculation, the brain places much more weight on vision than hearing. This is because vision tends to be much more accurate. But this is where the ventriloquist illusion trips the brain up: The sensory information from vision is not reliable. The puppet’s mouth is moving, but there’s no sound coming out. The sense of hearing is overruled by vision.
There is in fact a remarkably simple law for how the brain weighs the information it receives from different sensory sources. It appears to work by a model called “optimal integration theory.” This theory can be described in a mathematical formula for deciding how much reliability to place on any particular bit of sensory information at a given time.
The quirks of the ventriloquism illusion don’t end there. Our brains find ventriloquism so convincing that the sensation of misdirected sound can persist up to half an hour after we stop seeing the trick. Consider the waterfall illusion. “This is explained by neural fatigue. After a while of the neurons for direction of motion being excited, they get tired,” says Soto-Faraco. “So they stop inhibiting the neurons that compete with them. A similar thing happens with the ventriloquism effect. When someone is exposed to a sound and a flash that have different spatial origins for a while, the brain adapts to this difference.
Picture this: First, you see a person at the center of your gaze moving their lips, but the sound is actually coming from your right. Your brain compensates for this, and it seems like the person moving their lips is actually the one speaking. So far, so simple. But then the person in front of you actually starts to speak. Your brain will continue to compensate for the spatial discrepancy. This illusory aftereffect happens only very briefly, so you have to move fast to measure it in the lab, Soto-Faraco says. “It’s also not clear whether this happens because of neural fatigue, like in the waterfall illusion.
Stranger still, scientists have now found that ventriloquism works even when there’s no puppet (or other similar object) involved, a recent study in the journal Psychological Science has found. In the lab, researchers tend to use a slightly more banal set-up than a ventriloquist with a puppet, instead opting for the simpler cues of a tone and a flashing circle. In this latest study, participants were trained to associate a circle on a screen with the tone. Then researchers recreated the ventriloquism effect by shifting the sound to come from a source away from the circle.
“We were surprised to find that the effects on participants’ perception of acoustic space were almost as strong for imagined stimuli as they were for real visual stimuli,” says study author Christopher Berger of the Karolinska Institute in Sweden in a 2018 press release. As well as shedding light on how our brains process sensory information, researchers believe that studying ventriloquism could help in a number of practical fields. Understanding this quirk of perception could help to train brain-computer interfaces, and to help stroke patients regain their neural function, the Swedish researchers hope.
Similar illusions in different senses are of particular interest in developing virtual reality technology. More closely linking the sensory experiences that define a virtual world is key to advancing the sense of how real the virtual world feels. “It’s beautiful but very complex,” says Soto-Faraco. Far from a seemingly trivial party trick, the ventriloquism illusion actually says a lot about the fundamental ways our brains make sense of the sensory information we receive from the world around us.
Research has found that the visual and auditory reflexive spatial orienting are controlled through a common underlying neural substrate. Furthermore, studies have shown that vision has an effect in cognitive neuroscience, and provides for a significant effect when visually attended to.
The thalamus is a section of the brain responsible for relaying sensory and motor signals to the cerebral cortex. As stimuli pass through the thalamus, there are specific regions dedicated to each sense, and therefore is able to sort out the multiple parts of an environment an individual experiences in a given moment. The retina at the back of the eye is what perceives stimuli, allowing them to travel through the occipital tract to the lateral geniculate nucleus (LGN) within the thalamus. The LGN is located near the medial geniculate nucleus (MGN) which is responsible for organizing auditory stimuli after one hears a specific sound. Because these two systems are closely located to each other, research has shown that this might be where vision is responsible for taking over the perception of an environment and resulting in visual capture.
This phenomenon was first demonstrated by Frenchman J. Tastevin in 1937, after studying the tactile Aristotle illusion in 1937. This illusion produces the sensation of touching two objects by crossing one's fingers and then holding a spherical object between them. Attention was again tied to visual cues during an experiment conducted by Michael Posner in 1980.
By indicating visually in which direction a stimulus will appear, response time will improve (decrease) if the correct direction is attended to. (Conversely, if the indicator is misleading, response time increases.) This ability to attend to a specific direction allows for a faster reaction time, despite the participant not physically shifting their visual focus during the pre-stimulus indicator. The evidence that vision has an impact on reaction time demonstrates that vision has a neurological effect on the attentional process.
A number of studies have demonstrated the visual capture effect. Another example of visual capture comes from Ehrsson, Spense, & Passingham (2004) who used a rubber hand to prove that vision is capable of determining how other senses react. As participants watched a rubber hand be stroked, their hand was also stroked in a similar fashion, allowing the individual to attribute their own sensation to what they were watching rather than what was happening to their own body. Therefore, when the rubber hand was then manipulated, for example hitting it with a hammer, the participant feels an immediate shock and pain as they fear that it is their own hand that is in danger.
A study by Remington, Johnston, & Yantis (1992) found that attention is involuntarily drawn away from a given task when a visual stimulus interferes. In this study, participants were presented with four boxes; they were told that an image would precede a letter that they were to memorize. The conditions were either to attend to the same box, a different one, all four, or to focus on the center. However, even though they were told to not attend to a certain box, the participant was consistently drawn to the image before the letter in all cases, resulting in a longer response time in all conditions except for the same.
The research in visual capture does not all work in the favor of vision being constantly dominant, as Shams, Kamitani, & Shimojo in 2000 found that visual illusion can be induced by sound in a controlled environment. When a flash of light is accompanied by a series of auditory beeps, the results show that the participant views the flash to be a series of flashes corresponding with the beeps.
An example of visual capture experienced in daily life is the ventriloquism effect. This is when ventriloquists make their speech appear to be coming from their puppet rather than their own mouths. Another popular example of visual capture happens while watching a movie in a theater, and the sound appears to be coming from the actors lips.
There is also a phenomenon known that while crossing a street, an individual can hear the sound of an oncoming car. However, when they look to the left the next car is a few blocks away so it is safe to cross. But when they look to the right, there is a car that is passing them that they did not even notice before. This occurs because the individual attributes the sound of oncoming traffic to the first car because they were unaware of the other, closer car.
A phantom limb is the sensation that an amputated limb is still attached. This can cause pain and distress amongst many amputees, and was thought to be incurable. However, in 1998, Vilayanur S. Ramachandran created a mirror box, which allows for an amputee to place their intact limb on one side of the box, and observe their amputated limb by looking at the mirror image of their actual limb. Through visual capture, the visual system is able to override the somatosensory system and send feedback to the brain that the arm is actually okay and not in any specific pain.
The McGurk effect is a phenomenon that occurs when the reception of an auditory stimulus is determined by the visual system. For example, when the syllable “ba” is repeated over and over, and one sees an individual saying this, then the individual is perceived to be saying “ba”. However, when the same audio is played over a person saying the word “fa”, the fact that the utterance is completely forgotten, and the person will hear the word “fa”. This is once again because vision is able to dominate the auditory system and produce a response that is guided strictly by vision.
Understanding visual capture has the potential to lead to numerous benefits in the future. Beyond solving people's pain in phantom limb syndrome, there are numerous potential applications for visual capture. Already, there have been surround-sound systems built to provide unique listening experiences, that “put you right in the middle of the action”.
Factors Influencing the Ventriloquism Effect
Timing and spatial alignment play crucial roles in enhancing the ventriloquism effect. If an auditory stimulus reaches the listener's ears after the corresponding visual cue, this can create confusion about where the sound originates. Additionally, when the auditory source is physically aligned with the visual source in space, it strengthens the illusion that they are linked. Effective manipulation of these factors can lead to stronger instances of misattribution, highlighting how sensitive our perceptual systems are to variations in sensory inputs.
Impact of Direct Eye Gaze
While the puppet's often exaggerated mouth movements have been demonstrated to enhance the ventriloquism effect, the contribution of direct eye gaze remains unknown. In Experiment 1, participants viewed an image of a person's face while hearing a temporally synchronous recording of a voice originating from different locations on the azimuthal plane. The eyes of the facial stimuli were either looking directly at participants or were closed. Participants were more likely to misperceive the location of a range of voice locations as coming from a central position when the eye gaze of the facial stimuli were directed toward them. Thus, direct gaze enhances the ventriloquist effect by attracting participants' perception of the voice locations toward the location of the face. In an exploratory analysis, we furthermore found no evidence for an other-race effect between White vs Asian listeners. In Experiment 2, we replicated the effect of direct eye gaze on the ventriloquism effect, also showing that faces per se attract perceived sound locations compared with audio-only sound localization.



Applications in Virtual Reality and Audio Design
The ventriloquism effect has significant implications for virtual reality (VR) environments, particularly in designing immersive user experiences. By understanding how visual and auditory stimuli interact, designers can create more convincing and engaging VR worlds. For example, accurate spatial audio that aligns with visual cues enhances realism, allowing users to feel more present within virtual spaces.
Key Facts About the Ventriloquism Effect
- The ventriloquism effect is often demonstrated in experiments where participants hear a sound that appears to come from a location different from where it is actually produced.
- This effect can occur in everyday situations, such as when watching television or movies, where the voices of characters seem to come directly from their on-screen mouths.
- The ventriloquism effect illustrates the brain's reliance on visual input to interpret auditory signals, especially in ambiguous situations.
- Factors such as the timing and spatial alignment of visual and auditory stimuli significantly impact the strength of the ventriloquism effect.
- This effect has practical applications in areas like virtual reality and audio design, where creating convincing spatial audio experiences relies on manipulating perceptual cues.