The McGurk Effect: How Visual Cues Influence What We Hear
The McGurk effect, named after Harry McGurk and John MacDonald, is a compelling demonstration of how we integrate visual speech information into what we 'hear'. Discovered in 1976, this auditory illusion reveals that our perception of speech is not solely based on auditory input but is also significantly influenced by visual cues.

The effect shows that we can't help but integrate visual speech into what we 'hear'. The syllable that we perceive depends on the strength of the auditory and visual information, and whether some compromise can be achieved. Regardless, integration of the discrepant audiovisual speech syllables is effortless and mandatory.
Understanding the McGurk Effect
To fully grasp the McGurk effect, it's helpful to experience it firsthand. Typically, this involves watching a video of a person articulating one sound while the audio plays a different sound. The result is often a perceived sound that differs from both the visual and auditory inputs.
These stimuli were made by dubbing a single repeated audio syllable onto four different visual syllables. Depending on the audiovisual syllable combination used:
- the visual syllable can override the auditory syllable to determine what we perceive
- the auditory and visual syllables can combine to produce a new perceived syllable
- the auditory syllable can override the visual syllable to determine what we perceive
Our speech function makes use of all types of relevant information, regardless of the modality. In fact, there is some evidence that the brain treats visual speech information as if it is auditory speech.
How General Is the McGurk Effect?
The McGurk effect is surprisingly robust and has been observed across various conditions and populations:
- The effect works on perceivers with all language backgrounds (e.g., Massaro, Cohen, Gesi, Heredia, & Tsuzaki, 1993; Sekiyama. & Tokhura, 1993).
- The effect works on young infants (Rosenblum, Schmuckler, & Johnson, 1997).
- The effect works when the visual and auditory components are from speakers of different genders (Green, Kuhl, Meltzoff, & Stevens, 1991).
- The effect works with highly reduced face images (Rosenblum & Saldaña, 1996).
- The effect works when observers are unaware that they are looking at a face (Rosenblum & Saldaña, 1996).
- The effect works when observers touch-rather than look-at the face (Fowler & Dekle, 1991).
- The effect works less well with vowels than consonants (Summerfield & McGrath, 1984).
- The effect works less well with nonspeech pluck & bow stimuli (Saldaña & Rosenblum, 1994).
- The effect works better with some consonant combinations than others (e.g, McGurk & MacDonald).
Creating Your Own McGurk Effect Demonstration
To produce a 'live' demonstration of the McGurk effect (you'll need two other people besides yourself):
- Have an observer face you and keep looking at your face
- Have another person stand behind you so the observer can't see their face
- Starting synchronously, repeatedly mouth the word 'vase' (silently) while the person behind you repeats the word 'base' outloud -you can acheive synchronization by counting down '3, 2, 1. . vase, vase, vase, etc
- After about 8 repetitions, stop and ask the observer what they 'hear' -they should 'hear' vase
- Now do the same thing, and this time tell the observer to shut their eyes after a few repetitions
- They should hear 'base' with their eyes shut
- The observer can try opening and shutting their eyes, and what they 'hear' should change from 'vase' to 'base'

Tips for making your own McGurk Stimuli:
Audiovisual dubbing can be achieved by using two videotape players or digitizing stimuli onto a computer and using software to mix the audio and video components.
- The qualtiy of the auditory channel should be good, but the quality of the visual channel can be fair without much loss in the effect.
- The auditory and visual components should be synchronized so that the sound of the syllable seems to be coming from the visible mouth.
- However, the components do not have to be perfectly synchronized for the effect to work.
- The syllable combinations used in the above demonstration are known to be especially strong.
The McGurk effect highlights the brain’s reliance on multiple sensory inputs to interpret the world around us. It underscores the intricate interplay between auditory and visual information in speech perception.
Dr. Kevin Franck, director of audiology at Massachusetts Eye and Ear, says it all goes back to the brain’s inherent desire to assign meaning to confusing stimuli, like an ambiguous sound. “Your brain’s just trying to make sense of what’s coming at it,” he says. “It hates it when things don’t make sense.”
In terms of the McGurk effect, the brain relies on the eyes to settle the confusion coming at the ears - particularly when they’re seeing something as clear as a pair of lips distinctly mouthing a certain sound or word. “I have one ambiguous signal, which is the actual thing you hear, but then I say, ‘Oh, I’ve got another input, which is very clear. I trust my eyes now, because my ears aren’t sure,'” Franck explains. “It’s just taking all of these different inputs and making sense out of the mess.”
Once you know what’s going on, Franck says, you’ll likely be able to tell that you’re hearing the same thing in both instances - but left to its own devices, the brain will try to eliminate confusion every time.
“The brain just likes things to be consistent. It wants things to be neat and tidy,” Franck says.