Humans use vision as well as hearing to understand speech. There is no better example of this than the McGurk effect.
Imagine a recording of a person saying “bah” over and over again. What sound are you going to hear? “Bah”, obviously.
However, if this sound is accompanied by a video of this person saying “vah” you will start hearing “vah” or “fah” instead, even though the audio has not changed. Just have a look at the video:
Because the combination of sound and vision happens so early in the process of speech recognition the illusion persists even after you know how it works. The brain cannot differentiate whether it’s hearing or seeing the sound as usually these senses subtly work together to give us the ability to comprehend what other people are saying.
It’s quite interesting that most people are really quite bad at lip-reading, yet most people are easily fooled by the McGurk effect.
Also quite interesting is that the McGurk effect seems to have a different intensity to people who speak different languages. For example, to Italian, German and English speakers the effect is very strong. To Chinese and Japanese speakers the effect is quite weak. This is likely due to the different syllabic structure and tone of Chinese and Japanese (compared to indo-european languages) and possibly due to the culture of face avoidance in Japan (for example, children are instructed to look at a teacher’s adam’s apple or tie knot instead of the eyes).
References and further reading
- Sekiyama, K. (1997) – “Cultural and linguistic factors in audiovisual speech processing: The McGurk effect in Chinese subjects” – Perception and Psychophysics
- Hisanaga, S., Sekiyama, K., Igasaki, T. & Murayama, N. (2009) – “Audiovisual speech perception in Japanese and English: Inter-language differences examined by event-related potentials” –
- Rosenblum, L.D. (2010) – See what I’m saying: The extraordinary powers of our five senses – ISBN 978-0393339376
- Wikipedia page on the McGurk effect