Psychoacoustics Research News: Exploring Sound Perception and Beyond
The human auditory system is a marvel of biology. So far, even the most sophisticated computational models cannot perform such tasks as well as the human auditory system, but MIT neuroscientist Josh McDermott hopes to change that.

“Our long-term goal is to build good predictive models of the auditory system,” McDermott says.
MIT's Research on Auditory Perception
One aspect of audition that McDermott’s lab focuses on is “auditory scene analysis,” which includes tasks such as inferring what events in the environment caused a particular sound, and determining where a particular sound came from. This requires the ability to disentangle sounds produced by different events or objects, and the ability to tease out the effects of the environment.
“Sounds in the world have very particular properties, due to physics and the way that the world works,” McDermott says. “We believe that the brain internalizes those regularities, and you have models in your head of the way that sound is generated.
McDermott’s lab also explores how exposure to different types of music affects people’s music preferences and even how they perceive music.
As an undergraduate at Harvard University, McDermott originally planned to study math and physics, but “I was very quickly seduced by the brain,” he says. After earning a master’s degree from University College London, he came to MIT to do a PhD in brain and cognitive sciences. His focus was still on vision, which he studied with Ted Adelson, the John and Dorothy Wilson Professor of Vision Science, but he found himself increasingly interested in audition. He had always loved music, and around this time, he started working as a radio and club DJ. To pursue his new interest, he served as a postdoc at the University of Minnesota, where he worked in a lab devoted to psychoacoustics - the study of how humans perceive sound. There, he studied auditory phenomena such as the “cocktail party effect,” or the ability to focus on a particular person’s voice while tuning out background noise. During another postdoc at New York University, he started working on computational models of the auditory system.
“The culture here surrounding brain and cognitive science really prioritizes and values computation, and that was a perspective that was important to me,” says McDermott, who is also a member of MIT’s McGovern Institute for Brain Research and the Center for Brains, Minds and Machines.
About 10 years ago, when McDermott was a postdoc, he started working on cross-cultural studies of how the human brain perceives music. Richard Godoy, an anthropologist at Brandeis University, asked McDermott to join him for some studies of the Tsimane’ people, who live in the Amazon rainforest. Since then, McDermott and some of his students have gone to Bolivia most summers to study sound perception among the Tsimane’. These studies have revealed both differences and similarities between Westerners and the Tsimane’ people.
McDermott, who counts soul, disco, and jazz-funk among his favorite types of music, has found that Westerners and the Tsimane’ differ in their perceptions of dissonance. He has also shown that that people in Western society perceive sounds that are separated by an octave to be similar, but the Tsimane’ do not. However, there are also some similarities between the two groups.
“We’re finding both striking variation in some perceptual traits that many people presumed were common across cultures and listeners, and striking similarities in others,” McDermott says.
“Hearing impairment is the most common sensory disorder. It affects almost everybody as they get older, and the treatments are OK, but they’re not great,” he says. “We’re eventually going to all have personalized hearing aids that we walk around with, and we just need to develop the right algorithms in order to tell them what to do.
RISD's Studio for Research in Sound and Technology (SRST)
RISD’s Studio for Research in Sound and Technology (SRST, formerly known as the Spatial Audio Studio) serves as an interdisciplinary hub for sound design experimentation as well as research in sonic interaction, experience, composition and performance. Headed up by Professor Shawn Greenlee 96 PR, the studio is located on the mezzanine level of 15 West.
Johnson: What’s really unique about the space is that you can conduct research in conjunction with materials, inside a framework for actually hearing and experimenting with sound. That radically alters how your research might unfold. Being in a space where sound is created in such an intentional way reframes the conversation and reorients the community.
Shonuga-Fleming: Even listening to existing music in that super-quiet space allows you to hear all the details in ways you never could before. And sound design is much more intuitive when you can hear everything.
Cetilia: In my Sound Synthesis and Sonic Practices classes offered through the Digital + Media department and Computation, Technology and Culture concentration, I try to offer opportunities for collaboration.
Greenlee: And it’s an entirely interdisciplinary space, which means it’s a great place for RISD students and faculty to connect with people in other departments.
Johnson: I think that part of it is trying to conceive of sound as an invisible material and to understand how it relates to the larger, nebulous world of aesthetics.
Greenlee: Absolutely, yes. Students are already working with sound, for example in the Film/Animation/Video department, but it’s also becoming more and more prevalent in other departments, like Painting and Sculpture, in which students are creating immersive installations and performances that incorporate audio components. Think about all of the headphones and loudspeakers you’re likely to encounter in a gallery exhibition. Sound design is also a key factor in many fields, including industrial design and graphic design, when multi-sensory experience is fully considered.
Demps: I teach Spatial Dynamics classes in the Experimental and Foundation Studies division, so I’m working with undergraduate students who haven’t selected a major yet. One assignment I’ve been giving students is to build an archive of their favorite places and pay close attention to the places’ sounds as a way to understand and navigate space.
Shonuga-Fleming: I’m studying Architecture, and part of what we’re learning is how to design acoustically. I also took Shawn’s Spatial Audio class, which really opened me up to thinking about acoustic space architecturally: how sound is positioned in space, what the acoustics of the room contribute, etc.
Chechile: My research revolves around “difference tones,” which are sounds the ear generates in response to specific acoustic tone combinations. Difference tones are perceived as localized within the head, so they create an additional nested layer of spatial depth within the multichannel loudspeaker dome. To work creatively with these tones, I conduct empirical research in psychoacoustics, and the results can be used in medical fields and in the hearing sciences but also in instrument design and applied sonic arts. At RISD, I’m expanding my Ear Tone Toolbox software and writing new creative works that employ the phenomenon, among other projects.
Demps: It was amazing: very informative but also grounded and down to Earth. It’s awesome to hear about how other folks are working, the problems they’re dealing with and how they’ve solved them. I always find inspiration in that.
Johnson: I’ve been following Camille Norment’s work for forever. It’s really philosophical.
Greenlee: And it’s important to note that the event-an insightful discussion facilitated by Alex and Assistant Professor Jess Myers-was co-sponsored by the Architecture department and the Fleet Library, so it really brought the RISD community together.
Greenlee: RISD’s Studio for Research in Sound and Technology is becoming a hub for new software, tools and ideas in the fields of sonic arts and sound design. It has been so exciting to see how the students leapfrog off one another’s research ideas.
Key Research Areas in Psychoacoustics
The field of psychoacoustics encompasses a wide range of research areas, each contributing to our understanding of how humans perceive and process sound. Here are some key areas:
- Auditory scene analysis and sound object recognition: Research investigating how listeners distinguish between sound sources, process complex acoustic environments, and identify sound objects within natural and artificial soundscapes.
- Hearing loss, auditory perception, and assistive technologies: Studies examining how hearing loss alters psychoacoustic and cognitive aspects of sound perception, including changes in pitch, loudness, temporal and spatial processing.
This collection aims to bridge theory with application by highlighting work that spans from the core science of psychoacoustics to innovative implementations in real-world audio systems and soundscapes. We also welcome contributions that do not restrict to natural sciences, but that work in close relationship with, or make use of methods borrowed from the humanities and social sciences, e.g.
William M. Hartmann's Contribution to Psychoacoustics at MSU
“Michigan State University is a great place,” Hartmann said. “This university has been very good to me over the years,” Hartmann said. Hartmann’s $2 million gift will establish the new William M. Science Dean Eric Hegg said. years to come. Hartmann didn’t come to MSU planning to work in psychoacoustics.
Even his background was embroiled in the Vietnam War. physics with making bombs, and class enrollment dwindled. The sudden suggestion wasn’t out of nowhere, though. play the piano and, later, the trumpet. was hooked. his undergraduate college years. with enough sounds to emulate a whole orchestra.
The university letting me change fields the way I did. Society of America conference. everything he learned was interesting. sound is coming from, a process called sound localization. ears compared to their heads affected their ability to localize sound. into electrical signals in the brain. with a suspended metal cable floor. Hartmann’s work didn’t go unnoticed. he eventually became president and served as the acting director of acoustics. endowing a chair in physics,” Physics and Astronomy Chair Steve Zepf said. in the country. down. He continues to work in his lab and still publishes papers for more than 50 years.
Recent Publications in Psychoacoustics
Here are some of the recent publications in the field of psychoacoustics:
- Best, V., Ahlstrom, J. B., Mason, C. R., Perrachione, T. K., Kidd, G., Jr., & Dubno, J. R. (2024). Talker change detection by listeners varying in age and hearing loss.
- Best, V., & Roverud E. (2024). Externalization of speech when listening with hearing aids.
- Miles, K., Best, V., & Buchholz, J. M. (2024). Feasibility of an adaptive version of the everyday conversational sentences in noise test.
- Andrejková, G., Best, V., & Kopčo, N. (2023). Time scales of adaptation to context in horizontal sound localization.
- Best, V., Boyd, A. D., & Sen, K. (2023). An effect of gaze direction in cocktail party listening.
- Byrne, A. J., Conroy, C., & Kidd, G., Jr. (2023). Individual differences in speech-on-speech masking are correlated with cognitive and visual task performance.
- Conroy, C., Buss, E., & Kidd, G., Jr. (2023). Cues to reduce modulation informational masking.
- Kidd, G., Jr., & Conroy, C. (2023). Auditory informational masking.
- Roverud, E., Villard, S., & Kidd, G., Jr. (2023). Strength of target source segregation cues affects the outcome of speech-on-speech masking experiments.
- Villard, S., Perrachione, T. K., Lim, S.-J., Alam, A., & Kidd, G., Jr. (2023). Energetic and informational masking place dissociable demands on listening effort: Evidence from simultaneous electroencephalography and pupillometry.
- Baltzell, L. Best, V., Baltzell, L. S., & Colburn, H. S. (2022). Effects of hearing loss on interaural time difference sensitivity at low and high frequencies.
- Byrne, A. J., Conroy, C., & Kidd, G., Jr. (2022). The effects of uncertainty in level on speech-on-speech masking.
- Cho, A. Y., & Kidd, G., Jr. (2022). Auditory motion as a cue for source segregation and selection in a “cocktail party” listening environment.
- Chou, K. F., Boyd, A. D., Best, V., Colburn, H. S., & Sen, K. (2022). A biologically oriented algorithm for spatial sound segregation.
- Conroy, C., Byrne, A. J., & Kidd, G., Jr. (2022). Forward masking of spectrotemporal modulation detection.
- Miles, K. M., Beechey, T., Best, V., & Buchholz, J. M. (2022). Measuring speech intelligibility and hearing-aid benefit using everyday conversational sentences in real-world environments.
- Prud’homme, L., Lavandier, M., & Best, V. (2022). A dynamic binaural harmonic-cancellation model to predict speech intelligibility against a harmonic masker varying in intonation, temporal envelope, and location.
- Prud’homme, L., Lavandier, M., & Best, V. (2022). Investigating the role of harmonic cancellation in speech-on-speech masking.
- Baltzell, L. S., & Best, V. (2021). High-resolution temporal weighting of interaural time differences in speech.
- Best V., Goupell M. J., & Colburn H. S. (2021). Binaural hearing and across-channel processing. In: Litovsky, R. Y., Goupell, M. J., Fay, R. R., & Popper, A. N. (Eds.) Binaural Hearing.
- Conroy, C., & Kidd, G., Jr. (2021). Informational masking in the modulation domain.
- Goupell, M. J., Best, V., & Colburn H.
- Jett, B., Buss, E., Best, V., Oleson, J., & Calandruccio, L. (2021). Does sentence-level coarticulation affect speech recognition in noise or a speech masker?
- Lavandier, M., Mason, C. R., Baltzell, L. S., & Best, V. (2021). Individual differences in speech intelligibility at a cocktail party: A modeling perspective.
- Roverud, E., Dubno, J. R., Richards, V. M., & Kidd, G., Jr. (2021). Cross-frequency weights in normal and impaired hearing: Stimulus factors, stimulus dimensions, and associations with speech recognition.
- Yun, D., Jennings, T. R., Kidd, G., Jr., & Goupell, M. J. (2021). Benefits of triple acoustic beamforming during speech-on-speech masking and sound localization for bilateral cochlear-implant users.
- Baltzell, L. S., Cho, A. Y., Swaminathan, J., & Best, V. (2020). Spectro-temporal weighting of interaural time differences in speech.
- Baltzell, L. S., Swaminathan, J., Cho, A. Y., Lavandier, M., & Best, V. (2020). Binaural sensitivity and release from speech-on-speech masking in listeners with and without hearing loss.
- Best, V., Baumgartner, R., Lavandier, M., Majdak, P., & Kopčo, N. (2020). Sound externalization: A review of recent research.
- Best, V., Conroy, C., & Kidd, G., Jr. (2020). Can background noise increase the informational masking in a speech mixture?
- Conroy, C., Best, V., Jennings, T.
- Conroy, C., Mason, C. R., & Kidd, G., Jr. (2020). Informational masking of negative masking.
- Kidd, G., Jr., Jennings, T. R., & Byrne, A. J. (2020). Enhancing the perceptual segregation and localization of sound sources with a triple beamformer.
- Prud’homme, L., Lavandier, M., & Best, V. (2020). A harmonic-cancellation-based model to predict speech intelligibility against a harmonic masker.
- Roverud, E., Bradlow, A., & Kidd, G., Jr. (2020).
- Roverud, E., Dubno, J. R., & Kidd, G., Jr. (2020). Hearing-impaired listeners show reduced attention to high-frequency information in the presence of low-frequency information.
- Villard, S., & Kidd, G., Jr. (2020). Assessing the benefit of acoustic beamforming for listeners with aphasia using modified psychoacoustic methods.
- Wang, L., Best, V., & Shinn-Cunningham, B. G. (2020). Benefits of beamforming with local spatial-cue preservation for speech localization and segregation.
- Best, V., Roverud, E., Baltzell,L., Rennies, J., & Lavandier, M. (2019). The importance of a broad bandwidth for understanding “glimpsed” speech.
- Best, V., & Swaminathan, J. (2019). Revisiting the detection of interaural time differences in listeners with hearing loss.
- Kidd, G., Mason, C. R., Best, V., Roverud, E., Swaminathan, J., Jennings, T., Clayton, K., & Colburn, H. S. (2019). Determining the energetic and informational components of speech-on-speech masking in listeners with sensorineural hearing loss.
- Rennies, J., Best, V.. Roverud, E., & Kidd, G. (2019). Energetic and informational components of speech-on-speech masking in binaural speech intelligibility and perceived listening effort.
- Villard, S., & Kidd, G. (2019). Effects of acquired aphasia on the recognition of speech under energetic and informational masking conditions.
- Best, V., Ahlstrom, J. B., Mason, C. R., Roverud, E., Perrachione, T. K., Kidd, G., Jr., & Dubno, J. R. (2018). Talker identification: Effects of masking, hearing loss, and age.
- Best, V., Swaminathan, J., Kopčo, N., Roverud, E., & Shinn-Cunningham, B. G. (2018). A “buildup” of speech intelligibility in listeners with normal hearing and hearing loss.
- Cubick, J., Buchholz, J. M., Best, V., Lavandier, M., & Dau, T. (2018). Listening through hearing aids affects spatial perception and speech intelligibility in normal-hearing listeners.
- Dai, L., Best, V., & Shinn-Cunningham, B. G. (2018). Sensorineural hearing loss degrades behavioral and physiological measures of human spatial selective auditory attention.
- Rennies, J., & Kidd, G. (2018). Benefit of binaural listening as revealed by speech intelligibility and listening effort.
- Roverud, E., Best, V., Mason, C. R., Streeter, T., & Kidd, G., Jr. (2018). Evaluating the performance of a visually guided hearing aid using a dynamic auditory-visual word congruence task.
- Best, V., Mason, C. R., Swaminathan, J., Roverud, E., & Kidd, G., Jr. (2017). Use of a glimpsing model to understand the performance of listeners with and without hearing loss in spatialized speech mixtures.
- Best, V., Roverud, E., Mason, C. R., & Kidd, G., Jr. (2017). Examination of a hybrid beamformer that preserves auditory spatial cues.
- Best, V., Roverud, E., Streeter, T., Mason, C. R., & Kidd, G., Jr. (2017). The benefit of a visually guided beamformer in a dynamic speech task.
- Kidd, G., Jr. (2017). Enhancing auditory selective attention using a visually guided hearing aid.
- Kidd, G., Jr., & Colburn, H. S. (2017). Informational masking in speech recognition. In: The Auditory System at the Cocktail Party. J. C. Middlebrooks, J. Z. Simon, A.N. Popper and R.R. Fay (Eds.). Springer Nature, pp.
- Shinn-Cunningham, B., Best, V., & Lee, A. K. C. (2017). Auditory object formation and selection. In: The Auditory System at the Cocktail Party. J. C. Middlebrooks, J. Z. Simon, A. N. Popper and R. R. Fay (Eds.). Springer Nature, pp.
- Best, V., Streeter, T., Roverud, E., Mason, C. R., & Kidd, G., Jr. (2016). A flexible question-and-answer task for measuring speech understanding.
- Clayton, K., Swaminathan, J., Yazdanbakhsh, A., Zuk, J., Patel, A. D., & Kidd, G., Jr. (2016). Executive function, visual attention and the cocktail party problem in musicians and non-musicians.
- Kidd, G., Jr., Mason, C. R., Best, V., Swaminathan, J., Roverud, E., & Clayton, K. (2016). Determining the energetic and informational components of speech-on-speech masking.
- Roverud, E., Best, V., Mason, C., Swaminathan, J., & Kidd, G., Jr. (2016). Informational masking in normal-hearing and hearing-impaired listeners measured in a nonspeech pattern identification task.
- Swaminathan, J., Mason, C. R., Streeter, T., Best, V., Roverud, E., & Kidd, G., Jr. (2016). Role of binaural temporal fine structure and envelope cues in cocktail-party listening.
- Best, V., Mason, C. R., Kidd, G., Jr., Iyer, N., & Brungart, D. S. (2015). Better-ear glimpsing in hearing-impaired listeners.
- Kidd, G., Jr., Mason, C. R., Best, V., & Swaminathan, J. (2015). Benefits of acoustic beamforming for solving the cocktail party problem.
- Swaminathan, J., Mason, C. R., Streeter, T. M., Best, V., Kidd, G,. Jr., & Patel, A. D. (2015). Musical training, individual differences and the cocktail party problem.
