Ap Cam

Find The Best Tech Web Designs & Digital Insights

Technology and Design

Auditory Scene Analysis: The Perceptual Organization of Sound (Bregman, 1990)

Auditory Scene Analysis describes how our perceptual system parses the incoming complex vibration (sound) in order to produce a meaningful representation of the environment. The process of parsing the incoming sound signal into a meaningful representation of the environment is called auditory scene analysis. For example, in the noisy scene of a city street at any given time, some of the sound components reaching your ears may belong to a motorcycle driving by, others to ambient traffic noise, and still others to voices of people on the sidewalk next to you: your auditory system deciphers which is which.

It involves the process of grouping or separating sound events in time, which is called auditory streaming. Additionally, the auditory system must group incoming sound components into units that are delimited in time (segmentation), for example musical notes, and decide which ones to group together into extended sequences such as melodies. This is called auditory streaming. The elements can either be grouped together (integration), separated in layers (segregation) or separated in successive events (segmentation).

Auditory Streaming Illusion

Auditory streaming example.

Although this particular panoply occurs in the context of a piece of electroacoustic music, the experience of being bombarded with many different sounds is familiar from what the American psychologist William James called 'the blooming, buzzing confusion' of everyday life. How does the auditory system make sense of the amazingly complex and ever-changing air vibrations that reach the ear? How does the auditory system separate all of these sources into discrete perceptual units?

This spectrogram represents a 15-second excerpt from Robert Normandeau’s large-scale, multimovement electroacoustic composition Clair de Terre (1999). The movement from which this excerpt is taken is called “Micro-montage” and the music lives up to its title: many brief sound events are juxtaposed or superimposed in a short amount of time. Which raises the interesting question: how do we keep track of them all? In this excerpt, I can hear crashing chords, whistling wind, chirping birds, a revving motorcycle, and many more sound sources that I vaguely recognize but am hard-pressed to name.

Understanding Auditory Scene Analysis: Principles and Applications

Principles Guiding Auditory Scene Analysis

Complicated though it may be, there are fortunately relatively few principles that guide the auditory system through this task. They are:

  • Harmonicity: Frequencies (or partials) related by simple integer ratios tend to group together. For example, if the auditory scene contains frequencies at 110 Hz, 220 Hz, and 330 Hz (n, 2n, 3n), the auditory system will tend to fuse them together into a single complex sound, whereas frequencies at 110 Hz, 201 Hz, and 350 Hz, which are not related by simple ratios, are less likely to fuse. For instance, you can hear in this example how our perceptive system tends to group the components of a complex tone based on their frequency comodulation.
  • Amplitude comodulation: Sound components that get louder or softer in parallel tend to group together.
  • Source location: Sound components that originate from the same physical location in space tend to group together. As an example, this demonstration shows how we tend to integrate or segregate auditory streams based on their perceived source location (panning).
Auditory Scene Analysis Principles

Principles of Auditory Scene Analysis.

For the most part, we are unaware that this process is happening, and take it for granted. But before the auditory scene makes it into your conscious awareness, an amazing feat of pre-attentive analysis has already converted the dizzying complexity of air vibrations around you into a coherent picture of the world.

Summary of Principles

The following table summarizes the key principles discussed:

Principle Description Example
Harmonicity Frequencies related by simple integer ratios group together. Frequencies at 110 Hz, 220 Hz, and 330 Hz (n, 2n, 3n) fusing into a single complex sound.
Amplitude Comodulation Sound components with parallel loudness changes group together. Sounds increasing or decreasing in volume simultaneously being perceived as a single source.
Source Location Sounds originating from the same physical location group together. Sounds from the left side of the listener being grouped separately from sounds on the right.

Reference

Bregman, A. (1990). Auditory Scene Analysis : The Perceptual Organization of Sound. The MIT Press.