Ap Cam

Find The Best Tech Web Designs & Digital Insights

Technology and Design

Auditory Scene Analysis: The Perceptual Organization of Sound

The process of parsing the incoming sound signal into a meaningful representation of the environment is called auditory scene analysis. The American psychologist William James called everyday life 'the blooming, buzzing confusion'. How does the auditory system separate all of these sources into discrete perceptual units?

Auditory Scene Analysis (ASA) describes how our perceptual system parses the incoming complex vibration (sound) in order to produce a meaningful representation of the environment. In this book, I attempted to integrate the existing research on the perceptual organization of sound by connecting it with the "scene analysis" problem encountered in machine vision. I wanted to show how a large number of auditory phenomena could be viewed as parts of the process of auditory scene analysis (ASA) and could be explained through a limited number of principles of auditory grouping.

For example, in the noisy scene of a city street at any given time, some of the sound components reaching your ears may belong to a motorcycle driving by, others to ambient traffic noise, and still others to voices of people on the sidewalk next to you: your auditory system deciphers which is which. Additionally, the auditory system must group incoming sound components into units that are delimited in time (segmentation), for example musical notes, and decide which ones to group together into extended sequences such as melodies. This is called auditory streaming.

It involves the process of grouping or separating sound events in time, which is called auditory streaming. The elements can either be grouped together (integration), separated in layers (segregation) or separated in successive events (segmentation).

Key Principles of Auditory Scene Analysis

Complicated though it may be, there are fortunately relatively few principles that guide the auditory system through this task:

Auditory Scene Analysis: How Your Brain Makes Sense of Sound
  • Harmonicity: Frequencies (or partials) related by simple integer ratios tend to group together. For example, if the auditory scene contains frequencies at 110 Hz, 220 Hz, and 330 Hz (n, 2n, 3n), the auditory system will tend to fuse them together into a single complex sound, whereas frequencies at 110 Hz, 201 Hz, and 350 Hz, which are not related by simple ratios, are less likely to fuse.
  • Amplitude comodulation: Sound components that get louder or softer in parallel tend to group together.
  • Source location: Sound components that originate from the same physical location in space tend to group together.

For the most part, we are unaware that this process is happening, and take it for granted. But before the auditory scene makes it into your conscious awareness, an amazing feat of pre-attentive analysis has already converted the dizzying complexity of air vibrations around you into a coherent picture of the world.

Sequential and Simultaneous Integration

ASA is dependent on two kinds of integration (and segregation) of auditory information: sequential and simultaneous. These are discussed in Chapters 2 and 3.

Chapter 2 discusses sequential integration in detail, describing the history, the methods and the findings that bear on this aspect of ASA. The role of sequential integration is to perceptually connect a subset of the auditory information, collected over time, into a stream that represents a single environmental source of sound. The "streaming" phenomenon is viewed as a laboratory demonstration that exposes many of the principles of sequential integration. The factors that affect it, and its consequences for how listeners will perceive sounds are described, as well as the competition among alternative perceptual organizations, and the build-up of grouping over time. Finally theories of sequential organization are discussed.

Chapter 3 is devoted to the study of the integration of simultaneous events. The role of simultaneous integration is to partition the spectral information received at the same time to form one or more concurrent sounds, each with its own qualities. There is a discussion of the factors that influence this process and also of the perceptual consequences of this type of integration. Such causal factors as harmonic relations, spatial and spectral separations are described. A powerful principle of grouping, "the old-plus new heuristic" is shown to be involved in such phenomena as the illusory continuity of softer sounds through louder, interfering sounds.

Chapter 1 introduces the idea of auditory scene analysis (ASA) and links it to the scene analysis problem in vision. The concept of the auditory stream is introduced and a comparison made with the notion of the "object" in vision. Chapter 1 also introduces the idea that the grouping of sounds can be affected by the attentional processes of listeners, guided by their schemas (knowledge of types of sounds and their properties). It is also proposed that the more "primitive" or basic processes of auditory integration and segregation are innate.

Chapter 4 discusses the role of attention, expectation and schemas in ASA and uses these ideas to explain a number of research findings. Bottom-up and top-down influences on ASA work differently and have to be distinguished.

Chapter 5 uses the concepts developed in the earlier chapters to show how "primitive" auditory organization is involved in creating the architecture or "texture" of a piece of music. The same principles of grouping and segregation that we have studied in the laboratory can throw light on a number of musical phenomena: melodic coherence, compound melodic lines, phrasing, rhythm, the phenomenal dependency of one note on another, harmony, fusion of organ stops, the rules of counterpoint, the crossing of musical lines, the "control of dissonance", and other issues.

Chapter 6 discusses the role of perceptual organization in speech perception, showing how the acoustic bases of auditory organization influence this process. Research is cited that shows how the vowel quality of sounds can be affected by the grouping of spectral components and how the pitch trajectory contributes to the perceived continuity of an utterance. Concurrent speech sounds can be segregated by differences in fundamental frequency, harmonic relations, and spatial separations. The role of speech schemas in the segregation of concurrent sounds is also considered.

Chapter 7 continues the discussion of speech perception by considering the general question of "exclusive allocation" (in which a single piece of auditory input can contribute to the mental description of only one sound at any given moment). The phenomenon of duplex perception of speech seems to violate the principle of exclusive allocation. We examine whether the perception of two concurrent percepts using the same acoustic information can only occur when the two percepts are being built by different mental "modules", such as the speech perception module and a separate sounds-in-space module.

Chapter 8 summarizes the earlier chapters and draws conclusions. It also sets out directions for future research.

Separate chapters applied these principles to the study of music and of speech.

For instance, you can hear in this example how our perceptive system tends to group the components of a complex tone based on their frequency comodulation.

As an example, this demonstration shows how we tend to integrate or segregate auditory streams based on their perceived source location (panning).

This spectrogram represents a 15-second excerpt from Robert Normandeau’s large-scale, multimovement electroacoustic composition Clair de Terre (1999). The movement from which this excerpt is taken is called “Micro-montage” and the music lives up to its title: many brief sound events are juxtaposed or superimposed in a short amount of time. Which raises the interesting question: how do we keep track of them all? How does the auditory system make sense of the amazingly complex and ever-changing air vibrations that reach the ear? In this excerpt, I can hear crashing chords, whistling wind, chirping birds, a revving motorcycle, and many more sound sources that I vaguely recognize but am hard-pressed to name. Although this particular panoply occurs in the context of a piece of electroacoustic music, the experience of being bombarded with many different sounds is familiar from what the American psychologist William James called 'the blooming, buzzing confusion' of everyday life.

Spectrogram

Spectrogram Example

Reference

Bregman, Albert S., Auditory Scene Analysis: The Perceptual Organization of sound. Cambridge, Massachusetts: The MIT Press, 1990 (hardcover)/1994 (paperback).

Bregman, A. (1990). Auditory Scene Analysis : The Perceptual Organization of Sound. The MIT Press.