Ap Cam

Find The Best Tech Web Designs & Digital Insights

Technology and Design

The Influence of Pitch on Music Perception: A Comprehensive Overview

The ability to perceive pitch is fundamental to the human experience of music, enabling us to recognize the identity and emotion conveyed by others, and to make sense of our auditory environment. A pitch percept is formed by weighting different acoustic cues (e.g., signal fundamental frequency and interharmonic spacing) and contextual cues (expectation). How different spectrotemporal cues are integrated to solve the mapping from acoustic waveform to pitch percept remains unknown.

This article delves into the intricate relationship between pitch and music perception, examining the various acoustic and contextual cues that contribute to our understanding of pitch. We will explore how these cues are processed in the brain and how they influence our musical experiences.

Acoustic Cues in Pitch Perception

The perception of pitch arises from a number of sources in the acoustic input. Sounds arising from musical instruments or voices comprise F0 as well as a rich harmonic structure at integer multiples of F0. There are several key acoustic cues that play a significant role in pitch perception:

  • Fundamental Frequency (F0): The lowest periodic component present in the acoustic signal, which has a strong influence on pitch perception. A pure sinusoidal tone that repeats 440 times per second will be perceived with 440 Hz pitch.
  • Interharmonic Spacing: The spacing between harmonic components, which also exhibits a strong influence on pitch perception. Even if the energy at F0 is removed entirely, listeners perceive the signal as eliciting the same pitch percept.

Even if the energy at F0 is removed entirely, listeners perceive the signal as eliciting the same pitch percept. This “missing fundamental” (MF) phenomenon is an important clue to how the auditory system extracts pitch from auditory inputs.

Harmonic Partials
Illustration of harmonic partials on a string, demonstrating the relationship between the fundamental frequency and its overtones.

Neural Encoding of Pitch

Auditory pitch determination begins at the cochlea, where frequencies are mapped to locations along the basilar membrane. This organization is preserved as auditory information is transmitted along the afferent auditory pathway. The precise ways in which frequency and harmonic information are differentially represented once they reach the primary auditory cortex have been a point of contention, with tonotopy in some cases appearing to represent perceptual pitch and in other cases showing that the frequency of pure tones is organized tonotopically, with a separate representation of timbre.

Pitch-selective cortical regions, or “pitch centers,” have been established in lateral Heschl's gyrus, planum temporale, and superior temporal gyrus, as well as in anterior nonprimary areas. This diversity of cortical representation could be driven by stimulus properties, with some areas demonstrating a preference for resolved harmonics or reflecting one specific stimulus feature. Alternatively, these areas could comprise a distributed system of neural ensembles representing pitch over the auditory cortex.

An additional cue to pitch perception is the context in which a sound occurs, which has been shown to bias perception toward a contextually appropriate interpretation when two acoustic cues are ambiguous or in conflict. Furthermore, when there is a high tonal expectation of the next pitch, the expected pitch is neurally represented even when omitted from the end of a sequence. This effect is demonstrably present as early as the primary auditory cortex, suggesting that context is a fundamental component of cortical pitch encoding. However, it is unknown how contextual expectation interacts with acoustic content to derive an ultimate pitch representation.

How Our Brains Process Music

The Role of Context

An additional cue to pitch perception is the context in which a sound occurs. Context has been shown to bias perception toward a contextually appropriate interpretation when two acoustic cues are ambiguous or in conflict. Furthermore, when there is a high tonal expectation of the next pitch, the expected pitch is neurally represented even when omitted from the end of a sequence.

This effect is demonstrably present as early as the primary auditory cortex, suggesting that context is a fundamental component of cortical pitch encoding. However, it is unknown how contextual expectation interacts with acoustic content to derive an ultimate pitch representation.

Auditory Cortex
Simplified illustration of the auditory pathway, showing the flow of auditory information from the ear to the auditory cortex.

Cross-Cultural Perspectives on Pitch Perception

Cross-cultural studies of how music is perceived can shed light on the interplay between biological constraints and cultural influences that shape human perception. People who are accustomed to listening to Western music, which is based on a system of notes organized in octaves, can usually perceive the similarity between notes that are same but played in different registers - say, high C and middle C. However, a longstanding question is whether this a universal phenomenon or one that has been ingrained by musical exposure.

Now, a new study led by researchers from MIT and the Max Planck Institute for Empirical Aesthetics has found that unlike residents of the United States, people living in a remote area of the Bolivian rainforest usually do not perceive the similarities between two versions of the same note played at different registers (high or low). The findings suggest that although there is a natural mathematical relationship between the frequencies of every “C,” no matter what octave it’s played in, the brain only becomes attuned to those similarities after hearing music based on octaves.

The study also found that members of the Bolivian tribe, known as the Tsimane’, and Westerners do have a very similar upper limit on the frequency of notes that they can accurately distinguish, suggesting that that aspect of pitch perception may be independent of musical experience and biologically determined.

Limits of Perception

The study findings also shed light on the upper limits of pitch perception for humans. It has been known for a long time that Western listeners cannot accurately distinguish pitches above about 4,000 hertz, although they can still hear frequencies up to nearly 20,000 hertz. In a traditional 88-key piano, the highest note is about 4,100 hertz.

The researchers found that although Tsimane’ musical instruments usually have upper limits much lower than 4,000 hertz, Tsimane’ listeners could distinguish pitches very well up to about 4,000 hertz, as evidenced by accurate sung reproductions of those pitch intervals. Above that threshold, their perceptions broke down, very similarly to Western listeners.

One possible explanation for this limit is that once frequencies reach about 4,000 hertz, the firing rates of the neurons of our inner ear can’t keep up and we lose a critical cue with which to distinguish different frequencies.

Experimental Stimuli and Design

To disentangle these three primary cues to pitch perception: F0, interharmonic spacing, and tonal context, MEG data were collected from 28 healthy adults presented with sequences of pitch-matched tones with varying spectral content in and out of context. Pure tones and missing fundamental (MF) complex tones were used to represent F0 pitch cues and interharmonic spacing pitch cues, respectively. Ambiguous tones were used to investigate the neural representation of pitch in acoustic ambiguity.

Pure tones consisted of one frequency band at F0 with no additional harmonic components, for a total of 17 unique frequencies. Complex missing fundamental (MF) tones consisted of the five integer multiples (partials) above, but not including the fundamental frequency for each of the 17 pitches. For example, a complex MF tone with energy at frequencies 440, 660, 880, 1,100, and 1,320 Hz has a distance of 220 Hz between each harmonic and will be perceived as having a pitch of 220 Hz, yet the 220 Hz frequency content itself is not present. These missing fundamental tones had a flat spectral envelope, with equal attenuation of all five harmonics.

To build ambiguous tones, Diana Deutsch's adaptation of the Shepard tone illusion was used, in which she systematically varied the amplitudes of odd and even harmonics across the semitones spanning one musical octave, resulting in a set of seven tones corresponding to the notes of a diatonic scale. Due to the ambiguity of the tones, when they are played in succession repeatedly, the sequence of tones evokes an auditory barbershop effect, sounding like a continuously ascending or descending scale with no stable perceived switch during the point at which the tones repeat. Concretely, as the tones decrease in fundamental frequency, the even harmonics remain the same in amplitude, while odd harmonics decrease. This leads to competing cues to pitch, namely, a ratio of 2*F0 between harmonics and a highly attenuated, albeit present, F0.

Pure, missing fundamental and ambiguous tones were synthesized in three musical keys: A3 (220-440 Hz), C4 (262-524 Hz), and Eb4 major (312-624 Hz). Frequency values were calculated according to the equal tempered scale, in which semitones are separated by constant frequency multiples. These frequencies ranged from 220 to 624 Hz, for a total of 17 unique pitches.

All tones were presented for a duration of 300 ms, with a 200 ms silent interstimulus interval. Pure and complex MF blocks each contained nine miniblocks: up, down, and random for each of the three musical scales. Each miniblock repeated the eight tones 15 times, with tone order depending on the condition. In the up and down conditions, the tones were presented in ascending and descending order, respectively, with a jittered 500-750 ms of silence in between each miniblock. In the random condition, the eight tones were presented in random order 15 times. All miniblocks were shuffled within the larger pure tone and complex MF tone blocks.

Participants were presented with the ambiguous tones in a scale condition which mirrored that of the pure and complex MF tones. Because there are only seven ambiguous tones per scale, tones were presented in ascending (up condition) and descending (down condition) order. As in the pure and complex MF scale conditions, repetitions were separated by a pause of 500-750 ms. Lastly, during the random condition, the seven tones were presented in random order 15 times. These 15 miniblocks were shuffled.

All tones were presented binaurally to participants through earphones in the magnetically shielded room. Participants listened passively while watching silent videos of nature scenes. Eye tracking was used to ensure participants were not falling asleep. Block order was counterbalanced across participants. The experiment lasted ∼80 min.

Experimental Conditions
Tone Type Description Acoustic Cue
Pure Tones Single frequency at F0 F0
Missing Fundamental (MF) Tones Five integer multiples above F0, without F0 Interharmonic Spacing
Ambiguous Tones Varied amplitudes of odd and even harmonics Acoustic Ambiguity