Ap Cam

Find The Best Tech Web Designs & Digital Insights

Technology and Design

Melodic Expectation in Music: Exploring Contour Perception Across Auditory Dimensions

The question of what makes a good melody has intrigued composers, music theorists, and psychologists alike. Cognitive psychologists have observed that when listeners encounter a sequence of notes, they quickly develop expectations about how the melodic sequence will continue. These expectations are based on prior exposure to the melody or on more general acquired or innate principles.

Studies of melodic expectation have identified two basic categories of expectations: perceived musical key or tonality and contour. Contour refers to the pattern of directions (up or down) of the intervals in a melody. While tonality influences melodic expectation, many established principles relate only to melodic contour.

Novel melodies are more easily distinguished from melodies with different contours than from melodies with similar contours. Preference and expectation for melodies are distinct concepts, but are closely related, as expected continuations are more likely to be preferred than unexpected continuations.

Melodic preferences, particularly those related to tonality, are likely to be culturally specific and so may depend on exposure to certain forms of music and melodies. Other preferences and expectations, particularly those related to melodic contour, may reflect more general perceptual principles related to the formation of auditory streams, and may not be specific to melodies or even music.

One way to test whether melodic contour expectations are domain specific, or whether they reflect more general perceptual principles, is to generate contours in dimensions other than pitch. Pitch is a perceptual auditory dimension primarily related to a sound’s overall periodicity or fundamental frequency (F0). The auditory dimension of brightness is an aspect of timbre related to the center of mass of a sound’s spectral envelope (sounds with more energy in the high-frequency range of the spectrum are perceived as being brighter). Loudness is primarily related to a sound’s intensity.

Among these dimensions of sound, pitch is unique in that it can be classified according to both pitch height (a linear scale) and pitch chroma (a circular scale that repeats with every doubling of F0). Furthermore, perceived relationships between pitches form tonal hierarchies: Western listeners, especially those with musical training, judge notes belonging to an established musical scale as better “completions” following that scale.

In the dimensions of brightness and loudness, there are no analogies to pitch chroma or tonal hierarchy, only to pitch height. To the extent that melodic expectations are influenced by tonality, they should not be replicable in other auditory dimensions.

In this study we asked whether the same expectations that have been discovered for melodic contours in pitch also apply to contours in brightness and loudness. In two experiments, we presented our participants with 3-tone “melodies” that varied in pitch, brightness, or loudness, and we asked them to judge how well the final note of the melody completed the sequence.

Against these results, we tested three well-established rules of melodic continuation, derived from music theory and from cognitive studies based on pitch variations. If expectations for melodic contour extend beyond the pitch dimension, then we would expect listeners’ judgments to conform to the predictions of these rules, not only for pitch sequences, but also for sequences based on brightness and loudness.

Experiment Design and Stimuli

Harmonic complex tones were shaped with spectral envelopes determined by applying a Gaussian weighting function to the amplitudes of the individual harmonics. The standard deviation of the Gaussian was set to 25% of its center frequency. All the tones were gated on and off with 20-ms raised-cosine ramps. The tones were generated within MATLAB (The Mathworks, Natick, MA) and were played out from a 24-bit L22 soundcard (LynxStudio, Costa Mesa, CA) to both ears through HD580 headphones (Sennheiser USA, Old Lyme, CT), at a sampling rate of 48 kHz.

Pitch variations were achieved by varying the F0 of the tones; brightness variations were achieved by varying the center frequency of the Gaussian weighting function; and loudness variations were achieved by varying the overall sound level of the tones.

Harmonic complex tones

Harmonic complex tones

The first step in designing the stimuli was to create broadly equivalent “scales” in the three dimensions of pitch, brightness, and loudness. This was achieved by using scale step sizes of 1 semitone (~6%) for F0, 2 semitones for the center frequency of the Gaussian weighting function, and 2 dB for the overall sound pressure level.

The step sizes were selected to be approximately equally salient, based on previously reported interval-discrimination thresholds for pitch, timbre, and loudness. It is important to note here that by “scale” we do not mean a musical key or any other kind of tonal hierarchy. Those elements of pitch melodies cannot be meaningfully translated into brightness or loudness melodies, since there is no analog to pitch chroma in those dimensions.

The scale for each dimension spanned 27 steps. In pitch, the F0s ranged from G3 (196 Hz) to A5 (880 Hz) in 1-semitone steps (an equal-temperament tuning including the A440 pitch standard). In brightness, the center frequency of the Gaussian function ranged from 196 Hz to 3951 Hz, in 2-semitone steps. In loudness, the overall level ranged from 30 to 82 dB SPL, in 2-dB steps.

The range of these scales was determined by various constraints. First, the minimum and maximum loudness values were chosen to be easily audible and not uncomfortable, respectively. This level range, combined with the step-size of 2 dB, allowed for 27 scale steps. The same number of steps was then used for all three dimensions. The F0 range was selected to span a range that was within that normally used in Western music for melodies.

Scale Ranges for Pitch, Brightness, and Loudness
DimensionScale RangeStep Size
Pitch (F0)G3 (196 Hz) to A5 (880 Hz)1 semitone (~6%)
Brightness (Spectral Center)196 Hz to 3951 Hz2 semitones
Loudness (SPL)30 dB to 82 dB2 dB

Once the scales were established, we adapted a paradigm that was used in an earlier study for generating pitch melodies to create melodies in the pitch, brightness, or loudness dimension. Melodies consisted of three notes each. The first two notes comprised the context interval. The third note is referred to as the “continuation tone”.

The same eight context intervals originally used by Cuddy and Lunney (1995) were used. In Western music, these intervals in pitch are referred to as the ascending and descending forms of the major second, minor third, major sixth, and minor seventh. These intervals correspond to the following number of steps respectively: ±2, ±3, ±9, and ±10 steps. For each context interval, every continuation tone from 12 steps below to 12 steps above the second tone (25 intervals total) was tested for a total of 200 trials (8 context intervals by 25 continuation intervals).

In every melody, the value of the second note was selected from a set of three equally probable values, corresponding to the three centermost values in the pitch, loudness, or brightness range. In pitch, for example, the second note of every melody was randomly sampled from the set of G4 (392.0 Hz), G#4/Ab4 (415.3 Hz), and A4 (440 Hz). The values of the first and third notes were then determined based on the value of the second note and the necessary interval sizes and directions for each trial.

The three notes were presented with the temporal relationships shown in Fig. 2B. The duration of each note (including onset and offset ramps) was 1150, 350, and 750 ms, respectively.

Melody Scales

Visual representation of the scales used for F0 (for pitch melodies), spectral center (for brightness melodies), and level (for loudness melodies).

Participants and Procedure

Eighteen listeners, 5 male and 13 female, were recruited from the Twin Cities campus of the University of Minnesota. Listeners ranged in age from 18 to 31 (M = 20.8, SD = 3.0). The average amount of musical training was 6.5 years (SD = 4.8; range 0 to 13 years). The five participants who reported the lowest amount of musical training (either 0 or 1 years) and the four participants who reported the highest amount of musical training (either 12 or 13 years) were taken as an approximation of the lower and upper quartiles, respectively, of participants ranked by musical experience.

Listeners gave subjective continuation ratings for 200 three-tone sequences each in pitch, brightness and loudness (600 total). After each sequence, the listener was asked to rate how well the third tone met expectations on a Likert scale from −3 (“Very Poorly”) to 3 (“Very Well”).

Experiment 1 deviated from the paradigm established by Cuddy and Lunney (1995) in two ways. Firstly, the previous study presented the 200 possible melodies in blocks based on context interval size, such that all melodies beginning with the 9-steps-ascending context interval were heard in immediate succession. To avoid possible long-term context effects associated with presenting the same stimulus repeatedly, we randomized the presentation of the 8 different context intervals from trial to trial, just as the presentation of the 25 different continuation tones was randomized from trial to trial.

Secondly, Cuddy and Lunney (1995) set the second note of their melodies as equal to C4 or F#4, alternating every other trial. With our selected step size in loudness (2 dB), the range required to follow this convention exactly would have been impossible to attain without presenting sounds that were either uncomfortably loud or inaudibly soft.

The 200 trials in each condition were presented in a different random order for each participant and dimension.

Predictors of Melodic Continuation

Certain contour-based principles of melodic continuation have been well established and supported by previous studies of melodic continuation in pitch. The first predictor, Proximity, refers to the difference, in terms of scale steps, between the second and third notes, where positive values indicate that the third note was higher than the second. Previous research on pitch-based melodic expectancy has found that small absolute values of Proximity are more expected than large ones.

The second predictor, Inertia, corresponds to an expectation for pitch-based melodies to continue in the same direction after a small step. The third predictor, Post-skip Reversal, reflects the tendency for a melody to move in the opposite direction following a large leap.

Among the contour-based predictors, we selected Proximity because it is one of the most broadly supported by evidence. Post-skip Reversal is also well supported, though there is the question of whether it merely represents regression to the mean. There is less evidence for Inertia, with some studies, including our model study, finding no support for it.

These are far from the only principles of melodic continuation that are supported by evidence, and there were alternative predictor variables we could have selected. However, many of these are disqualified from the present study because they are based on tonality, and as such there is no way to evaluate them in the dimensions of brightness and loudness.

For example, one well-supported predictive principle favors continuation tones that are the tonic (primary) note of a musical key containing the previous two notes. But this predictor could not be applied to brightness or loudness sequences, as musical keys cannot be formed in those dimensions.

In part, our expectations were that lower absolute values of Proximity would lead to higher ratings, and that both Post-skip Reversal and Inertia would be generally supported by our data, but our primary hypothesis was that listeners’ expectations would be similar for contours in loudness and brightness to expectations for contours in pitch.

To evaluate the strength of these principles against our data, we coded each melody heard by listeners with a value indicating the degree to which that melody fulfilled each principle. Proximity was coded as the absolute difference, in steps, between the second and third notes in a melody. For example, if the second and third notes were the same, Proximity was 0, and if the third note was 12 steps down from the second note, Proximity was −12.

Inertia was coded as True when a small interval (2 or 3 steps) was followed by a continuation in the same direction, False when a small interval was followed by a continuation in the opposite direction, and Neutral for any large context interval (9 or 10 steps).

Melodic Expectation and Tension in Music

A model of melodic expectation is proposed. The model assigns ratings to the expectedness of melodic events. The ratings depend on the hierarchic implementation of three primary factors - stability, proximity, and direction - and one secondary factor-mobility. The model explicitly links expectancy ratings to aspects of listeners' experiences of tension in melody. An approach to temporal expectations is discussed but not quantified. The model is situated within a framework for thinking about a type of schematic melodic expectations. This article assesses the position of these expectations within the broader cognitive processes invoked in listening to music. It suggests methods for investigating the expectations empirically.