Ap Cam

Find The Best Tech Web Designs & Digital Insights

Technology and Design

Tonal Perception: Definition and Linguistic Significance

The use of pitch in language to distinguish lexical or grammatical meaning is known as tone. In essence, tone differentiates or inflects words. All oral languages employ pitch to convey emotional and other para-linguistic information, as well as to express emphasis and contrast, a feature known as intonation. However, not all languages utilize tones to distinguish words or their inflections in a manner analogous to consonants and vowels.

Languages that utilize tone are called tonal languages. The distinctive tone patterns in these languages are sometimes referred to as tonemes, drawing a parallel with phonemes. While most languages use pitch as intonation to convey prosody and pragmatics, this does not automatically classify them as tonal languages.

In tonal languages, each syllable possesses an inherent pitch contour, leading to the existence of minimal pairs (or larger minimal sets) between syllables that share the same segmental features (consonants and vowels) but differ in tone.

The accurate perception of suprasegmental information is also crucial in language comprehension, as it can provide listeners with important linguistic cues, such as affective-prosodic cues and prosodic phrasing. This is particularly relevant in tonal languages, such as Chinese and Thai, where pitch differences are used to differentiate words.

Understanding Tonal Languages: A Beginner's Guide

How Tone Works

Tone is most frequently manifested on vowels, but in most tonal languages where voiced syllabic consonants occur they will bear tone as well. This is especially common with syllabic nasals, for example in many Bantu and Kru languages, but also occurs in Serbo-Croatian.

In a number of East Asian languages, tonal differences are closely intertwined with phonation differences. In Vietnamese, for example, the ngã and sắc tones are both high-rising but the former is distinguished by having glottalization in the middle. Similarly, the nặng and huyền tones are both low-falling, but the nặng tone is shorter and pronounced with creaky voice at the end, while the huyền tone is longer and often has breathy voice.

In some languages, such as Burmese, pitch and phonation are so closely intertwined that the two are combined in a single phonological system, where neither can be considered without the other. The distinctions of such systems are termed registers.

Contrast of tones has long been thought of as differences in pitch height. However, several studies pointed out that tone is actually multidimensional. Contour, duration, and phonation may all contribute to the differentiation of tones.

Many languages use tone in a more limited way. In Japanese, fewer than half of the words have a drop in pitch; words contrast according to which syllable this drop follows. Such minimal systems are sometimes called pitch accent since they are reminiscent of stress accent languages, which typically allow one principal stressed syllable per word.

Both lexical or grammatical tone and prosodic intonation are cued by changes in pitch, as well as sometimes by changes in phonation. Lexical tone coexists with intonation, with the lexical changes of pitch like waves superimposed on larger swells. For example, Luksaneeyanawin (1993) describes three intonational patterns in Thai: falling (with semantics of "finality, closedness, and definiteness"), rising ("non-finality, openness and non-definiteness") and "convoluted" (contrariness, conflict and emphasis).

Languages with simple tone systems or pitch accent may have one or two syllables specified for tone, with the rest of the word taking a default tone. Such languages differ in which tone is marked and which is the default. In Navajo, for example, syllables have a low tone by default, whereas marked syllables have high tone.

In many Bantu languages, tones are distinguished by their pitch level relative to each other. In multisyllable words, a single tone may be carried by the entire word rather than a different tone on each syllable.

In the most widely spoken tonal language, Mandarin Chinese, tones are distinguished by their distinctive shape, known as contour, with each tone having a different internal pattern of rising and falling pitch. Many words, especially monosyllabic ones, are differentiated solely by tone. In a multisyllabic word, each syllable often carries its own tone.

Most tonal languages have a combination of register and contour tones.

Tone is typical of languages including Kra-Dai, Vietic, Sino-Tibetan, Afroasiatic, Khoisan, Niger-Congo and Nilo-Saharan languages.

Most varieties of Chinese use contour tones, where the distinguishing feature of the tones are their shifts in pitch (that is, the pitch is a contour), such as rising, falling, dipping, or level. Most Bantu languages (except northwestern Bantu) on the other hand, have simpler tone systems usually with high, low and one or two contour tones (usually in long vowels). In such systems there is a default tone, usually low in a two-tone system or mid in a three-tone system, that is more common and less salient than other tones.

Falling tones tend to fall further than rising tones rise; high-low tones are common, whereas low-high tones are quite rare. A language with contour tones will also generally have as many or more falling tones than rising tones.

Another difference between tonal languages is whether the tones apply independently to each syllable or to the word as a whole. In Cantonese, Thai, and Kru languages, each syllable may have a tone, whereas in Shanghainese, Swedish, Norwegian and many Bantu languages, the contour of each tone operates at the word level. That is, a trisyllabic word in a three-tone syllable-tone language has many more tonal possibilities (3 × 3 × 3 = 27) than a monosyllabic word (3), but there is no such difference in a word-tone language.

Tone sandhi is an intermediate situation, as tones are carried by individual syllables, but affect each other so that they are not independent of each other. For example, a number of Mandarin Chinese suffixes and grammatical particles have what is called (when describing Mandarin Chinese) a "neutral" tone, which has no independent existence.

After high level and high rising tones, the neutral syllable has an independent pitch that looks like a mid-register tone - the default tone in most register-tone languages. However, after a falling tone it takes on a low pitch; the contour tone remains on the first syllable, but the pitch of the second syllable matches where the contour leaves off. And after a low-dipping tone, the contour spreads to the second syllable: the contour remains the same (˨˩˦) whether the word has one syllable or two. In other words, the tone is now the property of the word, not the syllable.

Lexical tones are used to distinguish lexical meanings. Languages may distinguish up to five levels of pitch, though the Chori language of Nigeria is described as distinguishing six surface tone registers.

Since tone contours may involve up to two shifts in pitch, there are theoretically 5 × 5 × 5 = 125 distinct tones for a language with five registers. Several Kam-Sui languages of southern China have nine contrastive tones, including contour tones. For example, the Kam language has 9 tones: 3 more-or-less fixed tones (high, mid and low); 4 unidirectional tones (high and low rising, high and low falling); and 2 bidirectional tones (dipping and peaking). This assumes that checked syllables are not counted as having additional tones, as they traditionally are in China.

Preliminary work on the Wobe language (part of the Wee continuum) of Liberia and Côte d'Ivoire, the Ticuna language of the Amazon and the Chatino languages of southern Mexico suggests that some dialects may distinguish as many as fourteen tones or more. The Guere language, Dan language and Mano language of Liberia and Ivory Coast have around 10 tones, give or take. The Oto-Manguean languages of Mexico have a huge number of tones as well.

Tones are realized as pitch only in a relative sense. "High tone" and "low tone" are only meaningful relative to the speaker's vocal range and in comparing one syllable to the next, rather than as a contrast of absolute pitch such as one finds in music.

Tones may affect each other just as consonants and vowels do. In many register-tone languages, low tones may cause a downstep in following high or mid tones; the effect is such that even while the low tones remain at the lower end of the speaker's vocal range (which is itself descending due to downdrift), the high tones drop incrementally like steps in a stairway or terraced rice fields, until finally the tones merge and the system has to be reset. Sometimes a tone may remain as the sole realization of a grammatical particle after the original consonant and vowel disappear, so it can only be heard by its effect on other tones. It may cause downstep, or it may combine with other tones to form contours. In many contour-tone languages, one tone may affect the shape of an adjacent tone. The affected tone may become something new, a tone that only occurs in such situations, or it may be changed into a different existing tone. This is called tone sandhi.

Tone sandhi is a compulsory change that occurs when certain tones are juxtaposed. Tone change, however, is a morphologically conditioned alternation and is used as an inflectional or a derivational strategy.

In East Asia, tone is typically lexical. That is, tone is used to distinguish words which would otherwise be homonyms. However, in many African languages, especially in the Niger-Congo family, tone can be both lexical and grammatical. In Yoruba, much of the lexical and grammatical information is carried by tone.

Note that tonal languages are not distributed evenly across the same range as non-tonal languages. Instead, the majority of tone languages belong to the Niger-Congo, Sino-Tibetan and Vietic groups, which are then composed by a large majority of tone languages and dominate a single region. Only in limited locations (South Africa, New Guinea, Mexico, Brazil and a few others) do tone languages occur as individual members or small clusters within a non-tone dominated area.

If generally considering only complex-tone vs. no-tone, it might be concluded that tone is almost always an ancient feature within a language family that is highly conserved among members.

A 2015 study by Caleb Everett argued that tonal languages are more common in hot and humid climates, which make them easier to pronounce, even when considering familial relationships.

Tone has long been viewed as a phonological system. It was not until recent years that tone was found to play a role in inflectional morphology. In Iau language (the most tonally complex Lakes Plain language, predominantly monosyllabic), nouns have an inherent tone (e.g. be˧ 'fire' but be˦˧ 'flower'), but verbs don't have any inherent tone.

For verbs, a tone is used to mark aspect.

Certain varieties of Chinese are known to express meaning by means of tone change although further investigations are required. Examples from two Yue dialects spoken in Guangdong Province are shown below.

In Taishan, tone change indicates the grammatical number of personal pronouns. The following table compares the personal pronouns of Sixian dialect (a dialect of Taiwanese Hakka) with Zaiwa and Jingpho (both Tibeto-Burman languages spoken in Yunnan and Burma).

There are several approaches to notating tones in the description of a language. A phonemic notation will typically lack any consideration of the actual phonetic values of the tones. Such notations are especially common when comparing dialects with wildly different phonetic realizations of what are historically the same set of tones.

In Chinese, for example, the "four tones" may be assigned numbers, such as ① to ④ or - after the historical tone split that affected all Chinese languages to at least some extent - ① to ⑧ (with odd numbers for the yin tones and even numbers for the yang). In traditional Chinese notation, the equivalent diacritics ⟨꜀◌ ꜂◌ ◌꜄ ◌꜆⟩ are attached to the Chinese character, marking the same distinctions, plus underlined ⟨꜁◌ ꜃◌ ◌꜅ ◌꜇⟩ for the yang tones where a split has occurred. If further splits occurred in some language or dialect, the results may be numbered '4a' and '4b' or something similar.

Among the Kra-Dai languages, tones are typically assigned the letters A through D, or, after a historical tone split similar to what occurred in Chinese, A1 to D1 and A2 to D2; see Proto-Tai language.

Also phonemic are upstep and downstep, which are indicated by the IPA diacritics ⟨ꜛ⟩ and ⟨ꜜ⟩, respectively, or by the typographic substitutes ⟨ꜞ⟩ and ⟨ꜝ⟩, respectively. Upstep and downstep affect the tones within a language as it is being spoken, typically due to grammatical inflection or when certain tones are brought together.

Phonetic notation records the actual relative pitch of the tones. The easiest notation from a typographical perspective - but one that is internationally ambiguous - is a numbering system, with the pitch levels assigned digits and each tone transcribed as a digit (or as a sequence of digits if a contour tone). Such systems tend to be idiosyncratic (high tone may be assigned the digit 1, 3, or 5, for example) and have therefore not been adopted for the International Phonetic Alphabet.

For instance, high tone is conventionally written with a 1 and low tone with a 4 or 5 when transcribing the Kru languages of Liberia, but with 1 for low and 5 for high for the Omotic languages of Ethiopia. The tone ⟨53⟩ in a Kru language is thus the same pitch contour as one written ⟨35⟩ in an Omotic language.

For simple tone systems, a series of diacritics such as acute accent (´), grave accent (`) and macron (̄) are often used, with the unmarked vowel understood to have the default tone.

The most detailed and precise notation is a graphic representation of the tone contour or a transcription with tone letters, as used in the International Phonetic Alphabet: for instance, ⟨˥ ˦ ˧ ˨ ˩⟩ or ⟨̋ ́ ̄ ̀ ̏⟩. The symbol ⟨◌̆⟩ is used to indicate that the tone is rising, and ⟨◌̂⟩ to indicate that it is falling.

In the convention for Chinese, 1 is low and 5 is high. These tones combine with a syllable such as ma to produce different words.

Using this scheme, the tonal contour of a syllable can be described by a sequence of these numbers.

For example, Mandarin Chinese syllables have four lexical tones: the high level tone (Tone 1, or 55-tone, according to the five-scale tone representation scheme), high rising contour tone (Tone 2; 35-tone), low falling-raising contour tone (Tone 3; 214-tone), and high falling contour tone (Tone 4/ 51-tone).

In addition to the full Tone 3 (214-tone), the half-Tone 3 (21-tone), a reduced form of the 214-tone frequently used in natural speech, is an allophonic variant of the traditional 214-tone.

Figure 1. Representation of lexical tones in Mandarin Chinese (A) and Hailu Hakka (B) utilizing a five-level scale for tone marks (Chao, 1968). The digits on the left indicate the pitch level, where 1 corresponds to the lowest and 5 to the highest pitch. (A) The lexical tones of Mandarin using the five-level scale as described by Duanmu (2007); (B) The lexical tones of Hailu Hakka according to the study by Huang and Yu (2022).

Mandarin tones

Mandarin tones

Regarding the lexical tones of Hailu Hakka, there are seven distinct tones. Among these, four tones are similar to the lexical tones of Mandarin: the high-level tone (55-tone), which corresponds to Mandarin’s Tone 1; the high rising contour tone (35-tone), which resembles Mandarin’s Tone 2; the low falling tone (21-tone), akin to the half-Tone 3 in Mandarin; and the high falling tone (52-tone), which parallels Mandarin’s Tone 4.

According to the review by Huang and Yu (2022), the low falling tone in Hakka has been coded as both a 31-tone and a 21-tone in different studies. This variation arises due to the different methods of normalization and analysis employed across these studies. To ensure consistency and avoid confusion, the low falling tone in Hakka was referred to as the 21-tone in the present study.

Additionally, Hakka has unique tones that differentiate it from Mandarin: the middle-level tone (33-tone); the short high-level tone (55-tone), which is similar to Mandarin’s Tone 1 but shorter in duration; and the short middle falling tone (42-tone), characterized by a mid-level pitch that falls to a lower pitch with a shorter duration.

Building on the findings of Hacquard et al. (2007) regarding the impact of vowel inventory size on perceptual ability, one could posit that a similar correlation exists between the complexity of a tone system in tonal languages and the perceptual abilities of its speakers. That is, the additional tones in Hailu Hakka suggests a more complex tonal system, potentially indicating that speakers of Hailu Hakka have a more nuanced perceptual process sensitive to dynamic changes in tone.

Therefore, one might expect that Mandarin participants may be less sensitive to the tonal changes as compared with Hakka-Mandarin speakers and exhibit reduced or delayed MMN response.

The Role of MMN in Studying Tonal Perception

Numerous studies of speech perception have focused on MMN. The MMN paradigm typically involves a rapidly presented stream of repeated standard sounds occasionally interrupted by rare deviant sounds.

MMN activity can be measured by comparing ERP responses to the deviant sound with those to the standard sound or by comparing ERP responses to the deviant sound in an MMN experiment with those to the same sound in an equal-probability control block.

In addition to the amplitude of ERP activity, ERP studies also have demonstrated that the delay in the latency of MMN activity was associated with the insufficiency of phonological perception during the processing of linguistic stimuli.

For example, native Japanese speakers often struggle to differentiate between the English phonemes /r/ and /l/, both of which are mapped to Japanese /l/. Zhang et al. (2005) used magnetoencephalography (MEG) to record MMN in response to /r/ and /l/ sounds in native Japanese and native American English listeners.

Chandrasekaran et al. (2007) used two experimental blocks with different tonal contrasts. In one block, the participants frequently heard the syllable /yi/ with 55-tone and occasionally heard /yi/ with 214-tone. In the other block, the standard stimulus was /yi/ with 35-tone.

The results showed that native Mandarin speakers’ MMN responses to the 55/214 contrast were larger than their MMN responses to the 55/35 contrast, indicating that MMN amplitudes are correlated with the acoustic similarity between pairs of standard and deviant sounds. Native English speakers’ MMN did not demonstrate the effect of tonal contrast on MMN.

Furthermore, while native Mandarin speakers’ MMN to the 55/214 contrast was larger than native English speakers’ MMN to the 55/214 contrast, there was no significant group difference in the 55/35 contrast.

Table 1. Voice onset time (VOT), the first three formant frequencies and F0 range for each stimuli.

StimuliVOT (ms)F1 (Hz)F2 (Hz)F3 (Hz)F0 Range (Hz)
Hakka /so/ 55-tone1565012002600280-320
Hakka /so/ 21-tone1664011802580220-180
Mandarin /zu/ 55-tone1466012202620270-310
Mandarin /zu/ 21-tone1565012002600210-170
Speech waveforms and F0 contours of stimuli

Speech waveforms and F0 contours of stimuli