Hubbry Logo
Standard Chinese phonologyStandard Chinese phonologyMain
Open search
Standard Chinese phonology
Community hub
Standard Chinese phonology
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Standard Chinese phonology
Standard Chinese phonology
from Wikipedia

The phonology of Standard Chinese has historically derived from the Beijing dialect of Mandarin. However, pronunciation varies widely among speakers, who may introduce elements of their local varieties. Television and radio announcers are chosen for their ability to affect a standard accent. The sound system has not only segments—i.e. vowels and consonants—but also tones, and each syllable has one. In addition to the four main tones, there is a neutral tone that appears on weak syllables.

This article uses the International Phonetic Alphabet (IPA) to compare the phonetic values corresponding to syllables romanized with pinyin.

Consonants

[edit]

The sounds shown in parentheses are sometimes not analyzed as separate phonemes; for more on these, see § Alveolo-palatal series below. Excluding these, and excluding the glides [j], [ɥ], and [w], there are 19 consonant phonemes in the inventory.

Labial Denti-
alveolar
Retroflex Alveolo-
palatal
Velar
Nasal m n ŋ
Plosive aspirated
unaspirated p t k
Affricate aspirated t͡sʰ ʈ͡ʂʰ (t͡ɕʰ)
unaspirated t͡s ʈ͡ʂ (t͡ɕ)
Fricative f s ʂ (ɕ) x~h
Liquid l ʐ~ɻ

Between pairs of plosives or affricates having the same place of articulation and manner of articulation, the primary distinction is not voiced vs. voiceless (as in French or Russian), but unaspirated vs. aspirated (as in Scottish Gaelic or Icelandic). The unaspirated plosives and affricates may however become voiced in weak syllables (see § Syllable reduction below). In pinyin, an unaspirated/aspirated pair such as /p/ and /pʰ/ is represented with b and p respectively.

More details about the individual consonant sounds are given in the following table.

Phoneme or sound Approximate description Audio example Pinyin Zhuyin Wade–Giles* Notes
/p/ Like English p but unaspirated – as in spy 邦/bāng b p
// Like an aspirated English p, as in pie 旁/páng p p῾
/m/ Like English m 明/míng m m
/f/ Like English f 非/fēi f f
/t/ Like English t but unaspirated – as in sty 端/duān d t See § Denti-alveolar and retroflex series.
// Like an aspirated English t, as in tie 透/tòu t t῾ See § Denti-alveolar and retroflex series.
/n/ Like English n 泥/ní n n See § Denti-alveolar and retroflex series. Can occur in the onset and/or coda of a syllable.
/l/ Like English clear l, as in RP lay (never dark, i.e. velarized) 来/來/lái l l
/k/ Like English k, but unaspirated, as in scar 干/gān g k
// Like an aspirated English k, as in car 口/kǒu k k῾
/ŋ/ Like ng in English sing 江/jiāng ng ng Occurs only in the syllable coda.
/x/
([h ~ x])[1]: 27 
Varies between h in English hat and ch in Scottish loch. 火/huǒ h h
[t͡ɕ] Like an unaspirated English ch, but with an alveolo-palatal pronunciation 叫/jiào j ch See § Alveolo-palatal series.
[t͡ɕʰ] As t͡ɕ/pinyin "j", with aspiration 去/qù q ch῾ See § Alveolo-palatal series.
[ɕ] Similar to English sh, but with an alveolo-palatal pronunciation 小/xiǎo x hs See § Alveolo-palatal series.
/ʈ͡ʂ/ Similar to ch in English chat, but with a retroflex articulation and no aspiration 之/zhī zh ch See § Denti-alveolar and retroflex series.
/ʈ͡ʂʰ/ As ʈ͡ʂ/pinyin "zh", but with aspiration 吃/chī ch ch῾ See § Denti-alveolar and retroflex series.
/ʂ/ Similar to English sh, but with a retroflex articulation 矢/shǐ sh sh See § Denti-alveolar and retroflex series.
/ʐ/
([ʐ ~ ɻ])[a]
Similar to z in zoo in English, but with a retroflex articulation. L2 learners may pronounce it as an English R, but lips are unrounded. 日/rì r j For pronunciation in syllable-final position, see § Rhotic coda.
/t͡s/ Like English ts in cats, without aspiration 子/zǐ z ts See § Denti-alveolar and retroflex series.
/t͡sʰ/ As t͡s/pinyin "z", but with aspiration 此/cǐ c ts῾ See § Denti-alveolar and retroflex series.
/s/ Like English s, but usually with the tongue on the lower teeth.[citation needed] 私/sī s s See § Denti-alveolar and retroflex series.
*In Wade–Giles, the distinction between retroflex and alveolo-palatal affricates, which are both written as ch and ch῾, is indicated by the subsequent vowel coda, since the two consonant series occur in complementary distribution; for example, chi and chü correspond to pinyin ji and ju, respectively, whereas chih and chu correspond to pinyin zhi and zhu (see § Alveolo-palatal series).

All of the consonants may occur as the initial sound of a syllable, with the exception of /ŋ/ (unless the zero initial is assigned to this phoneme; see below). Excepting the rhotic coda, the only consonants that can appear in syllable coda (final) position are /n/ and /ŋ/ (although [m] may occur as an allophone of /n/ before labial consonants in fast speech). Final /n/, /ŋ/ may be pronounced without complete oral closure, resulting in a syllable that in fact ends with a long nasalized vowel.[1]: 72  See also § Syllable reduction, below.

Denti-alveolar and retroflex series

[edit]

The consonants listed in the first table above as denti-alveolar are sometimes described as alveolars, and sometimes as dentals. The affricates and the fricative are particularly often described as dentals; these are generally pronounced with the tongue on the lower teeth.[1]: 27 

The retroflex consonants (like those of Polish) are actually apical rather than subapical, and so are considered by some authors not to be truly retroflex; they may be more accurately called post-alveolar.[2][3][4] Some speakers not from Beijing may lack the retroflexes in their native dialects, and may thus replace them with dentals.[1]: 26 

Alveolo-palatal series

[edit]

The alveolo-palatal consonants (pinyin j, q, x) have standard pronunciations of [t͡ɕ, t͡ɕʰ, ɕ]. Some speakers realize them as palatalized dentals [t͡sʲ], [t͡sʰʲ], [sʲ]; this is claimed to be especially common among children and women,[1]: 33  although officially it is regarded as substandard and as a feature specific to the Beijing dialect.[5]

In phonological analysis, it is often assumed that, when not followed by one of the high front vowels [i] or [y], the alveolar-palatals consist of a consonant followed by a palatal glide ([j] or [ɥ]). That is, syllables represented in pinyin as beginning ⟨ji-⟩, ⟨qi-⟩, ⟨xi-⟩, ⟨ju-⟩, ⟨qu-⟩, ⟨xu-⟩ (followed by a vowel) are taken to begin [t͡ɕj], [t͡ɕʰj], [ɕj], [t͡ɕɥ], [t͡ɕʰɥ], [ɕɥ]. The actual pronunciations are more like [t͡ɕ], [t͡ɕʰ], [ɕ], [t͡ɕʷ], [t͡ɕʰʷ], [ɕʷ] (or for speakers using the dental variants, [t͡sʲ], [t͡sʰʲ], [sʲ], [t͡sᶣ], [t͡sʰᶣ], [sᶣ]). This is consistent with the general observation (see under § Glides) that medial glides are realized as palatalization and/or labialization of the preceding consonant (palatalization already being inherent in the case of the palatals).

On the above analysis, the alveolar-palatals are in complementary distribution with the dentals [t͡s, t͡sʰ, s], with the velars [k, kʰ, x], and with the retroflexes [ʈ͡ʂ, ʈ͡ʂʰ, ʂ], as none of these can occur before high front vowels or palatal glides, whereas the alveolo-palatals occur only before high front vowels or palatal glides. Therefore, linguists often prefer to classify [t͡ɕ, t͡ɕʰ, ɕ] not as independent phonemes, but as allophones of one of the other three series.[6] The existence of the above-mentioned dental variants inclines some to prefer to identify the alveolo-palatals with the dentals, but identification with any of the three series is possible (unless the empty rime /ɨ/ is identified with /i/, in which case the velars become the only candidate). The Yale and Wade–Giles systems mostly treat the alveolo-palatals as allophones of the retroflexes; Tongyong Pinyin mostly treats them as allophones of the dentals; and Mainland Chinese Braille treats them as allophones of the velars. In standard pinyin and bopomofo, however, they are represented as a separate sequence.

The alveolo-palatals arose historically from a merger of the dentals [t͡s, t͡sʰ, s] and velars [k, kʰ, x] before high front vowels and glides. Previously, some instances of modern [t͡ɕ(ʰ)i] were instead [k(ʰ)i], and others were [t͡s(ʰ)i]; distinguishing these two sources of [t͡ɕ(ʰ)i] is known as the "round-sharp" distinction [zh]. The change took place in the last two or three centuries at different times in different areas. This explains why some European transcriptions of Chinese names (especially in postal romanization) contain ⟨ki-⟩, ⟨hi-⟩, ⟨tsi-⟩, ⟨si-⟩ where an alveolo-palatal might be expected in modern Chinese. Examples are Peking for Beijing ([kiŋ] [tɕiŋ]), Chungking for Chongqing ([kʰiŋ] [tɕʰiŋ]), Fukien for Fujian (cf. Hokkien), Tientsin for Tianjin ([tsin] [tɕin]); Sinkiang for Xinjiang ([sinkiaŋ] [ɕintɕiaŋ], and Sian for Xi'an ([si] [ɕi]). The complementary distribution with the retroflex series arose when syllables that had a retroflex consonant followed by a medial glide lost the medial glide.

Zero onset

[edit]

A full syllable such as ai, in which the vowel is not preceded by any of the standard initial consonants or glides, is said to have a null initial or zero onset. This may be realized as a consonant sound: [ʔ] and [ɣ] are possibilities, as are [ŋ] and [ɦ] in some non-standard varieties. It has been suggested by San Duanmu that such an onset be regarded as a special phoneme, or as an instance of the phoneme /ŋ/, although it can also be treated as no phoneme (absence of onset). By contrast, in the case of the particle a, which is a weak onset-less syllable, linking occurs with the previous syllable (as described under § Syllable reduction, below).[1]: 43 

When a stressed vowel-initial Chinese syllable follows a consonant-final syllable, the consonant does not directly link with the vowel. Instead, the zero onset seems to intervene in between. 棉袄; mián'ǎo ("cotton jacket") becomes [mjɛnʔau], [mjɛnɣau]. However, in connected speech none of these output forms is natural. Instead, when the words are spoken together the most natural pronunciation is rather similar to [mjɛ̃ːau], in which there is no nasal closure or any version of the zero onset, and instead nasalization of the vowel occurs.[7]

Glides

[edit]

The glides [j], [ɥ], and [w] sound respectively like the y in English yes, the (h)u in French huit, and the w in English we. (Beijing speakers often replace initial [w] with a labiodental [ʋ], except when it is followed by [o] or [u].[1]: 25 ) The glides are commonly analyzed not as independent phonemes, but as consonantal allophones of the high vowels: [i̯, y̯, u̯]. This is possible because there is no ambiguity in interpreting a sequence like yao/-iao as /iau/, and potentially problematic sequences such as */iu/ do not occur.

The glides may occur in initial position in a syllable. This occurs with [ɥ] in the syllables written yu, yuan, yue, and yun in pinyin; with [j] in other syllables written with initial y in pinyin (ya, yi, etc.); and with [w] in syllables written with initial w in pinyin (wa, wu, etc.). When a glide is followed by the vowel of which that glide is considered an allophone, the glide may be regarded as epenthetic (automatically inserted), and not as a separate realization of the phoneme. Hence the syllable yi, pronounced [ji], may be analyzed as consisting of the single phoneme /i/, and similarly yin may be analyzed as /in/, yu as /y/, and wu as /u/.[1]: 274ff  It is also possible to hear both from the same speaker, even in the same conversation.[1]: 274ff  For example, one may hear the number "one" ; as either [jí] or [í].

The glides can also occur in medial position, that is, after the initial consonant but before the main vowel. Here they are represented in pinyin as vowels: for example, the i in bie represents [j], and the u in duan represents [w]. There are some restrictions on the possible consonant-glide combinations: [w] does not occur after labials (except for some speakers in bo, po, mo, fo); [j] does not occur after retroflexes and velars (or after [f]); and [ɥ] occurs medially only in lüe and nüe and after alveolar-palatals (for which see above). A consonant-glide combination at the start of a syllable is articulated as a single sound – the glide is not in fact pronounced after the consonant, but is realized as palatalization [ʲ], labialization [ʷ], or both [ᶣ], of the consonant.[1]: 28  (The same modifications of initial consonants occur in syllables where they are followed by a high vowel, although normally no glide is considered to be present there. Hence a consonant is generally palatalized [ʲ] when followed by /i/, labialized [ʷ] when followed by /u/, and both [ᶣ] when followed by /y/.)

The glides [j] and [w] are also found as the final element in some syllables. These are commonly analyzed as diphthongs rather than vowel-glide sequences. For example, the syllable bai is assigned the underlying representation /pai̯/. (In pinyin, the second element is generally written ⟨-i⟩ or ⟨-u⟩, but /au̯/ is written as ⟨-ao⟩.)

Syllabic consonants

[edit]

The syllables written in pinyin as zi, ci, si, zhi, chi, shi, ri may be described as a sibilant consonant (z, c, s, zh, ch, sh, r in pinyin) followed by a syllabic consonant (also known as apical vowel in classic literature):

Alternatively, the nucleus may[citation needed] be described not as a syllabic consonant, but as a vowel:

Phonologically, these syllables may be analyzed as having their own vowel phoneme, /ɨ/. However, it is possible to merge this with the phoneme /i/ (to which it is historically related), since the two are in complementary distribution – provided that the § Alveolo-palatal series is either left un-merged, or is merged with the velars rather than the retroflex or alveolar series. (That is, [t͡ɕi], [t͡sɨ], and [ʈ͡ʂɨ] all exist, but *[ki] and *[kɨ] do not exist, so there is no problem merging both [i]~[ɨ] and [k]~[t͡ɕ] at the same time.)

Another approach is to regard the syllables assigned above to /ɨ/ as having an (underlying) empty nuclear slot ("empty rhyme", Chinese 空韵; kōngyùn), i.e. as not containing a vowel phoneme at all. This is more consistent with the syllabic consonant description of these syllables, and is consistent with the view that phonological representations are minimal (underspecified).[8] When this is the case, sometimes the phoneme is described as shifting from voiceless to voiced, e.g. becoming /sź̩/.

Syllabic consonants may also arise as a result of weak syllable reduction; see below. Syllabic nasal consonants are also heard in certain interjections; pronunciations of such words include [m], [n], [ŋ], [hm], [hŋ].

Vowels

[edit]
Monophthongs of Mandarin Chinese as they are pronounced in Beijing (from Lee & Zee (2003:110)).
Part 1 of Mandarin Chinese diphthongs as they are pronounced in Beijing (from Lee & Zee (2003:110)).
Part 2 of Mandarin Chinese diphthongs as they are pronounced in Beijing (from Lee & Zee (2003:110)).

Standard Chinese can be analyzed as having between two and six vowel phonemes.[9] /i, u, y/ (which may also be analyzed as underlying glides) are high (close) vowels, /ə/ is mid whereas /a/ is low (open).

The precise realization of each vowel depends on its phonetic environment. In particular, the vowel /ə/ has two broad allophones [e] and [o] (corresponding respectively to pinyin e and o in most cases). These sounds can be treated as a single underlying phoneme because they are in complementary distribution. The mid vowel phoneme may also be treated as an under-specified vowel, attracting features either from the adjacent sounds or from default rules resulting in /ə/. (Apparent counterexamples are provided by certain interjections, such as [ɔ], [ɛ], [jɔ], and [lɔ], but these are normally treated as special cases operating outside the normal phonemic system.[b])

Transcriptions of the vowels' allophones (the ways they are pronounced in particular phonetic environments) differ somewhat between sources. More details about the individual vowel allophones are given in the following table (not including the values that occur with the rhotic coda).[c]

Phoneme Allophone Description Example Pinyin Wade–Giles Gwoyeu Romatzyh
Depends on analysis
(see below)
[i] Like English ee as in bee 比/bǐ i i i
[u] Like English oo as in boo 不/bù u u u
[ʊ] Like English oo in took (varies between [o][10] and [u] depending on the speaker.) 空/kōng o u o
[y] Like French u or German ü 女/nǚ ü, u ü iu
/ə/ [e] Somewhat like English ey as in prey 别/bié e, ê e, eh e
[o] Somewhat like southern British English awe or Scottish English oh 火/huǒ o o o
[ɤ] Pronounced as a sequence [ɰɤ̞]. 和/hé e ê, o e
[ə] Schwa, like English a as in about. 很/hěn e ê, u e
/a/ [a] Like English a as in palm 巴/bā a a a
[ɛ] Like English e as in then (varies between [e] and [a] depending on the speaker) 边/biān a e, a a

Zhuyin represents vowels differently from normal romanisation schemes, and as such is not displayed in the above table.

The vowel nuclei may be preceded by a glide /j, w, ɥ/, and may be followed by a coda /i, u, n, ŋ/. The various combinations of glide, vowel, and coda have different surface manifestations, as shown in the tables below. Any of the three positions may be empty, i.e. occupied by a null meta-phoneme .

Five vowel analysis (pinyin-based)

[edit]

The following table provides a typical five vowel analysis according to Duanmu (2000, p. 37) and Lin (2007). In this analysis, the high vowels /i, u, y/ are fully phonemic and may form sequences with the nasal codas /n, ŋ/.

Nucleus /i/ /u/ /y/ /ə/ /a/
Coda /n/ /ŋ/ /ŋ/ /n/ /i/ /u/ /n/ /ŋ/ /i/ /u/ /n/ /ŋ/
Medial [ɹ̩~ɻ̩]

-i
[i] ([ji])
yi
-i
[in]
yin
-in
[iŋ]
ying
-ing
[u] ([wu])
wu
-u
[ʊŋ]
3
-ong
[y]
yu
1
[yn]
yun
-ün1
[ɤ] ([e], [o])
e (ê, o)
-e (, -o)
[ei̯]
ei
-ei
[ou̯]
ou
-ou
[ən]
en
-en
[əŋ]
eng
-eng
[a]
a
-a
[ai̯]
ai
-ai
[au̯]
ao
-ao
[an]
an
-an
[aŋ]
ang
-ang
/j/ [jʊŋ]
yong
-iong
[je] ([jo])
ye (yo)
-ie (-io)
[jou̯] ([iu])
you
-iu
[ja]
ya
-ia
[jau̯]
yao
-iao
[jɛn]
yan
-ian
[jaŋ]
yang
-iang
/w/ [wo]
wo
-uo2
[wei̯] ([ui])
wei
-ui
[wən] ([un])
wen
-un
[wəŋ]
weng
 3
[wa]
wa
-ua
[wai̯]
wai
-uai
[wan]
wan
-uan
[waŋ]
wang
-uang
/ɥ/ [ɥe]
yue
-üe1
[ɥɛn]
yuan
-üan1
1 ü is written as u after j, q, or x (the /u/ phoneme never occurs in these positions);
2 uo is written as o after b, p, m, or f;

3 [wəŋ] and [ʊŋ] are in complementary distribution.

Two vowel analysis (bopomofo-based)

[edit]

Some linguists prefer to reduce the number of vowel phonemes drastically (at the expense of including underlying glides in their systems). Edwin G. Pulleyblank has proposed a system which includes underlying glides, but no vowels at all.[1]: 37  More common are systems with two vowels; for example, in Mantaro Hashimoto's system,[11] there are just two vowel nuclei, /ə, a/. In this analysis, the high vowels [i, u, y] are analyzed as glides /j, w, ɥ/ which surface as vowels before or /ən, əŋ/.

Nucleus /ə/ /a/
Coda /i/ /u/ /n/ /ŋ/ /i/ /u/ /n/ /ŋ/
Medial [ɹ̩~ɻ̩]
[o, ɤ, e]
[ei̯]
[ou̯]
[ən]
[əŋ]
[a]
[ai̯]
[au̯]
[an]
[aŋ]
/j/ [i]
[je, jo]
ㄧㄝㄧㄛ
[jou̯]
ㄧㄡ
[in]
ㄧㄣ
[iŋ]
ㄧㄥ
[ja]
ㄧㄚ
*[jai̯]

*ㄧㄞ

[jau̯]
ㄧㄠ
[jɛn]
ㄧㄢ
[jaŋ]
ㄧㄤ
/w/ [u]
[wo]
ㄨㄛ
[wei̯]
ㄨㄟ
[wən]
ㄨㄣ
[wəŋ], [ʊŋ]
ㄨㄥ
[wa]
ㄨㄚ
[wai̯]
ㄨㄞ
[wan]
ㄨㄢ
[waŋ]
ㄨㄤ
/ɥ/ [y]
[ɥe]
ㄩㄝ
[yn]
ㄩㄣ
[jʊŋ]
ㄩㄥ
[ɥɛn]
ㄩㄢ

Other notes

[edit]

As a general rule, vowels in open syllables (those which have no coda following the main vowel) are pronounced long, while others are pronounced short. This does not apply to weak syllables, in which all vowels are short.[1]: 42 

In Standard Chinese, the vowels [a] and [ə] harmonize in backness with the coda.[12][1]: 72–73  For [a], it is fronted [a̟] before /i, n/ and backed [a̠] before /u, ŋ/. For [ə], it is fronted [ə̟] before /n/ and backed [ə̠] before /ŋ/.

Some native Mandarin speakers may pronounce [wei̯], [jou̯], and [wən] as [ui], [iu], and [un] respectively in the first or second tone.[13]: 69 

Rhotic coda

[edit]

Standard Chinese features syllables that end with a rhotic coda /ɚ/. This feature, known in Chinese as erhua, is particularly characteristic of the Beijing dialect; many other dialects do not use it as much, and some not at all.[1]: 195  It occurs in two cases:

  1. In a small number of independent words or morphemes pronounced [ɚ] or [aɚ̯], written in pinyin as er, with some tone, such as ; èr; 'two', ; ěr; 'ear', and ; ; ér; 'son'.
  2. In syllables in which the rhotic coda is added as a suffix to another morpheme. This suffix is represented by the character ; ; 'son', to which meaning it is historically related, and in pinyin as r. The suffix combines with the final sound of the syllable, and regular but complex sound changes occur as a result (described in detail under erhua).

The r final is pronounced with a relatively lax tongue, and has been described as a "retroflex vowel".[1]: 41 

In dialects that do not make use of the rhotic coda, it may be omitted in pronunciation, or in some cases a different word may be selected: for example, Beijing 这儿; 這兒; zhèr; 'here' and 那儿; 那兒; nàr; 'there' may be replaced by the synonyms 这里; 這裡; zhèlǐ and 那里; 那裡; nàlǐ.

Syllables

[edit]

Syllables in Standard Chinese have the maximal form (CG)V(X)T,[13]: 48  traditionally analysed as an "initial" consonant C, a "final", and a tone T.[14] The final consists of a "medial" G (which may be one of the glides [j, w, ɥ]), a vowel V, and a coda X, which may be one of [n, ŋ, ɚ̯, i̯, u̯]. The vowel and coda may also be grouped as the "rhyme",[13]: 16  sometimes spelled "rime". Any of C, G, and X (and V, in some analyses) may be absent. However, in some analyses, C cannot be absent, due to the zero initial being considered a consonant.

Only a few of the permutations possible under the above scheme actually occur. There are only some 35 final combinations (medial+rime) in actual syllables (see pinyin finals). There are only about 400 different syllables when tone is ignored, and only about 1300 when it is taken into account. This is a far smaller number of distinct syllables than one finds in English. Since a Chinese syllable usually constitutes a whole word, or morpheme at least, there are a lot of homophones. On the other hand, in Standard Chinese, the average word has almost exactly two syllables, eliminating most homophony issues, even when tone is disregarded, especially when context is taken into account.[15][16] (Still, due to the limited phonetic inventory, homophonic puns in Mandarin Chinese are very common and important in Chinese culture.[17][18])

For a list of all Standard Chinese syllables (excluding tone and rhotic coda) see the pinyin table or zhuyin table.

Full and weak syllables

[edit]

Syllables can be classified as full (or strong), and weak. Weak syllables are usually grammatical markers such as le, or the second syllables of some compound words (although many other compounds consist of two or more full syllables).

A full syllable carries one of the four main tones, and some degree of stress. Weak syllables are unstressed, and have neutral tone. The contrast between full and weak syllables is distinctive; there are many minimal pairs such as 要事 yàoshì "important matter" and 钥匙 yàoshi "key", or 大意 dàyì "main idea" and (with the same characters) dàyi "careless", the second word in each case having a weak second syllable. Some linguists consider this contrast to be primarily one of stress, while others regard it as one of tone. For further discussion, see Neutral tone and Stress below.

There is also a difference in syllable length. Full syllables can be analyzed as having two morae ("heavy"), the vowel being lengthened if there is no coda. Weak syllables, however, have a single mora ("light"), and are pronounced approximately 50% shorter than full syllables.[1]: 88  Any weak syllable will usually be an instance of the same morpheme (and written with the same character) as some corresponding strong syllable; the weak form will often have a modified pronunciation, however, as detailed in the following section.

Syllable reduction

[edit]

Apart from differences in tone, length, and stress, weak syllables are subject to certain other pronunciation changes (reduction).[19]

  • If a weak syllable begins with an unaspirated obstruent (/p, t, k, t͡s, t͡ʂ, t͡ɕ/), that consonant may become voiced ([b, d, ɡ, d͡z, d͡ʐ, d͡ʑ] respectively). For example, in 嘴巴 zuǐba ("mouth"), the second syllable is likely to begin with a [b] sound, rather than an unaspirated [p], and in 飞机 feījī ("airplane"), the second syllable is likely to begin with [d͡ʑ] instead of an unaspirated [t͡ɕ].
  • The vowel of a weak syllable is often reduced, becoming more central. For example, in the word zuǐba just mentioned, the final vowel may become a schwa [ə].
  • The coda (final consonant or offglide) of a weak syllable is often dropped (this is linked to the shorter, single-mora nature of weak syllables, as referred to above). If the dropped coda was a nasal consonant, the vowel may be nasalized.[1]: 88  For example, 脑袋 nǎodai ("head") may end with a monophthong [ɛ] rather than a diphthong, and 春天 chūntian ("spring") may end with a centralized and nasalized vowel [ə̃].
  • In some cases, the vowel may be dropped altogether. This may occur, particularly with high vowels i, u, ü, when the unstressed syllable begins with a fricative f, h, sh, r, x, s or an aspirated p, t, k, q, ch, c consonant; for example, 豆腐 dòufu ("tofu") may be said as dòu-f, and 问题 wènti ("question") as wèn-t (the remaining initial consonant is pronounced as a syllabic consonant). The same may even occur in full syllables that have low ("half-third") tone.[1]: 258  The vowel (and coda) may also be dropped after a nasal, in such words as 我们 wǒmen ("we") and 什么 shénme ("what"), which may be said as wǒm and shém – these are examples of the merger of two syllables into one, which occurs in a variety of situations in connected speech.

The example of shénme → shém also involves assimilation, which is heard even in unreduced syllables in quick speech (for example, in guǎmbō for 广播 guǎngbō "broadcast"). A particular case of assimilation is that of the sentence-final exclamatory particle a, a weak syllable, which has different characters for its assimilated forms:

Preceding sound Form of particle (pinyin) Character
[ŋ], [ɹ̩], [ɻ̩] a
[i], [y], [e], [o], [a] ya (from ŋja)
[u] wa
[n] na
le (grammatical
marker)
combines to form la

Tones

[edit]
Standard Chinese tone contours
In Beijing
In Taipei

Standard Chinese, like all varieties of Chinese, is tonal. This means that in addition to consonants and vowels, the pitch contour of a syllable is used to distinguish words from each other. Many non-native Chinese speakers have difficulties mastering the tones of each character, but correct tonal pronunciation is essential for intelligibility because of the vast number of words in the language that only differ by tone (i.e. are minimal pairs with respect to tone). Statistically, tones are as important as vowels in Standard Chinese.[d][20]

The following table shows the four main tones of Standard Chinese, together with the neutral (or fifth) tone. To describe the pitch of the tones, its representation on a five-level scale is used, visualized with Chao tone letters. The values of the pitch for each tone described by Chao are traditionally considered standard, however slight regional and idiolectal variations in tone pronunciation also occur.

Tone number 1 2 3 4 5
Description high rising low (dipping) falling neutral
Pinyin diacritic ā á ǎ à a
Pitch contour per Chao (1968)[21] ˥ 55 ˧˥ 35 ˨˩˦ 21(4) ˥˩ 51, ˥˧ 53 (various,
see below)
Common realization
(Beijing)
˥ 55, ˦ 44 ˨˥ 25 ˨˩˨ 21(2) ˥˨ 52, ˥˧ 53
Common realization
(Taipei)
˦ 44 ˧˨˧ 323 ˧˩˨ 31(2) ˥˨ 52, ˥˧ 53
Other substandard variants[22] ˥˦ 54, ˦˥ 45 ˧˨˥ 325, ˨˦ 24 ˨˩˧ 21(3), ˨ 22 ˦˨ 42
IPA diacritic /á/ /ǎ/ [a᷄] /à/ [à̰, a̰᷆, a̰᷉] /â/
Tone name 阴平; 陰平; yīnpíng 阳平; 陽平; yángpíng ; shǎng ; ; ; qīng
Examples 巴/bā 拔/bá 把/bǎ 爸/bà 吧/ba
The syllable ma with each of the primary tones in Standard Chinese

The Chinese names of the main four tones are respectively 阴平; 陰平; yīnpíng; 'dark level', 阳平; 陽平; yángpíng; 'light level', ; shǎng[23][24] or shàng[25] ('rising'), and ; ; 'departing'. As descriptions, they apply rather to the predecessor Middle Chinese tones than to the modern tones.

Most romanization systems, including pinyin, represent the tones as diacritics on the vowels, as does bopomofo. Some, like Wade–Giles, use superscript numbers at the end of each syllable. The tone marks and numbers are rarely used outside of language textbooks: in particular, they are usually absent in public signs, company logos, and so forth. Gwoyeu Romatzyh is a rare example of a system where tones are represented using normal letters of the alphabet (although without a one-to-one correspondence).

First tone

[edit]

First tone is a high-level tone. It is a steady high sound, produced as if it were being sung instead of spoken. Its pitch is usually ˥ 55 or ˦ 44, at the same level where the fourth tone starts, or a little lower. Occasionally, slightly rising or falling high pitch (˥˦ 54 or ˦˥ 45) is also possible.[22]

In a few syllables, the quality of the vowel is changed when it carries first tone; see the vowel table above.

Second tone

[edit]

Second tone is a rising tone. It is usually described as a high-rising (˧˥ 35), with the sound that rises from middle to high pitch (like in the English "What?!"). It starts at around 3 or 2 pitch level, and then rises towards the level of the first tone pitch (5 or 4).

It may also start with a falling or flat segment, which is quite short in male speakers (a quarter of the total second tone length), but longer in female speakers, reaching nearly half of the total length of the second tone. This initial dip is more apparent in Southern China Mandarin accent, including Standard Taiwanese Mandarin, where the second tone is also lower and alternatively described as dipping or low-rising with overall contour of ˧˨˧ 323 (its start is still slightly lower than its final pitch).[26][27][28][29]

This tone is usually one of the most difficult to master for Mandarin learners, as well as the speakers of non-Mandarin Chinese varieties, who often pronounce their second tone close to (full) third tone, especially in the word-final position before a pause.[29][30][22][31]

Third tone

[edit]

Third tone is a low tone. It is also often termed a "dipping tone".

This tone is often demonstrated as having a rise in pitch after the low fall; however, third tone syllables that include the rise are significantly longer than other syllables. When a third-tone syllable is not said in isolation, this rise is normally heard only if it appears at the end of a sentence or before a pause, and then usually only on stressed monosyllables.[1]: 222  The third tone without the rise is sometimes called half third tone.

The overall pitch contour of the third tone is traditionally described as ˨˩˦ 214, but for modern Standard Chinese speakers, the rise, if present, is not that high. The third tone starts lower or around the starting point for the second tone. In Beijing, its value inclines to ˨˩˧ 213 or ˨˩˨ 212, while in Taiwan it is usually ˧˩˨ 312 (Taiwanese Standard Chinese speakers also tend to never pronounce the rising part in any context).[32] Unlike the other tones, third tone is usually pronounced with creaky voice.[33]

Two consecutive third tones are avoided by changing the first to second tone; see § Third tone sandhi below.

Fourth tone

[edit]

Fourth tone is a falling tone. It features a sharp fall from high to lower pitch (as is heard in curt commands in English, such as "Stop!").

It starts at the same pitch level or higher than the first tone, and then drops to the pitch 1 or 2. In connected speech, when followed by syllables with other full tones, it tends to fall only from high to mid-level. Similarly to the third tone, the final part is only pronounced before a pause or an unstressed syllable. Two consecutive fourth tones are pronounced in a zigzag pattern, with the first one higher, and the second one lower (˥˧ 53 - ˦˩ 41).[34][35][22]

Neutral tone

[edit]
Pitch contours of the neutral tone.

Also called fifth tone or zeroth tone (轻声; 輕聲; qīngshēng; 'light tone'), the neutral tone is sometimes thought of as a lack of tone. It is associated with weak syllables, which are generally somewhat shorter than tonic syllables.

In Standard Chinese, about 15–20% of the syllables in written texts are considered unstressed, including certain suffixes, clitics, and particles. Second syllables of some disyllabic words are also unstressed in Northern Mandarin accents, but many Mandarin speakers in Southern China tend to preserve their inherent tone.

The pitch of a syllable with neutral tone is determined by the tone of the preceding syllable. Chao (1968) considered the neutral tone syllables to not have pitch contour. He introduced special dotted tone letters to denote its pitch. Later studies, however, found that the neutral tone syllables do have pitch contour. The following table shows the pitch at which the neutral tone is pronounced in Standard Chinese after each of the four main tones. For contoured pitch analysis, the first column shows the pitch contour directly after the full tone syllable, and the second column shows the pitch contour after another neutral tone syllable.[36][22][37][38][39]

Realization of neutral tones
Tone of preceding syllable Pitch of neutral tone Example
Contourless Contoured Characters Pinyin Meaning Transcription
first syllable second syllable
First ˥ ˨ ( ) 2 ˦˩ 41 ˨˩ 21 玻璃() li (de) '[of the] glass' [pwo˥ li˦˩ də˨˩]
Second ˧˥ ˧ ( ) 3 ˥˨ 52 ˧˨ 32 伯伯() bo (de) '[of an] uncle' [pwo˨˥ bwo˥˨ də˧˨]
Third ˨˩ ˦ ( ) 4 ˧ 33 ~ ˨˧ 23 ˧˨ 32 喇叭() ba (de) '[of a] horn' [lä˨˩ bä˨˧ də˧˨]
Fourth ˥˩ ˩ ( ) 1 ˨˩ 21 ˩ 11 兔子() zi (de) '[of a] rabbit' [tʰu˥˨ d͡zɨ˨˩ də˩]

Although the contrast between weak and full syllables is often distinctive, the neutral tone is often not described as a full-fledged tone; some linguists feel that it results from a "spreading out" of the tone on the preceding syllable. This idea is appealing because without it, the neutral tone needs relatively complex tone sandhi rules to be made sense of; indeed, it would have to have four allotones, one for each of the four tones that could precede it. However, the "spreading" theory incompletely characterizes the neutral tone, especially in sequences where more than one neutral-tone syllable is found adjacent.[40] In Modern Standard Mandarin as applied in A Dictionary of Current Chinese, the second syllable of words with a 'toneless final syllable variant' (·次輕詞語) can be read with either a neutral tone or with the normal tone.[41][42][43]

Relationship between Middle Chinese and modern tones

[edit]

The four tones of Middle Chinese are not in one-to-one correspondence with the modern tones. The following table shows the development of the traditional tones as reflected in modern Standard Chinese. The development of each tone depends on the initial consonant of the syllable: whether it was a voiceless consonant (denoted in the table by v−), a voiced obstruent (v+), or a sonorant (s). (The voiced–voiceless distinction has been lost in modern Standard Chinese.)

Middle Chinese Tone píng () shǎng () () ()
Initial v− s v+ v− s v+ v− s v+ v− s v+
Standard Chinese Tone name yīnpíng (1st) yángpíng (2nd) shǎng (3rd) (4th) redistributed with no pattern (4th) yángpíng (2nd)
Tone contour 55 35 21(4) 51 51 35

Tone sandhi

[edit]

Pronunciation also varies with context according to the rules of tone sandhi. Some such changes have been noted above in the descriptions of the individual tones; however, the most prominent phenomena of this kind relate to consecutive sequences of third-tone syllables. There are also a few common words that have variable tone.

Third tone sandhi

[edit]

The principal rule of third tone sandhi is:

  • When there are two consecutive third-tone syllables, the first of them is pronounced with second tone.

For example, 老鼠; lǎoshǔ; 'mouse' is pronounced [lau̯˧˥ʂu˨˩] as if it were láoshǔ. It has been investigated whether the rising contour (˧˥) on the prior syllable is in fact identical to a normal second tone. It has been concluded that it is identical at least in terms of auditory perception.[1]: 237 

When there are three or more third tones in a row, the situation becomes more complicated since a third tone that precedes a second tone resulting from third tone sandhi may or may not be subject to sandhi itself. The results may depend on word boundaries, stress, and dialectal variations. General rules for three-syllable third-tone combinations can be formulated as follows:

  1. If the first word is two syllables and the second word is one syllable, the first two syllables become second tones. For example, 保管好; bǎoguǎn hǎo; 'to take good care of' is pronounced báoguán hǎo [pau̯˧˥kwan˧˥xau̯˨˩˦].
  2. If the first word has one syllable, and the second word has two syllables, the second syllable becomes second tone, but the first syllable remains third tone. For example, 老保管; lǎo bǎoguǎn; 'to take care of all the time' is pronounced lǎo báoguǎn [lau̯˨˩pau̯˧˥kwan˨˩˦].

Some linguists have put forward more comprehensive systems of sandhi rules for multiple third tone sequences. For example, it has been proposed[1]: 248  that modifications are applied cyclically, initially within rhythmic feet (trochees; see below) and that sandhi "need not apply between two cyclic branches".

Tones on special syllables

[edit]

Special rules apply to the tones heard on the morphemes (; 'not') and (; 'one').

For ():

  1. is pronounced with second tone when followed by a fourth tone syllable.
    Example: 不是 ( + shì, 'to not be') becomes búshì [pu˧˥ʂɻ̩˥˩]
  2. In other cases, is pronounced with fourth tone. However, when used between words in an A-not-A question, it may become neutral in tone (e.g. 是不是 shìbushì).

For ():

  1. is pronounced with second tone when followed by a fourth tone syllable.
    Example: 一定 ( + dìng 'must') becomes yídìng [i˧˥tiŋ˥˩]
  2. Before a first, second or third tone syllable, is pronounced with fourth tone.
    Examples:一天 ( + tiān 'one day') becomes yìtiān [i˥˩tʰjɛn˥], 一年 ( + nián 'one year') becomes yìnián [i˥˩njɛn˧˥], 一起 ( + 'together') becomes yìqǐ [i˥˩t͡ɕʰi˨˩˦].
  3. When final, or when it comes at the end of a multi-syllable word (regardless of the first tone of the next word), is pronounced with first tone. It also has first tone when used as an ordinal number (or part of one), and when it is immediately followed by any digit (including another ; hence both syllables of the word 一一 yīyī and its compounds have first tone).
  4. When is used between two reduplicated words, it may become neutral in tone, e.g. kànyikàn ('to take a look of')

The numbers (; 'seven') and (; 'eight') sometimes display similar tonal behavior as , but for most modern speakers they are always pronounced with first tone. All of these numbers, and (), were historically Ru tones, and as noted above, that tone does not have predictable reflexes in modern Chinese; this may account for the variation in tone on these words.[1]: 228 

Second and fourth tone change

[edit]

In conversational speech, for the rising tone (tone 2) and falling tone (tone 4), there are some situations (based on which tones are used immediately before and after) where the pitch contours will change.[44][21][22][1]: 239–241 

Tone 2 becomes higher and changes its direction, approaching the tone 1 pitch contour, when put between tone 1 or 2 and any other full tone.

Tone pattern Nominal Changed Example words
1-2-1 ˥ ˧˥ ˥ ˥ ˥˦ ˥ kēxuéjiā (科学家; 科學家)
1-2-4 ˥ ˧˥ ˥˩ ˥ ˥˦ ˥˩ xīhóngshì (西红柿; 西紅柿)
2-2-1 ˧˥ ˧˥ ˥ ˧˥ ˥˦ ˥ liúxuéshēng (留学生; 留學生)
2-2-4 ˧˥ ˧˥ ˥˩ ˧˥ ˥˦ ˥˩ yuáncáiliào (原材料)
1-2-2 ˥ ˧˥ ˧˥ ˥ ˥˦ ˧˥ zhuō mícáng (捉迷藏)
1-2-3 ˥ ˧˥ ˨˩ ˥ ˥˦ ˨˩ shānhúdǎo (珊瑚岛; 珊瑚島)
2-2-2 ˧˥ ˧˥ ˧˥ ˧˥ ˥˦ ˧˥ xuélíngqián (学龄前; 學齡前)
2-2-3 ˧˥ ˧˥ ˨˩ ˧˥ ˥˦ ˨˩ Guómíndǎng (国民党; 國民黨)

Rising tone induced by the tone 3 sandhi also undergoes this transformation.

Tone pattern Nominal With sandhi Changed Example words
3-3-3 ˨˩ ˨˩ ˨˩ ˧˥ ˧˥ ˨˩ ˧˥ ˥˦ ˨˩ pǎomǎchǎng (跑马场; 跑馬場)
2-3-3 ˧˥ ˨˩ ˨˩ ˧˥ ˧˥ ˨˩ ˧˥ ˥˦ ˨˩ nóngchǎnpǐn (农产品; 農產品)
1-3-3 ˥ ˨˩ ˨˩ ˥ ˧˥ ˨˩ ˥ ˥˦ ˨˩ kānshǒusǔo (看守所)

The status of this tone change is ambiguous, and some authors consider it a tone sandhi akin to the third tone sandhi. Yuen Ren Chao considered the changed tone 2 to be identical to tone 1, and Cao Wen treated it as tone 1 (before tones 1 or 4) or tone 4 (before tones 2 or 3).[21][22] Both views are generalizations; the exact pitch contour of the changed tone 2 varies between mid-level ˧ in isolated words or at a slower speaking rate, and slightly falling high ˥ in a carrier sentence, at a faster speaking rate.[44]

Tone 4 becomes lower and flatter, but still slightly falling, akin to Cantonese tone 3, when put between tone 3 or 4 and tone 1 or 4.

Tone pattern Pitch
(nominal)
Pitch
(changed)
Example words
4-4-1 ˥˩ ˥˩ ˥ ˥˧ ˧ ˥ jìsuànjī (计算机; 計算機)
4-4-4 ˥˩ ˥˩ ˥˩ ˥˧ ˧ ˥˩ bìmùshì (闭幕式; 閉幕式)
3-4-1 ˨˩ ˥˩ ˥ ˨˩ ˧ ˥ lǐbàitiān (礼拜天; 禮拜天)
3-4-4 ˨˩ ˥˩ ˥˩ ˨˩ ˧ ˥˩ gǎn xìngqù (感兴趣; 感興趣)

Unlike with changed tone 2, the changed tone 4 pitch contour was only insignificantly influenced by the change of speaking rate, provided it was still at conversational speed. The resulting pitch contours, especially that of the changed tone 4, are not associated with a phonemic tone in Mandarin. In perceptual experiments, native Beijing Mandarin speakers could easily recognize the intended tone in the original word, but could not recognize it when it was stripped from the context by the adjacent syllables being replaced with white noise:[44]

  • Changed tone 2 was perceived as tone 1 in over 70% of responses
  • Changed tone 4 was perceived as tone 1 in over 50% of responses
  • Both of them were properly recognized in only 20% of responses

Besides the speech rate, the frequency of expression may also play a role in triggering this tone change. The changed tone 2 that normally required tone 1 or 2 to precede it is also said to occur in gòngchǎndǎng (共产党; 共產黨; 'communist party') in place of sandhi-tone 3, but it remains to be seen whether there are more examples with initial tone 4.[1]: 240 

Stress, rhythm and intonation

[edit]

Stress within words (word stress) is not felt strongly by Chinese speakers, although contrastive stress is perceived easily (and functions much the same as in other languages). One of the reasons for the weaker perception of stress in Chinese may be that variations in the fundamental frequency of speech, which in many other languages serve as a cue for stress, are used in Chinese primarily to realize the tones. Nonetheless, there is still a link between stress and pitch – the range of pitch variation (for a given tone) has been observed to be greater on syllables that carry more stress.[1]: 134, 231 

As discussed above, weak syllables have neutral tone and are unstressed. Although this property can be contrastive, the contrast is interpreted by some as being primarily one of tone rather than stress. (Some linguists analyze Chinese as lacking word stress entirely.)[1]: 134 

Apart from this contrast between full and weak syllables, some linguists have also identified differences in levels of stress among full syllables. In some descriptions, a multi-syllable word or compound[e] is said to have the strongest stress on the final syllable, and the next strongest generally on the first syllable. Others, however, reject this analysis, noting that the apparent final-syllable stress can be ascribed purely to natural lengthening of the final syllable of a phrase, and disappears when a word is pronounced within a sentence rather than in isolation. San Duanmu[1]: 136ff  takes this view, and concludes that it is the first syllable that is most strongly stressed. He also notes a tendency for Chinese to produce trocheesfeet consisting of a stressed syllable followed by one (or in this case sometimes more) unstressed syllables. On this view, if the effect of "final-lengthening" is factored out:

  • In words (compounds) of two syllables, the first syllable has the main stress, and the second lacks stress.
  • In words (compounds) of three syllables, the first syllable is stressed most strongly, the second lacks stress, and the third may lack stress or have secondary stress.
  • In words (compounds) of four syllables, the first syllable is stressed most strongly, the second lacks stress, and the third or fourth may lack stress or have secondary stress depending on the syntactic structure of the compound.

The positions described here as lacking stress are the positions in which weak (neutral-tone) syllables may occur, although full syllables frequently occur in these positions also.

There is a strong tendency for Chinese prose to employ four-syllable 'prosodic words' consisting of alternating stressed and unstressed syllables which are further subdivided into two trochaic feet. This structure, sometimes known as a 'four-character template' (四字格), is particularly prevalent in chengyu, which are classical idioms that are usually four characters in length.[45] Statistical analysis of chengyu and other idiomatic phrases in vernacular texts indicates that the four-syllable prosodic word had become an important metrical consideration by the Wei and Jin dynasties (4th century CE).[46]

This preference for trochaic feet may even result in polysyllabic words in which the foot and word (morpheme) boundaries do not align. For example, 'Czechoslovakia' is stressed as // and 'Yugoslavia' is stressed as /, even though the morpheme boundaries are 捷克/斯洛伐克 'Czech[o]/slovak[ia]' and /斯拉夫 'South/slav[ia]', respectively. The preferred stress pattern also has a complex effect on tone sandhi for the various Chinese dialects.[47]

This preference for a trochaic metrical structure is also cited as a reason for certain phenomena of word order variation within complex compounds, and for the strong tendency to use disyllabic words rather than monosyllables in certain positions.[1]: 145–194  Many Chinese monosyllables have alternative disyllabic forms with virtually identical meaning – see Chinese grammar § Word formation.

Another function of voice pitch is to carry intonation. Chinese makes frequent use of particles to express certain meanings such as doubt, query, command, etc., reducing the need to use intonation. However, intonation is still present in Chinese (expressing meanings rather similarly as in standard English), although there are varying analyses of how it interacts with the lexical tones. Some linguists describe an additional intonation rise or fall at the end of the last syllable of an utterance, while others have found that the pitch of the entire utterance is raised or lowered according to the desired intonational meaning.[1]: 234 

Notes

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Standard Chinese phonology encompasses the sound system of Standard Mandarin (Pǔtōnghuà), the official language of the People's Republic of China and one of the working languages of the United Nations, primarily based on the phonology of the Beijing dialect with influences from other northern Mandarin varieties. It is characterized by monosyllabic morphemes, a tonal system, and a constrained syllable structure that limits phonological complexity while relying heavily on tones and context for lexical distinction. The basic syllable in consists of three components: an optional (consonant), a final (vowel or , often with a medial glide and optional coda), and a tone. There are 21 initials, including bilabials (b, p, m, f), alveolars (d, t, n, l), velars (g, k, h), and retroflex or palatalized (zh, ch, sh, r; j, q, x; z, c, s), which provide a rich set of contrasts such as aspiration and retroflexion. Finals number 36, comprising simple vowels (a, o, e, i, u, ü), (ai, ei, ao, ou), and those with nasal codas (an, en, ang, eng, ong) or the r (er), with only two possible codas overall: the nasals /n/ and /ŋ/. This structure yields approximately 400 distinct syllables without tones, but the addition of tones expands the inventory to approximately 1,300 phonetically distinct combinations, resulting in significant (e.g., over 50 characters pronounced shi in various tones). Tones are central to the system, with five distinguished: the high level (first tone, mā), rising (second, má), low falling-rising (third, mǎ; often shortened to low falling before non-third tones), falling (fourth, mà), and neutral (light, unstressed, ma). The third tone undergoes a prominent rule, changing to a rising tone before another third tone, which affects prosody and . Other notable aspects include the apical vowel [ɿ] (as in zi), the retroflex vowel [ʅ] (as in zhi), and regional variations like (r-suffixation, e.g., huār), which adds a retroflex to finals for expressive or dialectal emphasis. These features contribute to the language's reliance on suprasegmental elements for meaning, with stress patterns typically falling on the final in compounds.

Initial consonants

Stops and affricates

Standard Chinese features a set of voiceless stop and consonants as syllable initials, distinguished primarily by aspiration (presence or absence of post-release breathy voicing) and place of articulation, with realizations based on the that forms the phonological basis of the standard. The stops occur at bilabial, alveolar, and velar places, while affricates appear in alveolar, retroflex, and palatal positions, all lacking voiced counterparts in the strict sense; instead, the unaspirated series are voiceless but tense. The bilabial stops consist of the aspirated /pʰ/, phonetically realized as [pʰ] with strong aspiration, and the unaspirated /p/, realized as tense voiceless in Beijing Mandarin. A labialized variant /pʷ/ [pʷʰ] appears before rounded vowels, as in syllables like po. For example, the word pīng 'level' is pronounced [pʰiŋ˥], showcasing the aspirated bilabial stop. Alveolar stops include the aspirated /tʰ/ [tʰ] and unaspirated /t/ , with the latter exhibiting tense voiceless articulation in Beijing realizations. These stops precede front vowels without additional coarticulation effects beyond aspiration. The velar stops consist of the aspirated /kʰ/ [kʰ] and unaspirated /k/ , realized with tense voiceless articulation similar to the other stops. Examples include gāo 'high' [kɑʊ˥] and kāi 'open' [kʰaɪ˥]. Alveolar affricates are articulated at the alveolar ridge, featuring the aspirated /tsʰ/ [tsʰ] and unaspirated /ts/ [ts], both voiceless and tense in the unaspirated form. The aspirated /tsʰ/ corresponds to the pinyin initial c and is similar to the "ts" in "tsunami" with a strong puff of air. An example is cān 'participate' [tsʰan˥], highlighting the aspirated alveolar . Retroflex affricates are articulated with the tongue tip curled back, featuring the aspirated /ʈʂʰ/ [ʈʂʰ] and a labialized form /ʈʂʷ/ [ʈʂʷʰ] before rounded vowels, alongside the unaspirated /ʈʂ/ realized as [ʈʂ] or tense [ʈʂ]. The aspirated /ʈʂʰ/ corresponds to the pinyin initial ch, with the tongue curled back, similar to the "ch" in "church" but retroflexed and aspirated. An example is chī 'eat', pronounced [ʈʂʰi˥], highlighting the aspirated retroflex affricate. Palatal affricates, produced with the blade of the tongue near the , comprise the aspirated /tɕʰ/ [tɕʰ] and unaspirated /tɕ/ [tɕ] or tense [tɕ] in speech, often with slight release in the aspirated form. The aspirated /tɕʰ/ corresponds to the pinyin initial q, articulated with the tongue near the hard palate, similar to the "ch" in "cheese" but more forward and aspirated. These contrast with fricatives primarily through their initial stop closure phase. The aspirated affricates /tsʰ/ (c), /ʈʂʰ/ (ch), and /tɕʰ/ (q) are often confused by learners due to their similar manner of articulation (affricated and aspirated) but distinct places of articulation (alveolar, retroflex, and alveolo-palatal, respectively). Numerous video tutorials on YouTube demonstrate the differences in mouth positions, tongue placement, and example words to help distinguish these sounds.

Fricatives and approximants

The fricatives and in function exclusively as syllable initials, producing continuant sounds through constriction or without the release found in affricates. These consonants include five voiceless fricatives, one voiced , and one lateral , contributing to the language's contrasts across labiodental, dental, retroflex, palatal, and velar places of articulation. The sole labiodental fricative is /f/, realized phonetically as , where the lower approximates the upper teeth to create frication. This sound appears in words like fēn 'fragrance, ' [fən⁵¹]. Dental sibilants consist of the voiceless alveolar fricative /s/ , produced with the tongue blade near the alveolar ridge or teeth, as in 'private, think' [sɨ⁵¹]. The retroflex sibilant /ʂ/ [ʂ] involves the tongue tip curled back toward the palate, yielding a distinct retroflex frication, exemplified by shū 'book' [ʂu⁵¹]. The palatal fricative /ɕ/ [ɕ] is an alveolo-palatal sound with the tongue blade raised toward the hard palate, as heard in 'west' [ɕi⁵¹]. The velar fricative /x/ is produced with the back of the tongue approaching the soft palate, as in hǎo 'good' [xɑʊ̯˨˩˦]. The retroflex approximant /ʐ/ is realized as [ʐ] or [ɻ], a voiced continuant with the tongue curled back, as in rén 'person' [ʐən˧˥]. The lateral approximant /l/ allows airflow around the sides of the tongue in contact with the alveolar ridge; it varies to a flap [ɺ] intervocalically in casual speech. An example is 'road, land' [lu⁵¹]. Allophonic variations among the sibilants include palatalization of /s/ and /ʂ/ before /i/, where they may merge toward [ʃ] or [ɕ]-like realizations, enhancing contrast with non-front vowels; retroflex approximants may also exhibit post-vocalic retroflexion in connected speech.

Nasals

Standard Chinese features three nasal consonants that can function as syllable initials: the bilabial /m/, the alveolar /n/, and the velar /ŋ/. The bilabial nasal /m/ is realized phonetically as , with airflow obstructed at the lips and resonance through the nasal cavity. A representative example is mǎ "horse," transcribed as [ma˨˩]. The alveolar nasal /n/ is realized as , though it has a dental allophone [n̪] before high front vowels like /i/, due to coarticulatory assimilation in place of articulation. An example is nán "south" or "difficult," pronounced [nan˧˥]. The velar nasal /ŋ/ is realized as [ŋ], with the back of the tongue contacting the soft palate, but it often palatalizes to [ɲ] before /i/ in contexts where it occurs. However, /ŋ/ is rare as a syllable initial in modern standard Chinese, typically appearing in loanwords, dialectal influences, or historical reconstructions; in standard pronunciation, it is frequently substituted with /n/ or a zero onset, as in some analyses of syllables like those transcribed as ŋo in non-standard romanizations. This variation highlights the dynamic interaction between nasal place features and adjacent segments in the syllable.

Medial and final sounds

Glides and semivowels

In , glides—also known as semivowels—serve as non-syllabic consonantal elements that typically occupy the medial position within the or act as onsets, originating historically from the offglides of high s. The system features three main glides: the palatal glide /j/ , derived from the high front unrounded /i/; the labial glide /w/ , derived from the high back rounded /u/; and the labio-palatal glide /ɥ/ [ɥ], derived from the high front rounded /y/. These glides are underlyingly vowels that surface as consonants when positioned before another vowel in the nucleus, contributing to the formation of diphthongs without forming consonant clusters. As medials, glides follow an initial and precede the primary nucleus, creating complex rime structures. For instance, the jiā 'home' is transcribed as /tɕja/, where /j/ mediates between the alveolo-palatal /tɕ/ and the /a/, while guā '' appears as /kwa/, with /w/ linking the velar stop /k/ to /a/. Similarly, syllables like jué '觉' involve /ɥ/ as in /tɕɥɛ/, combining with front rounded s. This medial usage is obligatory in such combinations to maintain well-formedness, as prohibits true - sequences without glide intervention in these contexts. Syllables lacking a consonantal onset but beginning with high vowels are systematically analyzed as having a glide-initial structure rather than a true zero onset, ensuring uniform treatment across the inventory. Thus, yī 'one' (first tone) is phonetically [ji˥] and underlyingly /ji˥/, where /j/ is the glide onset. while wǔ 'five' (third tone) is [wu˨˩] from /wu˨˩/, where /w/ substitutes for an initial /u/. This convention avoids positing empty onsets for these cases, aligning with the language's phonological principles that high vowels in onset position consonantalize to glides. The same applies to yüe '月' as /ɥɛ˧˥/, treated as glide-initial /ɥ/- rather than a bare vowel. The standard phonetic realization of the labial glide /w/ is a close back rounded , consistent with Mandarin norms. However, in some southern varieties of , such as those influenced by southwestern or , /w/ may surface as a labiodental , reflecting regional substrate effects from languages lacking a pure . This variation does not alter the phonemic status but affects perceptual distinctions in non-standard speech.

Syllabic consonants

In Standard Chinese, syllabic consonants function as the nucleus of a syllable in the absence of a vowel, forming a distinct category within the language's phonological inventory. These include the nasals /m/, /n/, and /ŋ/, as well as the retroflex approximant /ɻ/, which are realized as syllabic [m̩], [n̩], [ŋ̩], and [ɚ], respectively. Additionally, there are apical syllabic approximants: [ɿ] after alveolar sibilant initials (z, c, s), as in sī [sɿ], and retroflex [ʅ] after retroflex sibilants (zh, ch, sh, r), as in shī [ʂʅ]. These apical forms are the most productive syllabic consonants and occur in many content words. This structure allows these consonants to bear tones, contributing to lexical and grammatical distinctions, though their occurrence is limited compared to vowel-nucleated syllables. The syllabic /m/ appears in rare contexts such as interjections, exemplified by , pronounced [m̩˨˩], where the bilabial nasal serves as the sole syllabic element. Similarly, syllabic /n/ occurs in function words like the question particle , realized as [n̩˨˩˦], highlighting its alveolar nasal quality in isolation. The syllabic /ŋ/ is found in onomatopoeic or emphatic expressions, as in ēng [ŋ̩˥], featuring a velar nasal nucleus with a high-level tone. The retroflex syllabic /ɻ̩/, often transcribed as [ɚ], is more productive and appears in the diminutive suffix ěr [ɚ˨˩˦], as in háir [xɑi˨˩˦ɚ˨˩˦] '' (rhotacized form). Phonetically, these syllabic consonants maintain their consonantal articulation in isolation, with structures resembling nasal murmurs rather than full vowels; for instance, [m̩] exhibits low first-formant values typical of bilabial closure. In , particularly under influences, denasalization may occur, reducing nasal airflow and shifting toward approximant-like realizations, though this is context-dependent and less prominent in careful pronunciation. These forms are rare and predominantly restricted to monosyllabic function words, particles, interjections, or suffixes, comprising a small subset of the approximately 400 core syllables in ; they do not typically form the basis of content items. This distribution underscores their role in prosodic and grammatical marking rather than primary lexical encoding. The apical syllabics, however, are more common in lexical items.

Vowels and diphthongs

Standard Chinese features a set of vowels categorized by height as high (/i/, /u/, /ɨ/), mid (/e/, /ə/, /o/), and low (/a/). These form the core vocalic elements in syllables, with /i/ realized as a , /u/ as a , and /ɨ/ as a often appearing after retroflex or alveolar initials.
HeightFront unroundedCentral unroundedBack unroundedBack rounded
Close/i//ɨ//u/
Mid/e//ə//ɤ//o/
Open/a/
Analyses of the vowel system vary, with the pinyin romanization supporting a five-vowel framework (/a, e, i, o, u/), where additional qualities emerge as allophones or contextual variants, while the bopomofo (Zhuyin) system, used in Taiwan, posits seven distinct vowels (/a, ɛ, i, o, ɔ, u, ɨ/) to better capture phonetic distinctions in finals like -e [ɤ] versus -o [ɔ]. The /ə/ appears primarily as a schwa-like mid central vowel in unstressed syllables, such as in the neutral tone. Diphthongs in Standard Chinese combine a primary vowel with a glide, forming sequences like /ai/ (as in āi [aɪ˥]), /ei/ (as in bēi [peɪ˥]), /ao/ (as in gāo [kaʊ˥]), and /ou/ (as in gōu [koʊ˥]). Nasalized variants occur in finals such as /an/ (low back nasal, as in ān [ãn˥]) and /ɛn/ or /ən/ (as in ēn [ən˥] or fēn [fən˥]), where the nasalization affects the vowel quality before /n/ or /ŋ/. These diphthongs often incorporate glides like /i̯/ or /u̯/ in their onset, contributing to syllable complexity. An example of mid vowel realization is guǒ [kʰu̯ɔ˨˩˦], where /o/ appears as [ɔ] in diphthongal context.

Syllable structure

Onset-nucleus-coda framework

The onset-nucleus-coda (ONC) framework provides a standard phonological model for analyzing syllables, highlighting their relative simplicity with no complex clusters and limited coda possibilities compared to languages like English. In this model, a is divided into three main constituents: an optional onset, a obligatory nucleus, and an optional coda, allowing for a concise representation that aligns with the language's monosyllabic tendencies. The onset is optional and comprises a single initial consonant (C) or a glide (G), such as /j/, /w/, or /ɥ/, which functions as a in this position. For instance, in the syllable shān ( for "mountain"), the onset is the retroflex /ʂ/. Glides in the onset position, as in ("one"), contribute to the 's consonantal initiation without forming clusters. The nucleus forms the core of the and consists of a (V) or a , providing the primary sonority peak. Common nuclei include monophthongs like /a/ or syllabic consonants such as the apical vowel [ɿ] in ("four"). This central element carries the syllable's inherent prominence, often interacting with surrounding segments in phonetic realization. The coda is optional and restricted to nasals (/n/, /ŋ/) or the rhotic (/ɚ/), reflecting the language's avoidance of finals. In shān, the coda is the alveolar nasal /n/, while many syllables like lack a coda entirely, resulting in open syllables. This limitation contributes to the inventory of approximately 1,300 possible syllables (including tones), which is modest given the language's lexical size, as tones play a crucial role in differentiation rather than segmental variety. The general formula for a Standard Chinese syllable in the ONC framework is (C)(G)V(N), where parentheses indicate optionality, G represents the prenuclear glide, V the nucleus, and N the nasal or approximant coda. This structure accommodates the majority of syllables, with weak syllables representing reduced variants of this canonical form.

Rhotic and other codas

In Standard Chinese, the possible codas are severely restricted, consisting solely of two nasal consonants and a distinctive rhotic suffix. The nasal codas are the alveolar /n/ and the velar /ŋ/, which occur after a wide range of vowels to form common rhyme structures such as [an], [ən], [aŋ], [ian], and [uaŋ]. These nasals are phonetically realized as following front vowels or [ə]-like nuclei and as [ŋ] following back vowels, contributing to the language's limited set of about 60 possible syllable finals when combined with nuclei. For instance, the syllable for "rice" (fān) is pronounced [fan˥], and "palace" (gōng) as [kʊŋ˥], illustrating how /n/ pairs with open vowels and /ŋ/ with rounded ones. The rhotic coda, known as erhua (儿化音) or the -r suffix, is a productive morphological feature that adds a retroflex [ɻ], often realized as an [ɚ], to the end of a , often deriving or colloquial forms. Phonetically, this rhotic assimilates with the preceding nucleus through vowel rhotacization rather than standing as a separate segment. In standard pronunciation, erhua is optional and not mandatory for intelligibility, but it is prevalent in the , which serves as the basis for Putonghua, and is frequently used to convey informality or regional flavor. The suffix attaches to virtually any regardless of its original coda, though it is most common with open syllables or those ending in nasals, effectively expanding the coda beyond strict nasals in colloquial speech. Examples of erhua include transforming "book" (shū [ʂu˥]) into the shūr [ʂuɚ˥], where the rhotic integrates seamlessly with the nucleus, or "flower" (huā [xwa˥]) becoming huār [xwaɻ˥], preserving the syllable's original tone while altering its phonetic quality. This rhotic coda fits within the broader onset-nucleus-coda framework by occupying the coda position, though its suffixal nature allows it to modify entire rhymes in ways that nasals do not.

Full versus weak syllables

In Standard Chinese, syllables are classified into full and weak categories based on prosodic strength, with full syllables serving as the primary carriers of lexical information. Full syllables are stressed and feature complete realization of their segments, including a full vowel and one of the four main lexical tones, such as the high-level tone in māma [ma˥ ma˥] 'mother'. These syllables conform to the structure (C)VX, where they are phonologically heavy due to bimoraic weight, allowing for contour tones and full vowel length. Weak syllables, in contrast, are unstressed and exhibit phonetic reduction, typically bearing the neutral tone and featuring centralized or diminished vowels. They often occur as clitics or in prosodically subordinate positions, such as function words or the second element in compounds, where the vowel reduces to a schwa [ə] or even elides entirely. The neutral tone on weak syllables results in a short, mid-level pitch that lacks contrastive value, distinguishing them from the tonal specifications of full syllables. Reduction in weak syllables involves vowel centralization toward [ə], potential loss of the onset consonant in rapid speech, and shortening of duration, which underscores their phonological dependence on adjacent full syllables. For instance, the possessive particle de is realized as [tə] or [ə] in connected speech, while the aspectual clitic le appears as [lə]; syllabic consonants like [ɨ] in zhī may weaken further in unstressed contexts. These patterns highlight the role of weak syllables in maintaining rhythmic alternation, with strong-weak pairings prevalent in disyllabic words. Weak syllables are particularly frequent in compounds, where the initial syllable is full and the final one weakens, as well as in grammatical morphemes like de, le, and zi, comprising a significant portion of everyday speech to facilitate . This distinction implies broader phonological constraints, such as the inability of weak syllables to host complex codas or rising tones, reinforcing the binary prosodic organization of .

Tonal system

The four lexical tones

, also known as Mandarin, employs a tonal system where pitch variations distinguish lexical meaning, with four primary lexical tones that form the core of its phonemic contrasts. These tones are realized as distinct pitch contours on monosyllabic units, typically notated using Yuen Ren Chao's five-level scale (1 low to 5 high), which captures the relative pitch height and direction over the syllable's duration. The tones serve a phonemic function, as demonstrated by minimal pairs where only the tone differs, leading to entirely different words and meanings. The first tone is a high level pitch, transcribed as [˥] or 55 in Chao notation, maintaining a steady high pitch throughout the syllable. For example, [ma˥] means "mother." This tone starts and ends at the upper end of the speaker's pitch range, providing a flat, sustained contour that contrasts sharply with dynamic tones. The second tone features a rising pitch, notated as [˧˥] or 35, beginning at mid-level and ascending to high. An example is [ma˧˥], meaning "hemp" or "numb." This upward trajectory gives it an interrogative-like quality in isolation, though its primary role is lexical distinction. The third tone is a low dipping contour, represented as [˨˩˦] or 214, starting mid-low, falling to the bottom of the pitch range, and then rising slightly. It is illustrated by [ma˨˩˦], which means "horse." This complex shape, with its characteristic dip, makes it the most marked of the four tones phonetically. The fourth tone is a high falling pitch, denoted [˥˩] or 51, dropping sharply from high to low. For instance, [ma˥˩] means "to scold." Its abrupt descent conveys a sense of finality or emphasis in utterance. These tones exemplify phonemic contrasts in minimal sets like (mother), (hemp), (horse), and (scold), where the sole difference in pitch contour alters the word's identity and semantics. Realizations of the tones exhibit slight phonetic variations depending on factors such as the syllable's position within a word or phrase—earlier syllables may show compressed contours—and speaker gender, with females typically producing higher overall pitch levels than males. The neutral tone, an atonic variant, appears on certain unstressed syllables but does not contrast lexically like the four main tones.

Neutral tone and tone values

The neutral tone, also known as the light tone, is a suprasegmental feature in Standard Chinese phonology that applies to unstressed or weak syllables, resulting in a reduced, short realization without a full tonal specification. It is typically transcribed as [˧] in narrow phonetic notation, representing a mid-level pitch that lacks the dynamic contour of the four lexical tones. This tone is phonetically underspecified, with its fundamental frequency (F0) often interpolating between the targets of adjacent tones rather than having an independent target, leading to a sagging or linear connection from the preceding tone. The neutral tone commonly occurs on clitic-like elements, such as the plural -men (e.g., rénmen [ɻə̌n.mən] "people") and the diminutive/nominalizer -zi (e.g., háizi [xái̯.t͡sɨ˧] ""), as well as in certain pronouns, particles, and reduced function words. Its presence shortens the duration of the preceding lexical tone, enhancing the prosodic flow and reducing emphasis on the weak ; for instance, the full form of "" is xièxie, realized as [ɕjɛ̂.ɕjɛ] where the second bears the neutral tone and the first tone is truncated. Acoustic studies show the neutral tone has a markedly shorter duration (often 50-70% of full tones) and variable intensity, contributing to its weak status. In terms of acoustic values, the neutral tone's F0 is generally mid-level, around 250-300 Hz for female speakers, but exhibits variability depending on context, such as slight falling or rising influenced by the preceding tone. For comparison with the lexical tones (which the neutral tone contrasts by lacking specification), approximate F0 contours for female speakers are as follows: the first tone maintains a high level at ~300-400 Hz; the second tone rises from ~200 Hz to 350 Hz; the third tone dips to ~100 Hz before rising; and the fourth tone falls sharply from ~350 Hz to 150 Hz. These values reflect typical productions in isolation or citation forms, with overall F0 ranges for females spanning 80-400 Hz. The realization of the neutral tone shows dialectal variability; it may be absent or merged with a low tone in some southern varieties of Standard Chinese, such as those influenced by Southwestern Mandarin, and can be optionally emphasized to bear a full lexical tone for contrast or focus.

Historical development from Middle Chinese

Middle Chinese, as reconstructed from rime dictionaries like the Qieyun (601 CE), featured a tonal system with four primary categories: the level tone (píngshēng), rising tone (shǎngshēng), departing tone (qùshēng), and entering tone (rùshēng). Each category was subdivided into yin (clear, with voiceless initials) and yang (muddy, with voiced initials), yielding eight distinct tone classes. This register split emerged during tonogenesis, where the loss of final consonants (such as glottal stops for shang and fricatives for qu) interacted with initial voicing to create pitch contrasts. Voiced initials generally lowered the overall pitch contour, distinguishing yang tones from their yin counterparts, while the entering tone was characterized by a short duration and a coda stop (-p, -t, -k). In the transition to Modern Standard Chinese (based on Beijing Mandarin), these eight categories underwent mergers, simplifying the system to four lexical tones while preserving traces of the historical distinctions. The first tone (high level) primarily derives from the yin ping category, reflecting the high pitch of voiceless level tones. The second tone (rising) derives from the yang ping category, where voiced initials' lower starting pitch evolved into a rising contour. The fourth tone (falling) mainly comes from the qu tones (both registers), maintaining a departing pitch fall. The third tone (low dipping) derives from the shang tones (both registers). These changes occurred gradually between the 12th and 16th centuries, influenced by northern dialect shifts and the loss of voiced obstruent initials, which became voiceless but retained tonal reflexes of their original voicing. The entering tones lost their phonemic status as a separate category in northern varieties, with the coda stops disappearing and syllables lengthening slightly; they were redistributed into the other tones based on their historical register and category, often resulting in shorter realizations. For instance, a Middle Chinese entering tone word like mək (with voiceless initial) developed into modern (fourth tone), absorbing into the falling category from qu. Similarly, a voiced level tone example such as pau evolved into bāo (first tone), illustrating how some voiced ping items shifted to high level under specific phonological conditions, though most aligned with the rising second tone. This redistribution preserved conceptual links to the original categories, contributing to patterns in modern where historical mergers affect tonal alternations in compounds.

Tone sandhi and prosody

Third tone sandhi

Third tone sandhi is a prominent phonological rule in Standard Chinese whereby a syllable with an underlying third tone (T3), typically a low dipping contour [˨˩˦] in isolation, changes to a rising contour akin to the second tone [˧˥] when immediately followed by another T3 syllable. This process, systematically outlined by Chao (1968), prevents consecutive low tones in connected speech and is obligatory in natural utterances. For instance, the common greeting "nǐ hǎo" ('hello'), consisting of two T3 syllables, is realized as [ni˧˥ xau˧˩˦] when utterance-final or [ni˧˥ xau˨˩] otherwise. In sequences of multiple T3 syllables, the rule applies iteratively from left to right, converting all preceding T3s to the rising form while leaving the final T3 unchanged. Thus, a trisyllabic string of T3s surfaces as T2 T2 T3, with the last T3 realized as full [˨˩˦] only if utterance-final and shortened to low [˨˩] in mid-utterance positions. Acoustic analyses confirm this pattern, showing the rising variant before T3 and a low realization before non-T3 tones, underscoring the rule's role in tonal contrast maintenance (Chen, 2007). The full dipping T3 occurs exclusively in isolation or utterance-final contexts, reflecting prosodic boundaries rather than lexical specification. Regional variations further modulate application; in , is applied less consistently than in Beijing-derived , with greater retention of full T3 forms in comparable contexts ( et al., 2020). In relation to the neutral tone, a preceding T3 adopts the shortened low form rather than the rising variant, as seen in "nǐmen" ('you all') [ni˨˩ mən].

Other sandhi rules and exceptions

In Standard Chinese, several monosyllabic words undergo obligatory tone sandhi distinct from the primary third tone rule, primarily to avoid tonal clashes in connected speech. The negative adverb bù (不, 'not'), which has a lexical fourth tone [pu̯⁵¹], changes to a second tone [pú˧˥] when followed by another fourth tone syllable. This applies in phrases like bù shì (不是, 'is not'), realized as [pú ʂɨ˥˩]. If the following syllable bears a non-fourth tone or neutral tone, bù retains its fourth tone, as in bù hǎo (不好, 'not good') [pu̯⁵¹ xaʊ̯˨˩˧]. This rule is exceptionless in isolation but may neutralize to a light tone in fixed expressions like verb-negation-verb constructions (e.g., bù zhīdào, 'don't know'). The numeral yī (一, 'one') displays context-dependent tone alternations based on the following syllable's tone. Before a first, second, or third tone syllable, yī shifts to a fourth tone [ì˥˩]; before a fourth tone, it becomes a second tone [í˧˥]. Before a neutral tone, yī typically retains its lexical first tone [ī˥], as in yī gè (一个, 'one [classifier]') [ji˥ kə]. Examples include yī tiān (一天, 'one day') [ì tʰjen˥] and yī suì (一岁, 'one year old') [í su̯eɪ̯˥˩]. These changes are lexical and apply across compounds, though in ordinal numbers like yī gōnglǐ (一公里, 'one kilometer'), the sandhi may be suppressed if yī functions as a prefix. These rules exhibit exceptions in lexicalized compounds, where tones may resist change to preserve etymological forms; for example, some disyllabic idioms retain bù or yī in their citation tones despite contextual pressure. Dialectal variation also occurs: Mandarin shows greater tonal reduction in sandhi realizations compared to Mandarin, with bù and yī changes often more neutralized or half-toned due to prosodic influences from local substrates, though the core rules align.

Stress, rhythm, and intonation

Standard Mandarin Chinese lacks lexical stress akin to that in , where individual words carry fixed stressed syllables; instead, prosodic prominence arises at the phrasal level, with primary stress typically placed on the final full of an intonational phrase. Secondary stress may occur on within the phrase to highlight semantic focus, achieved through increased duration and intensity rather than pitch accent. For instance, in the sentence "Wǒ ài nǐ" (I love you), the "ài" receives secondary emphasis as the , while "nǐ" bears primary stress at the phrase end. The rhythm of Standard Mandarin is characterized as syllable-timed, meaning syllables tend to occur at relatively equal intervals, contrasting with stress-timed rhythms in languages like English where stressed syllables dominate timing. However, lexical tones influence this evenness: high and rising tones (first and second) are shorter, while the low dipping third tone is longer due to its contour, and the neutral tone on weak syllables further shortens them, creating subtle variations in . This syllable-based timing supports the language's prosodic structure, where full syllables carry primary rhythmic weight, and weak syllables with neutral tone serve as rhythmic reducers in one sentence at most. Intonation in Mandarin overlays global pitch contours on the lexical tone system, primarily through (F0) adjustments that do not alter tone identities but modulate their realization for pragmatic purposes. Declarative feature a falling intonation, with an initial high F0 peak followed by gradual lowering across the phrase, enhancing the perceptual salience of tones. Yes-no questions exhibit a rising terminal contour, often marked by a high boundary tone on the final or an particle like "ma," while wh-questions maintain declarative-like falling patterns. Emphasis is conveyed by exaggerating tone height or duration on focused elements, as in boosting the F0 range of "ài" in "Wǒ ài nǐ" to stress . Phrase-level prosody organizes speech into tone groups, each with an initial high pitch reset and final lowering, facilitating coherence without fixed stress patterns.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.