Hubbry Logo
search
logo
1318757

Tocharian script

logo
Community Hub0 Subscribers
Read side by side
from Wikipedia
Tocharian script
Kizil Caves standing Buddha. Often attributed in the past to the 7th century AD,[1] but now carbon dated to AD 245-340.[2] Tocharian B inscription reading:

Se pañäkte saṅketavattse ṣarsa papaiykau
"This Buddha was painted by the hand of Sanketava".[3][4][5][6]
Script type
Period
8th century
LanguagesTocharian languages
Related scripts
Parent systems
Sister systems
Gupta, Tamil-Brahmi, Bhattiprolu, Sinhala
 This article contains phonetic transcriptions in the International Phonetic Alphabet (IPA). For an introductory guide on IPA symbols, see Help:IPA. For the distinction between [ ], / / and ⟨ ⟩, see IPA § Brackets and transcription delimiters.
Sample of Tocharian script on a tablet.

The Tocharian script,[7] also known as Central Asian slanting Gupta script or North Turkestan Brāhmī,[8] is an abugida which uses a system of diacritical marks to associate vowels with consonant symbols. Part of the Brahmic scripts, it is a version of the Indian Brahmi script. It is used to write the Central Asian Indo-European Tocharian languages, mostly from the 8th century (with a few earlier ones, probably as early as AD 300)[9] that were written on palm leaves, wooden tablets and Chinese paper, preserved by the extremely dry climate of the Tarim Basin. Samples of the language have been discovered at sites in Kucha and Karasahr, including many mural inscriptions. Mistakenly identifying the speakers of this language with the Tokharoi people of Tokharistan (the Bactria of the Greeks), early authors called these languages "Tocharian". This naming has remained, although the names Agnean and Kuchean have been proposed as a replacement.[10][7]

Tocharian A and B are not mutually intelligible. Properly speaking, based on the tentative interpretation of twqry as related to Tokharoi, only Tocharian A may be referred to as Tocharian, while Tocharian B could be called Kuchean (its native name may have been kuśiññe), but since their grammars are usually treated together in scholarly works, the terms A and B have proven useful. A common Proto-Tocharian language must precede the attested languages by several centuries, probably dating to the 1st millennium BC. Given the small geographical range of and the lack of secular texts in Tocharian A, it might alternatively have been a liturgical language, the relationship between the two being similar to that between Classical Chinese and Mandarin. However, the lack of a secular corpus in Tocharian A is by no means definite, due to the fragmentary preservation of Tocharian texts in general.

History

[edit]

The Tocharian script is derived from the Brahmi alphabetic syllabary (abugida) and is referred to as slanting Brahmi. It soon became apparent that a large proportion of the manuscripts were translations of known Buddhist works in Sanskrit and some of them were even bilingual, facilitating decipherment of the new language. Besides the Buddhist and Manichaean religious texts, there were also monastery correspondence and accounts, commercial documents, caravan permits, and medical and magical texts, and one love poem. Many Tocharians embraced Manichaean duality or Buddhism.

In 1998, Chinese linguist Ji Xianlin published a translation and analysis of fragments of a Tocharian Maitreyasamiti-Nataka discovered in 1974 in Yanqi.[11][12][13]

The Tocharian script probably died out after 840, when the Uyghurs were expelled from Mongolia by the Kyrgyz, retreating to the Tarim Basin. This theory is supported by the discovery of translations of Tocharian texts into Uyghur. During Uyghur rule, the peoples assimilated by[clarification needed] the Turkic speaking Uyghurs now in Xinjiang.[citation needed]

Script

[edit]

The Tocharian script is based on Brahmi, with each consonant having an inherent vowel, which can be altered by adding a vowel mark or removed by a special nullifying mark, the virama. Like Brahmi, Tocharian uses stacking for conjunct consonants and has irregular conjunct forms of , ra.[14] Unlike other Brahmi scripts, Tocharian has a second set of characters called Fremdzeichen that double up several of the standard consonants, but with an inherent "Ä" vowel.[15] The eleven Fremdzeichen are most often found as substitutes for the standard consonant+virama in conjuncts, but they can be found in any context other than with the explicit "Ä" vowel mark. Fremdzeichen as consonant+virama is not found in later Tocharian texts.

Table of Tocharian letters

[edit]
Tocharian vowels
Independent A Ā I Ī U Ū
R̥̄ E Ai O Au Ä
Vowel diacritics
(here applied on
as an example)
Tha Thā Thi Thī Thu Thū
Thr̥ Thr̥̄ The Thai Tho Thau Thä
Tocharian consonants
Velars Ka Kha Ga Gha Ṅa
Standard
Fremdzeichen
Palatals Ca Cha Ja Jha Ña
Retroflexes Ṭa Ṭha Ḍa Ḍha Ṇa
Dentals Ta Tha Da Dha Na
Standard
Fremdzeichen
Labials Pa Pha Ba Bha Ma
Standard
Fremdzeichen
Sonorants Ya Ra La Va
Standard
Fremdzeichen
Sibilants Śa Ṣa Sa Ha
Standard
Fremdzeichen
Other marks
Visarga Anusvara Virama (on na) Jihvamuliya Upadhmaniya

Evolution from Brahmi to Tocharian

[edit]
2nd-century CE Sanskrit, Kizil Caves. First line: "... [pa]kasah tasmad asma(d)vipaksapratipaksas..." . Spitzer, Manuscript folio 383 fragment.

Manuscripts in Sanskrit, using Middle Brahmi script and the Kushan period, and carbon dated to the 2nd century CE, have been discovered in the Tarim Basin, and particularly at Kizil. Some of the fragments, quite possibly the oldest Sanskrit manuscript of any type related to Buddhism and Hinduism discovered so far, were discovered in 1906 in the form of a pile of more than 1,000 palm leaf fragments in the Ming-oi, Kizil Caves, during the third Turfan expedition headed by Albert Grünwedel. The calibrated age of the manuscript by Carbon-14 technique is 130 CE (80–230 CE), corresponding to the rule of the Kushan king Kanishka.

The Tocharian script evolved from the Middle Brahmi script of the Kushan Empire:[16]

Evolution from Brahmi to Kushan Brahmi, and to Tocharian[17]
a i u e o k- kh- g- gh- ṅ- c- ch- j- jh- ñ- ṭ- ṭh- ḍ- ḍh-
Brahmi 𑀅 𑀇 𑀉 𑀏 𑀑 𑀓 𑀔 𑀕 𑀖 𑀗 𑀘 𑀙 𑀚 𑀛 𑀜 𑀝 𑀞 𑀟 𑀠
Kushan Brahmi
Tocharian
ṇ- t- th- d- dh- n- p- ph- b- bh- m- y- r- l- v- ś- ṣ- s- h-
Brahmi 𑀡 𑀢 𑀣 𑀤 𑀥 𑀦 𑀧 𑀨 𑀩 𑀪 𑀫 𑀬 𑀭 𑀮 𑀯 𑀰 𑀱 𑀲 𑀳
Kushan Brahmi
Tocharian

Unicode

[edit]

Tocharian script was proposed for inclusion in Unicode in 2015 but has not been approved.[18]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The Tocharian script is an abugida, a type of writing system in which consonants carry an inherent vowel that can be modified by diacritics or separate vowel signs, derived from the North Indian Brahmi script and adapted for the phonetic needs of the Tocharian languages.[1] It was used to record Tocharian A and Tocharian B, two extinct centum branches of the Indo-European language family spoken in the oases of the Tarim Basin in present-day Xinjiang, China, along the northern Silk Road.[2] The script reads from left to right, employing aksaras (syllabic characters) that typically represent a consonant with the inherent vowel a, alongside explicit signs for other vowels such as i, u, e, o, and a distinctive marker for the central vowel ä (realized as [ɨ] or schwa-like).[1] While the majority of texts are in this Slanting Brahmi variant—also known as North Turkestan Brahmi—a small number of Tocharian B documents appear in the Manichean script, reflecting cultural exchanges in the region.[3] The script's development traces back to the introduction of Buddhism into the Tarim Basin around the 4th–5th centuries CE, evolving from earlier Gupta-derived forms of Brahmi brought via Indian and Central Asian intermediaries.[2] Manuscripts in Tocharian A, primarily from eastern sites like Turfan and Karashahr, date from roughly the 7th to 10th centuries CE and often served liturgical purposes, while Tocharian B texts, found more widely including in Kucha, span a broader period from before 400 CE to the 10th century CE, encompassing both religious and secular content such as monastic accounts, business letters, medical treatises, and graffiti.[3] Over 7,600 fragments and manuscripts survive, making Tocharian the best-attested extinct Indo-European language from Central Asia, though the corpus remains fragmentary due to the arid climate's preservation contrasted with historical destruction.[2] Tocharian texts were first identified in the West in 1892 through manuscripts collected in the Tarim Basin during Russian expeditions, with the languages formally named "Tocharian" in 1907 by German scholar Friedrich W. K. Müller, though this label may not correspond to the ancient bearers of the tongue.[2] Subsequent discoveries by German, British, French, and Japanese teams in the early 20th century, followed by Chinese excavations from the 1970s onward, have enriched the corpus, revealing the script's role in a multicultural hub influenced by Indian, Iranian, Turkic, and Chinese elements.[3] The script's adaptation highlights unique phonological traits, such as the absence of voiced stops and the presence of palatalized consonants, providing crucial evidence for reconstructing Proto-Indo-European and understanding early medieval Central Asian linguistics.[1]

Background

Linguistic and historical context

The Tocharian languages represent an extinct branch of the Indo-European language family, distinct from the more familiar Indo-Iranian and Greco-Roman branches, and are known primarily through two dialects: Tocharian A, also called Turfanian or Agnean, and Tocharian B, known as Kuchean.[4] These dialects were spoken by communities in the oases of the Tarim Basin, located in present-day Xinjiang, northwest China, with manuscripts dating from around the 4th to the 13th centuries CE; Tocharian B is attested from before 400 CE to ca. 1200 CE, while Tocharian A dates from the 7th to 10th centuries CE.[5][2] The arid climate of the region preserved over 7,600 manuscripts and fragments, providing the primary evidence for these languages, which exhibit centum characteristics aligning them more closely with Western Indo-European branches like Celtic and Italic.[4][2] In the historical context of Central Asia, Tocharian speakers inhabited key nodes along the Silk Road trade routes, facilitating extensive cultural and linguistic exchanges in the Tarim Basin from the 1st millennium BCE onward.[5] The region served as a crossroads where Buddhism, originating from India, flourished from the 2nd century CE, profoundly influencing Tocharian society through the establishment of monasteries and the translation of sacred texts.[5] Interactions with neighboring cultures were multifaceted: Indo-Aryan influences arrived via Sanskrit and Gāndhārī loanwords related to Buddhism (e.g., bodhisātve 'bodhisattva'), Iranian elements through Saka and Sogdian contacts (e.g., etswe 'mule' from Old Iranian), and Chinese impacts via trade and administrative ties under dynasties like the Tang.[4] These exchanges enriched Tocharian vocabulary while maintaining its core Indo-European structure, highlighting the basin's role as a melting pot of Eurasian civilizations.[5] The Tocharian script, derived from the Brahmi family of Indian origin, played a crucial role in documenting a diverse corpus that belied the language's European linguistic affinities.[4] It was employed to record predominantly Buddhist literature, including sutras, commentaries, and monastic hymns, alongside administrative documents such as caravan passes and contracts, and occasional secular texts like poetry and medical treatises.[5] This orthographic choice reflected the deep penetration of Indo-Aryan cultural practices through Buddhist dissemination, contrasting sharply with Tocharian's phonological and grammatical ties to non-Asiatic Indo-European languages.[4] The decline and extinction of Tocharian occurred around the 9th to 10th centuries CE, accelerated by waves of Turkic migrations into the Tarim Basin, including the Uyghur influx around 840 CE and subsequent Karakhanid expansions.[6] Under Tibetan rule from the late 8th century and later Turkic dominance, Tocharian lost its status as a prestige language, leading to bilingualism and gradual shift toward Old Turkic, with the final manuscripts appearing by the 11th to 13th centuries before complete language death.[6][2] This process was compounded by the spread of Islam and the suppression of Buddhism, erasing Tocharian from the linguistic landscape of the region.[6]

Discovery and decipherment

The discovery of Tocharian manuscripts primarily occurred during early 20th-century archaeological expeditions in the Tarim Basin, particularly the German Turfan expeditions conducted between 1902 and 1914. These expeditions, led by Albert Grünwedel and Albert von Le Coq, explored sites in the Turfan oasis, Kucha, and surrounding regions of Xinjiang, uncovering thousands of ancient documents amid the ruins of Buddhist monasteries.[7] The teams recovered over 40,000 fragments in total, including significant numbers written in the Tocharian script on materials such as paper, birch bark, and wood slips, many of which were Buddhist texts preserved in arid cave conditions.[8] The decipherment of the Tocharian script and languages began shortly after these finds reached European collections, with pivotal contributions from German scholars Emil Sieg and Wilhelm Siegling. In 1908, they published an article titled "Tocharisch," in which they identified the unknown language as Indo-European based on grammatical analysis and recognition of familiar roots, distinguishing two dialects: Tocharian A (from the Turfan region) and Tocharian B (from Kucha).[2] Their breakthrough relied heavily on bilingual texts, such as Sanskrit-Tocharian manuscripts like the Udānavarga, where parallel passages allowed for comparative translation and script interpretation. Additional support came from Tocharian-Chinese bilinguals, which provided contextual clues despite linguistic barriers. Sieg and Siegling's subsequent multi-volume work, Tocharische Sprachreste (1921–1953), presented transliterations, facsimiles, and glossaries of key fragments, solidifying the decipherment despite the challenges posed by the highly fragmented state of the papyri and birch-bark manuscripts, many of which were incomplete or damaged by age and environmental exposure.[9] Today, the major collections of Tocharian manuscripts are housed in institutions such as the Berlin State Library, which holds approximately 4,000 fragments from the Turfan expeditions, and the British Library, with over 800 digitized items.[10][11] In total, more than 7,600 Tocharian documents have been cataloged worldwide, enabling ongoing scholarly access through digitization projects.[2]

Development

Origins in Brahmi script

The Brahmi script originated in ancient India around the 3rd century BCE, as evidenced by the rock edicts of Emperor Ashoka, marking its earliest attested use in Prakrit inscriptions across the subcontinent.[12] This abugida system, characterized by syllabic consonants with inherent vowels, served as the foundational writing tradition for numerous Indo-Aryan languages and spread northward with the expansion of Buddhism.[13] The transmission of Brahmi to Central Asia occurred primarily through the Kushan Empire (1st–3rd century CE), a multicultural realm that facilitated cultural exchanges along the Silk Road from northwestern India to Bactria and beyond.[14] Kushan rulers, such as Kanishka, promoted Buddhism and employed Brahmi alongside other scripts like Kharoshthi for administrative and religious purposes, including Sanskrit and Gandhari Prakrit inscriptions in regions like Mathura and Gandhara.[15] The script reached the Tarim Basin by the 2nd century CE via Kushan intermediaries, as evidenced by the Spitzer manuscript (ca. 130 CE), a Sanskrit philosophical text in Kushan Brahmi found at Kizil, introducing Brahmi for Buddhist and other documents in local multilingual environments.[16] In the Tarim Basin, initial adaptations of Brahmi for the Tocharian languages preserved core features, such as the left-to-right direction, while developing cursive forms suited to writing on palm leaves, wood, and later paper.[2] The system incorporated more explicit notation of vowels through diacritics and the addition of specific signs (Fremdzeichen) for Tocharian phonemes like /ä/ and certain consonants, somewhat reducing reliance on the inherent vowel typical of abugidas.[17] This evolution is evident in the North Turkestan Brahmi variant, also known as Slant Brahmi, which incorporated stacked akṣaras and a bar virama for consonant clusters, reflecting phonetic needs distinct from Indian Prakrit.[17] Earliest evidence of Brahmi in the Tarim region includes the Spitzer manuscript from the 2nd century CE, stylistically linked to Kushan Brahmi variants, found in oasis sites like Kizil and predating full Tocharian attestation around the 5th century CE.[18] These precursors, often in Prakrit or Sanskrit, demonstrate the script's establishment before its localization for Tocharian A and B dialects.[2]

Evolution and regional adaptations

The Tocharian script evolved through distinct chronological phases after its adaptation in the Tarim Basin. The initial "Slant Brahmi" phase, also termed Tarim Gupta, emerged in the 4th–5th centuries CE, deriving from Indian Gupta script and primarily used for Sanskrit texts alongside early Tocharian B inscriptions.[19] By the 5th–6th centuries CE, transitional forms known as Early Tarim Brahmi A and B developed, reflecting initial local modifications for Tocharian phonology while retaining Brahmi's abugida structure.[19] The mature phase from the 6th–8th centuries CE introduced cursive styles, evident in both calligraphic Buddhist manuscripts and secular documents on northern and southern Tarim routes, with increased fluidity in letter forms to accommodate faster writing.[20] In the late 8th–10th centuries CE, the "Late Tocharian" phase featured simplified strokes and reduced complexity, coinciding with archaic, classical, and late distinctions in Tocharian B manuscripts, and exclusively for all known Tocharian A texts.[20] Regional variants of the script reflected geographic and cultural differences across the Tarim Basin. Western styles, centered in Kucha, appeared in the 5th–6th centuries CE as Early Tarim Brahmi A, linked to northern Silk Road transmission and used for Tocharian B in administrative and religious contexts.[19] Eastern variants in the Turfan region, known as Tarim Brahmi North 1 and 2, developed by the 6th–7th centuries CE and served multiple languages including Tocharian A and B, Sanskrit, and Tumshuqese, with evidence from over 4,000 manuscripts showing consistent but locally adapted letter proportions.[19] Central areas like Šorčuq exhibited intermediate forms blending western and eastern traits, as seen in dialectal manuscript distributions.[20] The script incorporated influences and adaptations to suit Tocharian's phonological needs and local materials. Diacritics such as the Fremdvokal for the central vowel /ä/ and Fremdzeichen (11 consonant-vowel signs) were added to represent non-native Indo-European sounds absent in standard Brahmi, enabling precise notation of Tocharian's unique features like stacked aksaras and bar virama.[17] Adaptations for writing media included cursive ligatures and abbreviations in paper-based manuscripts, which predominated from the 6th century CE onward, alongside earlier uses on wooden tablets; these changes facilitated denser text in Buddhist and secular documents copied on imported Chinese paper.[20] The Tocharian script's decline began in the late 8th century CE, accelerating after 840–866 CE with Uyghur migration into the Tarim Basin, where Turkic dominance suppressed Tocharian Buddhist institutions and promoted bilingualism.[6] By the 10th–11th centuries, the script was largely replaced by the Old Uyghur script (derived from Sogdian), as Turkic languages like Uyghur became administrative and religious standards under Karakhanid rule, leading to the script's extinction by the 13th–14th centuries CE amid Islamization and cultural shifts.[6]

Description

Character set and alphabet

The Tocharian script is an abugida derived from the Late Brahmi script used in the Kushan Empire, featuring approximately 33–44 consonant akṣaras (syllabic units, including variants and Fremdzeichen) with an inherent vowel /a/ and 8–13 vowel signs (independent and dependent forms). This character set was adapted around the 5th to 8th centuries CE to represent the phonology of the Tocharian languages, including a core of 24 consonants in the standard Indic varga (class) order—ka to ma, ya to ha—supplemented by aspirated stops (e.g., kha, gha) borrowed from Indo-Aryan traditions and regional variants for sibilants and semi-vowels. Despite the script's retention of voiced and aspirated letters from Brahmi, Tocharian phonology merged them with voiceless unaspirated stops, using the extra letters for etymological or loanword distinctions.[17][21] The script accommodates Tocharian's seven simple vowels (i, u, e, o, a, ä, ā) and three diphthongs (ai, au, oi) primarily in Tocharian B, with some in A from loans or archaic forms; there is no phonemic vowel length distinction in the language itself, though the script retains long vowel markers from its Brahmi origins. Archaic forms, closer to Gupta-period Brahmi, appear in early 5th–6th century inscriptions from Kucha and Turfan, while standard, more cursive variants dominate 7th–8th century manuscript evidence from the Tarim Basin.[22]

Vowels

The vowel system includes independent letters for word-initial positions and dependent diacritics for following consonants. Tocharian phonology features short vowels only, with <ā> representing a low central [a] rather than a long vowel; diphthongs are limited primarily to Tocharian B. Unique to Tocharian are the central vowel <ä> [ɨ] (high) and adaptations for non-Indic sounds, often marked with a special "Fremdvokal" sign (AE) in archaic texts.[23]
Independent Form (Romanized)Dependent FormIPA ValueNotes
A-a[ə] or [ʌ] (central unrounded)Inherent vowel; standard short a. Archaic open form in early manuscripts.
Ā[a] (low central)Transcription convention for short low vowel; no true length.
I, Ī-i, -ī[i] (high front)Short i; long marker unused phonemically.
U, Ū-u, -ū[u] (high back)Short u; rounded back vowel.
E-e[e] (mid front)Unrounded mid front; common in verbal endings.
O-o[o] (mid back)Rounded mid back.
Ä (AE, Fremdvokal)[ɛ] or [ə] (central)Unique to Tocharian; often superscript dot or special stroke in standard variant.
AI-ai[ai̯] (diphthong)Low central with front off-glide; primarily Tocharian B, some in A.
AU-au[au̯] (diphthong)Low central with back off-glide; primarily Tocharian B, some in A.
OI-oi[oi̯] (diphthong)Mid back with front off-glide; rare, Tocharian B.

Consonants

Consonants are organized into five varga groups (gutturals, palatals, cerebrals, dentals, labials), plus semi-vowels, sibilants, and aspirates. Each carries an inherent /a/, removed via virāma (a horizontal stroke below) for clusters. Phonetic values align with voiceless stops and affricates ([p, t, k, ts, t͡ʃ]); no phonemic voicing or aspiration distinction exists, so letters for voiced and aspirated consonants represent voiceless unaspirated sounds, with aspirates sometimes denoting fricatives in positions; sibilants distinguish dental [s], retroflex [ʂ], and palatal [ɕ]. (Script letters for voiced stops represent voiceless phonemes; no phonemic voicing distinction.) Subscript (subjoined) forms handle clusters, e.g., -r (ra-phalā, a small r below), -y (ya-phalā, hooked y below), common for prenasalized or liquid sequences. Aspirated stops (kh, gh, etc.) derive from Indo-Aryan but often denote fricatives [x, ɣ] or breathy voice in Tocharian. Archaic variants feature angular strokes, while standard forms are rounded and cursive. Not all Brahmi letters are used; e.g., voiced aspirates (jh, bh) are rare. Retroflex letters (ṭ, ḍ, etc.) are retained for loanwords but represent dental/alveolar sounds in native words.[23][22][17][21]
Varga/GroupRomanized (with inherent a)IPA ValueNotes
Gutturalska[k]Voiceless velar stop.
kha[k] or [x]Aspirate letter; represents voiceless stop or fricative in intervocalic positions. Archaic hooked form.
ga[k]Voiced letter represents voiceless velar stop; from palatalized k in some dialects.
gha[k] or [ɣ]Aspirate voiced letter; represents voiceless or fricative; Indo-Aryan influence; rare.
ṅa (ṅ)[ŋ]Velar nasal; subjoined in clusters.
Palatalsca[t͡ʃ]Voiceless palatal affricate.
cha[t͡ʃ]Aspirate letter; used for [t͡ʃ] or [ɕ] sibilant.
ja[t͡ʃ]Voiced letter represents voiceless affricate; rare.
ña (ñ)[ɲ]Palatal nasal.
Cerebralsṭa[t] or [ts]Retroflex letter; represents dental/alveolar voiceless stop or affricate in native words; for [ʈ] in loans.
ṭha[t]Aspirate retroflex letter; rare.
ḍa[t]Voiced retroflex letter represents voiceless; uncommon.
ṇa (ṇ)[n]Retroflex nasal letter; represents dental.
ṣa (ṣ)[ʂ]Retroflex sibilant.
Dentalsta[t]Voiceless dental stop.
tha[t]Aspirate letter represents voiceless dental.
da[t]Voiced letter represents voiceless dental stop.
na (n)[n]Dental nasal.
la (l)[l]Lateral approximant.
sa (s)[s]Dental sibilant.
Labialspa[p]Voiceless bilabial stop.
pha[p]Aspirate letter represents voiceless bilabial.
ba[p]Voiced letter represents voiceless bilabial stop.
ma (m)[m]Bilabial nasal.
Semi-vowels & Othersya (y)[j]Palatal glide; subscript for palatalization.
ra (r)[r]Alveolar trill; subscript ra-phalā common.
va (v)[w] or [β]Labial glide; varies by position.
śa (ś)[ɕ]Palatal sibilant.
ha (h)[h]Glottal fricative.
Special symbols include the anusvāra (a superscript dot above a consonant, romanized as ṃ, for nasalization before heterorganic stops, e.g., [n] before [k]) and visarga (a small h or two dots to the right, romanized as ḥ, for aspiration or vowel-final breathiness in sandhi, e.g., aḥ + i > ai). These follow Brahmi conventions but are used sparingly in Tocharian to reflect euphonic rules.[17][22]

Orthographic features and conventions

The Tocharian script functions as an abugida, a type of writing system in which each consonant glyph inherently represents a syllable with the vowel /a/ attached, while other vowels are denoted by dependent diacritics known as matras positioned above, below, or to the side of the consonant.[17] These matras include forms for vowels such as /i/, /u/, /e/, /o/, and the distinctive central vowel /ä/, with the latter often marked by two dots or specialized "Fremdzeichen" (foreign signs)—11 unique consonant-vowel combinations specifically for consonants followed by /ä/.[17] To suppress the inherent /a/ and form consonant clusters or final consonants, scribes employed a virama, typically rendered as a horizontal or diagonal bar preceding the affected letter, allowing for stacked or subscript forms in conjuncts.[17][24] Writing proceeds from left to right in horizontal lines, with no separation between words, resulting in a continuous flow of akṣaras (syllabic units) that may span line breaks without indicators.[24] Consonant clusters are typically represented by subscript forms stacked downward beneath the primary consonant, facilitating a compact cursive style where letters join fluidly without distinct gaps.[17] Punctuation is minimal but includes the danda—a single vertical stroke (|) to mark sentence or phrase ends—and the double danda (||) for major divisions, alongside occasional dots or double dots for pauses; these elements help structure the otherwise unbroken text.[25] The script incorporates phonological adaptations tailored to Tocharian's sound system, such as dedicated letters for palatalized consonants (e.g., ś for palatal /ɕ/ derived from earlier k, and c from t), which reflect sound changes like palatalization before front vowels in verbal roots.[24] The vowel /ä/ receives special orthographic treatment via diacritics or the Fremdzeichen to capture its reduced central quality, distinct from the inherent /a/, aiding in the representation of Tocharian's vowel alternations (e.g., i/u shifting to y/w in certain positions).[17][24] For loanwords from Sanskrit, the script preserves original features like retroflex consonants (e.g., using dedicated glyphs for , , ) and stem vowels, though Tocharian often simplifies non-final a/ā to /ä/ or omits them, as in adaptations of Buddhist terminology.[21] Scribal practices emphasize efficiency in Buddhist manuscript production, including abbreviations for recurrent terms such as postak (from Sanskrit pustaka, meaning "book" or "manuscript") and pñi (from Sanskrit puṇya, denoting "merit"), which appear frequently in colophons to denote religious context or donor intentions.[26] Texts are laid out on oblong palm-leaf or paper folios in the pustaka format, typically featuring 4–9 lines per side with a central string hole for binding, and versos numbered sequentially; corrections, when needed, involve overstriking erroneous akṣaras with lines or dots.[26][27] These conventions, evident in over 4,000 surviving fragments from the 6th–11th centuries CE, reflect a standardized monastic tradition adapted to the Tarim Basin's arid preservation conditions.[17]

Usage

Primary texts and inscriptions

The surviving corpus of Tocharian writings comprises approximately 9,800 fragments, with about 8,100 in Tocharian B and 1,700 in Tocharian A, the majority of which are small and highly fragmentary.[21][28] These include Buddhist canonical texts such as sutras and vinaya translations from Sanskrit, as well as para-canonical works like poetry, narratives, and dramatic pieces; monastic documents encompassing letters, contracts, accounts, confessions, donations, and blessings; and secular literature featuring medical treatises, grammatical works, word lists, and a rare love poem.[21][29] Additional secular items comprise technical texts on calendrical matters, magic, and divination, alongside practical artifacts like merchant border passes and graffiti.[29][1] Among the earliest examples are inscriptions in Kizil Caves, carbon-dated to 245–340 CE.[30] The manuscripts originate primarily from Buddhist monastic sites along the Tarim Basin in Xinjiang, China, including the oases of Kucha (for Tocharian B) and Turfan, with significant collections from the Subashi temple complex near Kucha and cave temples such as Kizil and Kumtura.[1][31] Writing media vary, encompassing birch bark for pothi-format codices, wooden tablets for documents, and occasionally silk or paper, reflecting adaptations to local resources and Indian manuscript traditions.[32][33] Among notable examples, the Tocharian B translation of the Mahāparinirvāṇa Sūtra stands out as one of the longest preserved texts, appearing in bilingual Sanskrit-Tocharian fragments that highlight translational practices. Epigraphic uses of the script include donor inscriptions on cave walls at sites like Kizil, recording names and dedications by patrons supporting Buddhist art and architecture.[34] Preservation has been aided by the arid desert climate but challenged by fragmentation, insect damage, and post-excavation handling, resulting in many texts surviving only as isolated folios or scraps.[35] Radiocarbon dating of select manuscripts confirms production mainly from the 5th to 8th centuries CE, with some extending to the 10th century, though inscriptions date as early as the 3rd–4th centuries CE.[35][30]

Dialectal variations in script usage

The Tocharian script, derived from Brahmi, exhibits dialectal variations primarily in orthographic conventions and stylistic tendencies between Tocharian A (Agnean) and Tocharian B (Kuchean), reflecting their geographic and temporal separation. Tocharian A texts, primarily from the eastern Turfan region and dating to the 7th–10th centuries CE, display a more uniform and conservative orthography, often preserving distinct representations for Sanskrit loanwords with fidelity to original forms.[21] In contrast, Tocharian B, prevalent in the western Kucha area from the late 4th century CE onward, shows greater orthographic variability and innovation, including frequent abbreviations and adaptations in everyday and administrative contexts.[24][36] A key orthographic distinction lies in the treatment of palatalized consonants. In Tocharian A, the Proto-Tocharian sequence *-sḱ- typically simplifies to -ṣ-, as seen in verbal forms like the causative suffix (e.g., āśäṣ 'leads').[23] Tocharian B, however, geminates this to -ṣṣ-, reflecting a more explicit palatalization process (e.g., aiṣṣäm 'splits' from *-sḱ-).[23][37] Vowel representation also diverges: Tocharian A frequently inserts the central vowel ä to break consonant clusters, maintaining syllable structure conservatively (e.g., tänmäṣtär 'creator'), while Tocharian B relies more on diphthongs like ai and au and shows less consistent ä usage.[23][24] Stylistically, Tocharian B manuscripts from Kucha often feature rounder, cursive forms suited to wood tablets and administrative documents, whereas Tocharian A inscriptions tend toward a more angular, formalized script in Buddhist liturgical contexts.[36] Despite these differences, both dialects share a core Brahmi-derived alphabet with inherent a vowels and similar consonant inventories, indicating a common origin before the dialectal split around the 5th century CE.[21] Tocharian A's fidelity to Sanskrit loans underscores its role as a liturgical language, while Tocharian B's innovations, such as abbreviated forms, align with its vernacular and practical usage, heavily influenced by Buddhist texts.[38] These variations facilitate the attribution of fragmentary manuscripts to specific dialects, aiding reconstructions of cultural exchanges along the Silk Road, particularly Tocharian B's deeper integration with Buddhist traditions in the west.[21][24]

Modern Study

Transcription systems

The romanization of Tocharian script into the Latin alphabet relies on a standardized system that employs diacritics to capture the phonetic distinctions of the Brāhmī-derived characters, facilitating scholarly analysis of the extinct languages. This system, formalized in the Tocharisches Elementarbuch (TEB) by Werner Krause and Werner Thomas (1960), marks long vowels with a macron (e.g., ā for /aː/), uses a tilde for the palatal nasal (ñ), and distinguishes sibilants with acute accents (ś for palatal, ṣ for retroflex). It also represents aspirated stops as digraphs (e.g., kh, th, ph) and includes special symbols for unique sounds like the front rounded vowel ä, often derived from a distinct "Fremdzeichen" in the script.[24] A foundational version of this diacritic-based approach appears in Walter Couvreur's Syntaxe du tokharien B (1948) and his comparative grammar (1947), which prioritizes phonetic fidelity through examples like ś and ñ to reflect palatalization processes inherited from Proto-Indo-European. For practical phonetic transcription, Douglas Q. Adams, in works such as A Dictionary of Tocharian B (1999, revised 2013), modifies the standard by simplifying clusters (e.g., rendering kṣ as ks to approximate pronunciation) and minimizing diacritics where they obscure morphological patterns, aiding comparative linguistics without sacrificing accuracy.[39] Key conventions include consistent use of the macron for vowel length to distinguish phonemic contrasts (e.g., a vs. ā), postfixing 'h' for aspiration in stops, and resolving consonant clusters based on historical sound changes (e.g., PIE *kʷ > kṣ in Sanskrit loans, transcribed as kṣ but often simplified to ks in phonetic renderings). In handling ambiguous readings from fragmentary manuscripts, scholars employ diplomatic transcriptions to replicate exact akṣara forms (including damaged or unclear signs marked with parentheses or question marks), while normalized versions interpret based on contextual grammar and parallel texts.[24][40] These systems evolved from the ad hoc transcriptions introduced by Emil Sieg and Wilhelm Siegling in Tocharische Sprachreste (1921), which first adapted Sanskrit-style romanization to Tocharian's innovations like the ä-sign, without uniform diacritic rules. By the mid-20th century, the TEB provided a comprehensive standard, incorporating insights from Couvreur's analyses; contemporary approaches, seen in projects like the Comprehensive Edition of Tocharian Manuscripts (CEToM), integrate IPA-influenced notations for precise phonology while adhering to TEB basics, and databases such as TITUS employ dual transliteration (syllabic and linear) for digital accessibility.[41][28][42] Transcription faces challenges from dialectal inconsistencies, such as Tocharian A's more uniform but archaizing orthography versus Tocharian B's variable plene spelling (full vowel notation with matras), which can obscure etymologies. Scribal errors, including ligatures and omissions in the over 7,600 surviving fragments and manuscripts, compound ambiguities, particularly in proper names or loanwords. Publishing guidelines, as outlined in modern editions, recommend diplomatic transcriptions for raw data fidelity and normalized ones for interpretive editions, with footnotes detailing variants to address these issues systematically.[20][2]

Digital representation and Unicode

The Tocharian script remains unencoded in the Unicode Standard as of Unicode 17.0, despite ongoing proposals for its inclusion in the Supplementary Multilingual Plane. A key proposal submitted in 2015 by the Script Encoding Initiative at the University of California, Berkeley, outlines the script's encoding model, advocating for a block of approximately 80 characters to accommodate its Brahmi-derived structure.[25] This provisional allocation is listed in the Unicode roadmap at U+11E00–U+11E7F, subject to final approval by the Unicode Technical Committee and ISO/IEC JTC1/SC2/WG2.[43] The proposed encoding treats Tocharian as an alphabetic script with left-to-right directionality, featuring 38 base consonant and vowel letters, along with combining diacritics for its distinctive Fremdvokal system—superscript marks indicating inherent vowels after consonants.[25] For example, base forms would include characters like the letter a (proposed U+11E00) and combining marks for i or u (e.g., U+11E70 series), ensuring compatibility with other Brahmi-family scripts such as Devanagari or Tibetan through shared shaping behaviors in OpenType fonts.[17] This model avoids compatibility ideographs, prioritizing decomposable grapheme clusters for accurate text processing and searchability. Font and tool support for Tocharian is currently limited due to its provisional status, relying on prototype implementations rather than standard distributions. Custom fonts, such as the Tocharian A typeface developed by designer Lee Wilson, provide over 3,000 glyphs to handle ligatures and variant forms, often mapped to Private Use Area codepoints (U+E000–U+F8FF) for interim digital use.[44] Input methods are rudimentary, typically involving image-based insertion or custom keyboard layouts in software like Microsoft Word or Adobe InDesign via grapheme cluster extensions; rendering challenges persist, particularly with cursive joining behaviors and stacked vowel marks, which require advanced font features not yet standardized. No major font families like Noto Sans include Tocharian support, though experimental OpenType tables in tools like Graphite or HarfBuzz demonstrate potential for bidirectional text handling once encoded.[25] Post-2010 developments include refined proposals addressing archaic variants, such as elongated forms and dialect-specific ligatures observed in Kucha and Turfan manuscripts, to enhance coverage for historical accuracy.[17] These efforts facilitate integration into digital archives; for instance, the International Dunhuang Project (IDP) digitizes Tocharian B fragments from the British Library, providing high-resolution images and Latin-based transcriptions for scholarly access, bridging the gap until full Unicode support enables native script rendering.[45]

References

User Avatar
No comments yet.