Hubbry Logo
Saka languageSaka languageMain
Open search
Saka language
Community hub
Saka language
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Saka language
Saka language
from Wikipedia
Saka
Khotanese, Tumshuqese
Khotanese animal zodiac BLI6 OR11252 1R2 1
Native toKingdom of Khotan, Tumshuq, Murtuq, Shule Kingdom,[1] and Indo-Scythian Kingdom
RegionTarim Basin (Xinjiang, China)
EthnicitySaka
Era100 BC – 1,000 AD
developed into Wakhi[2]
Dialects
  • Khotanese
  • Tumshuqese
  • Kanchaki?
Brahmi, Kharosthi
Language codes
ISO 639-2kho
ISO 639-3Either:
kho – Khotanese
xtq – Tumshuqese
kho (Khotanese)
 xtq (Tumshuqese)
Glottologsaka1298
Khotanese Verses BLE4 IOLKHOT50 4R1 1
Book of Zambasta BLX3542 OR9614 5R1 1

Saka, or Sakan, was a variety of Eastern Iranian languages, attested from the ancient Buddhist kingdoms of Khotan, Kashgar and Tumshuq in the Tarim Basin, in what is now southern Xinjiang, China. It is a Middle Iranian language.[3] The two kingdoms differed in dialect, their speech known as Khotanese and Tumshuqese.

The Saka rulers of the western regions of the Indian subcontinent, such as the Indo-Scythians and Western Satraps, are traditionally assumed to have spoken practically the same language.[4] This has however been questioned by more recent research.[5]

Documents on wood and paper were written in modified Brahmi script with the addition of extra characters over time and unusual conjuncts such as ys for z.[6] The documents date from the fourth to the eleventh century. Tumshuqese was more archaic than Khotanese,[7] but it is much less understood because it appears in fewer manuscripts compared to Khotanese. The Khotanese dialect is believed to share features with the modern Wakhi and Pashto.[8][9][10][11][12][13][14] Saka was known as "Hvatanai" (from which the name Khotan) in contemporary documents.[15] Many Prakrit terms were borrowed from Khotanese into the Tocharian languages.

Classification

[edit]

Khotanese and Tumshuqese are closely related Eastern Iranian languages.[16]

The unusual phonological development of Proto-Iranian *ćw to Khotanese śś sets the latter apart from most other Iranian languages (which usually have sp or a product thereof). Similarities with Sogdian exist but could be due to parallel developments or areal features.[17]

History

[edit]

The two known dialects of Saka are associated with a movement of the Scythians. No invasion of the region is recorded in Chinese records and one theory is that two tribes of the Saka, speaking the two dialects, settled in the region in about 200 BC before the Chinese accounts commence.[18]

Michaël Peyrot (2018) rejects a direct connection with the "Saka" (塞) of the Chinese Hanshu, who are recorded as having immigrated in the 2nd century BC to areas further west in Xinjiang, and instead connects Khotanese and Tumshuqese to the long-established Aqtala culture (also Aketala, in pinyin) which developed since ca. 1000 BC in the region.[19]

The Khotanese dialect is attested in texts between the 7th and 10th centuries, though some fragments are dated to the 5th and 6th centuries. The far more limited material in the Tumshuqese dialect cannot be dated with precision, but most of it is thought to date to the late 7th or the 8th century.[20][21]

The Saka language became extinct after invading Turkic Muslims conquered the Kingdom of Khotan in the Islamicisation and Turkicisation of Xinjiang.

In the 11th century, it was remarked by Mahmud al-Kashgari that the people of Khotan still had their own language and script and did not know Turkic well.[22][23] According to Kashgari some non-Turkic languages like the Kanchaki and Sogdian were still used in some areas.[24] It is believed that the Saka language group was what Kanchaki belonged to.[25] It is believed that the Tarim Basin became linguistically Turkified by the end of the 11th century.[26]

Old Khotanese phonology

[edit]

Consonants

[edit]
[27][28][29]
Labial Dental/Alveolar Retroflex Palatal/

postalveolar

Velar Glottal
Plosive Voiceless Unaspirated p /p/ tt, t /t/ /ʈ/ k /k/ (t, g [ʔ])[a]
Aspirated ph // th // ṭh /ʈʰ/ kh //
Voiced b /b/ d /d/ /ɖ/ gg /ɡ/
Affricate Voiceless Unaspirated tc /ts/ kṣ /ʈʂ/ c, ky //
Aspirated ts /tsʰ/ ch /tʃʰ/
Voiced js /dz/ j, gy //
Fricative Non-Sibilant t /ð/ (later > ʔ) g /ɣ/ (later > ʔ)
Sibilant Voiceless s /s/ ṣṣ, /ʂ/ śś, ś /ʃ/ h /h/
Voiced ys /z/ /ʐ/ ś /ʒ/
Nasal m /m/ n, , /n/ /ɳ/ ñ /ɲ/
Approximant Central v /w/
hv //, //
rr, r /ɹ/ r /ɻ/ y /j/
Lateral l /l/

Vowels

[edit]
Khotanese
Transliteration[30]
IPA Phonemic IPA Phonetic
a /a/ [a]
ā /a:/ [a:]
i /i/ [i]
ī /i:/ [i:]
u /u/ [u]
ū /u:/ [u:]
ä /ə/ [ə]
e /e:/[b] [æ~æ:][c]
o /o:/[b] [o~o:][c]
ai /ai̯/
au /au̯/
ei /ae̯/

Sound changes

[edit]

Khotanese was characterized by pervasive lenition, developments of retroflexes and voiceless aspirated consonants.[31]

Changes shared in common Sakan
  • , *j́s, ys, but *ćw, *j́wśś, ś
  • *ft, *xt*βd, *ɣd
  • Lenition of *b, *d, and *g, ð, ɣ when initially or after vowels or *r
  • Nasals + voiceless consonants → nasals + voiced consonants (*mp, *nt, *nč, *nk*mb, *nd, *nj, *ng)
  • *ər (syllabic consonant) → *ur after labials *m, *p, *b, ; then *ir or *ar elsewhere
  • *rn, *rmrr
  • *sr
  • , tc, js
Changes shared in East Sakan
  • Nasals + voiced consonants → geminate nasals (*mb, *nd*mm, *nn, but *ng remained)
  • Questionable umlaut of *a into i and u before syllables with *i and *u, respectively (*masita*misitamista ~ mästa "big")
  • Lenition of *p, *t, , and *kb, d, ǰ, and g after vowels or *r
  • *f, *x, before consonants
  • *i̯ between vowels a, i and a consonant (*daxsa-*daɣsa-*daisa-dīs- "to burn")
  • w; , after vowels
  • *rðl
  • *f, , *x*h after vowels
  • *w, *j, initially
  • *f, , *x, ð, ɣ initially before *r (θrayahðrayidrai "three")
  • Lengthening of stressed vowels before clusters *rC and *ST (sibilants + dentals) (*sarta*sārtasāḍa "cold", *astakaāstaa "bone" but not *aštā́haṣṭā "eight").
    • Compensatory lengthening of vowels, before clusters containing non-sibilant fricatives and *r (*puhripūrä "son", darɣadārä "long"), however, -ir- and -ur- from earlier *ər were unaffected (*mərɣa-mura- "fowl").
  • Reduction of internal unstressed short and long vowels (*hámānaka*hamanakahamaṅgä)
  • *uwu
  • , ð, ʝ, ɣ > b, d, ɟ, g initially
  • *f, , *xph, th, kh (remaining instances)
  • *rthṭh; *rt, *rd
  • Lenition of b, d, g (from earlier voiceless consonants) → β (→ w), ð, ɣ after vowels or *r
    • also phonetically became or in this position.
  • Palatalization of certain consonants:
Earlier Later
*ky c, ky
*gy j, gy
*khy ch
*tcy c
*jsy j
*tsy ch
*ny ñ, ny
*sy śś
*ysy ś
*st, *ṣṭ śt, śc

Texts

[edit]
Manuscript in Khotanese from Dandan Oilik, NE of Khotan. Now held in the British Library.

Other than an inscription from Issyk kurgan that has been tentatively identified as Khotanese (although written in Kharosthi), all of the surviving documents originate from Khotan or Tumshuq. Khotanese is attested from over 2,300 texts[32] preserved among the Dunhuang manuscripts, as opposed to just 15 texts[33] in Tumshuqese. These were deciphered by Harold Walter Bailey.[34] The earliest texts, from the fourth century, are mostly religious documents. There were several viharas in the Kingdom of Khotan and Buddhist translations are common at all periods of the documents. There are many reports to the royal court (called haṣḍa aurāsa) which are of historical importance, as well as private documents. An example of a document is Or.6400/2.3.

See also

[edit]

Notes

[edit]

References

[edit]

Sources

[edit]

Further reading

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The Saka language, also referred to as Sakan, is an extinct variety of within the Indo-Iranian branch of the Indo-European family, primarily attested in the of northwestern China through and inscriptions from the ancient kingdoms of Khotan and Tumshuq. Spoken by the people—nomadic Iranian tribes who settled in these oases—it represents a Middle Iranian stage of development, bridging Old Iranian forms like and later such as and Ossetic. The language survives in two principal dialects: Khotanese, the dominant form used in the Kingdom of Khotan from roughly the 5th to 10th centuries CE, and Tumshuqese, attested in earlier documents from the Kingdom of Tumshuq dating to the 5th–7th centuries CE. Khotanese Saka, the better-documented dialect, was written in a cursive derivative of the adapted for Iranian , featuring innovations like the representation of specific Iranian sounds absent in Indic scripts. Most surviving texts are religious, including translations of Buddhist sutras, texts, and medical treatises from originals, reflecting the kingdom's role as a major center of along trade routes. Tumshuqese, less extensively preserved, shows archaic features and was also employed for Buddhist literature, with documents discovered in the ruins of Tumshuq near . Linguistically, exhibits characteristic Eastern Iranian traits, such as the development of from Proto-Iranian *č and *ǰ, and it influenced neighboring languages through loanwords and cultural exchanges. The study of Saka has been advanced by scholars like Harold Walter Bailey, whose works, including the Dictionary of Khotan Saka (1979), established it as a key to understanding Eastern Iranian evolution and Central Asian history. Despite its following the Islamic of the region in the 10th century, Saka's legacy persists in modern like Wakhi, which share phonological and lexical affinities.

Classification

Linguistic affiliation

The Saka language is classified within the Indo-European language family, specifically as part of the Indo-Iranian branch, the Iranian subgroup, and the Eastern Iranian division, forming its own Saka branch alongside other extinct varieties like and Alanian. This hierarchical placement reflects its evolution from Proto-Indo-European through shared Indo-Iranian features, such as the merger of PIE *s with *ś and the development of aspirated stops into fricatives, before diverging into Iranian-specific traits. As a Middle Iranian language group, Saka is distinguished from like Parthian and by its geographical and linguistic separation, with Saka attested primarily in during the first millennium CE. Unlike , which shows innovations such as the loss of initial *y- and widespread , Saka retains certain Eastern characteristics, including the preservation of intervocalic stops in some forms. Comparative linguistics provides key evidence for Saka's Eastern Iranian affiliation through shared innovations, notably the development of Proto-Iranian *č (from Indo-Iranian *ć < PIE *ḱ) to /s/ in Eastern Iranian languages, contrasting with /θ/ in Western Iranian; for instance, "hundred" appears as Avestan *satəm and Khotanese Saka *sa- versus Old Persian *θata-. This sound shift, along with the retention of *θ in other positions (e.g., Sogdian myθ "month"), underscores the unity of the Eastern branch. The name "Saka" originates from ancient attestations linking it to the nomadic peoples described in classical sources, where Herodotus (Histories 4.6) identifies the Saka as Scythians of the eastern steppes, with the term appearing in Old Persian inscriptions as Saka- for these tribes. This etymological connection ties the language to the ethnic Saka/Scythian identity, though the precise root remains debated among Iranists.

Relation to other Iranian languages

The Saka language belongs to the Eastern Iranian branch and is classified as part of the broader Scythian group of languages, which encompasses various nomadic Iranian dialects spoken across the Eurasian steppes and Central Asia. This affiliation is evidenced by shared phonological innovations. Saka exhibits close links to and Sogdian through retained archaisms from Old Iranian, including certain nominal endings like the genitive plural forms that preserve long-vowel patterns not generalized in all Eastern Iranian languages. In Khotanese Saka, the third-person plural ending -āre mirrors developments in Sogdian and Yaghnobi, while causative suffixes like *-āuśaiśa- appear in both Saka and Sogdian, indicating shared morphological heritage. However, Saka diverges by losing aspiration in voiced stops, a change common across Iranian languages but leading to unique phonetic outcomes, such as simplified consonant clusters compared to Avestan's more conservative system. Cognates like Saka *aspa- 'horse' directly correspond to Avestan aspa- and Sogdian asb-, underscoring lexical continuity. Relative to modern Eastern Iranian languages, Saka represents an early parallel branch rather than a direct ancestor, sharing retroflexion patterns (e.g., *rt > ḍ) with and Wakhi but differing in specifics like *śr > ṣ. It aligns with Ossetic in broader traits, as Ossetic descends from the Alanian dialect of the Scytho-Sarmatian continuum, though debates persist on whether Saka forms a coordinate or a distinct eastern offshoot within this . Examples include Saka nāma- 'name' with nām and Ossetic nom, highlighting persistent lexical ties despite geographic separation. Scholars debate the exact phyletic position, with some viewing Saka as more innovative than but archaically conservative compared to later developments.

Dialects

Khotanese

Khotanese, the most extensively documented dialect of the Saka language, was spoken in the Kingdom of Khotan located in the southern of modern-day , , where it served as the primary language of a vibrant Buddhist culture from the 5th to the 10th centuries CE. This dialect flourished amid the oasis city-states along the , with its speakers engaging in , administration, and religious scholarship, producing a rich literary tradition centered on . The Khotanese dialect is traditionally divided into two main phases: Old Khotanese (ca. 5th–8th centuries CE) and Late Khotanese (9th–10th centuries CE), with some scholars proposing a transitional Middle Khotanese phase in the 7th–8th centuries. Old Khotanese represents an earlier, more conservative stage, characterized by a complex phonological system with 11 phonemes distinguished by quantity, while Late Khotanese shows shifts toward qualitative distinctions in vowels and greater simplification in morphology, such as the merger of certain case endings and the reduction of forms. Morphologically, Old Khotanese retains a relatively intact inflectional system inherited from Old Iranian, including six cases in the singular (nominative, accusative, , ablative, genitive-dative, and locative) and five in the , reflecting a partial preservation of the proto system's eight cases; in Late Khotanese, these undergo further , with the genitive-dative evolving into a general oblique marker. Key linguistic features of Khotanese include its use of a Brahmi-derived script, adapted into a distinctive form known as Khotanese Brahmi, which evolved from earlier ornamental styles to more fluid variants in later periods. This script facilitated the recording of Buddhist sutras, commentaries, and administrative documents, underscoring the dialect's role in preserving Indo-Iranian heritage within a Buddhist context. Attestation of Khotanese is abundant, with over 2,300 manuscripts and fragments surviving, predominantly such as translations of the Tripitaka and indigenous compositions like the Book of Zambasta. These documents, discovered in sites like and the Khotan region, highlight the dialect's conservatism in vocabulary, retaining archaic Iranian terms such as ssa from Old Iranian sata- 'hundred', which persisted unchanged across phases unlike innovations in neighboring languages. In contrast to the fragmentarily attested Tumshuqese, Khotanese's extensive corpus provides the primary window into linguistic evolution.

Tumshuqese

Tumshuqese, a of the Saka language, was spoken in the Tumshuq region of the northern in what is now , , located north of the Khotan oasis and in proximity to areas where were used, suggesting it may represent a northern or earlier variant of Eastern Iranian speech. This geographic position likely contributed to its distinct development, potentially incorporating substrate influences from neighboring non-Iranian languages. The attestation of Tumshuqese is extremely limited, consisting of roughly 15 fragmentary manuscripts discovered by archaeologist during his second Central Asian expedition (1906–1908) at the ancient site of (modern Tumshuq); recent handlists identify around 67 fragments, though many are small, with key publications by Konow () and Maue (). These documents, primarily written in a Northern Brahmi-derived script, date to the late 7th or early CE based on paleographic and historical analysis, and include a mix of secular materials such as administrative contracts, letters, and economic records, alongside a few like portions of jātakas. The small corpus has made comprehensive study challenging, with many fragments remaining unpublished or only partially transliterated, limiting insights into its full grammatical structure. Key phonological features of Tumshuqese distinguish it as more archaic than its southern counterpart, Khotanese; notably, it retains initial *s- from Proto-Iranian where Khotanese innovates with h-, such as 'seven' (Tumshuqese sapt vs. Khotanese hapta). Morphologically, Tumshuqese displays a simplified system with fewer nominal cases—typically reduced to nominative, accusative, and genitive-dative—compared to the more elaborate case inventory in Khotanese, reflecting possible analogical leveling or contact-induced changes. Pronominal forms also preserve earlier Indo-Iranian stages, such as the first-person singular *azū versus Khotanese āzu. Scholars debate the precise status of Tumshuqese within the dialect continuum, with some viewing it as a distinct northern closely related to Khotanese but conservative in , while others propose it as a transitional form potentially influenced by Tocharian due to shared geographic and cultural contacts, evidenced by lexical borrowings and script adaptations. This uncertainty stems from the sparse evidence, but analyses of shared innovations confirm its affiliation as an Eastern Middle Iranian alongside Khotanese.

History

Origins and speakers

The Sakas were nomadic Eastern Iranian tribes belonging to the broader cultural and linguistic confederations, originating from the Central Asian and Eurasian s. Emerging as distinct groups by the BCE, they undertook significant migrations southward and eastward between the 8th and 2nd centuries BCE, prompted by conflicts with neighboring nomads such as the and , which displaced them from their steppe homelands toward the and beyond. Linguistically, the Saka language developed from post-Avestan Old Iranian as an Eastern Iranian variety within the Indo-Iranian branch of the Indo-European family, reflecting the divergence of Iranian speakers in after the 2nd millennium BCE. Through interactions in the , it incorporated influences from neighboring , seen in shared substrate vocabulary related to agriculture and religion (e.g., terms for and ), and from Tocharian, including loans for materials like mud bricks and iron, indicative of early cultural exchanges in the Bactria-Margiana region around 2000 BCE. Early historical evidence of the Sakas appears in Han dynasty Chinese annals (ca. 206 BCE–220 CE), which refer to them as the "Sai" tribes and document their presence in the western regions of Central Asia, including migrations southward due to Yuezhi incursions before 128 BCE. In Indian literary sources, the Mahābhārata (composed ca. 400 BCE–400 CE) portrays the Sakas as a foreign mleccha tribe settled along the Indus River banks, associating them with northwestern borderlands and conflicts involving Indo-Aryan kingdoms. By the 1st century BCE, Saka groups had settled in the oases, founding kingdoms in Khotan and , where they shifted from pastoral nomadism to sedentary agrarian and mercantile societies, increasingly integrating as a dominant cultural and religious framework.

Period of attestation

The attested corpus of the Saka language, comprising both the Khotanese and Tumshuqese dialects, spans approximately from the mid-5th century CE to the early CE, with the bulk of surviving manuscripts dating to the 7th–10th centuries. The earliest fragments in Tumshuqese, a dialect spoken in the region near modern , are assigned to the late 7th or CE based on paleographic analysis and contextual associations with dated Buddhist artifacts from the . Old Khotanese texts, representing the primary dialect from the Kingdom of Khotan, begin in the second half of the 5th century CE and reach their peak production between the 7th and 9th centuries CE, as evidenced by literary and documentary works such as Buddhist sūtras and administrative records. Late Khotanese, characterized by phonological and orthographic shifts, continued until around 1100 CE, with some fragments possibly extending slightly later. In the sociolinguistic context of the , Saka served as a key administrative and religious language in the Buddhist , facilitating governance, trade along the , and the dissemination of Mahāyāna through translations and commentaries. It coexisted with Tocharian in the northern and eastern oases and Chinese as a lingua franca for imperial interactions, reflecting the multilingual environment of the region where Iranian, Indo-European, and interacted in Buddhist monastic and commercial settings. The decline of began with the gradual Turkic migrations into the , culminating in the conquest of Khotan by the around 1006 CE, which imposed and accelerated toward Turkic varieties, including early Uyghur, leading to the extinction of as a in the following centuries. Archival evidence for these dates derives primarily from colophons—scribes' notes appended to manuscripts that often include explicit dates—and supplementary radiocarbon analyses of the supporting materials, such as or wood, which corroborate the textual attributions for many Khotanese documents from and Khotan sites.

Writing system

Scripts employed

The Saka language, encompassing the Khotanese and Tumshuqese dialects, was recorded using adapted variants of the , an derived ultimately from ancient Indian writing systems that trace their origins to influences through early Indic developments. This script was introduced to the regions around the 4th to 5th centuries CE by Indian Buddhist missionaries, marking the earliest attested writing for ; no indigenous pre-Buddhist script has been documented for the language. For Old Khotanese texts, the primary script was a formal variant of Central Asian Brahmi, influenced by Kushan and forms, which included modifications such as digraphs (e.g., ys for /z/) and new signs to accommodate Iranian phonemes absent in standard . By the period of Late Khotanese (roughly 9th–10th centuries CE), the script evolved into more fluid styles, particularly for administrative and documentary purposes, while retaining core Brahmi structures. Tumshuqese, attested in fewer fragments from the 5th–8th centuries CE, employed formal Brahmi in its North literary style for religious texts like the Karmavacana, alongside cursive business script variants for practical documents; these shared close paleographic ties with Khotanese but featured distinct adaptations for local . Manuscripts in these scripts were inscribed on diverse materials, including wooden slips for early economic records, palm leaves for Buddhist sutras, and paper for later compositions, reflecting the technological exchanges along the . The writing direction was consistently left-to-right, aligning with the standard orientation of Brahmi derivatives.

Phonology

Consonants

The consonant system of Old Khotanese Saka distinguishes phonemes across labial, dental, retroflex, palatal, velar, and glottal places of articulation, with voiceless (plain and aspirated), voiced pairs for stops and affricates, reflecting typical Eastern Iranian features including frequent palatalization and retroflexion. This inventory is reconstructed primarily from transliterations of Brāhmī-script manuscripts and comparative analysis with other Iranian languages. The core stops include voiceless plain /p, t, ṭ, k/ and aspirated /ph, th, ṭh, kh/, with voiced /b, d, ḍ, g/. Affricates comprise palatal voiceless /č/ and aspirated /čh/, voiced /ǰ/, as well as alveolar /ts/ and aspirated /tsh/, voiced /dz/. Fricatives include voiceless /s/ (dental), /ṣ/ (retroflex), /š/ (palatal), /x/ (velar), and /h/ (glottal), with voiced counterparts /z/, /ẓ/, /ž/, /γ/; nasals by /m/ (labial), /n/ (dental, with allophones including palatal /ñ/ and retroflex /ṇ/), liquids by /r/ and /l/ (dental/alveolar, with retroflex /ṛ/), and semivowels by /w/ (labial) and /y/ (palatal). Palatalization processes, common in Eastern Iranian branches, often affect dentals and velars before front vowels, yielding affricates or palatal variants.
LabialDentalRetroflexPalatalVelarGlottal
Stops (voiceless plain)ptk
Stops (aspirated voiceless)phthṭhkh
Stops (voiced)bdg
Affricates (voiceless palatal)č
Affricates (aspirated palatal)čh
Affricates (voiced palatal)ǰ
Affricates (voiceless alveolar)ts
Affricates (aspirated alveolar)tsh
Affricates (voiced alveolar)dz
Fricatives (voiceless)sšxh
Fricatives (voiced)zžγ
Nasalsmnñ
Liquidsr, l
Semivowelswy
Consonant distribution shows broad occurrence in , medial, and final positions, with allophones emerging in specific environments; for instance, /s/ realizes as intervocalically, and occurs in consonant clusters following short vowels, often indicated orthographically by doubled signs. These patterns are evidenced through metrical analysis of Old Khotanese texts and comparative reconstruction from Proto-Iranian.

Vowels

The vowel system of Old Khotanese Saka, as preserved in texts from the 5th to 10th centuries CE, comprises a set of short and long monophthongs, along with diphthongs, reflecting an evolution from an earlier stage with up to 11 phonemes to a later simplification involving distinctions of quality over quantity. The core inventory includes the short vowels /a/, /i/, /u/, /e/, and /o/, contrasted phonemically with their long counterparts /ā/, /ī/, /ū/, /ē/, and /ō/. Diphthongs such as /ai/ and /au/ (with a possible long /āu/) are also attested, though these often monophthongized in later developments to mid vowels like /e/ and /o/. Some reconstructions propose an additional front rounded vowel /ö/, but this remains tentative and is not consistently reflected in primary attestations. Vowel qualities encompass front unrounded (, ), central (, with reduced in derived forms), and back rounded (, ) articulations, varying by height (high for , ; mid for , ; low for ) and distinguished by tense-lax oppositions, where long vowels tend toward tenser realizations. is phonemic, with long vowels typically bearing stress and resisting reduction, while short vowels exhibit laxer qualities and are prone to alteration. Nasalized vowels, such as /ã/, arise in environments adjacent to nasal consonants (e.g., before /n/ or /m/) and are phonologically distinct, often functioning as separate phonemes in specific morphological contexts. In terms of distribution, vowels show positional constraints: long vowels are rare in initial syllables and more common in stressed medial or final positions, whereas short vowels predominate in unstressed syllables, where they frequently reduce to schwa (/ə/) or entirely, as in examples like *tin-an > zn-an (short /i/ ). is limited, appearing sporadically in morphological environments rather than as a pervasive rule, with no strong evidence for systematic front-back or height-based assimilation across the corpus. Short vowels in unstressed positions undergo reduction more readily than long ones, contributing to syncope of initial or interior unstressed vowels, a hallmark of Khotanese . Orthographically, are represented in the Brahmi-derived script of Khotanese texts, where inherent s follow consonant signs unless explicitly modified by diacritics or independent letters. Short /a/ is typically inherent to consonants, while other short (/i/, /u/, /e/, /o/) use matras (vowel signs) attached to the base akṣara. Long are indicated by elongated forms or additional markers, such as two dots above for /ī/ and similar diacritics for /ā/ and /ū/. Diphthongs like /ai/ and /au/ employ combined signs, and is marked by a single superscript dot over the . This system, adapted from Kushan Brahmi, allows for the notation of length and quality but often omits short in non-final positions unless contextually necessary, reflecting the script's consonantal bias.

Phonological processes

The Saka languages, as Eastern Iranian varieties, inherited several phonological processes from Proto-Iranian, including the satemization of Proto-Indo-European palatovelars into affricates and fricatives. In Old Khotanese, Proto-Indo-European *ḱ and *ǵ developed into palatal affricates *ć and *ǰ, which further evolved into alveolo-palatal affricates such as /t͡s/ and /d͡z/, as evidenced in comparative forms across Eastern Iranian (e.g., developments in numerals or other inherited roots). This process aligns with broader Eastern Iranian satem features, where palatals affricated before velars in some contexts, such as before /w/ in Khotanese aśśa "" from Proto-Iranian *aspa- (with *ćw > śś assimilation). A hallmark change in Khotanese is the debuccalization of Proto-Iranian *s to /h/ in word-initial and intervocalic positions, distinguishing it from more conservative ; for example, Proto-Iranian *sindhu- "river" yields Khotanese hīdu, while *θ > h occurs in *raθa- "" > rraha-. Syncope of unstressed vowels, particularly initial and medial ones, is another inherited process, simplifying forms like Proto-Iranian *apa- > Khotanese pa- "back, away." These changes reflect typical of Middle Iranian stages, with further weakening in Late Khotanese through fricativization and voicing. Internal phonological rules in Old Khotanese include progressive and regressive assimilation, notably intervocalic voicing of fricatives and stops, as in *jsa- > [dza]- in verbal roots, and nasal assimilation in clusters leading to , such as potential nt > nn in participial forms (though less attested due to sparse corpus). Metathesis occurs in consonant clusters, particularly involving liquids and fricatives, exemplified by Proto-Iranian *čaθwāra- "four" > Khotanese tcahaura- via *θw > hw metathesis followed by simplification. Vowel epenthesis, often of glides like /y/, breaks hiatus in vowel sequences, as in ā + i > āyi in nominal compounds like nätāyi "river" (from *nāra- + -i). Palatalization triggered by /i/ or /y/ affects adjacent consonants, affricating dentals to /t͡s/ or fricativizing them to /ś/, as in *mästa- > mästa- "drunk." Dialectal variation is pronounced between Khotanese and the more archaic Tumshuqese, where Tumshuqese retains Proto-Iranian *s in positions where Khotanese shifts to /h/, as in Tumshuqese reṣth- "send" versus Khotanese hīṣṭ- (from *hiṣṭa-). Tumshuqese also preserves intervocalic *š as /ž/, yielding pyežu "protect" (from *pāś-), while Khotanese simplifies to pyūʾ via further lenition. In Late Khotanese, additional lenition includes spirantization of stops and diphthong reduction, contrasting with Tumshuqese's relative conservatism in clusters. Comparatively, Khotanese processes align closely with Avestan in retaining aspirated series from Proto-Iranian fricatives (e.g., *x > kh) but diverge in *s > h, which Avestan resists, preserving s (cf. Avestan sə̄na- "river" vs. Khotanese hīdu). With Sogdian, shared Eastern Iranian traits include retroflex development from *r and *l (e.g., Khotanese muḍa- "wine" from *madhu-, akin to Sogdian muḍ) and cluster simplifications, though Sogdian favors δ > z shifts absent in early Khotanese. These alignments underscore Saka's position as a transitional Eastern Iranian branch.

Morphology and syntax

Nominal system

The nominal system of the Saka language, as attested primarily in Khotanese texts, is an inflectional paradigm inherited from Old Iranian, featuring three grammatical genders—masculine, feminine, and neuter—and two numbers: singular and plural, with the being rare and mostly confined to pronouns or fixed expressions. Nouns and adjectives inflect according to stem classes, including vocalic stems in -a (masculine or neuter), -ā and -i (feminine), -u (masculine, sparsely attested), and various consonant stems (predominantly masculine, but some feminine or neuter). This system reflects a reduction from the fuller Proto-Indo-Iranian morphology, with neuter forms often marginal in usage by the attested period. The case system comprises eight categories—nominative, accusative, genitive, dative, ablative, , locative, and vocative—directly inherited from Old Iranian, though extensive occurs, particularly distinguishing direct (nominative-accusative-vocative) versus oblique forms in both singular and plural. In Old Khotanese, the singular preserves six distinct case endings, while the plural retains five, with the genitive and dative merging early into a single genitive-dative form across genders and stems. The locative typically remains distinct, often marked by suffixes like -tä in singular masculine a-stems, while the instrumental and ablative show partial overlap even in earlier texts. Declension patterns vary by stem type and gender. Masculine a-stems, the most productive class, follow a paradigm where the nominative singular ends in -a (e.g., balysa 'Buddha'), the accusative in -änu, genitive-dative in -i or -sya, ablative in -ätsä, instrumental in -ä, locative in -tä, and vocative in -a. For instance, a hypothetical masculine a-stem like rrāma 'joy' (cognate with Avestan rāma-) would decline as nominative singular rrāma, genitive-dative rrāmi, instrumental rrāmä, reflecting the typical shortening and vowel shifts in this class. Feminine ā-stems, such as mātā 'mother', exhibit nominative singular -ā, genitive-dative -āi, and locative -āta, while i-stems like būmi 'earth' show -i in nominative singular, -e in genitive-dative, and -iṣta in locative. Neuter n-stems, less common, align closely with masculine a-stems but often lack distinct nominative-accusative plural forms. Consonant stems, such as masculine ttā 'father' (r-stem), preserve older endings like genitive-dative -au and instrumental -ā, but show analogical leveling toward vocalic patterns. Adjectives agree with nouns in case, number, and , typically following the same class as the head ; for example, a masculine a-stem adjective like hīna- '' declines parallel to balysa-, yielding forms such as genitive-dative hīnasya to modify a in that case. adjectives, formed with suffixes like -īya- or -ka-, also inflect identically, ensuring concord within noun phrases. A key innovation in Late Khotanese involves further , particularly the merger of ablative and into a single oblique form across many stems, reducing the functional distinction and aligning with broader Middle Iranian trends toward case simplification; this is evident in texts from the 9th-10th centuries, where endings like -ätsä/-ä blend more frequently. Such developments mark a transition toward the more analytic structures seen in later Eastern Iranian varieties.

Verbal system

The verbal system of , particularly in its Khotanese variety, preserves a rich array of tenses and moods inherited from Old Iranian prototypes, including the present, , , perfect, indicative, optative, imperative, and subjunctive, with traces of the injunctive mood. The and are typically expressed periphrastically rather than through synthetic forms, while the perfect can appear in simple or periphrastic constructions. Verbal stems are classified as thematic or athematic, with common formations including root stems, reduplicated stems for certain perfects, and causative derivatives marked by the -aya-, which adds a meaning to the base (e.g., from a root like *gam- "to go" deriving a form meaning "to cause to go"). In Late Khotanese, periphrastic futures emerge, often involving the present combined with the "to be" (hvā-) to indicate action. Conjugation patterns distinguish active and middle voices, with the middle often formed using the suffix -iya- in present stems to indicate reflexive or mediopassive functions (e.g., active *bū- "to be, become" vs. middle *bū-iya- "to become for oneself"). Personal endings mark person (1st, 2nd, 3rd) and number (singular, plural), showing palatalization in certain forms such as 2nd singular indicative -ahi and 3rd singular -ati, or optative singulars -yām, -hē, -atē. Many verbs differentiate transitive and intransitive stems in the perfect, where the past participle serves as the base for adding these endings. A representative paradigm is that of the irregular verb "to be" (hvā- / ah- in present stem), which serves as an auxiliary and shows anomalous forms across tenses; for instance, in the present indicative active singular: 1sg aham, 2sg ahi, 3sg asti, with plural 1pl asmā, 2pl aθa, 3pl santi. In the middle voice present, forms like 3sg hvātai illustrate the voice distinction. The subjunctive and optative moods, used for volition or potentiality, lack simple perfect forms and rely on stem variations, such as lengthening or ablaut in the root. The imperative is restricted to the present tense, with singular forms often identical to the 2sg indicative stem and plural using -ta.

Sentence structure

The Saka language, as attested in Khotanese and Tumshuqese varieties, employs a predominantly subject-object-verb (SOV) , with indirect objects typically preceding direct objects in ditransitive constructions. This head-final allows for some flexibility in constituent placement, facilitated by the language's robust case-marking system, which clearly delineates grammatical functions without strict reliance on position. Postpositions, rather than prepositions, are standard for expressing locative, instrumental, and other adpositional relations, aligning with the overall synthetic nature of Eastern Iranian syntax. Relative clauses are commonly introduced by relative pronouns or adverbs, often with a preceding , and precede the head noun they modify; participles frequently serve to form these clauses, especially in descriptive or restrictive contexts. Coordination employs native conjunctions like u ("and") or o ("or"), though -influenced ca ("and") appears in Buddhist translations, reflecting calqued structures from source texts. Complex embeddings, such as nested subordinate clauses, occur in religious literature due to direct adaptations of syntactic patterns, resulting in occasionally intricate sentence architectures. Verbal agreement is marked on the with the subject in and number, while adjectives concord with the modified in , number, and case, ensuring morphological within noun phrases and clauses. Nominal sentences often omit the copula in present and past indicative tenses, relying on for equative or attributive expressions.

Lexicon

Inherited vocabulary

The , particularly its attested varieties Khotanese and Tumshuqese, preserves a substantial core inherited from Proto-Iranian roots, reflecting its position within the Eastern Iranian branch of the . This inherited vocabulary forms the foundation of basic semantic fields, demonstrating phonological and morphological developments such as the shift of Proto-Iranian *s to *h or *ś in certain environments, while retaining much of the original structure. reconstruct these terms by comparing forms with cognates in , , and other , highlighting the language's conservatism in everyday nomenclature. In the domain of kinship terms, Saka exhibits clear retentions from Proto-Iranian, often with minimal alteration. For instance, "brother" is attested as brāte or bratar- in Khotanese, directly descending from Proto-Iranian *bráHtā-, itself from Proto-Indo-European *bʰréh₂tēr-. Similarly, "father" appears as piite or piitar-, from Proto-Iranian *pitā-, and "mother" as mata or matar-, from *mātár-. "Daughter" is duta or dutar-, continuing Proto-Iranian *duhitā-. These forms underscore Saka's adherence to Indo-Iranian patterns, where vocabulary remains stable across dialects. Numerals in Saka also show strong inheritance, with forms like "one" as śśau or ci in Khotanese, reflecting Proto-Iranian *aiwa- or related innovations while aligning with *aēwa-. "Two" is d(u)va, from Proto-Iranian *dwa-, and higher cardinals such as "four" (tcah(u) from *čatwār-) and "five" (pañcu from *pancha-) preserve the syllabic structure and initial consonants typical of Eastern Iranian. These numerals illustrate Saka's retention of counting systems essential for daily enumeration, comparable to those in Sogdian and . Body parts form another conserved , with terms like "eye" as tcei’man- in Khotanese, derived from Proto-Iranian *čakšman-, and "head" as śīra-, from Proto-Iranian *sāra-. "Foot" is piia-, from *pāda-, emphasizing the language's fidelity to anatomical basics. Such vocabulary aids in reconstructing Proto-Iranian through parallels, as in the genitive first-person mana- "my" in , cognate to Avestan dative ahmāi "to him/me" from Proto-Iranian *ahma-/*mana-, illustrating pronominal stability. Semantic fields related to nature and animals further demonstrate Indo-Iranian retentions. For nature, "fire" is dai- or daa-, from Proto-Iranian *ātar-/*dā-, and "water" as yudä-, from *ap-. In animals, "dog" is śve or s’an-, directly from Proto-Iranian *śwan-, akin to Avestan *spəṇga- and Sanskrit *śvaná-. Daily life terms include "house" as bisa-, from *bhiša-, evoking settled or nomadic routines. Overall, these elements highlight Saka's high degree of lexical conservatism relative to Western Iranian languages, preserving approximately the core structure of Proto-Iranian basics through comparative analysis with Avestan and Sogdian.

Borrowings and influences

The Saka language, particularly its Khotanese dialect, incorporated a substantial number of loanwords from and , primarily through the adoption of Buddhist terminology as the region became a center of along the . These borrowings often entered via translations of Sanskrit texts into Khotanese, with examples including religious concepts like dharma (law or ), which appears in Khotanese texts as dharma or adapted forms reflecting Prakrit influence such as dhamma in parallel Buddhist contexts. Other common loans encompass terms for Buddhist practices, such as those related to and cosmology, integrated into the to facilitate the dissemination of in local manuscripts. Loanwords from Tocharian, the Indo-European language of neighboring oases, were fewer but notable for everyday and agricultural vocabulary, reflecting cultural exchange in the region. Examples include technical terms for local flora and farming practices borrowed during periods of close contact between Khotanese speakers and Tocharian communities, though only a handful of reliable instances are attested, such as potential adaptations for crop-related . Similarly, borrowings from Chinese were limited but practical, often administrative or travel-related terms acquired through interactions; a Khotanese phrasebook for merchants includes Chinese-derived words like śu ttama la (from Middle Chinese shuǐ dān lái, meaning "bring water"), adapted for use in contexts. In the reverse direction, Saka exerted influence on neighboring languages, contributing loanwords to early Uyghur and modern Pamir languages like Wakhi due to prolonged contact and migration. In Old Uyghur, Saka provided place names such as Khotan (Hvatanai), embedded in historical texts like the Kutadgu Bilig. For Pamir languages, Khotanese terms persisted, illustrating lexical affinities on phonological and vocabulary levels. These loans underwent phonological adaptation to fit Saka's sound system, such as the rendering of Sanskrit palatal clusters like /kṣ/ as geminated /ṣṣ/ (e.g., in orthographic representations of borrowed terms like akṣara becoming aṣṣara- for "syllable" in Buddhist scripts). Semantic shifts also occurred, particularly in religious vocabulary, where Sanskrit terms for abstract concepts were repurposed in Khotanese to align with local Iranian cosmological frameworks, enhancing the expression of Buddhist ideas without fully retaining original connotations. Overall, borrowings constitute a significant portion of the Late Khotanese lexicon, with the majority stemming from Indic sources due to Buddhist dominance.

Corpus and texts

Discovery and preservation

The discovery of Saka language texts, primarily in the Khotanese dialect with some Tumshuqese fragments, began in the late 19th and early 20th centuries through European archaeological expeditions in the Tarim Basin. British archaeologist Marc Aurel Stein conducted multiple expeditions to the Khotan region between 1900 and 1910, unearthing numerous manuscripts from ruined sites such as the ancient city of Khotan (now in Xinjiang, China), including wooden tablets, scrolls, and fragments written in Brahmi script. These finds, often preserved in the arid desert environment, included official documents, Buddhist texts, and literary works dating from the 5th to 10th centuries CE. A pivotal moment occurred during Stein's second expedition in 1906–1908, when he accessed the sealed Library Cave (Cave 17) at the near in 1907, acquiring thousands of scrolls and fragments, among which were several in Khotanese from the 8th to 10th centuries, reflecting cultural exchanges along the . Independently, French explorer Paul Pelliot visited the same cave in 1908, collecting additional Khotanese items. For Tumshuqese , fragments were first identified from explorations in the Tumshuq area during the early German (Prussian) Turfan expeditions starting around 1902–1905, led by Albert Grünwedel and Albert von Le Coq, yielding a small corpus of texts from sites like Tumshuq. Preservation of these manuscripts has faced significant challenges due to their fragile materials—primarily , wood, and —despite the protective arid climate of the that initially aided their survival for over a millennium. Early 20th-century transport to often caused damage, as items were packed in cases during long overland and sea journeys, leading to fragmentation, infestation, and exposure to humidity; for instance, some Stein-acquired pieces arrived in partially deteriorated. The total Saka corpus comprises approximately 4,000 documents and fragments, predominantly Khotanese (over 3,000 items), with around 100 Tumshuqese pieces, though exact counts vary due to ongoing cataloging. Today, major collections are housed in institutions such as the , which holds over 2,500 Khotanese manuscripts from Stein's expeditions, and the , with Pelliot's acquisitions including key Khotanese scrolls. Tumshuqese fragments are primarily in Berlin's Museum für Asiatische Kunst. Since the early 2000s, digitization efforts by the International Project (IDP), launched in 1994 and expanded thereafter, have scanned thousands of these items for global access, employing high-resolution imaging to mitigate further physical handling and support conservation.

Major works and genres

The Saka literary corpus, primarily in its Khotanese dialect with limited Tumshuqese materials, encompasses a range of genres dominated by Buddhist texts, alongside secular and administrative writings. Buddhist s form the core, including translations and adaptations from originals such as the Suvarṇabhāsottamasūtra (Sutra of Golden Light), a protective text emphasizing the merits of kingship and , and fragments related to the Saddharmapuṇḍarīkasūtra (), which was highly revered in Khotan though often preserved in with Khotanese colophons rather than full vernacular translations. These s reflect the integration of Indian Buddhist traditions into local practice, often featuring bilingual glosses that interweave terms with Khotanese explanations to aid comprehension. Medical texts represent a key non-Buddhist genre, exemplified by the Jīvakapustaka, a bilingual Sanskrit-Khotanese treatise on Ayurvedic medicine attributed to the physician Jīvaka, Buddha's attendant. This work, preserved in a 10th-century Dunhuang manuscript, details diagnostics, treatments for ailments like poisons and wounds, and herbal remedies, alternating Sanskrit verses with Khotanese prose explanations, highlighting the adaptation of Indian medical knowledge to Saka contexts. Folk tales and narrative genres include adaptations like the Khotanese Rāma story, a poetic retelling of the Indian epic Ramayana that incorporates local motifs such as heroic quests and familial bonds, distinct from Sanskrit versions and suggesting influences from oral storytelling traditions. Jātaka tales, moral stories of the Buddha's past lives, appear in Khotanese, though specific complete versions like the Vessantara Jātaka—focusing on supreme generosity—are more attested in related Central Asian Iranian literatures, with fragments indicating similar narrative styles in Saka. Secular genres feature administrative documents, such as contracts for land sales, water rights, and loans, which provide practical insights into daily life; for instance, records from Dandan-Uiliq detail disputes and property transfers in a bureaucratic script. In the Tumshuqese dialect, the sparse corpus includes such as the Karmavācanā (a dedication ceremony for lay Buddhists) and a fragment of the Araṇemijātaka, along with letters by officials and possible medical prescriptions, reflecting administrative and religious uses rather than extensive literary production. A prominent original composition is the Book of Zambasta, a lengthy Khotanese poem compiling Buddhist doctrines across 24 chapters, blending excerpts with indigenous commentary, and culminating in prophetic oracles foretelling the dharma's future in Khotan. Themes across these works underscore Mahayana Buddhism's prevalence, with emphases on , enlightenment, and merit accumulation in sutras and Jātakas, while secular pieces explore , natural , and human relations, as seen in poetic fragments praising lovers amid landscapes. Bilingual elements, particularly Sanskrit-Khotanese glosses in religious texts, illustrate cultural synthesis along the . The significance of these genres lies in their evidence of an oral-to-written transition, where recited sutras and tales were committed to palm-leaf manuscripts, preserving identity amid Indian and Central Asian influences; unique forms like the Book of Zambasta's oracles blend with , offering rare glimpses into localized eschatological beliefs.

Scholarly study

Key researchers

Harold Walter Bailey (1899–1996), a pioneering figure in Saka studies, served as Professor of Iranian Studies at the School of Oriental and African Studies (SOAS) in and edited the foundational Khotanese Texts series from 1945 to 1967, offering transcriptions and linguistic analyses of manuscripts from the . He popularized the term "Saka" for the language in his 1958 article and later produced the comprehensive Dictionary of Khotan Saka in 1979, which etymologically dissects Iranian terms in the corpus. Bailey also advanced the understanding of Tumshuqese, a related Saka dialect, through early editions that facilitated subsequent decipherments. Ronald Eric Emmerick (1937–2001), another key scholar at SOAS, contributed significantly to Saka grammar in the 1960s, culminating in his 1968 publication Saka Grammatical Studies, which systematically describes Khotanese morphology and syntax using Late Khotanese materials. Emmerick's analyses built directly on Bailey's textual foundations and emphasized comparative Iranian linguistics. Prods Oktor Skjærvø, Emeritus Professor of Iranian Studies at Harvard University, has driven advancements in Late Khotanese editions since the 1980s, co-editing the multi-volume Studies in the Vocabulary of Khotanese with Emmerick through the 1990s and producing critical texts like the Suvarṇabhāsottamasūtra. His work at Harvard's Department of Near Eastern Languages and Civilizations supports ongoing Indo-Iranian projects focused on Saka philology. SOAS in hosted much of the early 20th-century breakthroughs in Saka , while Harvard continues to lead contemporary efforts. In the , digital corpora, including those hosted on platforms like khotanese.org—updated as of 2024 with a comprehensive digital —have enhanced to Saka materials, reflecting collaborative international scholarship. Recent philological includes studies on Saka-Tocharian linguistic contacts, such as a 2022 dissertation examining loanwords between Khotanese, Tumshuqese, and .

Reconstruction efforts

Reconstruction of the Saka language relies primarily on the , drawing parallels between attested Khotanese and Tumshuqese forms and related to hypothesize Proto-Saka and morphology. For instance, correspondences in nominal endings, such as the genitive *-nam in Khotanese and Sogdian, have been reconstructed as short-vowel variants distinct from Western Iranian *-nām, supporting a shared Proto-Eastern Iranian innovation. Comparisons with provide insights into archaisms, while Sogdian illuminates Middle Iranian developments, and offers modern reflexes for phonological shifts like lambdacism (r > l). Internal reconstruction complements these efforts by analyzing diachronic changes within itself, such as vowel shifts and morphological simplifications from Old Khotanese (ca. 5th–7th centuries CE) to Late Khotanese (ca. 8th–10th centuries CE). Prothetic h- insertions in Khotanese, for example, have been used to back-reconstruct Proto-Iranian word-initial structures lacking laryngeals. Gaps in the sparse Tumshuqese corpus, which consists of more than 50 documents (though fewer have been fully edited), are often filled by positing Khotanese parallels, assuming dialectal proximity within the branch. Key challenges include the limited corpus size—primarily Buddhist manuscripts discovered in the early —restricting data for rare forms and hindering statistical reliability in reconstructions. Distinguishing native Saka elements from extensive loanwords, especially and terms adopted via , requires careful etymological sifting to avoid skewing Proto-Saka forms. Since the , computational tools such as probabilistic models and software have aided alignment of cognates across , though application to Saka remains preliminary due to data scarcity. These methods have yielded a hypothetical Proto-Saka , with over 3,000 entries in H.W. Bailey's Dictionary of Khotan Saka () linking forms to Proto-Iranian roots. Ongoing debates center on the linguistic unity of and , with evidence suggesting a common Eastern Iranian substrate but questioning whether "Scythian" represents a single or a encompassing .

References

Add your contribution
Related Hubs
User Avatar
No comments yet.