Recent from talks
Nothing was collected or created yet.
Indo-Aryan languages
View on Wikipedia
| Indo-Aryan | |
|---|---|
| Indic[a] | |
| Geographic distribution | South Asia, Europe |
Native speakers | est. 1.5 billion (2024)[1] |
| Linguistic classification | Indo-European
|
| Proto-language | Proto-Indo-Aryan |
| Subdivisions | |
| Language codes | |
| ISO 639-2 / 5 | inc |
| Linguasphere | 59= (phylozone) |
| Glottolog | indo1321 |
Present-day geographical distribution of the major Indo-Aryan language groups. Romani, Domari, Kholosi, Luwati, Fiji Hindi and Caribbean Hindustani are outside the scope of the map.
Khowar (Dardic)
Shina (Dardic)
Kohistani (Dardic)
Kashmiri (Dardic)
Sindhi (Northwestern)
Gujarati (Western)
Khandeshi (Western)
Bhili (Western)
Central Pahari (Northern)
Eastern Pahari (Northern)
Eastern Hindi (Central)
Bihari (Eastern)
Odia (Eastern)
Halbic (Eastern)
(not shown: Kunar (Dardic), Chinali-Lahuli (Unclassified)) | |
| Part of a series on |
| Indo-European topics |
|---|
|
|
The Indo-Aryan languages, or sometimes Indic languages,[a] are a branch of the Indo-Iranian languages in the Indo-European language family. As of 2024, there are more than 1.5 billion speakers, primarily concentrated east of the Indus river in Bangladesh, Northern India, Eastern Pakistan, Sri Lanka, Maldives and Nepal.[4] Moreover, apart from the Indian subcontinent, large immigrant and expatriate Indo-Aryan–speaking communities live in Northwestern Europe, Western Asia, North America, the Caribbean, Southeast Africa, Polynesia and Australia, along with several million speakers of Romani languages primarily concentrated in Southeastern Europe. There are over 200 known Indo-Aryan languages.[5]
Modern Indo-Aryan languages descend from Old Indo-Aryan languages such as early Vedic Sanskrit, through Middle Indo-Aryan languages (or Prakrits).[6][7][8][9] The largest such languages in terms of first-speakers are Hindustani (Hindi/Urdu) (c. 330 million),[10] Bengali (242 million),[11] Punjabi (about 150 million),[12][13] Marathi (112 million), and Gujarati (60 million). A 2005 estimate placed the total number of native speakers of the Indo-Aryan languages at nearly 900 million people.[14] Other estimates are higher, suggesting a figure of 1.5 billion speakers of Indo-Aryan languages.[1]
Classification
[edit]Theories
[edit]
The Indo-Aryan family as a whole is thought to represent a dialect continuum, where languages are often transitional towards neighbouring varieties.[15] Because of this, the division into languages vs. dialects is in many cases somewhat arbitrary. The classification of the Indo-Aryan languages is controversial, with many transitional areas that are assigned to different branches depending on classification.[16] There are concerns that a tree model is insufficient for explaining the development of New Indo-Aryan, with some scholars suggesting the wave model.[17]
Subgroups
[edit]The following table of proposals is expanded from Masica (1991) (from Hoernlé to Turner), and also includes subsequent classification proposals. The table lists only some modern Indo-Aryan languages.
| Model | Odia | Bengali– Assamese |
Bihari | E. Hindi | W. Hindi | Rajasthani | Gujarati | Pahari | E. Punjabi | W. Punjabi | Sindhi | Dardic | Marathi– Konkani |
Sinhala– Dhivehi |
Romani |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Hoernlé (1880) | E | E~W | W | N | W | ? | W | ? | S | ? | ? | ||||
| Grierson (−1927) | E | C~E | C | NW | non-IA | S | non-IA | ||||||||
| Chatterji (1926) | E | Midland | SW | N | NW | non-IA | S | NW | |||||||
| Grierson (1931) | E | Inter. | Midland | Inter. | NW | non-IA | S | non-IA | |||||||
| Katre (1968) | E | C | NW | Dardic | S | ? | |||||||||
| Nigam (1972) | E | C | C (+NW) | C | ? | NW | N | S | ? | ||||||
| Cardona (1974) | E | C | (S)W | NW | (S)W | ? | |||||||||
| Turner (−1975) | E | C | SW | C (C.)~NW (W.) | NW | SW | C | ||||||||
| Kausen (2006) | E | C | W | N | NW | Dardic | S | Romani | |||||||
| Kogan (2016) | E | ? | C | C~NW | NW | C~NW | C | NW | non-IA | S | Insular | C | |||
| Ethnologue (2020)[18] | E | EC | C | W | EC (E.)~W (C., W.) | W | NW | S | W | ||||||
| Glottolog (2024)[19] | E | Midland | N | NW | Dardic | S | Dhivehi-Sinhala | Midland | |||||||
Anton I. Kogan, in 2016, conducted a lexicostatistical study of the New Indo-Aryan languages based on a 100-word Swadesh list, using techniques developed by the glottochronologist and comparative linguist Sergei Starostin.[17] That grouping system is notable for Kogan's exclusion of Dardic from Indo-Aryan on the basis of his previous studies showing low lexical similarity to Indo-Aryan (43.5%) and negligible difference with similarity to Iranian (39.3%).[20] He also calculated Sinhala–Dhivehi to be the most divergent Indo-Aryan branch. Nevertheless, the modern consensus of Indo-Aryan linguists tends towards the inclusion of Dardic based on morphological and grammatical features.[citation needed]
Inner–Outer hypothesis
[edit]The Inner–Outer hypothesis argues for a core and periphery of Indo-Aryan languages, with Outer Indo-Aryan (generally including Eastern and Southern Indo-Aryan, and sometimes Northwestern Indo-Aryan, Dardic and Pahari) representing an older stratum of Old Indo-Aryan that has been mixed to varying degrees with the newer stratum that is Inner Indo-Aryan. It is a contentious proposal with a long history, with varying degrees of claimed phonological and morphological evidence. Since its proposal by Rudolf Hoernlé in 1880 and refinement by George Grierson it has undergone numerous revisions and a great deal of debate, with the most recent iteration by Franklin Southworth and Claus Peter Zoller based on robust linguistic evidence (particularly an Outer past tense in -l-). Some of the theory's sceptics include Suniti Kumar Chatterji and Colin P. Masica.[citation needed]
Groups
[edit]The below classification follows Masica (1991), and Kausen (2006).
Dardic
[edit]The Dardic languages (also Dardu or Pisaca) are a group of Indo-Aryan languages largely spoken in the northwestern extremities of the Indian subcontinent. Dardic was first formulated by George Abraham Grierson in his Linguistic Survey of India but he did not consider it to be a subfamily of Indo-Aryan. The Dardic group as a genetic grouping (rather than areal) has been scrutinised and questioned to a degree by recent scholarship: Southworth, for example, says "the viability of Dardic as a genuine subgroup of Indo-Aryan is doubtful" and "the similarities among [Dardic languages] may result from subsequent convergence".[21]: 149
The Dardic languages are thought to be transitional with Punjabi and Pahari (e.g. Zoller describes Kashmiri as "an interlink between Dardic and West Pahāṛī"),[22]: 83 as well as non-Indo-Aryan Nuristani; and are renowned for their relatively conservative features in the context of Proto-Indo-Aryan.
- Kashmiri: Kashmiri, Kishtwari, Poguli;
- Shina: Brokskad, Kundal Shahi, Shina, Ushojo, Kalkoti, Palula, Savi;
- Chitrali: Kalasha, Khowar;
- Kohistani: Bateri, Chilisso, Gowro, Indus Kohistani, Kalami, Tirahi, Torwali, Wotapuri-Katarqalai;
- Pashayi
- Kunar: Dameli, Gawar-Bati, Nangalami, Shumashti.
Northern Zone
[edit]The Northern Indo-Aryan languages, also known as the Pahari ('hill') languages, are spoken throughout the Himalayan regions of the subcontinent.
- Eastern Pahari: Nepali, Jumli, Doteli;
- Central Pahari: Garhwali, Kumaoni;
- Western Pahari: Dogri, Kangri, Bhadarwahi, Churahi, Bhateali, Bilaspuri, Chambeali, Gaddi, Pangwali, Mandeali, Mahasu Pahari, Jaunsari, Kullui, Pahari Kinnauri, Hinduri, Sarazi, Sirmauri.
Northwestern Zone
[edit]Northwestern Indo-Aryan languages are spoken in the northwestern region of India and eastern region of Pakistan. Punjabi is spoken predominantly in the Punjab region and is the official language of the northern Indian state of Punjab, in addition to being the most widely-spoken language in Pakistan. Sindhi and its variants are spoken natively in the Pakistani province of Sindh and neighbouring regions. Northwestern languages are ultimately thought to be descended from Shauraseni Prakrit, with influence from Persian and Arabic.[23]
Western Zone
[edit]Western Indo-Aryan languages are spoken in central and western India, in states such as Madhya Pradesh and Rajasthan, in addition to contiguous regions in Pakistan. Gujarati is the official language of Gujarat, and is spoken by over 50 million people. In Europe, various Romani languages are spoken by the Romani people, an itinerant community who historically migrated from India. The Western Indo-Aryan languages are thought to have diverged from their northwestern counterparts, although they have a common antecedent in Shauraseni Prakrit.
- Rajasthani: Bagri, Marwari, Mewati, Dhundari, Harauti, Mewari, Shekhawati, Dhatki, Malvi, Nimadi, Gujari, Goaria, Loarki, Bhoyari/Pawari, Kanjari, Od, Lambadi;
- Gujarati: Gujarati, Jandavra, Saurashtra, Aer, Vaghri, Parkari Koli, Kachi Koli, Wadiyara Koli;
- Bhil: Kalto, Vasavi, Wagdi, Gamit, Vaagri Booli;
- Northern Bhil: Bauria, Bhilori, Magari;
- Central Bhil: Bhili proper, Bhilali, Chodri, Dhodia, Dhanki, Dubli;
- Bareli: Palya Bareli, Pauri Bareli, Rathwi Bareli, Pardhi;
- Khandeshi
- Domaaki
- Domari
- Romani: Carpathian Romani, Balkan Romani, Vlax Romani, Baltic Romani;
- Northern Romani
- British Romani: Angloromani, Welsh Romani
- Northwestern Romani: Sinte Romani, Finnish Kalo
- Northern Romani
Central Zone
[edit]Within India, Central Indo-Aryan languages are spoken primarily in the western Gangetic plains, including Delhi and parts of the Central Highlands, where they are often transitional with neighbouring lects. Many of these languages, including Braj and Awadhi, have rich literary and poetic traditions. Urdu, a Persianised derivative of Dehlavi descended from Shauraseni Prakrit, is the official language of Pakistan and also has strong historical connections to India, where it also has been designated with official status. Hindi, a standardised and Sanskritised register of Dehlavi, is the official language of the Government of India (along with English). Together with Urdu, it is the third most-spoken language in the world.
- Western Hindi: Hindustani (including Standard Hindi and Standard Urdu), Khariboli, Braj, Haryanvi, Bundeli, Kannauji, Parya, Sansi.
- Eastern Hindi: Bagheli, Chhattisgarhi, Surgujia, Awadhi (Fiji Hindi, Caribbean Hindustani).
Eastern Zone
[edit]The Eastern Indo-Aryan languages, also known as Magadhan languages, are spoken throughout the eastern subcontinent, alongside other regions surrounding the northwestern Himalayan corridor. Bengali is the seventh most-spoken language in the world, and has a strong literary tradition; the national anthems of India and Bangladesh are written in Bengali. Assamese and Odia are the official languages of Assam and Odisha, respectively. The Eastern Indo-Aryan languages descend from Magadhan Apabhraṃśa[24] and ultimately from Magadhi Prakrit.[25][26][24] Eastern Indo-Aryan languages display many morphosyntactic features similar to those of Munda languages, which are largely absent in western Indo-Aryan languages. It is suggested that "proto-Munda" languages may have once dominated the eastern Indo-Gangetic Plain, and were then absorbed by Indo-Aryan languages at an early date as Indo-Aryan spread east.[27][28]
- Bihari:
- Bhojpuri, Caribbean Hindustani, Fiji Hindi;
- Magahi, Khortha;
- Maithili, Angika, Bajjika, Thethi, Dehati;
- Sadanic: Nagpuri, Kurmali (Panchpargania);
- Tharu:[29] Kochila Tharu, Rana Tharu, Kathariya Tharu, Sonha Tharu, Dangaura Tharu, Chitwania Buksa, Majhi, Musasa;
- Kumhali, Kuswaric:[30] Danwar, Bote-Darai;
- Halbic: Halbi, Kamar, Bhunjia, Nahari;
- Odia: Baleswari, Kataki, Ganjami, Sundargadi, Sambalpuri, Desia;
- Bodo Parja, Bhatri, Reli, Kupia;
- Bengali–Assamese
- Bengali-Gauda: Bengali (Bangali, Rarhi, Varendri, Manbhumi, Dhakaiya Kutti, Mymensinghi, Dobhashi), Noakhali, Chittagonian, Sylheti, Bishnupriya Manipuri, Hajong, Chakma, Tanchangya, Rohingya;
- Kamarupic: Assamese (Kamrupi, East Goalpariya), Kamtapuri, Surjapuri, Rajbanshi;
Southern Zone
[edit]Marathi-Konkani languages are ultimately descended from Maharashtri Prakrit, whereas Insular Indo-Aryan languages are descended from Elu Prakrit and possess several characteristics that markedly distinguish them from most of their mainland Indo-Aryan counterparts. Insular Indo-Aryan languages (of Sri Lanka and Maldives) started developing independently and diverging from the continental Indo-Aryan languages from around 5th century BCE.[17]
- Marathi-Konkani
- Marathic: Marathi, Varhadi, Andh, Agri, Zadi Boli, Thanjavur, Berar-Deccan Marathi, Phudagi, Judeo, Katkari, Varli, Kadodi;
- Konkanic: Konkani, Karnataki Konkani, Maharashtrian Konkani.
- Insular Indo-Aryan
Unclassified
[edit]The following languages are otherwise unclassified within Indo-Aryan:
History
[edit]This section needs additional citations for verification. (February 2017) |
Indian subcontinent
[edit]Dates indicate only a rough time frame.
- Proto-Indo-Aryan (before 1500 BCE, reconstructed)
- Old Indo-Aryan (c. 1500–300 BCE)
- early Old Indo-Aryan: includes Vedic Sanskrit (c. 1500 to 500 BCE)
- late Old Indo-Aryan: Epic Sanskrit, Classical Sanskrit (c. 200 CE to 1300 CE)
- Mitanni Indo-Aryan (c. 1400 BCE)
- Middle Indo-Aryan or Prakrits (c. 300 BCE to 1500 CE)
- early Jain and Buddhist texts (c. 6th or 5th century BCE)
- early Middle Indo-Aryan: e.g. Ashokan Prakrits, Pali, Gandhari, (c. 300 BCE to 200 BCE)
- middle Middle Indo-Aryan: e.g. Dramatic Prakrits, Elu (c. 200 BCE to 700 CE)
- late Middle Indo-Aryan: e.g. Abahattha (c. 700 CE to 1500 CE)
- Early Modern Indo-Aryan (Late Medieval India): e.g. early Dakhini and emergence of the Dehlavi dialect

Proto-Indo-Aryan
[edit]Proto-Indo-Aryan (or sometimes Proto-Indic[a]) is the reconstructed proto-language of the Indo-Aryan languages. It is intended to reconstruct the language of the pre-Vedic Indo-Aryans. Proto-Indo-Aryan is meant to be the predecessor of Old Indo-Aryan (1500–300 BCE), which is directly attested as Vedic and Mitanni-Aryan. Despite the great archaicity of Vedic, however, the other Indo-Aryan languages preserve a small number of conservative features lost in Vedic.
Mitanni-Aryan hypothesis
[edit]Some theonyms, proper names, and other terminology of the Late Bronze Age Mitanni civilisation of Upper Mesopotamia exhibit an Indo-Aryan superstrate. While what few written records left by the Mittani are either in Hurrian (which appears to have been the predominant language of their kingdom) or Akkadian (the main diplomatic language of the Late Bronze Age Near East), these apparently Indo-Aryan names suggest that an Indo-Aryan elite imposed itself over the Hurrians in the course of the Indo-Aryan expansion. If these traces are Indo-Aryan, they would be the earliest known direct evidence of Indo-Aryan, and would increase the precision in dating the split between the Indo-Aryan and Iranian languages (as the texts in which the apparent Indicisms occur can be dated with some accuracy).
In a treaty between the Hittites and the Mitanni, the deities Mitra, Varuna, Indra, and the Ashvins (Nasatya) are invoked. Kikkuli's horse training text includes technical terms such as aika (cf. Sanskrit eka, "one"), tera (tri, "three"), panza (panca, "five"), satta (sapta, seven), na (nava, "nine"), vartana (vartana, "turn", round in the horse race). The numeral aika "one" is of particular importance because it places the superstrate in the vicinity of Indo-Aryan proper as opposed to Indo-Iranian in general or early Iranian (which has aiva).[32] Another text has babru (babhru, "brown"), parita (palita, "grey"), and pinkara (pingala, "red"). Their chief festival was the celebration of the solstice (vishuva) which was common in most cultures in the ancient world. The Mitanni warriors were called marya, the term for "warrior" in Sanskrit as well; note mišta-nnu (= miẓḍha, ≈ Sanskrit mīḍha) "payment (for catching a fugitive)" (M. Mayrhofer, Etymologisches Wörterbuch des Altindoarischen, Heidelberg, 1986–2000; Vol. II:358).
Sanskritic interpretations of Mitanni royal names render Artashumara (artaššumara) as Ṛtasmara "who thinks of Ṛta" (Mayrhofer II 780), Biridashva (biridašṷa, biriiašṷa) as Prītāśva "whose horse is dear" (Mayrhofer II 182), Priyamazda (priiamazda) as Priyamedha "whose wisdom is dear" (Mayrhofer II 189, II378), Citrarata as Citraratha "whose chariot is shining" (Mayrhofer I 553), Indaruda/Endaruta as Indrota "helped by Indra" (Mayrhofer I 134), Shativaza (šattiṷaza) as Sātivāja "winning the race price" (Mayrhofer II 540, 696), Šubandhu as Subandhu "having good relatives" (a name in Palestine, Mayrhofer II 209, 735), Tushratta (tṷišeratta, tušratta, etc.) as *tṷaiašaratha, Vedic Tvastar "whose chariot is vehement" (Mayrhofer, Etym. Wb., I 686, I 736).
Old Indo-Aryan
[edit]The earliest evidence of the group is from Vedic Sanskrit, that is used in the ancient preserved texts of the Indian subcontinent, the foundational canon of the Hindu synthesis known as the Vedas. The Indo-Aryan superstrate in Mitanni is of similar age to the language of the Rigveda, but the only evidence of it is a few proper names and specialised loanwords.[33]
While Old Indo-Aryan is the earliest stage of the Indo-Aryan branch, from which all known languages of the later stages Middle and New Indo-Aryan are derived, some documented Middle Indo-Aryan variants cannot fully be derived from the documented form of Old Indo-Aryan (on which Vedic and Classical Sanskrit are based), but betray features that must go back to other undocumented dialects of Old Indo-Aryan.[34]
From Vedic Sanskrit, "Sanskrit" (literally 'put together, perfected, elaborated') developed as the prestige language of culture, science and religion, as well as the court, theatre, etc. Sanskrit of the later Vedic texts is comparable to Classical Sanskrit, but is largely mutually unintelligible with Vedic Sanskrit.[35]
Middle Indo-Aryan (Prakrits)
[edit]Outside the learned sphere of Sanskrit, vernacular dialects (Prakrits) continued to evolve. The oldest attested Prakrits are the Buddhist and Jain canonical languages Pali and Ardhamagadhi Prakrit, respectively. Inscriptions in Ashokan Prakrit were also part of this early Middle Indo-Aryan stage.
By medieval times, the Prakrits had diversified into various Middle Indo-Aryan languages. Apabhraṃśa is the conventional cover term for transitional dialects connecting late Middle Indo-Aryan with early Modern Indo-Aryan, spanning roughly the 6th to 13th centuries. Some of these dialects showed considerable literary production; the Śravakacāra of Devasena (dated to the 930s) is now considered to be the first book written in Hindi.
The next major milestone occurred with the Muslim conquests in the Indian subcontinent in the 13th–16th centuries. Under the flourishing Turco-Mongol Mughal Empire, Persian became very influential as the language of prestige of the Islamic courts due to adoption of the foreign language by the Mughal emperors.
The largest languages that formed from Apabhraṃśa were Bengali, Bhojpuri, Hindustani, Assamese, Sindhi, Gujarati, Odia, Marathi, and Punjabi.
New Indo-Aryan
[edit]Medieval Hindustani
[edit]In the Central Zone Hindi-speaking areas, for a long time the prestige dialect was Braj Bhasha, but this was replaced in the 13th century by Dehlavi-based Hindustani. Hindustani was strongly influenced by Persian, with these and later Sanskrit influence leading to the emergence of Modern Standard Hindi and Modern Standard Urdu as registers of the Hindustani language.[36][37] This state of affairs continued until the division of the British Indian Empire in 1947, when Modern Standard Hindi became the official language in India and Modern Standard Urdu became official in Pakistan. Despite the different script the fundamental grammar remains identical, the difference is more sociolinguistic than purely linguistic.[38][39][40] Today it is widely understood/spoken as a second or third language throughout South Asia[41] and one of the most widely known languages in the world in terms of number of speakers.
Outside the Indian subcontinent
[edit]Domari
[edit]Domari is an Indo-Aryan language spoken by older Dom people scattered across the Middle East. The language is reported to be spoken as far north as Azerbaijan and as far south as central Sudan.[42]: 1 Based on the systematicity of sound changes, linguists have concluded that the ethnonyms Domari and Romani derive from the Indo-Aryan word ḍom.[43]
Lomavren
[edit]Lomavren is a nearly extinct mixed language, spoken by the Lom people, that arose from language contact between a language related to Romani and Domari[44] and the Armenian language.
Parya
[edit]Parya is spoken in Tajikistan and Uzbekistan by the descendants of migrants from the Indian subcontinent. The language retains many features similar to Punjabi and the Western Hindi dialects, while also bearing some influence from Tajik Persian.[45]
Romani
[edit]The Romani language is usually included in the Western Indo-Aryan languages.[46] Romani varieties, which are mainly spoken throughout Europe, are noted for their relatively conservative nature; maintaining the Middle Indo-Aryan present-tense person concord markers, alongside consonantal endings for nominal case. Indeed, these features are no longer evident in most other modern Central Indo-Aryan languages. Moreover, Romani shares an innovative pattern of past-tense person, which corresponds to Dardic languages, such as Kashmiri and Shina. This is believed to be further indication that proto-Romani speakers were originally situated in central regions of the subcontinent, before migrating to northwestern regions. However, there are no known historical sources regarding the development of the Romani language specifically within India.
Research conducted by nineteenth-century scholars Pott (1845) and Miklosich (1882–1888) demonstrated that the Romani language is most aptly designated as a New Indo-Aryan language (NIA), as opposed to Middle Indo-Aryan (MIA); establishing that proto-Romani speakers could not have left India significantly earlier than AD 1000.
The principal argument favouring a migration during or after the transition period to NIA is the loss of the old system of nominal case, coupled with its reduction to a two-way nominative-oblique case system. A secondary argument concerns the system of gender differentiation, due to the fact that Romani has only two genders (masculine and feminine). Middle Indo-Aryan languages (named MIA) generally employed three genders (masculine, feminine and neuter), and some modern Indo-Aryan languages retain this aspect today.
It is suggested that loss of the neuter gender did not occur until the transition to NIA. During this process, most of the neuter nouns became masculine, while several became feminine. For example, the neuter aggi "fire" in Prakrit morphed into the feminine āg in Hindustani, and jag in Romani. The parallels in grammatical gender evolution between Romani and other NIA languages have additionally been cited as indications that the forerunner of Romani remained on the Indian subcontinent until a later period, possibly as late as the tenth century.
Sindhic migrations
[edit]Kholosi, Jadgali, Luwati, Maimani and Al Sayigh[47] represent offshoots of the Sindhic subfamily of Indo-Aryan that have established themselves in the Persian Gulf region, perhaps through sea-based migrations. These are of a later origin than the Rom and Dom migrations which represent a different part of Indo-Aryan as well.
Indentured labourer migrations
[edit]The use by the British East India Company of indentured labourers led to the transplanting of Indo-Aryan languages around the world, leading to locally influenced lects that diverged from the source language, such as Fiji Hindi and Caribbean Hindustani.
Phonology
[edit]Consonants
[edit]Stop positions
[edit]The normative system of New Indo-Aryan stops consists of five places of articulation: labial, dental, "retroflex", palatal, and velar, which is the same as that of Sanskrit. The "retroflex" position may involve retroflexion, or curling the tongue to make the contact with the underside of the tip, or merely retraction. The point of contact may be alveolar or postalveolar, and the distinctive quality may arise more from the shaping than from the position of the tongue. Palatal stops have affricated release and are traditionally included as involving a distinctive tongue position (blade in contact with hard palate). Widely transcribed as [tʃ], Masica (1991:94) claims [cʃ] to be a more accurate rendering.
Moving away from the normative system, some languages and dialects have alveolar affricates [ts] instead of palatal, though some among them retain [tʃ] in certain positions: before front vowels (esp. /i/), before /j/, or when geminated. Alveolar as an additional point of articulation occurs in Marathi and Konkani where dialect mixture and others factors upset the aforementioned complementation to produce minimal environments, in some West Pahari dialects through internal developments (*t̪ɾ, t̪ > /tʃ/), and in Kashmiri. The addition of a retroflex affricate to this in some Dardic languages maxes out the number of stop positions at seven (barring borrowed /q/), while a reduction to the inventory involves *ts > /s/, which has happened in Assamese, Chittagonian, Sinhala (though there have been other sources of a secondary /ts/), and Southern Mewari.
Further reductions in the number of stop articulations are in Assamese and Romani, which have lost the characteristic dental/retroflex contrast, and in Chittagonian, which may lose its labial and velar articulations through spirantisation in many positions (> [f, x]). [48] /q x ɣ f/ are restricted to Perso-Arabic loanwords in most IA languages but they occur natively in Khowar.[49] According to Masica (1991) some dialects of Pashayi have a /θ/ which is unusual for IA languages. Domari which is spoken in the Middle East and had high contact with Middle Eastern languages has /q ħ ʕ ʔ/ and emphatic consonants from loanwords.
| Stops | Languages | |||||||
|---|---|---|---|---|---|---|---|---|
| /p/ | /t̪/ | /ʈ/ ~ /t/ | /ʈ͡ʂ/ | /t͡ʃ/ ~ /t͡ɕ/ | /t͡s/ | /k/ | /q/ | |
| Khowar, Shina, Bashkarik, Kalasha | ||||||||
| Gawarbati, Phalura, Shumashti, Kanyawali, Pashai | ||||||||
| Marathi, Konkani, certain W. Pahari dialects (Bhadrawahi, Bhalesi, Mandeali, Padari, Simla, Satlej, maybe Kulu), Kashmiri, E. and N. dialects of Bengali (parts of Dhaka, Mymensingh, Rajshahi) | ||||||||
| Hindustani, Punjabi, Dogri, Sindhi, Gujarati, Sinhala, Odia, Standard Bengali, dialects of Rajasthani (except Lamani, NW. Marwari, S. Mewari), Sanskrit,[50] Prakrit, Pali, Maithili, Magahi, Bhojpuri | ||||||||
| Romani, Domari, Kholosi | ||||||||
| Nepali, dialects of Rajasthani (Lamani and NW. Marwari), Northern Lahnda's Kagani, Kumauni, many West Pahari dialects (not Chamba Mandeali, Jaunsari, or Sirmauri) | ||||||||
| Rajasthani's S. Mewari | ||||||||
| Assamese | ||||||||
| Chittagonian | ||||||||
| Sylheti | ||||||||
Nasals
[edit]Sanskrit was noted as having five nasal-stop articulations corresponding to its oral stops, and among modern languages and dialects Dogri, Kacchi, Kalasha, Rudhari, Shina, Saurashtri, and Sindhi have been analysed as having this full complement of phonemic nasals /m/ /n/ /ɳ/ /ɲ/ /ŋ/, with the last two generally as the result of the loss of the stop from a homorganic nasal + stop cluster ([ɲj] > [ɲ] and [ŋɡ] > [ŋ]), though there are other sources as well.[51]
In languages that lack phonemic nasals at some places of articulation, they can still occur allophonically from place assimilation in a nasal + stop culture, e.g. Hindustani /nɡ/ > [ŋɡ].
| Nasals | Languages | ||||
|---|---|---|---|---|---|
| /m/ | /n/ | /ɳ/ | /ɲ/ | /ŋ/ | |
| Dogri, Kacchi, Kalasha, Rudhari, Shina, Saurashtri, Sindhi, Saraiki | |||||
| Sinhala | |||||
| Kalami, Odia, Dhundhari, Pashayi, Marwari | |||||
| Dhivehi[b] | |||||
| Gujarati, Kashmiri, Marathi, Punjabi, Rajasthani (Marwari) | |||||
| Hindustani, Nepali, Sylheti, Assamese, Bengali | |||||
| Romani, Domari | |||||
Aspiration and breathy-voice
[edit]Most Indo-Aryan languages have contrastive aspiration (/ʈ/ ~ /ʈʰ/), and some retain historical breathy voice on voiced consonants (/ɖ/ ~ /ɖʱ/). Sometimes both phenomena are analysed as a single aspiration contrast. The places and manners of articulation which allow contrastive aspiration vary by language; e.g. Sindhi permits phonemic /mʱ/, but the phonemic status of this sound in Hindustani is uncertain, and many "Dardic" languages lack aspirated retroflex sibilants despite having unaspirated equivalents.[52]
In languages that have lost breathy-voice, the contrast has often been replaced with tone.
Regional developments
[edit]Some of these are mentioned in Masica (1991:104–105).
- Implosives: Languages in the Sindhic subfamily, as well as Saraiki, western Marwari dialects, and some dialects of Gujarati have developed implosive consonants from historical intervocalic geminates and word-initial stops. Sindhi has a full implosive series except for the dental implosive: /ɠ ʄ ᶑ ɓ/. It has been claimed that Wadiyari Koli has the dental implosive too. Other languages have less complete implosive series, e.g. Kacchi has just /ᶑ ɓ/.
- Prenasalized stops: Sinhala and Maldivian (Dhivehi) have a series of prenasalised stops covering all places except for palatal: /ᵐb ⁿd ᶯɖ ᵑɡ/.
- Palatalization: Kashmiri (natively) and some Romani dialects (from contact with Slavic languages) have contrastive palatalisation.
- Voiceless lateral In Gawarbati, some Pashai dialects, partly Bashkarik and some Shina dialects have /ɬ/ from clusters of tr kr or sometimes pr; dr gr and br merged with /l/ in these languages.
- Lateral affricates: Bhadarwahi has an unusual series of lateral retroflex affricates (/ʈ͡ꞎ ɖ͡ɭ ɖ͡ɭʱ/ derived from historical /Cɾ/ clusters.
Vowels
[edit]Vowel typologies are varied across Indo-Aryan due to diachronic mergers and (in some cases) splits, as well as different accounts by linguists for even the widely-spoken languages. Vowel systems per Masica (1991:108–113) are listed below. Many languages also have phonemic nasal vowels.
| Vowels | Languages | |
|---|---|---|
| 16 | /iː i eː e ɨː ɨ əː ə aː a ɔː ɔ oː o uː u/ | Kashmiri |
| 14 | /ɪ iː ʊ uː e eː ə~ɐ əː o oː æ~ɛ a aː ɔ/ | Maithili |
| 13 | /iː i eː e æː æ aː a ə oː o uː u/ | Sinhala |
| 10 | /i ɪ e ɛ · a ə · ɔ o ʊ u/ | Hindustani, Punjabi, Sindhi, Kacchi, Hindko, Rajasthani (most varieties) |
| 9 | /i ɪ e æ~ɛ · a ə · o ʊ u/ | W. Pahari (Dogri, Rudhari, Mandeali, Pangwali, Khashali, Churahi), Saraiki |
| /i ɪ e · a ə · ɔ o ʊ u/ | W. Pahari (Shodochi, Surkhuli) | |
| /i ɪ e ɛ · a · ɔ o ʊ u/ | W. Pahari (Jaunsari, Shoracholi, Kullui) | |
| 8 | /i e ɛ · a ə · ɔ o u/ | Gujarati |
| /i e ɛ a · ɒ ɔ o u/ | Assamese | |
| /i ɪ e · a ə · o ʊ u/ | Halbi, Bhatri, W. Pahari (Garhwali, Chameali, Gaddi) | |
| 7 | /i e æ · a · ɔ o u/ | Bengali |
| 6 | /i e a · ɔ o u/ | Odia, Bishnupriya Manipuri |
| /i e · a ə · o u/ | Marathi, Lambadi, Sadri/Sadani | |
| /i e · a ʌ · o u/ | Nepali | |
| 5 | /i e · a · o u/ | Romani (European dialects) |
Sylheti language is one of the few tonal Indo-Aryan languages, others being Punjabi and a few Dardic languages. The vowels of Sylheti language listed below.[53]
| Vowels | Languages | |
|---|---|---|
| 5 | /i e · a · ɔ u/ | Sylheti |
Charts
[edit]The following are consonant systems of major and representative New Indo-Aryan languages, mostly following Masica (1991:106–107), though here they are in IPA. Parentheses indicate those consonants found only in loanwords: square brackets indicate those with "very low functional load". The arrangement is roughly geographical.
|
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sociolinguistics
[edit]Register
[edit]In many Indo-Aryan languages, the literary register is often more archaic and utilises a different lexicon (Sanskrit or Perso-Arabic) than spoken vernacular. One example is Bengali's high literary form, Sādhū bhāṣā, as opposed to the more modern Calita bhāṣā (Cholito-bhasha).[56] This distinction approaches diglossia.
Language and dialect
[edit]In the context of South Asia, the choice between the appellations "language" and "dialect" is a difficult one, and any distinction made using these terms is obscured by their ambiguity. In one general colloquial sense, a language is a "developed" dialect: one that is standardised, has a written tradition and enjoys social prestige. As there are degrees of development, the boundary between a language and a dialect thus defined is not clear-cut, and there is a large middle ground where assignment is contestable. There is a second meaning of these terms, in which the distinction is drawn on the basis of linguistic similarity. Though seemingly a "proper" linguistics sense of the terms, it is still problematic: methods that have been proposed for quantifying difference (for example, based on mutual intelligibility) have not been seriously applied in practice; and any relationship established in this framework is relative.[57]
See also
[edit]Notes
[edit]References
[edit]- ^ a b "Development team" (PDF). inflibnet.ac.in. Retrieved 9 March 2024.
- ^ Reynolds, Mike; Verma, Mahendra (2007). "Indic languages". In Britain, David (ed.). Language in the British Isles. Cambridge University Press. pp. 293–307. ISBN 978-0-521-79488-6. Retrieved 4 October 2021.
- ^ Munshi, S (2009). "Indo-Aryan languages". In Keith Brown; Sarah Ogilvie (eds.). Concise Encyclopedia of Language of the World. Amsterdam: Elsevier. p. 522–528.
- ^ "Overview of Indo-Aryan languages". Encyclopædia Britannica. Retrieved 8 July 2018.
- ^ Various counts depend on where the line is drawn between a "dialect" and a "language".[citation needed] Glottolog 4.1 lists 224 languages.
- ^ Burde, Jayant (2004). Rituals, Mantras, and Science: An Integral Perspective. Motilal Banarsidass Publishers. p. 3. ISBN 978-81-208-2053-1.
The Aryans spoke an Indo-European language sometimes called the Vedic language from which have descended Sanskrit and other Indic languages ... Prakrit was a group of variants which developed alongside Sanskrit.
- ^ Jain, Danesh; Cardona, George (26 July 2007). The Indo-Aryan Languages. Routledge. p. 163. ISBN 978-1-135-79711-9.
... a number of their morphophonological and lexical features betray the fact that they are not direct continuations of R̥gvedic Sanskrit, the main base of 'Classical' Sanskrit; rather they descend from dialects which, despite many similarities, were different from R̥gvedic and in some regards even more archaic.
- ^ Chamber's Encyclopaedia, Volume 7. International Learnings Systems. 1968.
Most Aryan languages of India and Pakistan belong to the Indo-Aryan family, and are descended from Sanskrit through the intermediate stage of Prakrit. The Indo-Aryan languages are by far the most important numerically and the territory occupied by them extends over the whole of northern and central India and reaches as far south as Goa.
- ^ Donkin, R. A. (2003). Between East and West: The Moluccas and the Traffic in Spices Up to the Arrival of Europeans. American Philosophical Society. p. 60. ISBN 9780871692481.
The modern, regional Indo-Aryan languages developed from Prakrt, an early 'unrefined' (prakrta) form of Sanskrit, around the close of the first millennium A.D.
- ^ Standard Hindi first language: 260.3 million (2001), as second language: 120 million (1999). Urdu L1: 68.9 million (2001–2014), L2: 94 million (1999): Ethnologue 19.
- ^ Bengali or Bangla-Bhasa, L1: 242.3 million (2011), L2: 19.2 million (2011), Ethnologue
- ^ "Världens 100 största språk 2010" [The world's 100 largest languages in 2010]. Nationalencyclopedin (in Swedish). Government of Sweden publication. Archived from the original on 11 November 2012. Retrieved 30 August 2013.
- ^ "Punjabi speaking countries". WorldData.info.
- ^ Bryant, Edwin Francis; Patton, Laurie L. (2005). The Indo-Aryan Controversy: Evidence and Inference in Indian History. Routledge. pp. 246–247. ISBN 978-0-7007-1463-6.
- ^ Masica (1991), p. 25.
- ^ Masica (1991), pp. 446–463.
- ^ a b c Kogan, Anton I. (2016). "Genealogical classification of New Indo-Aryan languages and lexicostatistics" (PDF). Journal of Language Relationship. 14 (4): 227–258. doi:10.31826/jlr-2017-143-411. S2CID 212688418.
- ^ Eberhard, David M.; Simons, Gary F.; Fennig, Charles D., eds. (2020). Ethnologue: Languages of the World (23rd ed.). Dallas, Texas: SIL International.
- ^ Hammarström, Harald; Forkel, Robert; Haspelmath, Martin, eds. (2017). "Indo-Aryan". Glottolog 3.0. Jena, Germany: Max Planck Institute for the Science of Human History.
- ^ Kogan, Anton I. (2005). Dardskie yazyki. Geneticheskaya kharakteristika [Dardic language. Genetic characteristic] (in Russian). Moskva: Vostochnaya literatura.
- ^ Southworth, Franklin C. (2005). Linguistic archaeology of South Asia. Routledge. ISBN 0-415-33323-7.
- ^ Zoller, Claus Peter (2016). "Outer and Inner Indo-Aryan, and northern India as an ancient linguistic area". Acta Orientalia. 77: 71–132.
- ^ Sigfried J. de Laet. History of Humanity: From the seventh to the sixteenth century UNESCO, 1994. ISBN 9231028138 p 734
- ^ a b Ray, Tapas S. (2007). "Eleven: "Oriya"". In Jain, Danesh; Cardona, George (eds.). The Indo-Aryan Languages. Routledge. p. 445. ISBN 978-1-135-79711-9.
- ^ Cardona, George; Jain, Dhanesh, eds. (2003). "The historical context and development of Indo-Aryan". The Indo-Aryan Languages. Routledge language family series. London: Routledge. pp. 46–66. ISBN 0-7007-1130-9.
- ^ Claus, Peter J.; Diamond, Sarah; Mills, Margaret Ann (2003). "Afghanistan, Bangladesh, India". South Asian folklore: an encyclopedia. Routledge. p. 203.
- ^ Peterson, John (2017). "The prehistorical spread of Austro-Asiatic in South Asia Archived 11 April 2018 at the Wayback Machine". Presented at ICAAL 7, Kiel, Germany.
- ^ Ivani, Jessica K.; Paudyal, Netra; Peterson, John (1 September 2020). "Indo-Aryan – a house divided? Evidence for the east–west Indo-Aryan divide and its significance for the study of northern South Asia". Journal of South Asian Languages and Linguistics. 7 (2): 287–326. doi:10.1515/jsall-2021-2029. ISSN 2196-078X.
- ^ Hammarström, Harald; Forkel, Robert; Haspelmath, Martin, eds. (2017). "Tharuic". Glottolog 3.0. Jena, Germany: Max Planck Institute for the Science of Human History.
- ^ Hammarström, Harald; Forkel, Robert; Haspelmath, Martin, eds. (2017). "Kuswaric". Glottolog 3.0. Jena, Germany: Max Planck Institute for the Science of Human History.
- ^ Hammarström, Harald; Forkel, Robert; Haspelmath, Martin, eds. (2017). "Chinali–Lahul Lohar". Glottolog 3.0. Jena, Germany: Max Planck Institute for the Science of Human History.
- ^ Paul Thieme, The 'Aryan' Gods of the Mitanni Treaties. JAOS 80, 1960, 301–17
- ^ Parpola, Asko (2015). The Roots of Hinduism: The Early Aryans and The Indus Civilization. Oxford University Press.
- ^ Oberlies, Thomas (2007). "Chapter Five: Aśokan Prakrit and Pāli". In Cardona, George; Jain, Danesh (eds.). The Indo-Aryan Languages. Routledge. p. 179. ISBN 9781135797119.
- ^ Gombrich, Richard (14 April 2006). Theravada Buddhism: A Social History from Ancient Benares to Modern Colombo. Routledge. p. 24. ISBN 978-1-134-90352-8.
- ^ Kulshreshtha, Manisha; Mathur, Ramkumar (24 March 2012). Dialect Accent Features for Establishing Speaker Identity: A Case Study. Springer Science & Business Media. p. 16. ISBN 978-1-4614-1137-6.
- ^ Nunley, Robert E.; Roberts, Severin M.; Wubrick, George W.; Roy, Daniel L. (1999). The Cultural Landscape an Introduction to Human Geography. Prentice Hall. ISBN 978-0-13-080180-7.
... Hindustani is the basis for both languages ...
- ^ "Urdu and its Contribution to Secular Values". South Asian Voice. Archived from the original on 11 November 2007. Retrieved 26 February 2008.
- ^ "Hindi/Urdu Language Instruction". University of California, Davis. Archived from the original on 3 January 2015. Retrieved 3 January 2015.
- ^ "Ethnologue Report for Hindi". Ethnologue. Retrieved 26 February 2008.
- ^ Zwartjes, Otto (2011). Portuguese Missionary Grammars in Asia, Africa and Brazil, 1550–1800. John Benjamins Publishing. ISBN 978-9027283252.
- ^ *Matras, Y. (2012). A grammar of Domari. Berlin: De Gruyter Mouton (Mouton Grammar Library).
- ^ "History of the Romani language". Archived from the original on 6 October 2022. Retrieved 16 July 2016.
- ^ "GYPSY ii. Gypsy Dialects – Encyclopaedia Iranica". Archived from the original on 2 April 2015. Retrieved 25 March 2015. Encyclopædia Iranica
- ^ Tiwari, Bholanath (1970). Tajuzbeki. National Publishing House.
- ^ "Romani (subgroup)". SIL International. n.d. Retrieved 15 September 2013.
- ^ Jahdhami, Said Humaid Al (31 July 2022). "Maimani Language and Lawati Language: Two Sides of the Same Coin?". Journal of Modern Languages. 32 (1): 37–57. doi:10.22452/jml.vol32no1.3. ISSN 2462-1986.
- ^ Masica (1991:94–95)
- ^ Cardona & Jain (2003), p. 932.
- ^ In Sanskrit, probably /cɕ/ is a more correct representation. Sometimes, only for representation, /c/ is also used.
- ^ Masica (1991:95–96)
- ^ Masica (1991:101–102)
- ^ Mahanta, Shakuntala; Gope, Amalesh (1 September 2018). "Tonal polarity in Sylheti in the context of noun faithfulness". Language Sciences. 69: 81. doi:10.1016/j.langsci.2018.06.010. ISSN 0388-0001. S2CID 149759441.
- ^ Gope, Amalesh; Mahanta, Shakuntala (2015). An Acoustic Analysis of Sylheti Phonemes (PDF). ICPhS 2015. Glasgow. Retrieved 11 November 2022.
- ^ Pandey, Anshuman (10 September 2010). "Proposal to Encode the Sindhi Script in ISO/IEC 10646" (PDF). Retrieved 11 November 2022.
- ^ Masica 1991, p. 57.
- ^ Masica 1991, pp. 23–27.
Further reading
[edit]- Morgenstierne, Georg. "Early Iranic Influence upon Indo-Aryan." Acta Iranica, I. série, Commemoration Cyrus. Vol. I. Hommage universel (1974): 271–279.
- John Beames, A comparative grammar of the modern Aryan languages of India: to wit, Hindi, Panjabi, Sindhi, Gujarati, Marathi, Oriya, and Bangali. Londinii: Trübner, 1872–1879. 3 vols.
- Madhav Deshpande (1979). Sociolinguistic attitudes in India: An historical reconstruction. Ann Arbor: Karoma Publishers. ISBN 0-89720-007-1, ISBN 0-89720-008-X (pbk).
- Chakrabarti, Byomkes (1994). A comparative study of Santali and Bengali. Calcutta: K.P. Bagchi & Co. ISBN 81-7074-128-9
- Erdosy, George. (1995). The Indo-Aryans of ancient South Asia: Language, material culture and ethnicity. Berlin: Walter de Gruyter. ISBN 3-11-014447-6.
- Kausen, Ernst (2006). "Die Klassifikation der indogermanischen Sprachen (Microsoft Word, 133 KB)".
- Kobayashi, Masato.; & George Cardona (2004). Historical phonology of old Indo-Aryan consonants. Tokyo: Research Institute for Languages and Cultures of Asia and Africa, Tokyo University of Foreign Studies. ISBN 4-87297-894-3.
- Masica, Colin (1991), The Indo-Aryan Languages, Cambridge: Cambridge University Press, ISBN 978-0-521-29944-2.
- Misra, Satya Swarup. (1980). Fresh light on Indo-European classification and chronology. Varanasi: Ashutosh Prakashan Sansthan.
- Misra, Satya Swarup. (1991–1993). The Old-Indo-Aryan, a historical & comparative grammar (Vols. 1–2). Varanasi: Ashutosh Prakashan Sansthan.
- Sen, Sukumar. (1995). Syntactic studies of Indo-Aryan languages. Tokyo: Institute for the Study of Languages and Foreign Cultures of Asia and Africa, Tokyo University of Foreign Studies.
- Vacek, Jaroslav. (1976). The sibilants in Old Indo-Aryan: A contribution to the history of a linguistic area. Prague: Charles University.
External links
[edit]- The Indo-Aryan languages, 25 October 2009
- The Indo-Aryan languages Colin P.Masica
- Survey of the syntax of the modern Indo-Aryan languages (Rajesh Bhatt), 7 February 2003.
Indo-Aryan languages
View on GrokipediaClassification
Chronological stages
The Indo-Aryan languages evolved through distinct chronological stages—Old Indo-Aryan (OIA), Middle Indo-Aryan (MIA), and New Indo-Aryan (NIA)—each defined by progressive linguistic innovations observable in textual attestations and reconstructed sound shifts from earlier Proto-Indo-Aryan forms.[8] OIA, attested from approximately 1500 BCE to 500 BCE, is represented mainly by Vedic Sanskrit in the Rigveda and subsequent Vedic corpora, retaining Proto-Indo-European traits such as eight nominal cases, three numbers (singular, dual, plural), and a synthetic verbal system with active, middle, and passive voices.[8] This stage shows minimal deviation from reconstructed Proto-Indo-Aryan, with features like the ruki rule (where s becomes ṣ after r, u, k, i) already operative, linking it to broader Indo-Iranian developments.[9] MIA, spanning roughly 600 BCE to 1000 CE and documented in Prakrit inscriptions (e.g., Aśokan edicts from the 3rd century BCE) and literary works, features systematic phonological reductions including monophthongization of diphthongs ai and au to e and o, replacement of vocalic liquids ṛ and ḷ with a, i, or u, shortening of long vowels before consonant clusters, and simplification of intervocalic stops and clusters via gemination or assimilation.[10] Morphologically, MIA simplifies OIA's complex endings—merging feminine i-/u- declensions into ī-/ū-, eliminating the dual, thematicizing athematic stems, and reducing cases from eight to a core set (often nominative, accusative/oblique, genitive)—while shifting toward analytic structures with postpositions supplanting inflections.[10][9] The middle voice fades, and verbal forms increasingly derive from present stems, with passive functions handled by active endings.[10] Apabhramśas, emerging in late MIA from the 6th to 13th centuries CE, mark the transition to NIA through intensified case erosion (yielding absolutive-oblique distinctions), loss of synthetic perfects and aorists in favor of participial periphrases, and nascent postpositional syntagms that restructure spatial and relational notions previously encoded inflectionally.[11][9] These varieties, attested in Jain and Buddhist texts, exhibit regional divergences, such as in Western Apabhramśa contributing to animacy-based pronominal systems.[9] NIA stages, post-1000 CE, consolidate these trends into fully analytic grammars, with hallmark innovations like split-ergative alignment—wherein transitive subjects in perfective tenses receive ergative marking (e.g., via postpositions derived from genitives)—contrasting with accusative alignment in imperfectives, alongside expanded serial verb constructions and lexical aspect marking via auxiliaries.[9] This ergativity, absent in OIA and incipient in MIA, reflects remodeling of the aspectual system, where past participles combine with light verbs to encode perfectivity.[9]Subgrouping hypotheses
The subgrouping of Indo-Aryan languages relies primarily on identifying bundles of isoglosses—shared phonological, morphological, and lexical innovations—that indicate common descent or areal convergence, rather than geographic proximity alone, given the dialect continuum nature of the family across the Indian subcontinent.[12] Early classifications, such as those in Grierson's Linguistic Survey of India (1903–1928), emphasized northwest-to-southeast gradients but often conflated linguistic evidence with presumed migration paths, leading to critiques that subgrouping should prioritize empirical comparative data over speculative historical narratives. Modern approaches, informed by computational phylogenetics, test hypotheses against large datasets like Turner's Comparative Dictionary of the Indo-Aryan Languages (1966), which catalogs over 13,000 etymologies across dozens of varieties.[13] The Inner–Outer hypothesis, a century-old framework, divides the family into an "inner" core of northwestern and central languages (e.g., those retaining more conservative features akin to Vedic Sanskrit) and an "outer" periphery of eastern and southern varieties, posited to reflect early dialectal fragmentation or substrate influences.[14] Key isoglosses include outer-specific innovations such as vocalic *ṛ > a (versus ī in inner), past tense suffixes in *-l- (versus *-t- or *-s-), and enhanced retroflexion patterns, potentially signaling peripheral developments from contact with non-Indo-Aryan substrates.[15] Proponents like Southworth (2005) and Zoller argue these reflect distinct proto-stages, but skeptics such as Masica (1991) highlight overlapping "genetic zones" where features diffuse across proposed boundaries, complicating a binary split. A 2019 Bayesian analysis of lexical cognates from 33 languages supported cohesive core-periphery clustering but found the traditional inner-outer demarcation only partially corroborated, with model probabilities favoring gradual divergence over sharp subgroups.[14] Complementing this, a 2021 structural study of 16 Indo-Aryan languages across 217 morphosyntactic features (e.g., case alignment, agreement patterns, and periphrastic constructions) revealed a robust east-west divide, with western varieties (northwestern cluster) statistically distinct from eastern and southern ones in dimensions like verb morphology and nominal inflection. Hierarchical clustering and principal component analysis in the study quantified this split, attributing it to post-Old Indo-Aryan innovations rather than geography per se, and cautioned against overvaluing Sanskrit's prestige, which has skewed classifications toward northwestern conservatism by privileging attested Vedic texts over underrepresented eastern prakrits. Such data-driven critiques reject outdated ties to racial or unidirectional invasion models, insisting on falsifiable isogloss criteria to avoid circular reasoning from incomplete corpora.[14]Dardic and transitional languages
The Dardic languages, including Kashmiri spoken by approximately 7 million people in the Kashmir Valley and Shina by around 500,000 in northern Pakistan's Gilgit-Baltistan and Khyber Pakhtunkhwa regions, were proposed as a third primary branch of Indo-Iranian by George Grierson in his Linguistic Survey of India between 1919 and 1928, distinct from both Indo-Aryan and Iranian due to perceived archaic traits and geographic isolation in the Hindu Kush.[16] This classification emphasized features such as retention of voiced aspirates from Proto-Indo-Iranian, which Iranian languages lost through deaspiration, and certain palatalizations aligning with satem developments shared across Indo-Iranian but interpreted as bridging to Iranian peripheries.[17] Grierson's grouping encompassed subgroups like Chitral (e.g., Khowar), Shina, and Kashmiri, viewing them as relics of pre-Vedic Indo-Iranian diversity rather than derived from central Indo-Aryan Prakrits.[16] Subsequent analyses rejected Grierson's separation, reassigning Dardic to the Indo-Aryan branch based on shared phonological innovations, such as the development of voiced fricatives (e.g., /z/, /ɣ/) absent in Old Indo-Aryan and most core Indo-Aryan descendants, and morphological parallels like ergative alignment patterns evolving from Middle Indo-Aryan.[17][18] Georg Morgenstierne's fieldwork in the 1920s–1950s demonstrated genetic affinity with Indo-Aryan through vocabulary cognates and syntactic structures, positioning Dardic as Northwestern Indo-Aryan peripherals shaped by areal convergence rather than archaic isolation.[18] For instance, Shina exhibits SOV word order and postpositions typical of Himalayan Indo-Aryan, while Kashmiri's partial SVO tendencies reflect contact-driven shifts but retain Indo-Aryan core lexicon exceeding 70% overlap with Sanskrit-derived forms.[19][20] These languages occupy a northwest continuum, exhibiting transitional traits from substrate and adstrate effects, including lexical borrowings from now-separate Nuristani languages (e.g., Kati group), which Georg Strand disentangled from Dardic in 1973 based on distinct innovations like centum-like sibilant reflexes absent in Indo-Aryan.[21] Nuristani contact, rather than substrate dominance, accounts for isolated phonological quirks in Dardic, such as variable retroflexion patterns, without undermining their Indo-Aryan phylogeny; empirical tree reconstructions using probabilistic models confirm Dardic clustering within Indo-Aryan outer subgroups, contra third-branch hypotheses.[22] This peripheral conservatism—retaining aspirates amid regional pressures—highlights causal dynamics of geographic barriers preserving select Proto-Indo-Aryan elements while core areas underwent uniform Prakrit-level changes.[18]Major zonal groups
The major zonal groups of modern Indo-Aryan languages are delineated primarily by geographical distribution in the Indian subcontinent, supplemented by evidence from shared phonological innovations (such as tone development or aspiration loss), morphological patterns (like gender systems or verbal suffixes), and quantitative measures including lexicostatistics and mutual intelligibility assessments, which reveal dialect continua rather than strict genetic trees.[23][12] These groupings refine earlier colonial-era surveys, such as George Grierson's Linguistic Survey of India (1903–1928), by prioritizing empirical clustering over arbitrary boundaries; for instance, lexicostatistical analyses of core vocabulary show cognate percentages clustering above 70% within zones, indicating recent common development.[23] The Northwestern Zone, encompassing languages like Lahnda (including Hindko, Siraiki, and Pothwari), Sindhi, and Dardic varieties (such as Shina and Kashmiri), is characterized by archaic retentions like implosive consonants, retroflex flaps, and ergative alignments, with geographical focus in Pakistan's Punjab and northwestern India; mutual intelligibility is high among Lahnda dialects (over 80% lexical similarity), supporting their coherence despite substrate influences from Iranian or Tibeto-Burman languages.[23] The Northern Zone (Pahari group), spoken in Himalayan foothills, includes Nepali (over 16 million speakers as of 2011), Garhwali, and Kumaoni, unified by innovations like tone systems and geminate consonant retention, with Nepali serving as a lingua franca; dialectometry highlights continuity from western to eastern Pahari, though hill isolates exhibit low intelligibility (below 50%) due to local substrates.[23][12] In the Western Zone, languages such as Gujarati (around 55 million speakers in 2011), Rajasthani (including Marwari), and Bhili predominate in Gujarat and Rajasthan, sharing features like the retroflex lateral /ɭ/ and three-gender systems (masculine, neuter, feminine), with lexicostatistical data showing 75–85% similarity among them, distinguishing them from neighboring central varieties.[23] The Central Zone, centered on the Hindi Belt, features Hindi-Urdu (over 500 million speakers combined in 2021 estimates), Braj, and Bundeli, defined by two-gender (masculine/feminine) morphology and conjunct verb constructions, where high mutual intelligibility (90%+ for dialects) forms a midland continuum based on phonological metrics like aspirated nasal preservation.[23][12] The Eastern Zone includes Bengali (over 230 million speakers in 2011), Odia, Assamese, and Bihari varieties (Maithili, Magahi, Bhojpuri), marked by sibilant mergers, gender loss, and postposed subordinators; Bihari acts as a transitional bridge to the Central Zone, with Bhojpuri showing 70–80% lexical overlap with western Hindi dialects despite eastern phonological shifts, per refined lexicostatistical studies that challenge strict zonal divides.[23][12] The Southern Zone, comprising Marathi (around 83 million speakers in 2011) and Konkani, exhibits Dravidian substrate effects like prenasalized stops and verb-final tendencies, with mutual intelligibility clustering tightly (80%+ similarity) in Maharashtra's coastal and inland areas.[23] Certain hill and peripheral varieties, such as some Dardic or eastern Pahari isolates, remain unclassified due to low cognate matches (under 60%) with major zones, reflecting heavy substrate interference and isolation, as evidenced by dialectometric distances exceeding zonal norms.[23][12]Origins and historical development
Proto-Indo-Aryan within Indo-Iranian
Proto-Indo-Aryan (*pIA) is the reconstructed proto-language ancestral to the Indo-Aryan branch, diverging from Proto-Indo-Iranian (*pIIr) around 2000–1800 BCE through application of the comparative method to early attested forms in Vedic Sanskrit and coordination with Avestan evidence.[24] This stage preserves *pIIr innovations diagnostic of their joint separation from broader Indo-European, such as satem palatalization of Proto-Indo-European *ḱ, *ǵ to sibilants (*ś, *ź) and the ruki rule, whereby intervocalic *s assimilates to a palatal or retroflex sibilant following *r, *u, *k, or *i, yielding forms like *bráhman- 'prayer' from *bʰreh₂mṇ- with *s > ś after *r.[25] These shared phonological shifts, absent in centum branches like Greek or Italic, are complemented by retained vocabulary illustrating deeper Indo-European links, such as the term for 'father' *ph₂tḗr, reflected in *pIA *pitṛ́- (Sanskrit pitṛ), Iranian *pitā- (Avestan pitar-), Latin pater, and Greek patḗr.[26] These shared phonological shifts and lexical items substantiate *pIIr unity before the *pIA-Iranian split, with causal divergence arising from geographic dispersal of pastoralist groups post-Andronovo horizon (ca. 2000–1500 BCE), as Indo-Aryan speakers separated southward while Iranian groups consolidated eastward and southward.[24] Lexical and morphological distinctions mark *pIA innovation, including the semantic specialization of *déwH- 'shining/divine' to devá- denoting benevolent gods in Indo-Aryan ritual contexts, contrasting Iranian daēuua- recast as malevolent entities in Zoroastrian opposition to *asura- 'lord' elevated to ahura-.[27] Retained *pIIr morphology includes thematic verbs with *-ati endings (e.g., *bʰárati 'carries') and augment *e- for past tenses, but *pIA shows early drift in ablaut patterns and sandhi rules favoring retroflexion, as in *sáhas- 'strength' influencing later developments.[28] The earliest non-Indian attestation of *pIA appears in Mitanni kingdom documents from northern Mesopotamia circa 1700–1400 BCE, where an Indo-Aryan superstrate overlays Hurrian substrates, evidenced by treaty invocations to deities *mitra-, *varuṇa-, *indra-, and numerals *áika- 'one', *téra- 'three', *sátu- 'seven' mirroring Vedic forms and diverging from Iranian cognates like Avestan aiβi-, θri-, hapta-.[29] This peripheral evidence, predating Rigvedic composition (ca. 1500–1200 BCE), indicates *pIA speakers had dispersed beyond core *pIIr zones by the late Bronze Age, with linguistic isolation reinforcing branch-specific evolutions like the merger of *pIIr *ć, *j to Indo-Aryan *j while Iranian developed distinct affricates.[25] Such attestations, derived from cuneiform archives rather than interpretive narratives, anchor *pIA reconstruction empirically, underscoring splits driven by migratory ecology rather than isolated cultural stasis.Evidence from linguistics, archaeology, and genetics
Linguistic evidence for the external origins of Indo-Aryan languages includes the presence of Dravidian loanwords in Old Indo-Aryan texts from the middle Rigvedic period around 1200 BCE, indicating substrate influence on incoming Indo-Aryan speakers rather than vice versa.[30] This directional borrowing pattern, with over 300 Dravidian-derived terms in Sanskrit for agriculture, flora, and fauna absent in earlier Indo-European branches, supports an influx of Indo-Aryan into a pre-existing non-Indo-European linguistic landscape.[31] Additionally, the absence of centum-like phonetic retentions in potential South Asian substrates aligns with Indo-Aryan as a satem branch derived externally, without local evolution from a centum substrate.[32] Archaeological correlations point to cultural shifts post-dating the Harappan decline around 1900 BCE, including the introduction of horse-drawn chariots linked to Sintashta-Petrovka cultures in the steppe (circa 2100–1800 BCE), which align temporally with Proto-Indo-Iranian material culture preceding Vedic assemblages.[33] Harappan sites lack evidence of domesticated horses or spoked-wheel chariots, technologies central to Rigvedic descriptions, suggesting their post-Harappan adoption via external technological diffusion rather than indigenous development.[34] These shifts coincide with the Late Harappan phase, marked by urban abandonment and ruralization, facilitating subsequent pastoralist integrations.[35] Genetic data provide the most robust evidence, with ancient DNA analyses revealing a significant influx of Steppe Bronze Age ancestry into South Asia between 2000 and 1500 BCE, correlating with Indo-Aryan language spread.30967-5) The 2019 study of Swat Valley samples (circa 1200 BCE) shows admixture of local Indus periphery ancestry with steppe-derived male lineages, particularly R1a-Z93 haplogroup, at frequencies up to 30% in northern populations today.[36] This migration exhibits male-biased dispersal, as evidenced by Y-chromosome R1a dominance contrasting lower autosomal steppe components, consistent with elite-driven language shifts.[37] Harappan genomes from Rakhigarhi (circa 2600 BCE) confirm absence of steppe ancestry, underscoring its post-IVC introduction.30967-5) Among disciplines, genetics offers the strongest quantitative support for migration scale, while linguistics elucidates shift mechanisms.Debates on migration and indigenous origins
The debate over the origins of Indo-Aryan languages centers on whether speakers migrated into the Indian subcontinent from the Pontic-Caspian steppe region around 2000–1500 BCE or developed indigenously within Indian subcontinent. The migration hypothesis posits that Proto-Indo-Aryan speakers, part of the broader Indo-Iranian branch, entered via northwestern routes, introducing Indo-European linguistic elements through processes potentially involving elite dominance rather than large-scale population replacement.[38] This view aligns with linguistic phylogenies tracing Indo-European roots to steppe pastoralists, where shared innovations like satemization distinguish Indo-Iranian from other branches.[6] Proponents of the indigenous origins or Out-of-India theory argue for continuity between the Indus Valley Civilization (IVC, circa 3300–1900 BCE) and Vedic culture, citing geographical references in the Rigveda—such as rivers like the Sarasvati—as evidence of an ancient Indian homeland for Indo-Europeans, with supposed outward migrations explaining global distribution.[39] They claim cultural and possibly script-based links between undeciphered IVC symbols and early Brahmi-derived writing, positing that Indo-Aryan languages evolved in situ without external influx. However, these arguments falter on the undeciphered status of IVC script, which shows no verifiable Proto-Indo-European (PIE) traces, and the absence of pre-2000 BCE linguistic evidence for PIE in the Indian subcontinent, rendering claims of continuity speculative and unfalsifiable.[40] Genetic data, including ancient DNA from sites like Rakhigarhi (IVC, lacking steppe ancestry) and post-2000 BCE Swat Valley samples (showing 10–30% steppe-related components in northern populations), supports influx timing with Indo-Aryan arrival, correlating with linguistic shifts but indicating admixture rather than conquest.[41] Critiques of indigenous theory highlight its incompatibility with the centum-satem isogloss and lack of Dravidian loanwords in European Indo-European branches, which would be expected under an Indian origin.[42] Political motivations influence both sides: Indian nationalist perspectives often dismiss migration evidence to preserve narratives of unbroken civilizational primacy, selectively ignoring genetic and archaeological data despite their empirical weight, while earlier Western colonial framings emphasized violent invasion without substantiating mass destruction, now refined to models of gradual elite-mediated language shift fitting the sparse archaeological record of disruption.[43] Causally, the migration model better integrates multidisciplinary evidence—linguistic divergence, genetic admixture post-IVC decline, and absence of early PIE markers in India—outweighing indigenous claims, which rely on interpretive reinterpretations lacking positive, predictive support.[44]Old Indo-Aryan
Old Indo-Aryan constitutes the earliest attested phase of the Indo-Aryan branch, spanning roughly 1500–500 BCE, with its primary representatives in Vedic Sanskrit and the subsequent Classical Sanskrit. The language appears in the Vedic corpus, a collection of orally composed religious texts that preserve archaic Indo-European features such as the instrumental-plural in -bhis, athematic verbs, and inherited vocabulary for kinship and cosmology. These texts reflect a society emphasizing ritual hymns, sacrifices, and cosmology, with linguistic evidence pointing to composition in the Punjab region amid pastoral and early agrarian contexts. The Rigveda, comprising 1,028 hymns in 10 books, stands as the oldest document, dated by linguistic and astronomical analysis to circa 1500–1200 BCE for its core layers, though transmission remained oral until much later. Subsequent Vedic layers include the Sāmaveda (melodic chants derived from Rigveda hymns), Yajurveda (prose ritual formulas), and Atharvaveda (spells and domestic rites), extending into the late Vedic period around 1200–500 BCE. This corpus exhibits grammatical archaisms like the retention of the dual number across nouns, verbs, and pronouns, alongside eight noun cases and a verbal system distinguishing aorist, imperfect, perfect, and injunctive moods, enabling precise expression of agency, tense, and aspect in ritual contexts. By the late Vedic phase, texts such as the Brāhmaṇas and early Upaniṣads reveal subtle innovations, including the augmentation of verbal roots and simplification of some sandhi rules, signaling dialectal diversification as Indo-Aryan speakers expanded eastward. Hints of regional variants emerge, with western forms retaining older phonology (e.g., consistent s for intervocalic sounds) contrasted against eastern influences in texts like the Śatapatha Brāhmaṇa, where phonetic lenitions and lexical borrowings suggest interaction with non-Indo-Aryan substrates. Classical Sanskrit emerged as a codified norm through Pāṇini's Aṣṭādhyāyī (circa 400 BCE), a generative grammar of approximately 4,000 sūtras that standardized late Vedic usage for epic poetry like the Mahābhārata and philosophical treatises, prioritizing inflectional rigor over spoken variability while preserving core OIA morphology. This standardization facilitated a thematic lexicon centered on ṛta (cosmic order), deva (deities), and sacrificial terminology, underscoring continuity in religious and intellectual traditions.Middle Indo-Aryan
Middle Indo-Aryan (MIA) encompasses the developmental stage of Indo-Aryan languages from roughly 600 BCE to 1000 CE, marked by phonological simplification, morphological streamlining, and the diversification into multiple Prakrit dialects spoken across northern and central India.[45] These languages evolved from Old Indo-Aryan through processes of erosion, including the reduction of complex vowel systems and the assimilation of local substrates, leading to greater dialectal variation than in prior stages.[46] The earliest documented evidence of MIA appears in the rock edicts of Emperor Ashoka, inscribed circa 260–232 BCE in eastern Prakrit varieties, which reflect vernacular speech patterns diverging from classical Sanskrit.[47] Literary standardization emerged with Pali, a western Prakrit used in the Buddhist Tipitaka canon compiled from oral traditions dating to the 5th–3rd centuries BCE, and Ardhamagadhi, an eastern variety preserved in Jain Agamas representing teachings from the 6th century BCE onward.[10] These texts facilitated the dissemination of Buddhist and Jain doctrines among non-elite populations, highlighting MIA's role as a medium for religious vernacularization rather than elite liturgical use.[48] Phonological innovations included vowel mergers—such as the collapse of distinctions between short *ṛ and *a in many contexts—and the widespread deletion of final consonants, contributing to syllable structure simplification and prosodic shifts.[46] Morphologically, MIA featured the elimination of the dual number across nominal paradigms, thematicization of athematic consonant stems (e.g., via vowel insertion), and consolidation of i-/u-stems into ā-like patterns alongside ī-/ū mergers, reducing the inherited eight-case system toward fewer oppositions.[10] Dialectal proliferation is evident in regional Prakrits like Shauraseni (central), Maharashtri (western), and Magadhi (eastern), each exhibiting localized sound shifts and lexical variances, fostering a spectrum of spoken forms.[45] Substrate effects from pre-existing Dravidian and Munda (Austroasiatic) languages influenced MIA phonology, notably reinforcing retroflex consonants (e.g., ḍ, ṇ) absent in early Indo-Aryan inventories and introducing agglutinative traces in periphrastic constructions.[49] These non-Indo-Aryan contributions, likely from indigenous populations in the Gangetic plain, accelerated erosion of Indo-European case endings and promoted analytic tendencies.[50] In the later MIA phase, Apabhramsha dialects (circa 6th–13th centuries CE) represented further dialectal fragmentation and phonological decay, with intensified vowel leveling, consonant cluster reductions, and nominal case loss, positioning them as direct antecedents to emergent New Indo-Aryan vernaculars through intermediate poetic and inscriptional attestations.[51] This transitional erosion underscored MIA's role in bridging synthetic Old Indo-Aryan structures with the more isolating patterns of later stages.[10]New Indo-Aryan emergence
The New Indo-Aryan (NIA) languages diversified from the Apabhramśa varieties of Middle Indo-Aryan around 1000–1200 CE, coinciding with the political fragmentation of northern and central India following the decline of centralized empires like the Gurjara-Pratiharas and the onset of Turkic invasions from 1001 CE onward under Mahmud of Ghazni. This era saw the Delhi Sultanate (1206–1526 CE) and subsequent regional kingdoms foster vernacular literatures in courts and trade hubs, accelerating the shift from Sanskrit-dominated elites to spoken dialects influenced by Persian and Arabic via administrative and mercantile interactions.[52][53] Regional isolation in fragmented polities, such as the Bengal Sultanate (1352–1576 CE) and Deccan kingdoms, promoted independent phonological and lexical innovations, yielding distinct modern forms like the Eastern and Southern NIA branches.[8] In the Ganges-Yamuna Doab, the Khariboli dialect of Western Hindi emerged as a contact vernacular during the 12th–13th centuries, serving as a bridge language between Persian-speaking rulers and local populations amid invasions by the Ghurids and Delhi Sultanate forces; by the 14th century, it incorporated Perso-Arabic vocabulary, forming the basis for Hindustani, which later bifurcated into standardized Hindi (in Devanagari script) and Urdu (in Perso-Arabic script).[54][53] Similarly, Bengali crystallized from Gaudiya Apabhramśa in eastern Magadha around the 10th–11th centuries, with the earliest attestations in Charyapada poems (c. 8th–12th centuries, compiled post-1000 CE) and proliferation under the Bengal Sultanate's patronage of local poets, diverging through vowel shifts and SOV syntax reinforcements.[55][56] Gujarati and Marathi likewise consolidated in western and southern regions by the 13th century, tied to trade routes and bhakti movements that vernacularized devotional texts.[8] Standardization accelerated under British colonial administration from the 19th century, with the Linguistic Survey of India (1903–1928), directed by George Grierson, cataloging over 179 languages and dialects, including NIA varieties, through 50,000+ informant interviews; this influenced census classifications from 1901 onward, elevating Hindi (based on Khariboli) as a scheduled language.[57] Post-1947 independence, India's Constitution (1950) designated Hindi in Devanagari as an official language alongside English, spurring academies like the Central Hindi Directorate to codify grammar and promote diglossia, while Pakistan elevated Urdu; these policies reduced dialectal variation but sparked movements for regional NIA recognition, as in the States Reorganisation Act (1956).[58] In the 2020s, computational linguistics has addressed challenges in low-resource NIA languages like Sindhi and Magahi, with efficient neural machine translation models leveraging multilingual transfer learning to achieve BLEU scores of 20–30 for Indo-Aryan-to-English pairs, despite limited corpora under 1 million sentences; initiatives like IndoLib toolkits integrate these for NLP tasks in under-documented varieties.[59][60] Such models highlight persistent vitality amid urbanization, though they underscore data scarcity from historical fragmentation.[61]Linguistic features
Phonology
Indo-Aryan languages exhibit a consonant system characterized by five places of articulation—bilabial, dental/alveolar, retroflex, palatal, and velar—with stops in four series: voiceless unaspirated, voiceless aspirated, voiced unaspirated, and voiced breathy (murmured).[62] This retention of aspiration and breathy voice contrasts with the simplification in many other Indo-European branches, while the retroflex series represents an areal innovation influenced by substrate languages, featuring stops like /ʈ ʈʰ ɖ ɖʱ/ and often a retroflex approximant /ɻ/ or flap /ɽ/.[63] Fricatives are limited, typically including /s/ (dental or palato-alveolar) and /ɦ/ (breathy voiced glottal), with /ʂ/ (retroflex) appearing in some eastern varieties but merging with /s/ elsewhere; affricates /t͡ɕ t͡ɕʰ d͡ʑ d͡ʑʱ/ occur at the palatal place.[62] The following table illustrates a typical consonant inventory in many central and eastern Indo-Aryan languages, such as Hindi, using IPA notation:| Labial | Dental | Retroflex | Palatal | Velar | Glottal | |
|---|---|---|---|---|---|---|
| Plosive/ Affricate (voiceless unaspir.) | p | t | ʈ | t͡ɕ | k | |
| Plosive/ Affricate (voiceless aspir.) | pʰ | tʰ | ʈʰ | t͡ɕʰ | kʰ | |
| Plosive/ Affricate (voiced unaspir.) | b | d | ɖ | d͡ʑ | ɡ | |
| Plosive/ Affricate (breathy voiced) | bʱ | dʱ | ɖʱ | d͡ʑʱ | ɡʱ | |
| Nasal | m | n | ɳ | ɲ | ŋ | |
| Lateral approximant | l | ɭ | ||||
| Flap | ɾ | ɽ | ||||
| Fricative | s | ɦ |