Recent from talks
Nothing was collected or created yet.
Spurious languages
View on Wikipedia
Spurious languages are languages that have been reported as existing in reputable works, while other research has reported that the language in question did not exist. Some spurious languages have been proven to not exist. Others have very little evidence supporting their existence, and have been dismissed in later scholarship. Others still are of uncertain existence due to limited research.
Below is a sampling of languages that have been claimed to exist in reputable sources but have subsequently been disproved or challenged. In some cases a purported language is tracked down and turns out to be another, known language. This is common when language varieties are named after places or ethnicities.
Some alleged languages turn out to be hoaxes, such as the Kukurá language of Brazil or the Taensa language of Louisiana. Others are honest errors that persist in the literature despite being corrected by the original authors; an example of this is Hongote, the name given in 1892 to two Colonial word lists, one of Tlingit and one of a Salishan language, that were mistakenly listed as Patagonian. The error was corrected three times that year, but nonetheless "Hongote" was still listed as a Patagonian language a century later in Greenberg (1987).[1]: 133
In the case of New Guinea, one of the most linguistically diverse areas on Earth, some spurious languages are simply the names of language surveys that the data was published under. Examples are Mapi, Kia, Upper Digul, Upper Kaeme, listed as Indo-Pacific languages in Ruhlen 1987; these are actually rivers that gave their names to language surveys in the Greater Awyu languages and Ok languages of New Guinea.[2]
Dubious languages
[edit]Dubious languages are those whose existence is uncertain. They include:
Spurious according to Ethnologue and ISO 639-3
[edit]Following is a list of ISO 639-3 language codes which have been retired since the standard was established in 2006, arranged by the year in which the actual retirement took effect; in most cases the change request for retirement was submitted in the preceding year. Also included is a partial list of languages (with their SIL codes) that appeared at one time in Ethnologue but were removed prior to 2006, arranged by the first edition in which they did not appear.
The list includes codes that have been retired from ISO 639-3 or languages removed from Ethnologue because the language apparently does not exist and cannot be identified with an existing language. The list does not include instances where the "language" turns out to be a spelling variant of another language or the name of a village where an already known language is spoken; these are cases of duplicates, which are resolved in ISO 639-3 by a code merger. It does include "languages" for which there is no evidence or which cannot be found. (In some cases, however, the evidence for nonexistence is a survey among the current population of the area, which would not identify extinct languages such as Ware below.)
SIL codes are upper case; ISO codes are lower case. Once retired, ISO 639-3 codes are not reused.[6] SIL codes that were retired prior to 2006 may have been re-used or may have reappeared as ISO codes for other languages.
Removed from Ethnologue, 12th ed., 1992
[edit]- Itaem (PNG) [ITM]
- Marajona (Brazil) [MPQ]
- Nemeyam (PNG) [NMY]
- Nereyama, Nereyó (Brazil) [NRY]
- Numbiaí (Orelha de Pau) [NUH]
- Oganibi (PNG) [OGA]
- Tijuana Sign Language (Mexico) [TJS] – added to Ethnologue 1988 by mistake due to a misunderstanding, removed in 1992. No evidence that it ever existed.
- Tyeliri Senoufo [TYE] – the Tyeliri are a caste of leather workers, and do not have their own language
- Wagumi [WGM]
- Zanofil [ZNF] – name of an ethnic group that speaks Yongkom [yon]
Removed from Ethnologue, 13th ed., 1996
[edit]- Bibasa (PNG) [BHE] – described as "isolate in need of survey" in the 12th ed.
Removed from Ethnologue, 14th ed., 2000
[edit]- Alak 2 [ALQ] – a mislabeled fragment of a word list[7]
- Dzorgai [DZI], Kortse [KBG], Pingfang [PFG], Thochu [TCJ], Lofuchai (Lophuchai) [LFU], Wagsod [WGS] – old names for Qiangic languages, some of uncertain correspondence to currently recognized names
- Hsifan [HSI] – an ethnic name for people speaking a variety of Qiangic or Jiarongic languages
- Scandinavian Pidgin Sign Language [SPF] – normal inter-language contact, not an established pidgin
- Wutana (Nigeria) [WUW] – an ethnic name
Removed from Ethnologue, 15th ed., 2005
[edit]- Jiji [JIJ][8]
- Kalanke [CKN][9]
- Lewada-Dewara [LWD], incl. Balamula/Mataru[10]
- Lowland Semang [ORB][11] (though other languages without ISO codes, such as Wila', are also called Lowland Semang)
- Mutús [MUF][12] – suspected to exist, e.g. by Adelaar 2005
- Nchinchege [NCQ][13]
- Nkwak [NKQ][14] – same as Tanjijili? Also a possible synonym for Kwak (retired in 2015)
- Oso (Southern Fungom) [OSO] – no evidence it is distinct from Fungom and Bum[15]
- Rungi [RUR][16]
- Wamsak [WBD][17]
Retired 2007
[edit]- Miarrã [xmi] – unattested[18][19]
- Atuence [atf] – an old town name,[20] likely referring to Dêqên
- Amapá Creole [amd][21]
Retired 2008
[edit]- Amikoana (Amikuân) [akn][22]
- Land Dayak [dyk] – language family name, not individual language[23]
- Ware [wre][24] – Ware is listed as extinct in Maho (2009). When an SIL team in Tanzania were not able to find any evidence of it being spoken, the code was retired.
- Bahau River Kenyah [bwv], Kayan River Kenyah [knh], Mahakam Kenyah [xkm], Upper Baram Kenyah [ubm] – Any current use is likely either Mainstream Kenyah [xkl] or Uma' Lung [ulu]
- Amerax [aex] – prison jargon
- Garreh-Ajuran [ggh] (Borana & Somali)
- Sufrai [suf] – two languages, Tarpia and Kaptiau, which are not close[25]
Retired 2009
[edit]Retired 2010
[edit]Retired 2011
[edit]- Ayi (China) [ayx]
- Dhanwar (India) [dha]
- Mahei [mja]
Retired 2012
[edit]- Palu [pbz]
- Pongyong [pgy]
- Elpaputih [elp] – could be either of two existing languages
- Wirangu-Nauo [wiw] – the two varieties which do not form a unit[26]
Retired 2013
[edit]- Malakhel [mld] – likely Ormuri
- Forest Maninka [myq] – generic
Retired 2014
[edit]- Gugu Mini [ggm] – a generic name
- Maskoy Pidgin [mhh] – never existed
- Emok [emo] – never existed
- Yugh [yuu] – duplicate of Yug [yug]
- Lamam [lmm] – duplicate of Romam [rmx]
Retired 2015
[edit]- Mator-Taygi-Karagas [ymt] – duplicate of Mator
- Yiddish Sign Language [yds] – no evidence that it existed[27]
- The [thx] – duplicate of Oy
- Imraguen (Mauritania) [ime]
- Borna (Eborna) [bxx] – perhaps a typo for Boma (Eboma)[28]
- Bemba [bmy] – a tribal name
- Songa [sgo] – a tribal name
- Daza [dzd] – retired in 2015 (with the reason "Nonexistent") but that decision was reversed in 2023, bringing [dzd] back[29]
- Buya [byy]
- Kakauhua [kbf] – Kakauhua/Caucahue is an ethnonym, language unattested – see Alacalufan languages
- Subi [xsj] – duplicate of Shubi [suj] but that decision was reversed in 2019, bringing [xsj] back[30]
- Yangho [ynh] – does not exist
- ǂKxʼaoǁʼae ("=/Kx'au//'ein") [aue] – dialect of Juǀʼhoan [ktz][31]
Retired 2016
[edit]- Bhatola [btl]
- Cagua [cbh]
- Chipiajes [cbe] – a Saliba and Guahibo surname
- Coxima [kox]
- Iapama [iap] – uncontacted, and likely one of the neighboring languages
- Kabixí [xbx] – generic name for Parecis, Nambiquaras, or any hostile group (see Cabixi language for one specific use)
- Runa [rna]
- Savara (Dravidian) [svr]
- Xipináwa [xip] – unattested and may not be distinct[32]
- Yarí [yri] – dialect of Carijona[33]
And several supposed extinct Arawakan languages of Venezuela and Colombia:
- Cumeral [cum]
- Omejes [ome]
- Ponares [pod] – a Sáliba surname, perhaps just Piapoco or Achagua[34]
- Tomedes a.k.a. Tamudes [toe]
Additional languages and codes were retired in 2016, due to a lack of evidence that they existed, but were not necessarily spurious as languages.
Retired 2017
[edit]- Lua' [prb][35]
- Rennellese Sign Language [rsi] – a home sign system, not a full language[36]
- Rien [rie][37]
- Shinabo [snh][38]
- Pu Ko [puk] – no substantive evidence that the language ever existed.
Retired 2018
[edit]- Lyons Sign Language [lsg][39] – no substantive evidence that the language ever existed.
- Mediak [mwx][40]
- Mosiro [mwy] – a clan name[41]
Retired 2019
[edit]- Lui [lba][42]
- Khlor [llo] – duplicate of Kriang [ngt][43]
- Mina (India) [myi] – Meena, a tribe and caste name in India[44]
Retired 2020
[edit]- Arma [aoh][45]
- Tayabas Ayta [ayy][46]
- Babalia Creole Arabic [bbz][47]
- Barbacoas [bpb][48]
- Cauca [cca][49]
- Chamari [cdg][50]
- Degaru [dgu][51]
- Eastern Karnic [ekc][52]
- Khalaj [kjf][53]
- Lumbee [lmz][54]
- Palpa [plp][55]
- Tapeba [tbb][56]
Retired 2021
[edit]Retired 2022
[edit]- Warduji [wrd][58]
- Pini [pii][59]
- Judeo-Tunisian Arabic [ajt] – duplicate of Tunisian Arabic [aeb][60]
Retired 2023
[edit]- Tupí [tpw] – duplicate of Tupinamba [tpn][61]
- Karipúna [kgm] – duplicate of Palikur [plu][62]
- Koibal [zkb] – duplicate of Khakas [kjh][63]
- Salchuq [slq][64]
- Parsi [prp][65]
Retired 2024
[edit]- Dek (Cameroon) [dek] – duplicate of Suma [sqm]
Spurious according to Glottolog
[edit]Glottolog, maintained at the Max Planck Institute for Evolutionary Anthropology in Leipzig, classifies several languages, some with ISO 639 codes, as spurious/unattested in addition to those retired by the ISO. These include:
| Language Name | ISO 639-3 | Details |
|---|---|---|
| !Khuai | Duplicate of ǀXam | |
| Adabe | adb | Dialect of Wetarese, taken for a Papuan language |
| Adu | adu | Duplicate of Okpamheri |
| Agaria | agi | all likely candidates in the area already have ISO codes |
| Ahirani | ahr | Khandeshi dialect |
| Anasi | bpo | Misidentification of Nisa |
| Arakwal | rkw | An ethnic group, not a language |
| Baga Kaloum | bqf | Should be subsumed into Koga variant |
| Baga Sobané | bsv | Should be subsumed into Sitemu variant |
| Bainouk-Samik | bcb | Split from Bainouk-Gunyuño due solely to national border |
| Bhalay | bhx | A caste rather than a language |
| Bubia | bbx | |
| Buso | bso | Duplicate of Kwang |
| Chetco | ctc | Indistinguishable from Tolowa |
| Chuanqiandian Cluster Miao | cqd | |
| Con | cno | |
| Gengle | geg | Mutually intelligible with Kugama |
| Gowlan | goj | A caste rather than a language |
| Gowli | gok | A caste, not a language |
| Guajajara | gub | Mutually intelligble with Tenetehara |
| Ihievbe | ihi | Ibviosakan dialect |
| Inku | jat |
SIL named jat entry Jakati, Ethnologue 16 through 28 versions suggest spoken by 29,300 people in Ukraine, but a Ukrainian linguist Aleksej Barannikov contested it as maybe covered by Vlax Romani. A alternative name "Jat" may refer to some (at least two) village-lived dialects in Afghanistan, supported by Aparna Rao and Charles Kieffer, Glottolog currently supports Charles' investigate to name Inku, consider it related with Saraiki. |
| Ir | irr | duplicate of Ong-Ir |
| Judeo-Berber | jbe | According to Glottolog, Jewish Berbers speak no differently than Muslim Berbers. However, there are claims, listed in the linked article, that this is not true. |
| Kang | kyp | |
| Kannada Kurumba | kfi | |
| Katukína | kav | Historical form of modern-day language, not considered distinct |
| Kayort | kyv | Duplicate of Rajbanshi |
| Kisankasa | kqh | |
| Kofa | kso | Duplicate of Bata |
| Kpatili | kpm | Purportedly the original language of the Kpatili people, who now speak Gbayi, but any such language is unattested |
| Kuanhua | xnh | Insufficient attestion; possibly Khmu |
| Kuku-Mangk | xmq | |
| Lama (Myanmar) | lay | Duplicate of Nung |
| Lambichhong | lmh | Yakkha language; name exists due to form errors |
| Lang'e | yne | |
| Laopang | lbg | Undocumented Loloish language |
| Loarki | lrk | Also covered under Gade Lohar (gda) |
| Lopi | lov | Undocumented Loloish language |
| Lumba-Yakkha | luu | Yakkha language; name exists due to form errors |
| Mawa (Nigeria) | wma | listed in Ethnologue but SIL has no evidence it ever existed. |
| Munda | unx | Duplicate of Mundari |
| Ndonde Hamba | njd | Dialect of Makonde language |
| Norra | nrr | Duplicate of Nung |
| Northwestern Fars | faz | all likely candidates in the area already have ISO codes |
| Odut | oda | Extinct and unattested Nigerian language |
| Old Turkish | otk | |
| Ontenu | ont | A place rather than a language |
| Phangduwali | phw | Yakkha language; name exists due to form errors |
| Pisabo | pig | Asserted to be both unattested and non-distinct by Glottolog |
| Pokangá | pok | Spurious misidentification of Waimajã |
| Potiguára | pog | Unattested language, Glottolog argues is likely Old Tupi |
| Puimei Naga | npu | Indistinct variety of one of the related languages |
| Putoh | put | |
| Quetzaltepec Mixe | pxm | |
| Rufiji | rui | |
| Skagit | ska | duplicate of Lushootseed |
| Snohomish | sno | duplicate of Lushootseed |
| Southern Lolopo | ysp | Confused entry duplicating either Lolopo or Miqie |
| Southwestern Nisu | nsv | Likely confused additional Nisu language (spoken in same locations as Southern Nisu) |
| Syerna Senoufo | shz | Should be subsumed into Sìcìté Sénoufo |
| Tawang Monpa | twm | Chinese and Indian name for Dakpakha |
| Tetete | teb | Unattested, but intelligeble with Siona language |
| Thu Lao | tyl | Duplicate of Dai Zhuang |
| Tingui-Boto | tgv | Ethnic group speaking Dzubukuá |
| Welaung | weu | Place name, not a language |
| Yarsun | yrs | |
| Yauma | yax |
References and notes
[edit]- ^ Campbell, Lyle (2012). "Classification of the indigenous languages of South America". In Grondona, Verónica; Campbell, Lyle (eds.). The Indigenous Languages of South America. The World of Linguistics. Vol. 2. Berlin: De Gruyter Mouton. pp. 59–166. ISBN 9783110255133.
- ^ Upper Kaeme may correspond to Korowai.
- ^ Tapeba at Ethnologue (17th ed., 2013)
- ^ Hammarström, Harald; Forkel, Robert; Haspelmath, Martin, eds. (2017). "Tapeba". Glottolog 3.0. Jena, Germany: Max Planck Institute for the Science of Human History.
- ^ "Glottolog 2.4 – Adabe". Glottolog.org. Retrieved 13 July 2015.
- ^ "ISO 639-3 Change History". 01.sil.org. Retrieved 13 July 2015.
- ^ Sidwell, 2009, Classifying the Austroasiatic languages
- ^ "Ethnologue 14 report for language code:JIJ". Ethnologue.com. Retrieved 24 September 2012.
- ^ "Ethnologue 14 report for language code:CKN". Ethnologue.com. Retrieved 24 September 2012.
- ^ "Ethnologue 14 report for language code:LWD". Ethnologue.com. Retrieved 24 September 2012.
- ^ "Ethnologue 14 report for language code:ORB". Ethnologue.com. Retrieved 24 September 2012.
- ^ "Ethnologue 14 report for language code:MUF". Ethnologue.com. Retrieved 24 September 2012.
- ^ "Ethnologue 14 report for language code:NCQ". Ethnologue.com. Retrieved 24 September 2012.
- ^ "Ethnologue 14 report for language code:NKQ". Ethnologue.com. Retrieved 24 September 2012.
- ^ "Ethnologue 14 report for language code:OSO". Ethnologue.com. Retrieved 24 September 2012.
- ^ "Ethnologue 14 report for language code:RUR". Ethnologue.com. Retrieved 24 September 2012.
- ^ "Ethnologue 14 report for language code:WBD". Ethnologue.com. Retrieved 24 September 2012.
- ^ Hurd, Conrad (8 August 2006). "Request Number 2006-016 for Change to ISO 639-3 Language Code" (PDF). SIL International. Retrieved 6 January 2019.
- ^ Hammarström, Harald; Forkel, Robert; Haspelmath, Martin, eds. (2017). "Miarra". Glottolog 3.0. Jena, Germany: Max Planck Institute for the Science of Human History.
- ^ Hurd, Conrad (26 March 2007). "Request Number 2006-122 for Change to ISO 639-3 Language Code" (PDF). SIL International. Retrieved 6 January 2019.
- ^ Hurd, Conrad (21 March 2007). "Request Number 2006-124 for Change to ISO 639-3 Language Code" (PDF). SIL International. Retrieved 6 January 2019.
- ^ Holbrook, David J. (5 April 2007). "Request Number 2007-003 for Change to ISO 639-3 Language Code" (PDF). SIL International. Retrieved 6 January 2019.
- ^ Peebles, Matt (1 September 2007). "Request Number 2007-254 for Change to ISO 639-3 Language Code" (PDF). SIL International. Retrieved 6 January 2019.
- ^ Woodward, Mark (23 May 2007). "Request Number 2007-024 for Change to ISO 639-3 Language Code" (PDF). SIL International. Retrieved 6 January 2019.
- ^ Hurd, Conrad (8 August 2006). "Request Number 2006-016 for Change to ISO 639-3 Language Code" (PDF). SIL International. Retrieved 6 January 2019.
- ^ Legère, Karsten (18 August 2011). "Request Number 2011-133 for Change to ISO 639-3 Language Code" (PDF). SIL International. Retrieved 6 January 2019.
- ^ Bickford, J. Albert (31 January 2014). "Request Number 2014-010 for Change to ISO 639-3 Language Code" (PDF). SIL International. Retrieved 6 January 2019.
- ^ "Request Number 2014-032 for Change to ISO 639-3 Language Code" (PDF). SIL International. 25 July 2014. Retrieved 6 January 2019.
- ^ "639 Identifier Documentation: dzd". SIL International. Retrieved 13 February 2023.
- ^ "639 Identifier Documentation: xsj". SIL International. Retrieved 26 January 2019.
- ^ Dyer, Josh (28 August 2014). "Request Number 2014-059 for Change to ISO 639-3 Language Code" (PDF). SIL International. Retrieved 6 January 2019.
- ^ "Request Number 2015-011 for Change to ISO 639-3 Language Code" (PDF). SIL International. 9 March 2015. Retrieved 6 January 2019.
- ^ "2015-022 | ISO 639-3". iso639-3.sil.org. Archived from the original on 29 January 2022. Retrieved 14 March 2025.
- ^ "Request Number 2015-032 for Change to ISO 639-3 Language Code" (PDF). SIL International. 28 May 2015. Retrieved 6 January 2019.
- ^ Cheeseman, Nate (16 February 2016). "Request Number 2016-010 for Change to ISO 639-3 Language Code" (PDF). SIL International. Retrieved 6 January 2019.
- ^ Bickford, Albert (23 September 2015). "Request Number 2016-002 for Change to ISO 639-3 Language Code" (PDF). SIL International. Retrieved 6 January 2019.
- ^ Cheeseman, Nate (27 October 2015). "Request Number 2016-005 for Change to ISO 639-3 Language Code" (PDF). SIL International. Retrieved 6 January 2019.
- ^ "Request Number 2016-004 for Change to ISO 639-3 Language Code" (PDF). SIL International. 26 October 2015. Retrieved 6 January 2019.
- ^ Bickford, J. Albert (9 March 2017). "Request Number 2017-013 for Change to ISO 639-3 Language Code" (PDF). SIL International. Retrieved 6 January 2019.
- ^ Legère, Karsten (18 May 2017). "Request Number 2017-017 for Change to ISO 639-3 Language Code" (PDF). SIL International. Retrieved 6 January 2019.
- ^ Legère, Karsten (31 August 2016). "Request Number 2016-029 for Change to ISO 639-3 Language Code" (PDF). SIL International. Retrieved 6 January 2019.
- ^ "Request Number 2018-016 for Change to ISO 639-3 Language Code" (PDF). SIL International. 20 August 2018. Retrieved 15 January 2019.
- ^ Gehrmann, Ryan (22 January 2018). "Request Number 2018-008 for Change to ISO 639-3 Language Code" (PDF). SIL International. Retrieved 25 January 2019.
- ^ "Request Number 2018-011 for Change to ISO 639-3 Language Code" (PDF). SIL International. 9 August 2018. Retrieved 25 January 2019.
- ^ "Request Number 2019-017 for Change to ISO 639-3 Language Code" (PDF). SIL International. 1 March 2019. Retrieved 5 February 2020.
- ^ "Request Number 2019-018 for Change to ISO 639-3 Language Code" (PDF). SIL International. 4 March 2019. Retrieved 5 February 2020.
- ^ "Request Number 2019-013 for Change to ISO 639-3 Language Code" (PDF). SIL International. 5 January 2019. Retrieved 5 February 2020.
- ^ "Request Number 2019-019 for Change to ISO 639-3 Language Code" (PDF). SIL International. 4 March 2019. Retrieved 5 February 2020.
- ^ "Request Number 2019-020 for Change to ISO 639-3 Language Code" (PDF). SIL International. 5 March 2019. Retrieved 5 February 2020.
- ^ "Request Number 2019-028 for Change to ISO 639-3 Language Code" (PDF). SIL International. 14 March 2019. Retrieved 5 February 2020.
- ^ "Request Number 2019-029 for Change to ISO 639-3 Language Code" (PDF). SIL International. 18 March 2019. Retrieved 5 February 2020.
- ^ "Request Number 2019-015 for Change to ISO 639-3 Language Code" (PDF). SIL International. 16 February 2019. Retrieved 5 February 2020.
- ^ "Request Number 2019-026 for Change to ISO 639-3 Language Code" (PDF). SIL International. 12 March 2019. Retrieved 5 February 2020.
- ^ "Request Number 2019-025 for Change to ISO 639-3 Language Code" (PDF). SIL International. 7 March 2019. Retrieved 5 February 2020.
- ^ "Request Number 2019-034 for Change to ISO 639-3 Language Code" (PDF). SIL International. 13 March 2019. Retrieved 5 February 2020.
- ^ "Request Number 2019-032 for Change to ISO 639-3 Language Code" (PDF). SIL International. 13 March 2019. Retrieved 5 February 2020.
- ^ "2020-026 | Iso 639-3".
- ^ "Request Number 2021-015 for Change to ISO 639-3 Language Code" (PDF). SIL International. 25 February 2021. Retrieved 4 February 2022.
- ^ "Request Number 2021-021 for Change to ISO 639-3 Language Code" (PDF). SIL International. 26 April 2021. Retrieved 4 February 2022.
- ^ Turki, Houcemeddine (21 April 2021). "Request Number 2021-020 for Change to ISO 639-3 Language Code" (PDF). SIL International. Retrieved 12 July 2023.
- ^ "Request Number 2022-012 for Change to ISO 639-3 Language Code" (PDF). SIL International. 30 June 2022. Retrieved 8 February 2023.
- ^ "Request Number 2022-012 for Change to ISO 639-3 Language Code" (PDF). SIL International. 30 June 2022. Retrieved 8 February 2023.
- ^ "Request Number 2022-011 for Change to ISO 639-3 Language Code" (PDF). SIL International. 30 June 2022. Retrieved 8 February 2023.
- ^ "Request Number 2022-015 for Change to ISO 639-3 Language Code" (PDF). SIL International. 30 June 2022. Retrieved 8 February 2023.
- ^ "Request Number 2022-009 for Change to ISO 639-3 Language Code" (PDF). SIL International. 24 June 2022. Retrieved 8 February 2023.
External links
[edit]Spurious languages
View on GrokipediaOverview
Definition and Scope
Spurious languages refer to linguistic entities that have been documented and reported as existing in reputable sources, such as comprehensive catalogs like Ethnologue, but subsequent research has demonstrated that they do not exist as distinct languages, are fabrications, or are duplicates of other known languages due to insufficient or contradictory evidence. These cases typically arise from errors in data compilation, including the uncritical merging of unverified lists from disparate surveys or the inclusion of "thin" reports about speech communities that cannot be substantiated through fieldwork or reliable documentation. The scope of spurious languages encompasses global linguistic documentation efforts, particularly those conducted in the 20th and 21st centuries, where rapid cataloging of diverse speech varieties worldwide has led to occasional inaccuracies. This phenomenon excludes extinct languages that possess historical attestation through texts or records, focusing instead on modern reports lacking verifiable speakers or structural data. In major catalogs, hundreds of such entries have been identified across editions—for instance, 191 in the 16th edition of Ethnologue (2009), decreasing to 141 by the 18th edition (2015) as refinements were made—highlighting ongoing efforts to refine linguistic inventories. Key to understanding spurious languages is distinguishing them from related categories in linguistic classification. Unlike "unclassified" languages, which feature some attestation but defy affiliation to known families due to limited data, spurious ones are outright disproven as separate entities upon closer scrutiny. Similarly, pidgins or creoles may undergo reclassification if initially misidentified, but they are not deemed spurious if evidence confirms their existence as functional varieties. Initial reports of spurious languages often stem from brief mentions in missionary accounts, colonial administrative surveys, or early ethnographic works, where hearsay or incomplete field notes were entered into catalogs without cross-verification. This underscores the challenges in building exhaustive language inventories amid incomplete global coverage.Historical Context
The emergence of spurious languages as a recognized issue in linguistic documentation began in the early 20th century, amid colonial-era surveys and missionary activities in Africa and the Americas. Missionaries and colonial administrators, often with limited linguistic training, compiled reports on indigenous speech varieties based on hearsay, brief encounters, or misinterpretations of dialects as distinct languages, leading to unverified entries that persisted in later catalogs.[3][4] For instance, in southern Africa during the 1920s to 1950s, missionary linguists created or attributed names to speech forms like Tsonga and Ronga without sufficient evidence of their independence as separate languages, reflecting the era's emphasis on rapid Christianization over rigorous analysis.[4] Similar patterns occurred in the Americas, where colonial documentation instrumentalized indigenous languages for evangelization, resulting in textual records that conflated or invented linguistic distinctions.[3] The development of systematic language catalogs in the mid-20th century amplified these issues by incorporating early unverified reports into broader inventories. Ethnologue, founded in 1951 by Richard S. Pittman under the Summer Institute of Linguistics (SIL) to track Bible translation needs, began as a modest 10-page mimeographed document covering 46 languages but expanded significantly after 1971 to encompass all known world languages, codifying a legacy of potentially spurious entries from prior sources.[5] By the 14th edition in 2000, it listed over 6,800 languages, many drawn from colonial-era missionary accounts without initial verification.[6] The establishment of ISO 639-3 in 2007, developed by SIL and published by the International Organization for Standardization, aimed to standardize three-letter codes for comprehensive language coverage, including mechanisms for change requests to address inaccuracies inherited from earlier compilations like Ethnologue's pre-2005 editions.[7][8] A pivotal shift toward evidence-based verification occurred in the 1990s, as linguistic communities increasingly prioritized bibliographic and empirical standards amid growing awareness of documentation gaps. This era saw preparations for ISO 639 standards evolve, with SIL aligning codes to international norms and emphasizing documented evidence over anecdotal reports, setting the stage for retirements of dubious entries.[9] Glottolog, founded around 2011 by Harald Hammarström and collaborators at the Max Planck Institute for Evolutionary Anthropology, further advanced this by compiling exhaustive bibliographies to assess language existence, classifying entries as spurious when lacking verifiable references.[10] The advent of digital databases from the 2000s onward transformed verification practices, enabling cross-referencing of global sources and systematic retirements. Platforms like ISO 639-3's change request system and Glottolog's linked bibliographic data allowed linguists to evaluate entries against primary sources, identifying "ghost languages" from early 20th-century reports as non-existent or duplicates, thus refining catalogs through collaborative, technology-driven scrutiny.[11][10] This digital infrastructure, including initiatives like Cross-Linguistic Data Formats, facilitated broader access to evidence, reducing the persistence of unverified languages in standardized references.[12] As of the 28th edition in 2025, Ethnologue lists 7,159 living languages, reflecting continued refinements including retirements of spurious entries.[13]Types of Spurious Languages
Fabrications and Hoaxes
Fabrications and hoaxes represent a subset of spurious languages deliberately created and promoted as authentic, often for amusement, satire, fraud, or to expose flaws in linguistic documentation processes. These inventions typically involve fabricated grammars, vocabularies, or speaker communities that mimic natural languages but lack verifiable evidence of natural development or use. Unlike constructed languages such as Esperanto, which are openly artificial auxiliaries, fabrications are presented deceptively to infiltrate scholarly catalogs or narratives. A prominent historical example is the Taensa language, purportedly spoken by a Native American group in 18th-century Louisiana. In 1880, a French seminary student named Jean Parisot published a grammar and vocabulary claiming it derived from Taensa informants, but investigations revealed it as a hoax blending French and Latin elements with no basis in indigenous speech. The fabrication was exposed by anthropologist Daniel G. Brinton in 1885, who demonstrated inconsistencies in the morphology and lack of corroborating evidence from colonial records. Taensa is now recognized as non-existent. Another case is the Kukurá language, allegedly an isolate from Mato Grosso, Brazil, reported in early 20th-century expeditions. It was fabricated by an interpreter accompanying explorer Alberto Vojtěch Frič, who invented words and structures during interactions with Bororo speakers to mislead researchers. Linguistic analysis later identified it as a patchwork of Portuguese and local terms without independent attestation, leading to its classification as a phantom language. The entry persists in some older references but has been debunked in comprehensive surveys of South American indigenous tongues. In more recent times, Europanto exemplifies a satirical fabrication entering formal catalogs. Created by journalist Ken Smith in 1996 as a mock "pan-European" pidgin blending English, French, German, and other languages to critique EU multilingualism policies, it was mistakenly coded as a natural language (eur) in ISO 639-3 drafts. Upon review, the ISO 639-3 Registration Authority retired the code in 2009, citing its non-existent status as a naturally occurring tongue and confirming it as an intentional jest with no native speakers.[14] Motivations for such hoaxes vary: Parisot's Taensa may have stemmed from academic ambition or prankish intent to test scholarly credulity in an era of rapid Native American language documentation, while Frič's interpreter likely sought personal gain or amusement amid colonial exploration pressures. Satirical cases like Europanto aim to highlight bureaucratic absurdities in language standardization. In colonial contexts, fabrications sometimes served to embellish ethnographic reports, inflating the perceived diversity of subjugated regions.[14] Detection typically occurs through rigorous verification: absence of native speakers, inconsistent linguistic features (e.g., unnatural syntax or borrowed elements), and failure to appear in independent fieldwork or archival sources. For instance, Europanto's retirement followed a 2008 change request documenting its constructed nature via Smith's publications, while Taensa's debunking relied on comparative analysis against known Muskogean languages. Modern catalogs like Glottolog and Ethnologue now employ stricter criteria, including community consultations, to prevent such entries.[15]Misidentifications and Duplicates
Misidentifications and duplicates represent a significant category of spurious languages, stemming primarily from errors in distinguishing dialects from independent languages or from redundant entries arising from inconsistent transliterations and incomplete historical records. A common cause is the conflation of dialect clusters with distinct languages, as seen in the case of Land Dayak, where varieties within the Bidayuhic subgroup of Austronesian languages were initially cataloged as a single entity rather than a group of closely related dialects.[1] Duplicates often occur when the same linguistic variety receives multiple codes due to varying orthographic representations in early surveys or cross-border reporting discrepancies.[1] Notable examples illustrate these issues. The ISO 639-3 code "dek" for Dek, reported in Cameroon, was retired in 2024 upon recognition as a duplicate of Suma (code "sqm"), a Gbaya language of the Central African Republic and Cameroon, based on overlapping lexical and ethnographic data.[16] Similarly, Bahau River Kenyah (code "bwv") was retired effective January 14, 2008, after linguistic analysis determined it was not a separate variety but likely encompassed within Mainstream Kenyah (ktn) or Uma' Jalan Kenyah (kjj), with no evidence of distinct usage.[17] These spurious entries typically entered catalogs through initial inclusion reliant on preliminary field reports or unverified secondary sources from the mid-20th century, when documentation was sparse. Subsequent retirements occur via formal ISO 639-3 change requests, informed by comparative lexical studies, dialectometry, or genetic classifications that demonstrate high mutual intelligibility or identity, prompting mergers into established codes.[1] The prevalence of such misidentifications before the 2000s contributed to inflated estimates of global linguistic diversity, with Ethnologue editions from that era listing hundreds of redundant or erroneous entries that overstated the number of mutually unintelligible languages by up to 10-15% in certain regions like Borneo and Central Africa.[1] Rigorous verification protocols introduced in later decades have mitigated these issues, enhancing the accuracy of language inventories.Unattested or Insufficiently Documented
Unattested or insufficiently documented spurious languages are those cataloged in linguistic databases based on initial reports or mentions that lack subsequent verification, often originating from isolated traveler accounts, early ethnographic notes, or outdated surveys without supporting linguistic data or speaker confirmation. These entries highlight the challenges of early language documentation, where hearsay or misreported ethnonyms were sometimes interpreted as distinct languages, only to be retired upon closer scrutiny revealing no evidence of their existence as separate linguistic entities. The retirement process in standards like ISO 639-3 typically occurs when change requests demonstrate the absence of speakers, lexical material, or fieldwork validation, emphasizing the importance of rigorous evidence in language identification. Recent ISO 639-3 updates as of 2025 continue to retire unattested entries through annual change requests.[18] A key characteristic of these languages is their reliance on unconfirmed sources, such as brief mentions in historical records or traveler narratives that fail to provide grammatical, lexical, or sociolinguistic details for corroboration. For example, Dzorgai, listed as a potential Qiangic variety but based on outdated 19th-century surveys, was retired around 2000 due to insufficient documentation and no identifiable speakers or materials. Similarly, Wutana, reported in early Nigerian ethnographies, was removed from Ethnologue in 2000 after surveys found no speakers or evidence, attributing the name to an ethnic group rather than a distinct language. These cases illustrate how initial inclusions in catalogs like Ethnologue propagated unattested entries until systematic reviews exposed their lack of foundation. Verification of such languages poses significant challenges, particularly in remote or historically inaccessible locations where fieldwork is logistically difficult, or among groups that may have become extinct before modern documentation efforts. In the Americas, for instance, Chipiajes was retired from ISO 639-3 in 2016 after investigation revealed it to be a surname among Sáliba and Guahibo speakers rather than a separate language, complicated by the region's vast, under-explored indigenous territories and historical disruptions from colonization. Likewise, Mosiro, initially reported among Kenyan pastoralist communities, was retired in 2018 due to non-existence and insufficient data, with the name traced to a clan rather than a linguistic variety; efforts to locate speakers in arid, mobile populations proved fruitless amid limited archival records. These examples underscore how geographic isolation and cultural shifts exacerbate the difficulty of confirming or refuting early reports.[19] Trends in unattested spurious languages show a higher incidence in understudied regions prior to the 1990s, when systematic linguistic surveys were scarce, such as in Papua New Guinea's diverse highlands or the Amazon basin's expansive riverine systems. Pre-1990s documentation often depended on sporadic expeditions, leading to entries like those in early Ethnologue editions that were later pruned through ISO 639-3's annual reviews. This pattern reflects the evolution of cataloging practices toward greater evidentiary standards, reducing new inclusions of insufficiently documented cases in recent decades.Retirements in Ethnologue and ISO 639-3
Retirement Process and Criteria
The retirement process for language codes in ISO 639-3 and Ethnologue is managed by SIL International, which serves as the registration authority for ISO 639-3 and the publisher of Ethnologue. Change requests, including those for retiring codes due to spurious or non-existent languages, are submitted using a standardized form that requires detailed justification and supporting evidence, such as historical records, linguistic analyses, or fieldwork data demonstrating the absence of distinct linguistic features or speakers.[20] Submissions for ISO 639-3 are accepted annually from September 1 to August 31, after which they are posted publicly for comment from September 15 to December 15, reviewed by a panel of linguists in mid-December, and finalized by January 31 of the following year, with decisions announced by January 31. Retirement occurs if the review confirms no verifiable evidence of the language's existence as a distinct entity, such as lacking lexical similarity thresholds (typically 80-90% for dialects) or sociolinguistic distinctiveness, aligning with ISO 639-3's criteria for individual languages.[20] SIL does not reuse retired codes to maintain stability in global language identification systems.[20] Ethnologue integrates these ISO 639-3 changes into its annual editions, starting from the 15th edition in 2005, with updates based on similar evidence requirements including speaker population data, comparative wordlists, or surveys confirming duplication or fabrication. Entries are retired if they fail to meet these thresholds, often through collaboration with field linguists for validation via targeted research or archival review.[21][5] A notable surge in retirements occurred between 2007 and 2010, following the initial publication of ISO 639-3 in 2007, as standardized reviews addressed legacy entries from earlier Ethnologue versions; over 900 change requests were processed from 2006 to 2012, resulting in numerous retirements for unattested or misidentified languages.[9] Retired codes are marked with specific reasons, such as "non-existence" or "duplicate," and archived to preserve historical context, contributing to a refined global count of approximately 7,100 living languages as of recent updates.[22][23]Pre-2000 Retirements
Early editions of Ethnologue involved initial cleanups of entries based on insufficient documentation or misidentification, leading to the retirement of several spurious languages in the 1990s and 2000.-
1992: Itaem (ISO 639-3: itm, Papua New Guinea) – retired as a fabricated entry with no verifiable speakers or linguistic data.
[](https://www.ethnologue.com/) -
1992: Marajona (ISO 639-3: mpq, Brazil) – retired as a hoax language lacking any attested materials or community.
[](https://www.ethnologue.com/) -
1996: Bibasa (ISO 639-3: bhe, Papua New Guinea) – retired after survey revealed it as an unconfirmed isolate with no evidence of existence.
[](https://www.ethnologue.com/) -
2000: Alak 2 (ISO 639-3: alq, Laos) – retired as a duplicate or mislabeled fragment of the Alak language.
[](https://www.ethnologue.com/) -
2000: Dzorgai (ISO 639-3: dzg, China) – retired as an unattested name for a Tibetan dialect, not a distinct language.
[](https://www.ethnologue.com/) -
2000: Other entries like Hsifan (ISO 639-3: hsi, China) – retired as an ethnic group name rather than a language.
[](https://www.ethnologue.com/)
2000s Retirements
The 2000s saw retirements through the establishment of ISO 639-3 in 2007, focusing on hoax and constructed languages.- 2005: Jiji (ISO 639-3: jij, Cameroon) – retired as a non-existent language confused with a place name.[24]
- 2005: Kalanke (ISO 639-3: ckn, India) – retired as a misidentification of a dialect, with no independent attestation.[25]
-
2007: Miarrã (ISO 639-3: mvr, Brazil) – retired as a fabricated entry from early missionary reports without linguistic evidence.
[](https://www.ethnologue.com/) -
2008: Amikoana (ISO 639-3: amk, Brazil) – retired as an unconfirmed name for an uncontacted group, not a distinct language.
[](https://academic.oup.com/book/57386/chapter/464721551) -
2008: Land Dayak (ISO 639-3: lnd, Indonesia) – retired as a cover term for multiple Dayak dialects, not a single language.
[](https://www.ethnologue.com/) -
2009: Aariya (ISO 639-3: aay, India) – retired as a duplicate of Aari, with no separate data.
[](https://www.ethnologue.com/) -
2009: Europanto (ISO 639-3: eur, Europe) – retired as a constructed auxiliary language or hoax, not a natural language.
[](https://glottolog.org/resource/languoid/id/euro1250) -
2010: Chimakum (ISO 639-3: cmk, United States) – retired as a duplicate of Chemakum [xch].
[](https://www.iana.org/assignments/lang-subtags-templates/cmk.txt)
2010s Retirements
Retirements in the 2010s increased with improved verification processes, targeting misidentifications and insufficiently documented cases.-
2011: Ayi (ISO 639-3: ayi, China) – retired as a name for a Yi dialect, not a separate language.
[](https://www.ethnologue.com/) -
2014: Gugu Mini (ISO 639-3: gug, Australia) – retired as a historical name without modern attestation or speakers.
[](https://www.ethnologue.com/) -
2016: Bhatola (ISO 639-3: bho, India) – retired as a misreported dialect of Bhojpuri.
[](https://www.ethnologue.com/) -
2016: Cagua (ISO 639-3: cag, Papua New Guinea) – retired as an unverified entry from early surveys.
[](https://www.ethnologue.com/) -
2016: Other cases like Papavô (ISO 639-3: ppv, Vanuatu) – retired as a name for uncontacted groups, not a language.
[](https://academic.oup.com/book/57386/chapter/464721551) -
2018: Lyons Sign Language (ISO 639-3: lsg, United Kingdom) – retired as a non-standard sign system, not a full language.
[](https://www.ethnologue.com/) -
2019: Lui (ISO 639-3: lui, Papua New Guinea) – retired as a duplicate of a local dialect.
[](https://www.ethnologue.com/) -
2019: Khlor (ISO 639-3: kht, Vietnam) – retired as an unattested Katuic variety.
[](https://www.ethnologue.com/)
2020s Retirements
Recent retirements reflect ongoing updates to the latest Ethnologue editions, with a focus on duplicates and hoaxes.-
2021: Bikaru (ISO 639-3: bku, Papua New Guinea) – retired as an insufficiently documented entry with no speakers identified.
[](https://www.ethnologue.com/) -
2024: Dek (ISO 639-3: dek, Papua New Guinea) – retired as a duplicate of Suma [sqm], based on reanalysis of data.
[](https://www.ethnologue.com/)
Spurious Languages in Glottolog
Classification Approach
Glottolog is an open-access, comprehensive database that catalogs the world's languages, dialects, and families, assigning unique glottocodes to each languoid for persistent identification and linking to bibliographic references.[10] In its latest edition, Glottolog 5.2.1 (released in 2025), it organizes over 8,000 languages into genealogical classifications based on historical-comparative linguistic research, with a strong emphasis on lesser-known and low-documentation languages.[27] The database classifies a languoid as spurious if it is mentioned in the literature but its existence as a distinct language cannot be verified beyond doubt, such as when it represents a place name, ethnic group, or unproven proposal rather than a genuine linguistic entity.[28] The classification approach relies on a rigorous, evidence-based methodology grounded in the master bibliography, which aggregates thousands of sources including grey literature and requires multiple independent attestations for validation.[29] For inclusion, a proposed language must demonstrate distinctness through non-mutual intelligibility with other varieties, supported by form-meaning pairs (e.g., lexical or grammatical data) from at least 50 basic vocabulary items, and evidence of its use as a primary communication medium in a speech community.[29] Entries lacking such evidence—such as those based on a single, uncorroborated mention or contradicted by subsequent research—are marked spurious and placed in the "Unclassified" pseudo-family to maintain bibliographic completeness without implying validity; for instance, Welaung (glottocode: wela1234) was identified as a spurious entry because it is actually a place name associated with the Nga La language, not a separate entity.[30] Unlike Ethnologue, which prioritizes speaker population data and retires ISO 639-3 codes for unverified languages, Glottolog focuses on bibliographic attestation over demographic metrics, explicitly including dialects and families in its inventory while marking unattested or spurious languoids as such without code retirement.[10] This reference-driven framework ensures transparency, as all classifications link directly to sources, allowing users to evaluate evidence independently.[31] Updates to Glottolog occur periodically through collaborative curation on a public GitHub repository, incorporating new bibliographies and expert revisions, with no major methodological changes reported for 2025 beyond minor data refinements in version 5.2.1.[27]Catalog of Spurious Entries
Glottolog maintains a catalog of spurious languoids, which are entries derived from the linguistic literature but deemed non-existent, misidentified, or insufficiently distinct as separate languages upon further scrutiny. These are classified under the "Bookkeeping" category and include retired ISO 639-3 codes where applicable. The following lists selected spurious entries grouped by macro-area, with glottocodes, associated ISO codes (if retired), and brief evidentiary notes based on Glottolog assessments.[28]Africa
- !Khuai (khua1244): Retired entry based on a misunderstanding of historical records; no evidence of a distinct language, likely conflated with /Xam or other Khoisan varieties.[32]
- Baga Kaloum (baga1271, ISO bqf): Attested in 19th-century colonial reports but identical to Baga Koga; considered extinct as a separate lect due to lack of differentiation.[33]
- Baga Sobané (baga1274, ISO bsv): Retired as a distinct Baga dialect; shares vocabulary and structure with Baga Sitemu, stemming from outdated ethnolinguistic classifications.[34]
- Ngombe (ngom1265, ISO nmj): Spurious as a separate language; refers to a Pygmy clan name rather than a linguistic entity, with no independent attestation.[35]
- Oropom (orop1234): Unattested and likely fabricated; based on unverified 20th-century reports of an extinct Ugandan language with no surviving data or descendants.[36]
- Gengle (geng1243, ISO geg): Bookkeeping entry for a purported Adamawa language; unverified and likely a mishearing or variant name for nearby lects like Kugama.[37]
Asia-Pacific
- Agariya (agar1251, ISO agi): Spurious Munda language entry; conflates caste names, place names, and Hindi varieties from colonial sources, with no unique linguistic features.[38]
- Ahirani (ahir1243, ISO ahr): Retired as a separate Indo-Aryan language; actually a dialect of Khandeshi, misidentified in early 20th-century surveys due to regional naming variations.[39]
- Welaung (wela1234): Place name mistaken for a Chin language; actually refers to a location within the Matu Chin area, with no independent lexical or grammatical evidence.[30]
Americas
- Arakwal (arak1254, ISO rkw): Not a distinct Pama-Nyungan language; one of multiple names for Bandjalang varieties in southeastern Australia, based on ethnic rather than linguistic separation.[40]
- Chetco (chet1237): Spurious Oregon Athabaskan entry; merged into Tolowa-Chetco as linguistically indistinguishable from Tolowa, as confirmed by modern revitalization efforts and comparative analysis.[41]
- Pisabo (pisa1244, ISO pig): Unattested Pano-Tacanan language; mentioned in early 20th-century sources but lacks any documentation, likely an unverified indigenous report or error.[42]
Other Regions
- Judeo-Berber (jude1262, ISO jbe): Bookkeeping entry for a purported Berber-Jewish lect; no distinct variety exists, as Berber-speaking Jews used regional dialects without unique innovations.[43]
- Old Turkish (oldt1247): Retired historical Turkic entry; redundant with established Old Turkic classifications, stemming from inconsistent terminological use in early comparative studies.[44]
- Tawang Monpa (tawa1289, ISO twm): Retired as a separate Sino-Tibetan language; now reclassified under Dakpa (Takpa), based on shared vocabulary and phonology from Arunachal Pradesh field data.[45]
