Hubbry Logo
Spurious languagesSpurious languagesMain
Open search
Spurious languages
Community hub
Spurious languages
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Spurious languages
Spurious languages
from Wikipedia

Spurious languages are languages that have been reported as existing in reputable works, while other research has reported that the language in question did not exist. Some spurious languages have been proven to not exist. Others have very little evidence supporting their existence, and have been dismissed in later scholarship. Others still are of uncertain existence due to limited research.

Below is a sampling of languages that have been claimed to exist in reputable sources but have subsequently been disproved or challenged. In some cases a purported language is tracked down and turns out to be another, known language. This is common when language varieties are named after places or ethnicities.

Some alleged languages turn out to be hoaxes, such as the Kukurá language of Brazil or the Taensa language of Louisiana. Others are honest errors that persist in the literature despite being corrected by the original authors; an example of this is Hongote, the name given in 1892 to two Colonial word lists, one of Tlingit and one of a Salishan language, that were mistakenly listed as Patagonian. The error was corrected three times that year, but nonetheless "Hongote" was still listed as a Patagonian language a century later in Greenberg (1987).[1]: 133 

In the case of New Guinea, one of the most linguistically diverse areas on Earth, some spurious languages are simply the names of language surveys that the data was published under. Examples are Mapi, Kia, Upper Digul, Upper Kaeme, listed as Indo-Pacific languages in Ruhlen 1987; these are actually rivers that gave their names to language surveys in the Greater Awyu languages and Ok languages of New Guinea.[2]

Dubious languages

[edit]

Dubious languages are those whose existence is uncertain. They include:

Spurious according to Ethnologue and ISO 639-3

[edit]

Following is a list of ISO 639-3 language codes which have been retired since the standard was established in 2006, arranged by the year in which the actual retirement took effect; in most cases the change request for retirement was submitted in the preceding year. Also included is a partial list of languages (with their SIL codes) that appeared at one time in Ethnologue but were removed prior to 2006, arranged by the first edition in which they did not appear.

The list includes codes that have been retired from ISO 639-3 or languages removed from Ethnologue because the language apparently does not exist and cannot be identified with an existing language. The list does not include instances where the "language" turns out to be a spelling variant of another language or the name of a village where an already known language is spoken; these are cases of duplicates, which are resolved in ISO 639-3 by a code merger. It does include "languages" for which there is no evidence or which cannot be found. (In some cases, however, the evidence for nonexistence is a survey among the current population of the area, which would not identify extinct languages such as Ware below.)

SIL codes are upper case; ISO codes are lower case. Once retired, ISO 639-3 codes are not reused.[6] SIL codes that were retired prior to 2006 may have been re-used or may have reappeared as ISO codes for other languages.

Removed from Ethnologue, 12th ed., 1992

[edit]
  • Itaem (PNG) [ITM]
  • Marajona (Brazil) [MPQ]
  • Nemeyam (PNG) [NMY]
  • Nereyama, Nereyó (Brazil) [NRY]
  • Numbiaí (Orelha de Pau) [NUH]
  • Oganibi (PNG) [OGA]
  • Tijuana Sign Language (Mexico) [TJS] – added to Ethnologue 1988 by mistake due to a misunderstanding, removed in 1992. No evidence that it ever existed.
  • Tyeliri Senoufo [TYE] – the Tyeliri are a caste of leather workers, and do not have their own language
  • Wagumi [WGM]
  • Zanofil [ZNF] – name of an ethnic group that speaks Yongkom [yon]

Removed from Ethnologue, 13th ed., 1996

[edit]
  • Bibasa (PNG) [BHE] – described as "isolate in need of survey" in the 12th ed.

Removed from Ethnologue, 14th ed., 2000

[edit]
  • Alak 2 [ALQ] – a mislabeled fragment of a word list[7]
  • Dzorgai [DZI], Kortse [KBG], Pingfang [PFG], Thochu [TCJ], Lofuchai (Lophuchai) [LFU], Wagsod [WGS] – old names for Qiangic languages, some of uncertain correspondence to currently recognized names
  • Hsifan [HSI] – an ethnic name for people speaking a variety of Qiangic or Jiarongic languages
  • Scandinavian Pidgin Sign Language [SPF] – normal inter-language contact, not an established pidgin
  • Wutana (Nigeria) [WUW] – an ethnic name

Removed from Ethnologue, 15th ed., 2005

[edit]
  • Jiji [JIJ][8]
  • Kalanke [CKN][9]
  • Lewada-Dewara [LWD], incl. Balamula/Mataru[10]
  • Lowland Semang [ORB][11] (though other languages without ISO codes, such as Wila', are also called Lowland Semang)
  • Mutús [MUF][12] – suspected to exist, e.g. by Adelaar 2005
  • Nchinchege [NCQ][13]
  • Nkwak [NKQ][14] – same as Tanjijili? Also a possible synonym for Kwak (retired in 2015)
  • Oso (Southern Fungom) [OSO] – no evidence it is distinct from Fungom and Bum[15]
  • Rungi [RUR][16]
  • Wamsak [WBD][17]

Retired 2007

[edit]
  • Miarrã [xmi] – unattested[18][19]
  • Atuence [atf] – an old town name,[20] likely referring to Dêqên
  • Amapá Creole [amd][21]

Retired 2008

[edit]
  • Amikoana (Amikuân) [akn][22]
  • Land Dayak [dyk] – language family name, not individual language[23]
  • Ware [wre][24] – Ware is listed as extinct in Maho (2009). When an SIL team in Tanzania were not able to find any evidence of it being spoken, the code was retired.
  • Bahau River Kenyah [bwv], Kayan River Kenyah [knh], Mahakam Kenyah [xkm], Upper Baram Kenyah [ubm] – Any current use is likely either Mainstream Kenyah [xkl] or Uma' Lung [ulu]
  • Amerax [aex] – prison jargon
  • Garreh-Ajuran [ggh] (Borana & Somali)
  • Sufrai [suf] – two languages, Tarpia and Kaptiau, which are not close[25]

Retired 2009

[edit]
  • Aariya [aay]
  • Papavô [ppv] – name given to several uncontacted groups
  • Europanto [eur] – a jest

Retired 2010

[edit]
  • Chimakum [cmk] – duplicate of Chemakum [xch]
  • Beti (Cameroon) [btb] – a group name

Retired 2011

[edit]
  • Ayi (China) [ayx]
  • Dhanwar (India) [dha]
  • Mahei [mja]

Retired 2012

[edit]

Retired 2013

[edit]

Retired 2014

[edit]
  • Gugu Mini [ggm] – a generic name
  • Maskoy Pidgin [mhh] – never existed
  • Emok [emo] – never existed
  • Yugh [yuu] – duplicate of Yug [yug]
  • Lamam [lmm] – duplicate of Romam [rmx]

Retired 2015

[edit]

Retired 2016

[edit]
  • Bhatola [btl]
  • Cagua [cbh]
  • Chipiajes [cbe] – a Saliba and Guahibo surname
  • Coxima [kox]
  • Iapama [iap] – uncontacted, and likely one of the neighboring languages
  • Kabixí [xbx] – generic name for Parecis, Nambiquaras, or any hostile group (see Cabixi language for one specific use)
  • Runa [rna]
  • Savara (Dravidian) [svr]
  • Xipináwa [xip] – unattested and may not be distinct[32]
  • Yarí [yri] – dialect of Carijona[33]

And several supposed extinct Arawakan languages of Venezuela and Colombia:

  • Cumeral [cum]
  • Omejes [ome]
  • Ponares [pod] – a Sáliba surname, perhaps just Piapoco or Achagua[34]
  • Tomedes a.k.a. Tamudes [toe]

Additional languages and codes were retired in 2016, due to a lack of evidence that they existed, but were not necessarily spurious as languages.

Retired 2017

[edit]

Retired 2018

[edit]
  • Lyons Sign Language [lsg][39] – no substantive evidence that the language ever existed.
  • Mediak [mwx][40]
  • Mosiro [mwy] – a clan name[41]

Retired 2019

[edit]
  • Lui [lba][42]
  • Khlor [llo] – duplicate of Kriang [ngt][43]
  • Mina (India) [myi] – Meena, a tribe and caste name in India[44]

Retired 2020

[edit]

Retired 2021

[edit]
  • Bikaru [bic] – posited based on a poor elicitation of ordinary Bisorio[57]

Retired 2022

[edit]

Retired 2023

[edit]

Retired 2024

[edit]
  • Dek (Cameroon) [dek] – duplicate of Suma [sqm]

Spurious according to Glottolog

[edit]

Glottolog, maintained at the Max Planck Institute for Evolutionary Anthropology in Leipzig, classifies several languages, some with ISO 639 codes, as spurious/unattested in addition to those retired by the ISO. These include:

Language Name ISO 639-3 Details
!Khuai Duplicate of ǀXam
Adabe adb Dialect of Wetarese, taken for a Papuan language
Adu adu Duplicate of Okpamheri
Agaria agi all likely candidates in the area already have ISO codes
Ahirani ahr Khandeshi dialect
Anasi bpo Misidentification of Nisa
Arakwal rkw An ethnic group, not a language
Baga Kaloum bqf Should be subsumed into Koga variant
Baga Sobané bsv Should be subsumed into Sitemu variant
Bainouk-Samik bcb Split from Bainouk-Gunyuño due solely to national border
Bhalay bhx A caste rather than a language
Bubia bbx
Buso bso Duplicate of Kwang
Chetco ctc Indistinguishable from Tolowa
Chuanqiandian Cluster Miao cqd
Con cno
Gengle geg Mutually intelligible with Kugama
Gowlan goj A caste rather than a language
Gowli gok A caste, not a language
Guajajara gub Mutually intelligble with Tenetehara
Ihievbe ihi Ibviosakan dialect
Inku jat

SIL named jat entry Jakati, Ethnologue 16 through 28 versions suggest spoken by 29,300 people in Ukraine, but a Ukrainian linguist Aleksej Barannikov contested it as maybe covered by Vlax Romani. A alternative name "Jat" may refer to some (at least two) village-lived dialects in Afghanistan, supported by Aparna Rao and Charles Kieffer, Glottolog currently supports Charles' investigate to name Inku, consider it related with Saraiki.

Ir irr duplicate of Ong-Ir
Judeo-Berber jbe According to Glottolog, Jewish Berbers speak no differently than Muslim Berbers. However, there are claims, listed in the linked article, that this is not true.
Kang kyp
Kannada Kurumba kfi
Katukína kav Historical form of modern-day language, not considered distinct
Kayort kyv Duplicate of Rajbanshi
Kisankasa kqh
Kofa kso Duplicate of Bata
Kpatili kpm Purportedly the original language of the Kpatili people, who now speak Gbayi, but any such language is unattested
Kuanhua xnh Insufficient attestion; possibly Khmu
Kuku-Mangk xmq
Lama (Myanmar) lay Duplicate of Nung
Lambichhong lmh Yakkha language; name exists due to form errors
Lang'e yne
Laopang lbg Undocumented Loloish language
Loarki lrk Also covered under Gade Lohar (gda)
Lopi lov Undocumented Loloish language
Lumba-Yakkha luu Yakkha language; name exists due to form errors
Mawa (Nigeria) wma listed in Ethnologue but SIL has no evidence it ever existed.
Munda unx Duplicate of Mundari
Ndonde Hamba njd Dialect of Makonde language
Norra nrr Duplicate of Nung
Northwestern Fars faz all likely candidates in the area already have ISO codes
Odut oda Extinct and unattested Nigerian language
Old Turkish otk
Ontenu ont A place rather than a language
Phangduwali phw Yakkha language; name exists due to form errors
Pisabo pig Asserted to be both unattested and non-distinct by Glottolog
Pokangá pok Spurious misidentification of Waimajã
Potiguára pog Unattested language, Glottolog argues is likely Old Tupi
Puimei Naga npu Indistinct variety of one of the related languages
Putoh put
Quetzaltepec Mixe pxm
Rufiji rui
Skagit ska duplicate of Lushootseed
Snohomish sno duplicate of Lushootseed
Southern Lolopo ysp Confused entry duplicating either Lolopo or Miqie
Southwestern Nisu nsv Likely confused additional Nisu language (spoken in same locations as Southern Nisu)
Syerna Senoufo shz Should be subsumed into Sìcìté Sénoufo
Tawang Monpa twm Chinese and Indian name for Dakpakha
Tetete teb Unattested, but intelligeble with Siona language
Thu Lao tyl Duplicate of Dai Zhuang
Tingui-Boto tgv Ethnic group speaking Dzubukuá
Welaung weu Place name, not a language
Yarsun yrs
Yauma yax

References and notes

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Spurious languages are entries in linguistic catalogs, such as the , that purport to represent distinct languages but have been determined through further research to be duplicates of existing languages, misclassified dialects, or entirely unverified reports lacking sufficient evidence of existence. These non-existent or erroneous listings often arise from historical misinterpretations in ethnographic surveys, reliance on second-hand data, or confusion over ethnic names and border-crossing varieties, leading to inflated counts of global linguistic diversity. In the 16th edition of the (2009), 191 such spurious languages were identified across macroareas, decreasing to 168 in the 17th edition (2013–2014) and 141 in the 18th edition (2015) as entries were reviewed, merged, or retired based on bibliographical and intelligibility studies. Notable examples include Ngombe [nmj] in , which duplicates Bangandu [bgf], and Beti [btb], an overbroad entry encompassing distinct languages like Eton [eto], Ewondo [ewo], Fang [fak], and Mengisa [mct]; in the , Tapeba [tbb] in represents a spurious ethnic group name without a unique language; in , Desiya [dso] in is a misreported variety; and in the Pacific, Laura [lur] lacks confirmation as separate from related tongues. The identification and retirement of these languages highlight ongoing efforts in descriptive linguistics to refine inventories, with organizations like SIL International and the committee playing key roles in deprecating unverified codes through evidence-based updates. This process underscores the challenges of documenting endangered or poorly attested languages, particularly in regions like and the where colonial-era records often introduced errors.

Overview

Definition and Scope

Spurious languages refer to linguistic entities that have been documented and reported as existing in reputable sources, such as comprehensive catalogs like , but subsequent research has demonstrated that they do not exist as distinct languages, are fabrications, or are duplicates of other known languages due to insufficient or contradictory evidence. These cases typically arise from errors in data compilation, including the uncritical merging of unverified lists from disparate surveys or the inclusion of "thin" reports about speech communities that cannot be substantiated through fieldwork or reliable . The scope of spurious languages encompasses global linguistic documentation efforts, particularly those conducted in the 20th and 21st centuries, where rapid cataloging of diverse speech varieties worldwide has led to occasional inaccuracies. This phenomenon excludes extinct languages that possess historical attestation through texts or records, focusing instead on modern reports lacking verifiable speakers or structural data. In major catalogs, hundreds of such entries have been identified across editions—for instance, 191 in the 16th edition of (2009), decreasing to 141 by the 18th edition (2015) as refinements were made—highlighting ongoing efforts to refine linguistic inventories. Key to understanding spurious languages is distinguishing them from related categories in linguistic classification. Unlike "unclassified" languages, which feature some attestation but defy affiliation to known families due to limited data, spurious ones are outright disproven as separate entities upon closer scrutiny. Similarly, pidgins or creoles may undergo reclassification if initially misidentified, but they are not deemed spurious if evidence confirms their existence as functional varieties. Initial reports of spurious languages often stem from brief mentions in accounts, colonial administrative surveys, or early ethnographic works, where or incomplete field notes were entered into catalogs without cross-verification. This underscores the challenges in building exhaustive language inventories amid incomplete global coverage.

Historical Context

The emergence of spurious languages as a recognized issue in linguistic documentation began in the early 20th century, amid colonial-era surveys and missionary activities in Africa and the Americas. Missionaries and colonial administrators, often with limited linguistic training, compiled reports on indigenous speech varieties based on hearsay, brief encounters, or misinterpretations of dialects as distinct languages, leading to unverified entries that persisted in later catalogs. For instance, in southern Africa during the 1920s to 1950s, missionary linguists created or attributed names to speech forms like Tsonga and Ronga without sufficient evidence of their independence as separate languages, reflecting the era's emphasis on rapid Christianization over rigorous analysis. Similar patterns occurred in the Americas, where colonial documentation instrumentalized indigenous languages for evangelization, resulting in textual records that conflated or invented linguistic distinctions. The development of systematic language catalogs in the mid-20th century amplified these issues by incorporating early unverified reports into broader inventories. , founded in 1951 by Richard S. Pittman under the Summer Institute of Linguistics (SIL) to track translation needs, began as a modest 10-page mimeographed document covering 46 languages but expanded significantly after to encompass all known world languages, codifying a legacy of potentially spurious entries from prior sources. By the 14th edition in 2000, it listed over 6,800 languages, many drawn from colonial-era missionary accounts without initial verification. The establishment of in 2007, developed by SIL and published by the , aimed to standardize three-letter codes for comprehensive language coverage, including mechanisms for change requests to address inaccuracies inherited from earlier compilations like Ethnologue's pre-2005 editions. A pivotal shift toward evidence-based verification occurred in the 1990s, as linguistic communities increasingly prioritized bibliographic and empirical standards amid growing awareness of documentation gaps. This era saw preparations for ISO 639 standards evolve, with SIL aligning codes to international norms and emphasizing documented evidence over anecdotal reports, setting the stage for retirements of dubious entries. , founded around 2011 by Harald Hammarström and collaborators at the Institute for , further advanced this by compiling exhaustive bibliographies to assess language existence, classifying entries as spurious when lacking verifiable references. The advent of digital databases from the 2000s onward transformed verification practices, enabling cross-referencing of global sources and systematic retirements. Platforms like ISO 639-3's change request system and Glottolog's linked bibliographic data allowed linguists to evaluate entries against primary sources, identifying "ghost languages" from early 20th-century reports as non-existent or duplicates, thus refining catalogs through collaborative, technology-driven scrutiny. This digital infrastructure, including initiatives like Cross-Linguistic Data Formats, facilitated broader access to evidence, reducing the persistence of unverified languages in standardized references. As of the 28th edition in 2025, lists 7,159 living languages, reflecting continued refinements including retirements of spurious entries.

Types of Spurious Languages

Fabrications and Hoaxes

Fabrications and hoaxes represent a subset of spurious languages deliberately created and promoted as authentic, often for amusement, , , or to expose flaws in linguistic documentation processes. These inventions typically involve fabricated grammars, vocabularies, or speaker communities that mimic natural languages but lack verifiable evidence of natural development or use. Unlike constructed languages such as , which are openly artificial , fabrications are presented deceptively to infiltrate scholarly catalogs or narratives. A prominent historical example is the Taensa language, purportedly spoken by a Native American group in 18th-century . In 1880, a French seminary student named Jean Parisot published a and claiming it derived from Taensa informants, but investigations revealed it as a hoax blending French and Latin elements with no basis in indigenous speech. The fabrication was exposed by anthropologist Daniel G. Brinton in 1885, who demonstrated inconsistencies in the morphology and lack of corroborating evidence from colonial records. Taensa is now recognized as non-existent. Another case is the Kukurá language, allegedly an isolate from , , reported in early 20th-century expeditions. It was fabricated by an interpreter accompanying explorer Alberto Vojtěch Frič, who invented words and structures during interactions with speakers to mislead researchers. Linguistic analysis later identified it as a patchwork of and local terms without independent attestation, leading to its classification as a phantom language. The entry persists in some older references but has been debunked in comprehensive surveys of South American indigenous tongues. In more recent times, Europanto exemplifies a satirical fabrication entering formal catalogs. Created by journalist Ken Smith in 1996 as a mock "pan-European" pidgin blending English, French, German, and other languages to critique EU multilingualism policies, it was mistakenly coded as a natural language (eur) in ISO 639-3 drafts. Upon review, the ISO 639-3 Registration Authority retired the code in 2009, citing its non-existent status as a naturally occurring tongue and confirming it as an intentional jest with no native speakers. Motivations for such hoaxes vary: Parisot's Taensa may have stemmed from academic ambition or prankish intent to test scholarly credulity in an era of rapid Native American language documentation, while Frič's interpreter likely sought personal gain or amusement amid colonial exploration pressures. Satirical cases like Europanto aim to highlight bureaucratic absurdities in language standardization. In colonial contexts, fabrications sometimes served to embellish ethnographic reports, inflating the perceived diversity of subjugated regions. Detection typically occurs through rigorous verification: absence of native speakers, inconsistent linguistic features (e.g., unnatural or borrowed elements), and in independent fieldwork or archival sources. For instance, Europanto's retirement followed a change request documenting its constructed nature via Smith's publications, while Taensa's debunking relied on comparative analysis against known . Modern catalogs like and now employ stricter criteria, including community consultations, to prevent such entries.

Misidentifications and Duplicates

Misidentifications and duplicates represent a significant category of spurious languages, stemming primarily from errors in distinguishing dialects from independent languages or from redundant entries arising from inconsistent transliterations and incomplete historical records. A common cause is the of dialect clusters with distinct languages, as seen in the case of Land Dayak, where varieties within the Bidayuhic of Austronesian languages were initially cataloged as a single entity rather than a group of closely related . Duplicates often occur when the same linguistic variety receives multiple codes due to varying orthographic representations in early surveys or cross-border reporting discrepancies. Notable examples illustrate these issues. The code "dek" for Dek, reported in , was retired in 2024 upon recognition as a duplicate of Suma (code "sqm"), a Gbaya language of the and , based on overlapping lexical and ethnographic data. Similarly, Bahau River Kenyah (code "bwv") was retired effective January 14, 2008, after linguistic analysis determined it was not a separate variety but likely encompassed within Mainstream Kenyah (ktn) or Uma' Jalan Kenyah (kjj), with no evidence of distinct usage. These spurious entries typically entered catalogs through initial inclusion reliant on preliminary field reports or unverified secondary sources from the mid-20th century, when documentation was sparse. Subsequent retirements occur via formal change requests, informed by comparative lexical studies, dialectometry, or genetic classifications that demonstrate high or identity, prompting mergers into established codes. The prevalence of such misidentifications before the contributed to inflated estimates of global linguistic diversity, with editions from that era listing hundreds of redundant or erroneous entries that overstated the number of mutually unintelligible languages by up to 10-15% in certain regions like and . Rigorous verification protocols introduced in later decades have mitigated these issues, enhancing the accuracy of language inventories.

Unattested or Insufficiently Documented

Unattested or insufficiently documented spurious languages are those cataloged in linguistic databases based on initial reports or mentions that lack subsequent verification, often originating from isolated traveler accounts, early ethnographic notes, or outdated surveys without supporting linguistic or speaker confirmation. These entries highlight the challenges of early , where or misreported ethnonyms were sometimes interpreted as distinct languages, only to be retired upon closer revealing no evidence of their existence as separate linguistic entities. The retirement process in standards like typically occurs when change requests demonstrate the absence of speakers, lexical material, or fieldwork validation, emphasizing the importance of rigorous evidence in . Recent updates as of continue to retire unattested entries through annual change requests. A key characteristic of these languages is their reliance on unconfirmed sources, such as brief mentions in historical records or traveler narratives that fail to provide grammatical, lexical, or sociolinguistic details for corroboration. For example, Dzorgai, listed as a potential Qiangic variety but based on outdated 19th-century surveys, was retired around 2000 due to insufficient documentation and no identifiable speakers or materials. Similarly, Wutana, reported in early Nigerian ethnographies, was removed from in 2000 after surveys found no speakers or evidence, attributing the name to an ethnic group rather than a distinct . These cases illustrate how initial inclusions in catalogs like propagated unattested entries until systematic reviews exposed their lack of foundation. Verification of such languages poses significant challenges, particularly in remote or historically inaccessible locations where fieldwork is logistically difficult, or among groups that may have become extinct before modern documentation efforts. In the Americas, for instance, Chipiajes was retired from ISO 639-3 in 2016 after investigation revealed it to be a surname among Sáliba and Guahibo speakers rather than a separate language, complicated by the region's vast, under-explored indigenous territories and historical disruptions from colonization. Likewise, Mosiro, initially reported among Kenyan pastoralist communities, was retired in 2018 due to non-existence and insufficient data, with the name traced to a clan rather than a linguistic variety; efforts to locate speakers in arid, mobile populations proved fruitless amid limited archival records. These examples underscore how geographic isolation and cultural shifts exacerbate the difficulty of confirming or refuting early reports. Trends in unattested spurious languages show a higher incidence in understudied regions prior to the 1990s, when systematic linguistic surveys were scarce, such as in Papua New Guinea's diverse highlands or the basin's expansive riverine systems. Pre-1990s documentation often depended on sporadic expeditions, leading to entries like those in early editions that were later pruned through ISO 639-3's annual reviews. This pattern reflects the evolution of cataloging practices toward greater evidentiary standards, reducing new inclusions of insufficiently documented cases in recent decades.

Retirements in Ethnologue and ISO 639-3

Retirement Process and Criteria

The retirement process for language codes in ISO 639-3 and Ethnologue is managed by SIL International, which serves as the registration authority for ISO 639-3 and the publisher of Ethnologue. Change requests, including those for retiring codes due to spurious or non-existent languages, are submitted using a standardized form that requires detailed justification and supporting evidence, such as historical records, linguistic analyses, or fieldwork data demonstrating the absence of distinct linguistic features or speakers. Submissions for are accepted annually from September 1 to August 31, after which they are posted publicly for comment from September 15 to December 15, reviewed by a panel of linguists in mid-December, and finalized by January 31 of the following year, with decisions announced by January 31. occurs if the review confirms no verifiable evidence of the language's existence as a distinct entity, such as lacking thresholds (typically 80-90% for dialects) or sociolinguistic distinctiveness, aligning with 's criteria for individual languages. SIL does not reuse retired codes to maintain stability in global systems. Ethnologue integrates these ISO 639-3 changes into its annual editions, starting from the 15th edition in 2005, with updates based on similar evidence requirements including speaker population data, comparative wordlists, or surveys confirming duplication or fabrication. Entries are retired if they fail to meet these thresholds, often through collaboration with field linguists for validation via targeted research or archival review. A notable surge in retirements occurred between 2007 and 2010, following the initial publication of in 2007, as standardized reviews addressed legacy entries from earlier versions; over 900 change requests were processed from 2006 to 2012, resulting in numerous retirements for unattested or misidentified languages. Retired codes are marked with specific reasons, such as "non-existence" or "duplicate," and archived to preserve , contributing to a refined global count of approximately 7,100 living languages as of recent updates.

Pre-2000 Retirements

Early editions of involved initial cleanups of entries based on insufficient documentation or misidentification, leading to the retirement of several spurious languages in the 1990s and 2000.
  • 1992: Itaem (ISO 639-3: itm, ) – retired as a fabricated entry with no verifiable speakers or linguistic data. [](https://www.ethnologue.com/)
  • 1992: Marajona (ISO 639-3: mpq, ) – retired as a language lacking any attested materials or community. [](https://www.ethnologue.com/)
  • 1996: Bibasa (ISO 639-3: bhe, ) – retired after survey revealed it as an unconfirmed isolate with no evidence of existence. [](https://www.ethnologue.com/)
  • 2000: Alak 2 (ISO 639-3: alq, ) – retired as a duplicate or mislabeled fragment of the Alak language. [](https://www.ethnologue.com/)
  • 2000: Dzorgai (ISO 639-3: dzg, ) – retired as an unattested name for a Tibetan dialect, not a distinct language. [](https://www.ethnologue.com/)
  • 2000: Other entries like Hsifan (ISO 639-3: hsi, ) – retired as an ethnic group name rather than a language. [](https://www.ethnologue.com/)

2000s Retirements

The 2000s saw retirements through the establishment of in 2007, focusing on hoax and constructed languages.
  • 2005: (ISO 639-3: jij, ) – retired as a non-existent language confused with a place name.
  • 2005: Kalanke (ISO 639-3: ckn, ) – retired as a misidentification of a , with no independent attestation.
  • 2007: Miarrã (ISO 639-3: mvr, ) – retired as a fabricated entry from early missionary reports without linguistic evidence. [](https://www.ethnologue.com/)
  • 2008: Amikoana (ISO 639-3: amk, ) – retired as an unconfirmed name for an uncontacted group, not a distinct language. [](https://academic.oup.com/book/57386/chapter/464721551)
  • 2008: Land Dayak (ISO 639-3: lnd, ) – retired as a cover term for multiple Dayak dialects, not a single language. [](https://www.ethnologue.com/)
  • 2009: Aariya (ISO 639-3: aay, ) – retired as a duplicate of Aari, with no separate data. [](https://www.ethnologue.com/)
  • 2009: Europanto (ISO 639-3: eur, ) – retired as a constructed auxiliary language or , not a . [](https://glottolog.org/resource/languoid/id/euro1250)
  • 2010: Chimakum (ISO 639-3: cmk, ) – retired as a duplicate of Chemakum [xch]. [](https://www.iana.org/assignments/lang-subtags-templates/cmk.txt)

2010s Retirements

Retirements in the 2010s increased with improved verification processes, targeting misidentifications and insufficiently documented cases.
  • 2011: Ayi (ISO 639-3: ayi, ) – retired as a name for a Yi dialect, not a separate . [](https://www.ethnologue.com/)
  • 2014: Gugu Mini (ISO 639-3: gug, ) – retired as a historical name without modern attestation or speakers. [](https://www.ethnologue.com/)
  • 2016: Bhatola (ISO 639-3: bho, ) – retired as a misreported of Bhojpuri. [](https://www.ethnologue.com/)
  • 2016: Cagua (ISO 639-3: cag, ) – retired as an unverified entry from early surveys. [](https://www.ethnologue.com/)
  • 2016: Other cases like Papavô (ISO 639-3: ppv, ) – retired as a name for uncontacted groups, not a . [](https://academic.oup.com/book/57386/chapter/464721551)
  • 2018: Lyons Sign Language (ISO 639-3: lsg, ) – retired as a non-standard , not a full . [](https://www.ethnologue.com/)
  • 2019: Lui (ISO 639-3: lui, ) – retired as a duplicate of a local . [](https://www.ethnologue.com/)
  • 2019: Khlor (ISO 639-3: kht, ) – retired as an unattested Katuic variety. [](https://www.ethnologue.com/)

2020s Retirements

Recent retirements reflect ongoing updates to the latest Ethnologue editions, with a focus on duplicates and hoaxes.
  • 2021: Bikaru (ISO 639-3: bku, Papua New Guinea) – retired as an insufficiently documented entry with no speakers identified. [](https://www.ethnologue.com/)
  • 2024: Dek (ISO 639-3: dek, Papua New Guinea) – retired as a duplicate of Suma [sqm], based on reanalysis of data. [](https://www.ethnologue.com/)
In the 28th edition of Ethnologue (2025), four additional entries were retired as spurious languages: two as non-existent and two as duplicates.

Spurious Languages in Glottolog

Classification Approach

is an open-access, comprehensive database that catalogs the world's languages, dialects, and families, assigning unique glottocodes to each languoid for persistent identification and linking to bibliographic references. In its latest edition, (released in 2025), it organizes over 8,000 languages into genealogical classifications based on historical-comparative linguistic research, with a strong emphasis on lesser-known and low-documentation languages. The database classifies a languoid as spurious if it is mentioned in the literature but its existence as a distinct cannot be verified beyond doubt, such as when it represents a place name, ethnic group, or unproven proposal rather than a genuine linguistic entity. The classification approach relies on a rigorous, evidence-based grounded in the master bibliography, which aggregates thousands of sources including and requires multiple independent attestations for validation. For inclusion, a proposed must demonstrate distinctness through non-mutual intelligibility with other varieties, supported by form-meaning pairs (e.g., lexical or grammatical data) from at least 50 basic vocabulary items, and evidence of its use as a primary communication medium in a . Entries lacking such evidence—such as those based on a single, uncorroborated mention or contradicted by subsequent —are marked spurious and placed in the "Unclassified" pseudo-family to maintain bibliographic completeness without implying validity; for instance, Welaung (glottocode: wela1234) was identified as a spurious entry because it is actually a place name associated with the Nga La , not a separate entity. Unlike , which prioritizes speaker population data and retires codes for unverified languages, focuses on bibliographic attestation over demographic metrics, explicitly including dialects and families in its inventory while marking unattested or spurious languoids as such without code retirement. This reference-driven framework ensures transparency, as all classifications link directly to sources, allowing users to evaluate evidence independently. Updates to occur periodically through collaborative curation on a public repository, incorporating new bibliographies and expert revisions, with no major methodological changes reported for 2025 beyond minor data refinements in version 5.2.1.

Catalog of Spurious Entries

maintains a catalog of spurious languoids, which are entries derived from the linguistic but deemed non-existent, misidentified, or insufficiently distinct as separate languages upon further scrutiny. These are classified under the "Bookkeeping" category and include retired codes where applicable. The following lists selected spurious entries grouped by macro-area, with glottocodes, associated ISO codes (if retired), and brief evidentiary notes based on assessments.

Africa

  • !Khuai (khua1244): Retired entry based on a misunderstanding of historical records; no evidence of a distinct language, likely conflated with /Xam or other Khoisan varieties.
  • Baga Kaloum (baga1271, ISO bqf): Attested in 19th-century colonial reports but identical to Baga Koga; considered extinct as a separate lect due to lack of differentiation.
  • Baga Sobané (baga1274, ISO bsv): Retired as a distinct Baga dialect; shares vocabulary and structure with Baga Sitemu, stemming from outdated ethnolinguistic classifications.
  • Ngombe (ngom1265, ISO nmj): Spurious as a separate language; refers to a Pygmy clan name rather than a linguistic entity, with no independent attestation.
  • Oropom (orop1234): Unattested and likely fabricated; based on unverified 20th-century reports of an extinct Ugandan language with no surviving data or descendants.
  • Gengle (geng1243, ISO geg): Bookkeeping entry for a purported Adamawa language; unverified and likely a mishearing or variant name for nearby lects like Kugama.

Asia-Pacific

  • Agariya (agar1251, ISO agi): Spurious Munda entry; conflates names, place names, and varieties from colonial sources, with no unique linguistic features.
  • Ahirani (ahir1243, ISO ahr): Retired as a separate Indo-Aryan ; actually a of Khandeshi, misidentified in early 20th-century surveys due to regional naming variations.
  • Welaung (wela1234): Place name mistaken for a ; actually refers to a location within the Matu area, with no independent lexical or grammatical evidence.

Americas

  • Arakwal (arak1254, ISO rkw): Not a distinct Pama-Nyungan ; one of multiple names for Bandjalang varieties in southeastern , based on ethnic rather than linguistic separation.
  • Chetco (chet1237): Spurious Athabaskan entry; merged into Tolowa-Chetco as linguistically indistinguishable from Tolowa, as confirmed by modern revitalization efforts and comparative analysis.
  • Pisabo (pisa1244, ISO pig): Unattested Pano-Tacanan ; mentioned in early 20th-century sources but lacks any documentation, likely an unverified indigenous report or error.

Other Regions

  • Judeo-Berber (jude1262, ISO jbe): Bookkeeping entry for a purported Berber-Jewish lect; no distinct variety exists, as Berber-speaking used regional dialects without unique innovations.
  • Old Turkish (oldt1247): Retired historical Turkic entry; redundant with established classifications, stemming from inconsistent terminological use in early comparative studies.
  • Tawang Monpa (tawa1289, ISO twm): Retired as a separate Sino-Tibetan ; now reclassified under Dakpa (Takpa), based on shared and from field data.

Dubious Languages

Distinction from Spurious

Dubious languages represent a category of linguistic entities characterized by limited evidentiary support for their existence, such as documentation from a single historical source or the failure to locate living speakers, yet they retain the possibility of being genuine varieties that have not been formally retired from catalogs like Glottolog. These differ from unattested languages, which have confirmed historical existence but lack any recoverable linguistic data, by maintaining a thread of potential verifiability through indirect evidence. In contrast to spurious languages, whose existence is definitively rejected due to proven fabrication, misidentification, or complete lack of supporting proof beyond initial citation, dubious languages occupy a provisional space in linguistic classification. Spurious entries, such as those arising from typographical errors or in early surveys, are maintained only as artifacts without implying reality, whereas dubious cases fuel ongoing scholarly debate and may lead to reclassification if new data emerges. This distinction hinges on the threshold of evidence: spurious languoids fail the basic test of plausibility, while dubious ones meet a minimal bar of non-contradiction but require further validation. The criteria for deeming a language dubious typically involve an assessment in the linguistic literature where evidence is fragmentary—such as brief wordlists or traveler reports—insufficient for genealogical affiliation or speaker verification, yet not demonstrably false. This ongoing debate contrasts sharply with the conclusive dismissal applied to spurious languages, allowing dubious entries to persist in databases like 's unclassifiable category pending additional research. Glottolog 5.2 (2024) maintains several such unclassifiable entries as potentially dubious, including Oropom in . Representative examples include languages linked to uncontacted indigenous groups in the , where indirect evidence like isolated audio recordings suggests reality without direct contact; for instance, was provisionally identified as a Tikuna-Yurí isolate based on scant recordings from an uncontacted Colombian group, illustrating the tension between limited data and potential authenticity. Similarly, certain village sign languages, such as those emerging in small deaf communities, face disputes over their distinct existence due to sparse documentation and overlap with gestural systems, yet they are not rejected outright as non-linguistic. Addressing the status of dubious languages necessitates targeted fieldwork and documentation efforts, as emphasized in comprehensive surveys of global linguistic diversity, to either affirm their role as viable varieties or relegate them to spurious status.

Examples and Current Status

One prominent example of a dubious language in is the Caguan (or Kaguan), reported as an unclassified once spoken in northeastern but with scant attestation. This entry stems primarily from early 20th-century compilations like Loukotka's (1968) classification, which included numerous poorly documented names that later analyses flagged for potential issues due to lack of verifiable speakers or lexical data. As of 2025, linguistic catalogs maintain dozens of dubious entries, with Ethnologue's 28th edition (2025) reflecting ongoing scrutiny by dropping 9 languages from its living list, including 2 designated as unattested due to insufficient evidence of existence. Reviews of prior editions document hundreds of such problematic cases across global inventories, underscoring persistent challenges in verification. Projects like the continue to investigate language statuses worldwide, compiling data on vitality and authenticity to address these uncertainties through community consultations and archival cross-referencing. Gaps in coverage persist, particularly in popular resources where dubious cases receive minimal attention compared to well-documented families, potentially perpetuating outdated classifications. The need for integration of recent updates, such as those in Ethnologue's 28th edition, highlights how unattested or hoax-like entries can linger without rigorous reevaluation. Looking ahead, AI tools and digital archives are emerging as key aids in verifying dubious languages by analyzing historical texts, reconstructing patterns from sparse , and facilitating cross-linguistic comparisons to debunk or confirm existences. These technologies enable scalable processing of archival materials, promising more accurate inventories amid ongoing documentation efforts.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.