Hubbry Logo
GlottologGlottologMain
Open search
Glottolog
Community hub
Glottolog
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Glottolog
Glottolog
from Wikipedia

Glottolog
ProducerMax Planck Institute of Geoanthropology (Germany)
LanguagesEnglish
Access
CostFree
Coverage
DisciplinesLinguistics
Links
Websiteglottolog.org

Glottolog is an open-access online bibliographic database of the world's languages. In addition to listing linguistic materials (grammars, articles, dictionaries) describing individual languages, the database also contains the most up-to-date language affiliations based on the work of expert linguists.

Glottolog was first developed and maintained at the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, and between 2015 and 2020 at the Max Planck Institute of Geoanthropology in Jena, Germany. Its main curators include Harald Hammarström and Martin Haspelmath.

Overview

[edit]

Sebastian Nordhoff and Harald Hammarström established the Glottolog/Langdoc project in 2011.[1][2] The creation of Glottolog was partly motivated by the lack of a comprehensive language bibliography, especially in Ethnologue.[3]

Glottolog provides a catalogue of the world's languages and language families and a bibliography on individual languages. It differs from Ethnologue in several respects:

  • It includes only those languages that the editors have been able to confirm both exist and are distinct. Varieties that have not been confirmed, but are inherited from another source, are tagged as "spurious" or "unattested".[note 1]
  • It attempts only to classify languages into families demonstrated to be valid groupings based on research by linguists specializing in them.
  • Comprehensive bibliographic information is provided, especially for lesser-known or underdescribed languages.
  • To a limited extent, alternative names are listed according to the sources that use them.
  • Apart from a single point-location on a map at its geographic centre, no ethnographic or demographic information is provided.

Language names used in the bibliographic entries are identified by ISO 639-3 code or Glottolog's own code (Glottocode). External links are provided to ISO, Ethnologue and other online language databases

The latest version is 5.1, released under the Creative Commons Attribution 4.0 International License in October 2024.

It is part of the Cross-Linguistic Linked Data project hosted by the Max Planck Institute of Geoanthropology.[4]

Language families

[edit]

Glottolog is more conservative in its classification than other databases in establishing membership of languages and families given its strict criteria for postulating larger groupings. On the other hand, the database is more permissive in terms of considering unclassified languages as isolates. Edition 4.8 lists 421 spoken language[note 2] families and isolates as follows:[5]

List of Glottolog genealogical families
Name Region[note 3] Languages
Atlantic-Congo Africa 1,410
Austronesian Africa, Eurasia, Oceania, South America 1,272
Indo-European Africa, Australia, Eurasia, North America, Oceania, South America 585
Sino-Tibetan Eurasia 506
Afro-Asiatic Africa, Eurasia 382
Nuclear Trans New Guinea Oceania 317
Pama-Nyungan Australia, Oceania 250
Otomanguean North America 181
Austroasiatic Eurasia 158
Tai-Kadai Eurasia 96
Dravidian Eurasia 82
Arawakan North America, South America 77
Mande Africa 75
Tupian South America 70
Uto-Aztecan North America 68
Central Sudanic Africa 63
Nilotic Africa 56
Nuclear Torricelli Oceania 55
Uralic Eurasia 49
Algic North America 47
Athabaskan-Eyak-Tlingit North America 46
Pano-Tacanan South America 45
Quechuan South America 43
Turkic Eurasia 43
Cariban South America 42
Hmong-Mien Eurasia 42
Kru Africa 38
Nakh-Daghestanian Eurasia 36
Sepik Oceania 36
Mayan North America 34
Lower Sepik-Ramu Oceania 30
Nuclear-Macro-Je South America 29
Chibchan North America, South America 27
Tucanoan South America 26
Salishan North America 25
Timor-Alor-Pantar Oceania 23
Dogon Africa 20
Lakes Plain Oceania 20
Mixe-Zoque North America 19
Ta-Ne-Omotic Africa 19
Yam Oceania 19
Siouan North America 18
Anim Oceania 17
Japonic Eurasia, Oceania 17
Mongolic-Khitan Eurasia 17
Border Oceania 15
North Halmahera Oceania 15
Tungusic Eurasia 15
Khoe-Kwadi Africa 14
Angan Oceania 13
Eskimo-Aleut Eurasia, North America 13
Miwok-Costanoan North America 13
Ndu Oceania 13
Nubian Africa 13
Tor-Orya Oceania 13
Totonacan North America 13
Chapacuran South America 12
Gunwinyguan Australia 12
Cochimi-Yuman North America 11
Iroquoian North America 11
Sko Oceania 11
Surmic Africa 11
Western Daly Australia 11
Geelvink Bay Oceania 10
Great Andamanese Eurasia, Oceania 10
Heibanic Africa 10
Ijoid Africa 10
Maban Africa 10
Nyulnyulan Australia 10
Saharan Africa 10
Songhay Africa 10
South Bougainville Oceania 10
Worrorran Australia 10
Chocoan South America 9
Dagan Oceania 9
Tuu Africa 9
Greater Kwerba Oceania 8
Kiowa-Tanoan North America 8
Koiarian Oceania 8
Mailuan Oceania 8
Narrow Talodi Africa 8
Bosavi Oceania 7
Chukotko-Kamchatkan Eurasia 7
Dajuic Africa 7
Huitotoan South America 7
Matacoan South America 7
Muskogean North America 7
Pomoan North America 7
Arawan South America 6
Baining Oceania 6
Barbacoan South America 6
Chumashan North America 6
East Strickland Oceania 6
Kadugli-Krongo Africa 6
Kiwaian Oceania 6
Left May Oceania 6
Lengua-Mascoy South America 6
Nambiquaran South America 6
South Bird's Head Family Oceania 6
Wakashan North America 6
Yanomamic South America 6
Zaparoan South America 6
Abkhaz-Adyge Eurasia 5
Arafundi Oceania 5
Caddoan North America 5
Eleman Oceania 5
Guahiboan South America 5
Guaicuruan South America 5
Kartvelian Eurasia 5
Keram Oceania 5
Koman Africa 5
Kxa Africa 5
Mirndi Australia 5
Misumalpan North America 5
Nimboranic Oceania 5
Pauwasi Oceania 5
Sahaptian North America 5
South Omotic Africa 5
West Bird's Head Oceania 5
Xincan North America 5
Yareban Oceania 5
Yeniseian Eurasia 5
Yuat Oceania 5
Aymaran South America 4
Blue Nile Mao Africa 4
Chicham South America 4
Chinookan North America 4
Chonan South America 4
Eastern Jebel Africa 4
Eastern Trans-Fly Oceania 4
Huavean North America 4
Iwaidjan Proper Australia 4
Kamakanan South America 4
Kunimaipan Oceania 4
Maiduan North America 4
Mangarrayi-Maran Australia 4
Maningrida Australia 4
Naduhup South America 4
North Bougainville Oceania 4
Sentanic Oceania 4
Shastan North America 4
Suki-Gogodala Oceania 4
Tamaic Africa 4
Tangkic Australia 4
Turama-Kikori Oceania 4
Walioic Oceania 4
Yokutsan North America 4
Yukaghir Eurasia 4
Ainu Eurasia 3
Bororoan South America 3
Bulaka River Oceania 3
Charruan South America 3
Dizoid Africa 3
East Bird's Head Oceania 3
Giimbiyu Australia 3
Gumuz Africa 3
Jarrakan Australia 3
Kalapuyan North America 3
Kamula-Elevala Oceania 3
Katla-Tima Africa 3
Kawesqar South America 3
Kayagaric Oceania 3
Kolopom Oceania 3
Kresh-Aja Africa 3
Kuliak Africa 3
Kwalean Oceania 3
Lepki-Murkim-Kembra Oceania 3
Mairasic Oceania 3
Peba-Yagua South America 3
Saliban South America 3
Tequistlatecan North America 3
Tsimshian North America 3
West Bomberai Oceania 3
Western Tasmanian Australia 3
Yangmanic Australia 3
Zamucoan South America 3
Amto-Musan Oceania 2
Araucanian South America 2
Baibai-Fas Oceania 2
Bayono-Awbono Oceania 2
Bogia Oceania 2
Boran South America 2
Bunaban Australia 2
Cahuapanan South America 2
Chimakuan North America 2
Chiquitano South America 2
Coosan North America 2
Doso-Turumsa Oceania 2
East Kutubu Oceania 2
Eastern Daly Australia 2
Furan Africa 2
Garrwan Australia 2
Haida North America 2
Harakmbut South America 2
Hatam-Mansim Oceania 2
Hibito-Cholon South America 2
Huarpean South America 2
Hurro-Urartian Eurasia 2
Inanwatan Oceania 2
Jarawa-Onge Eurasia 2
Jicaquean North America 2
Kakua-Nukak South America 2
Katukinan South America 2
Kaure-Kosare Oceania 2
Keresan North America 2
Konda-Yahadian Oceania 2
Koreanic Eurasia 2
Kwomtari-Nai Oceania 2
Lencan North America 2
Limilngan-Wulna Australia 2
Manubaran Oceania 2
Marrku-Wurrugu Australia 2
Mombum-Koneraw Oceania 2
Namla-Tofanma Oceania 2
Nivkh Eurasia 2
North-Eastern Tasmanian Australia 2
Northern Daly Australia 2
Nyimang Africa 2
Otomaco-Taparita South America 2
Pahoturi Oceania 2
Palaihnihan North America 2
Piawi Oceania 2
Puri-Coroado South America 2
Rashad Africa 2
Senagi Oceania 2
Somahai Oceania 2
South-Eastern Tasmanian Australia 2
Southern Daly Australia 2
Tarascan North America 2
Taulil-Butam Oceania 2
Teberan Oceania 2
Temeinic Africa 2
Ticuna-Yuri South America 2
Uru-Chipaya South America 2
Wintuan North America 2
Yawa-Saweru Oceania 2
Yuki-Wappo North America 2
Abinomn Oceania 1
Abun Oceania 1
Adai North America 1
Aewa South America 1
Aikanã South America 1
Alsea-Yaquina North America 1
Andaqui South America 1
Andoque South America 1
Anem Oceania 1
Arutani South America 1
Asabano Oceania 1
Atacame South America 1
Atakapa North America 1
Bangime Africa 1
Basque Eurasia 1
Beothuk North America 1
Berta Africa 1
Betoi-Jirara South America 1
Bilua Oceania 1
Bogaya Oceania 1
Burmeso Oceania 1
Burushaski Eurasia 1
Camsá South America 1
Candoshi-Shapra South America 1
Canichana South America 1
Cayubaba South America 1
Cayuse North America 1
Chimariko North America 1
Chitimacha North America 1
Chono South America 1
Coahuilteco North America 1
Cofán South America 1
Comecrudan North America 1
Cotoname North America 1
Cuitlatec North America 1
Culli South America 1
Damal Oceania 1
Dem Oceania 1
Dibiyaso Oceania 1
Duna Oceania 1
Elamite Eurasia 1
Elseng Oceania 1
Esselen North America 1
Etruscan Eurasia 1
Fasu Oceania 1
Fulniô South America 1
Fuyug Oceania 1
Gaagudju Australia 1
Guachi South America 1
Guaicurian North America 1
Guamo South America 1
Guató South America 1
Gule Africa 1
Guriaso Oceania 1
Hadza Africa 1
Hattic Eurasia 1
Hoti South America 1
Hruso Eurasia 1
Iberian Eurasia 1
Irántxe-Münkü South America 1
Itonama South America 1
Jalaa Africa 1
Jirajaran South America 1
Kaki Ae Oceania 1
Kanoê South America 1
Kapori Oceania 1
Karami Oceania 1
Karankawa North America 1
Kariri South America 1
Karok North America 1
Kehu Oceania 1
Kenaboi Eurasia 1
Kibiri Oceania 1
Kimki Oceania 1
Klamath-Modoc North America 1
Kol (Papua New Guinea) Oceania 1
Kujarge Africa 1
Kunama Africa 1
Kungarakany Australia 1
Kunza South America 1
Kuot Oceania 1
Kusunda Eurasia 1
Kutenai North America 1
Kwaza South America 1
Laal Africa 1
Lafofa Africa 1
Laragia Australia 1
Lavukaleve Oceania 1
Leco South America 1
Lule South America 1
Máku South America 1
Maratino North America 1
Marori Oceania 1
Massep Oceania 1
Matanawí South America 1
Mato Grosso Arára South America 1
Mawes Oceania 1
Maybrat-Karon Oceania 1
Meroitic Africa 1
Mimi-Gaudefroy Africa 1
Minkin Australia 1
Mochica South America 1
Molale North America 1
Molof Oceania 1
Mor (Bomberai Peninsula) Oceania 1
Mosetén-Chimané South America 1
Movima South America 1
Mpur Oceania 1
Muniche South America 1
Mure South America 1
Nara Africa 1
Natchez North America 1
Nihali Eurasia 1
Odiai Oceania 1
Omurano South America 1
Ongota Africa 1
Oti South America 1
Oyster Bay-Big River-Little Swanport Australia 1
Páez South America 1
Pankararú South America 1
Papi Oceania 1
Pawaia Oceania 1
Payagua South America 1
Pele-Ata Oceania 1
Pirahã South America 1
Puelche South America 1
Puinave South America 1
Pumé South America 1
Puquina South America 1
Purari Oceania 1
Pyu Oceania 1
Ramanos South America 1
Salinan North America 1
Sandawe Africa 1
Sapé South America 1
Sause Oceania 1
Savosavo Oceania 1
Sechuran South America 1
Seri North America 1
Shabo Africa 1
Shom Peng Eurasia 1
Siamou Africa 1
Siuslaw North America 1
Sulka Oceania 1
Sumerian Eurasia 1
Tabo Oceania 1
Taiap Oceania 1
Takelma North America 1
Tallán South America 1
Tambora Oceania 1
Tanahmerah Oceania 1
Taruma South America 1
Tauade Oceania 1
Taushiro South America 1
Timote-Cuica South America 1
Timucua North America 1
Tinigua South America 1
Tiwi Australia 1
Tonkawa North America 1
Touo Oceania 1
Trumai South America 1
Tunica North America 1
Tuxá South America 1
Umbugarla Australia 1
Urarina South America 1
Usku Oceania 1
Vilela South America 1
Wadjiginy Australia 1
Wageman Australia 1
Waorani South America 1
Warao South America 1
Washo North America 1
Wiru Oceania 1
Xukurú South America 1
Yale Oceania 1
Yámana South America 1
Yana North America 1
Yele Oceania 1
Yerakai Oceania 1
Yetfa Oceania 1
Yuchi North America 1
Yuracaré South America 1
Yurumanguí South America 1
Zuni North America 1

Creoles are classified with the language that supplied their basic lexicon.

In addition to the families and isolates listed above, Glottolog uses several non-genealogical families for various languages:[6]

Notes

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Glottolog is an open-access online database that serves as a comprehensive catalogue of the world's s, dialects, and language families, collectively termed "languoids," with a particular emphasis on lesser-known varieties. It assigns each languoid a unique and stable identifier called a Glottocode to facilitate consistent referencing in linguistic research. Developed as a transparent and collaborative resource, Glottolog bases its genealogical classifications on historical-comparative linguistic evidence curated by experts, distinguishing it from proprietary databases like by prioritizing and community contributions. Initiated by the Department of Linguistics at the Max Planck Institute for (MPI-EVA) in , , Glottolog emerged from efforts to create an empirically grounded inventory of the world's linguistic diversity, addressing gaps in existing catalogues by defining languoids based on documented speech varieties rather than speaker numbers or administrative boundaries. The project was led by linguist Harald Hammarström, with significant contributions from Robert Forkel, Martin Haspelmath, and Sebastian Bank, among others; additional input on the bibliography has come from scholars like Alain Fabre, Jouni Maho, and resources from SIL International. First released in 2011, Glottolog has evolved through regular updates, reaching version 5.2 as of 2025, with its data curated collaboratively via for version control and error reporting. A core component of Glottolog is its extensive , comprising 447,613 references to linguistic documentation such as grammars, dictionaries, and articles as of November 2025, which users can search by filters including author, country, or genealogical affiliation. The database documents 8,231 languoids (languages, dialects, and families) as of May 2025, organized into families and macro-areas, and supports advanced queries for mapping linguistic distributions and exploring typological features. As a free resource, Glottolog promotes scholarly access to global linguistic data, enabling research in , typology, and while encouraging ongoing contributions from the academic community.

Introduction

Definition and Purpose

Glottolog is a comprehensive, expert-curated catalogue of the world's languages, dialects, and families, launched in as an open-access online resource. It functions primarily as a that links descriptive linguistic materials to specific languoids, enabling researchers to identify and reference all known linguistic units without relying on demographic or sociolinguistic metrics such as speaker populations. The core purpose of Glottolog is to facilitate the identification of languages through comprehensive bibliographic coverage and a genealogical classification grounded in historical-comparative , prioritizing works like grammars and dictionaries that document linguistic structures and features. This approach treats "languoids" as a neutral, recursive term encompassing families, languages, and dialects, avoiding prescriptive distinctions between them and focusing instead on verifiable descriptive resources. Each languoid is assigned a unique Glottocode identifier to ensure stable referencing across scholarly work. Glottolog is licensed under a Attribution 4.0 International License, promoting , reusability, and collaborative contributions to its data. It is hosted by the Max Planck Institute for Evolutionary Anthropology in , , supporting its role as a public, versioned repository for linguistic documentation.

Development History

Glottolog was initiated in 2011 by linguists Sebastian Nordhoff and Harald Hammarström at the Max Planck Institute for Evolutionary Anthropology in , , as part of the Langdoc project aimed at compiling a comprehensive for lesser-known languages to enhance their in linguistic . The project began with a focus on aggregating and descriptive for underdocumented languages, addressing gaps in existing bibliographic tools by linking references directly to specific languoids. In 2015, Glottolog transitioned to the Institute for the Science of Human History (later renamed the Max Planck Institute of Geoanthropology) in , where it continued development under the same core team, with Martin Haspelmath joining as a key responsible for languoid names and dialect classifications. Harald Hammarström served as the primary , overseeing the compilation of bibliographies and genealogical classifications, while contributions from a global network of linguists were integrated to expand coverage. By around 2020, maintenance shifted back to the Max Planck Institute for in , reflecting institutional realignments within the . The project's evolution has been marked by regular version releases, progressing from early editions in the to version 5.2 in May 2025 and 5.2.1 in June 2025, each incorporating refinements to and expanded bibliographic entries. Glottolog adopted an open-source collaborative model in the mid-, utilizing for to enable community-submitted updates through pull requests and issues, while expert curators maintain oversight to ensure accuracy and consistency. This approach has facilitated ongoing maintenance, with guidelines for contributions emphasizing verifiable sources and adherence to editorial standards.

Database Structure and Methodology

Languoids and Classification

In Glottolog, the term "languoids" serves as a neutral umbrella concept encompassing families, individual s, and dialects, circumventing ongoing debates over the sociopolitical and linguistic boundaries between languages and dialects. This approach treats all such units as comparable entities within a structured inventory, allowing for consistent documentation without imposing subjective status judgments. Glottolog's classification system is exclusively genealogical, grounded in the historical-comparative linguistic method and informed by expert consensus from the scholarly literature. It adopts a conservative stance, incorporating only well-established genetic affiliations and eschewing speculative or unverified links between distant groups, thereby prioritizing reliability over maximalist proposals. The organizes the world's languages into 430 families and isolates as of version 5.2 (updated , 2025), where isolates are treated as single-member families lacking demonstrable relatives. Hierarchies are represented through tree structures that illustrate nested subgroups based on shared innovations and common ancestry, while flat lists provide inventories at each classificatory level for straightforward reference. Each languoid receives a unique Glottocode as an identifier. To address linguistic diversity, Glottolog designates sign languages, pidgins, mixed languages, and artificial languages as separate genealogical families, recognizing their distinct developmental histories outside typical spoken-language phylogenies. Inclusion criteria require verifiable linguistic descriptions, including evidence of distinct form-meaning pairings, sufficient descriptive data, and attestation as primary communication systems; unattested varieties or dubious historical proposals are explicitly flagged and excluded from the core inventory. The methodology eschews probabilistic modeling or automated computational techniques, depending instead on manual curation by linguists who synthesize evidence from the associated .

Glottocodes and Identifiers

Glottocodes are unique identifiers assigned to each languoid in the Glottolog database, consisting of an 8-character alphanumeric string formatted as four lowercase letters or digits followed by four decimal digits. For example, the glottocode "stan1295" identifies (ISO 639-3: deu). Although the initial four characters often appear mnemonic, drawing loosely from the languoid's name or associated to enhance partial human readability, the system is primarily designed for machine compatibility and stability, avoiding reliance on interpretable abbreviations that could lead to unintended modifications. The primary purpose of glottocodes is to provide stable, persistent identifiers for languoids—encompassing families, languages, and dialects—independent of external coding systems like , whose codes can be retired, merged, or reassigned over time. This stability facilitates reliable linking between bibliographic references, classification trees, and other linguistic data, ensuring that references to a languoid remain consistent across Glottolog versions and external research tools. One glottocode is generated for each distinct languoid during the curation process by Glottolog maintainers, who assign new codes sequentially without recycling retired ones, even for bookkeeping entries like misunderstood or unclassified languoids. Glottocodes maintain compatibility with by mapping one-to-one where possible for language-level entries, but they extend coverage to dialects, families, and additional languoids not represented in ISO, such as unclassified varieties or those lacking standardized codes. This broader scope allows Glottolog to include over 25,000 identifiers as of recent releases, supporting a comprehensive inventory beyond ISO's focus on living languages. The advantages of glottocodes include their level-neutral application across the languoid hierarchy, persistence through via platforms like and , and enablement of precise data exchange in linguistic software, reducing errors in cross-referencing and phylogenetic analyses.

Bibliographic Database

Glottolog's serves as a comprehensive repository of references supporting linguistic and , encompassing grammars, dictionaries, articles, theses, and other descriptive materials on the world's . As of 2025, it includes over 447,000 references, specifically 447,613 entries linked to 27,034 varieties and families. These references are primarily drawn from global linguistic , with a strong emphasis on descriptive works for under-documented and lesser-known , excluding those on major national that already have extensive coverage elsewhere. The database is sourced through aggregation from diverse providers, including institutional archives such as the Alaska Native Language Archive, publishers like John Benjamins and SIL International, and individual contributions from linguists like Harald Hammarström, who compiled master bibliographies from personal collections. References are obtained via manual curation, automated parsing of existing bibliographies, and ongoing submissions, ensuring broad coverage of materials relevant to . Each reference is structured with detailed metadata, including , publication year, , of focus, medium (print or digital), and type of document, and is tagged to specific languoids using persistent glottocodes for precise linkage. Quality control is maintained through a combination of manual by expert curators and automated processes to tag references for coverage based on titles or explicit mentions, with duplicates systematically merged to avoid redundancy. The database is open to community submissions, but all additions are verified by the editorial team to ensure relevance and accuracy, and users are encouraged to report errors for continuous improvement. This curated approach prioritizes high-quality, verifiable sources that advance linguistic research. References are accessible online through searchable interfaces on the Glottolog platform and can be downloaded in formats such as for bibliographic management tools like , CSV for tabular data analysis, and other structured exports like XML to facilitate integration with external databases and software. A unique strength of the database lies in its comprehensive focus on indigenous and minority languages, addressing significant gaps in commercial and general bibliographic resources by compiling hard-to-find descriptive materials that are essential for documenting endangered linguistic diversity.

Content and Coverage

Language Inventory

Glottolog's language inventory encompasses approximately 8,600 languoids (languages and dialects) as documented in version 5.2 released on May 27, 2025. This total reflects the catalog's comprehensive approach to cataloging all known varieties that linguists have studied or referenced, assigning each a unique Glottocode for identification. Entries are included based on the presence of at least one bibliographic reference demonstrating linguistic documentation, such as grammars, dictionaries, or descriptive studies. Languoids are distinguished based on how researchers treat varieties in documentation, such as through separate descriptive studies, without relying on criteria like or sociolinguistic factors such as speaker numbers or administrative boundaries. The inventory provides strong coverage of Papuan, Austronesian, and Amazonian languages, while maintaining a global scope that prioritizes lesser-known and underdocumented varieties over widely studied ones. This emphasis stems from Glottolog's goal to fill gaps in linguistic documentation, particularly for regions with high linguistic diversity. Languoids in the inventory are tagged with status indicators to reflect their evidential basis: "Confirmed" for those with sufficient data allowing classification into a family; "Spurious" for unverified or mistaken proposals, such as Yarsun, which arose from misunderstandings in earlier catalogs; and "Unattested" for hypothetical or extinct varieties without surviving records, like Taimviae or Teutae. These tags help users assess the reliability of entries based on available bibliographic evidence. The inventory undergoes annual updates through new releases that incorporate recent discoveries and revisions, with version history maintained via GitHub to track changes in entries and classifications. Glottolog explicitly excludes quantitative data such as speaker numbers, focusing instead solely on the existence and quality of documentary sources to maintain neutrality and reliance on verifiable linguistic evidence.

Family Classifications

Glottolog classifies the world's languages into 430 genealogical , encompassing 246 multi-member and 184 isolates categorized as one-member , along with select macro- supported by scholarly consensus. This structure reflects a comprehensive yet cautious inventory derived from historical-comparative , where isolates represent languages without demonstrable genetic ties to others. The hierarchical organization employs tree-based models for well-established families, featuring nested branches and sub-branches defined by shared phonological, morphological, and lexical innovations, while flatter structures apply to groupings with limited . For instance, the Indo-European family branches into subgroups such as Germanic, Romance, and Indo-Iranian, illustrating deeper internal diversification. Glottolog maintains a conservative , eschewing long-range hypotheses like Amerind due to insufficient , and instead prioritizing rigorous criteria such as regular sound correspondences and sets in basic vocabulary—typically requiring at least 50 form-meaning pairs—to affirm genetic relationships. Key examples highlight the scale and distribution of these families: Niger-Congo, the largest, comprises approximately 1,500 languages mainly across , with major branches like Atlantic-Congo and Benue-Congo; Austronesian includes about 1,200 languages dispersed from to ; and Sino-Tibetan accounts for roughly 450 languages in , including Sinitic and Tibeto-Burman subgroups. Users can visualize these hierarchies through interactive family trees on the Glottolog platform, which support navigation and exploration of relationships, with export options in formats like Newick for phylogenetic software analysis. Classifications evolve across versions, with refinements driven by research in the 2020s, such as subgroup splits in or mergers informed by new comparative data, ensuring ongoing alignment with empirical evidence.

Special Categories

Glottolog accommodates non-traditional languoids through specialized pseudo-families, which group languages that do not fit standard genealogical classifications based on the . These categories include sign languages, pidgins and creoles, artificial languages, and unclassified or isolate languages, allowing for systematic documentation without imposing artificial phylogenetic structures. This approach relies on bibliographic evidence to establish distinctness and shared traits, ensuring comprehensive coverage of linguistic diversity beyond spoken, inherited systems. Sign languages are treated as a pseudo-family in Glottolog, subdivided into L1 sign languages, auxiliary sign systems, and pidgin sign languages, with approximately 227 entries classified into families or isolates based on lexical similarities and historical descent rather than sound correspondences. For instance, the family encompasses related systems like , reflecting evidence of transmission through educational institutions and communities. This classification acknowledges the visual-gestural modality and the challenges of applying traditional , prioritizing documented historical relationships. Pidgins and creoles are grouped under pseudo-families such as "" (87 entries) or "Mixed languages" (9 entries), with creoles often integrated into broader contact clusters like Atlantic Englishes or West African Creole English, emphasizing substrate influences from diverse linguistic sources. These are not forced into genealogical trees due to their origins in and simplification, but instead documented based on sociolinguistic and structural evidence from grammars and descriptions. For example, Sea Island Creole English is classified within English-based creoles, highlighting adstrate and substrate contributions from African languages. Artificial languages, numbering 31 in Glottolog, are included as a pseudo-family or isolates when they possess documented linguistic structure, such as or , which have been analyzed in grammatical studies despite their constructed nature. This treatment avoids equating them with natural languages but recognizes their role in linguistic research on typology and acquisition, justified by bibliographic references to their design and usage. Unclassified languages (120 entries) and isolates (184 entries, treated as one-member families) represent languoids without proven relatives or sufficient data for affiliation, totaling around 300 cases tagged for potential future research. Isolates like Basque are distinguished by the absence of shared ancestry with neighboring families, while unclassified entries, such as certain Papuan varieties, await more documentation. This status is assigned conservatively, using evidence from available sources to avoid premature classification. The rationale for these special categories is to prevent forcing atypical languoids into genealogical hierarchies, instead leveraging bibliographic rigor to justify their placement and facilitate targeted studies. Updates in Glottolog 5.2.1 (released June 11, 2025) have incorporated emerging sign languages from indigenous communities, such as those in Australian Aboriginal groups, expanding coverage based on recent ethnographic documentation.

Access and Tools

Search Functionality

Glottolog provides an intuitive web-based interface for searching its language inventory and , accessible without user login or registration. The primary entry point for language searches is the "Languages" section, where users can query by primary or alternative name (with options for whole-word or partial matches, including non-English names), Glottocode, code, or country/region. Results display as a paginated list of languoids (s, dialects, or families), each linking to detailed profiles with , geographic coordinates, and reference counts; for example, searching "" yields entries for its dialects alongside family affiliations. Family browsing is facilitated through a dedicated "Families" tool, offering a navigable that allows users to expand or collapse hierarchical classifications, from major phyla like Indo-European to isolates. This tree-based navigation supports quick exploration of genealogical relationships without requiring prior knowledge of specific codes. Glottocodes, as unique identifiers for languoids, can be directly entered to pinpoint exact entries within this browser. The bibliographic search, centered on the "References" section, enables filtering of over 447,000 entries by author, title, publication year, country, languoid (via name or code), or genealogical affiliation. Advanced options in the complex query interface further refine results by document medium, such as grammars, dictionaries, or articles, and by macro-area or annotation type (e.g., manual vs. automatic indexing). For instance, a query for "grammars" on Austronesian languages in returns targeted citations with download options in formats like . Key features enhance usability, including suggestions during name-based queries to handle variant spellings and an integrated reference index that cross-links bibliographies to languoid pages. Geographic visualization is available through GlottoScope, an interactive tool that plots search results by location, overlaying data on endangerment status or extent for selected languages or families. The interface is fully text-based, lacking direct integration of images or audio resources, which focuses queries on bibliographic and classificatory data. Since its version 4.0 release in , the platform has been mobile-responsive, ensuring compatibility across devices for on-the-go access.

Data Download and APIs

Glottolog offers multiple formats for downloading its database, enabling bulk access for research and integration purposes. The full is available as a gzipped 9.x (glottolog.sql.gz, approximately 75.8 MB), which includes all languoids, classifications, and bibliographic references. Additionally, a comprehensive CLDF (Cross-Linguistic Data Formats) is provided in a zipped package (glottolog_dataset.cldf.zip, about 39.6 MB), consisting of CSV tables with accompanying metadata for structured reuse, covering languoids, trees, and references. These downloads are generated from the curated in the project's repository and archived on for each versioned release. Partial exports facilitate targeted access to subsets of the data. For instance, a zipped CSV file (glottolog_languoid.csv.zip) contains detailed languoid information, including names, levels, and identifiers, while direct CSV endpoints like https://glottolog.org/glottolog/[language](/page/Language).csv provide -specific data such as ISO codes and macroareas. Bibliographic exports are integrated into the CLDF dataset, with references available in formats compatible with tools like , and RDF triples (e.g., glottolog_dataset.n3.gz, 50.9 MB) for applications. Version-specific downloads ensure reproducibility, with each release tagged and cited separately, such as Glottolog 5.2 from June 2025. Programmatic access to Glottolog data is supported through its web interface and client libraries. Languoid details, references, and family trees can be queried via REST-like endpoints, such as /resource/languoid/id/{glottocode} for individual profiles (e.g., https://glottolog.org/resource/languoid/id/stan1288 for Spanish). The pyglottolog Python library serves as an wrapper, allowing users to load, query, and export data from local installations of the CLDF dataset, including functions for and reference retrieval. Glottolog data integrates seamlessly with the CLLD (Cross-Linguistic ) framework, which powers the and supports building custom linguistic tools and databases. The source code and data curation occur in a public repository (glottolog/glottolog), where community contributions for updates and extensions are encouraged via pull requests. All downloads and uses require proper attribution under the Creative Commons Attribution 4.0 International license, with citations to the editors (Hammarström et al.) and the specific version (e.g., "Glottolog 5.2. : Institute for "). This ensures academic integrity and tracks data across releases. Examples of tools leveraging Glottolog data include the Glottolog Data Explorer, an interactive for visualizing global distributions, endangerment statuses, and geographic mappings using JavaScript libraries. Such tools demonstrate the dataset's utility for spatial and typological analyses.

Impact and Reception

Academic Use

Glottolog serves as a foundational resource in linguistic research, particularly for during fieldwork, where its stable glottocodes and comprehensive bibliographic references enable researchers to precisely locate and verify lesser-known s and dialects in real-time documentation efforts. In , it facilitates the analysis of genetic relationships by providing detailed family classifications and reference materials, supporting studies that trace historical connections across language families. For typology studies, Glottolog's catalog of linguistic features and statuses aids in cross-linguistic comparisons, allowing scholars to map structural variations and diversity patterns systematically. In educational settings, Glottolog is integrated into university courses on to illustrate global language diversity, with its interactive maps and searchable database used to teach concepts like dialect continua and phylogenetic trees. It is also embedded in computational tools such as LingPy, a Python library for quantitative , where Glottolog's identifiers standardize data for phylogenetic analyses and sequence comparisons in classroom exercises and student projects. Glottolog supports projects by aggregating references to grammars, dictionaries, and texts, which are essential for revitalizing endangered languages; for instance, it aids Endangered Languages Documentation Programme (ELDP) initiatives by providing bibliographic leads for fieldwork on under-documented varieties. This functionality helps researchers and communities locate primary sources, streamlining efforts to preserve oral traditions and cultural knowledge associated with at-risk languages. Since its inception in , Glottolog has been referenced in over 600 academic papers, establishing it as a standard resource for linguistic databases like the Automated Similarity Judgment Program (ASJP), which relies on its classifications for lexical comparison and phylogenetic modeling. Its bibliographic depth ensures reliable sourcing in peer-reviewed work, from typological surveys to digital archives. The project fosters community engagement through open collaboration on , where linguists worldwide contribute bibliographies, classifications, and data validations, alongside workshops such as those organized under Glottobank-ELDP small grants to train researchers in using its tools for . This participatory model has built a global network of contributors, enhancing the database's accuracy and relevance. In the 2020s, Glottolog's adoption has grown in , supporting cross-linguistic formats like CLDF for interoperable datasets in studies. Its structured inventories are increasingly utilized in AI language modeling, providing ground-truth identifiers for training models on low-resource and endangered s to improve multilingual representation.

Comparisons with Other Databases

Glottolog differs from in its conservative approach to , recognizing fewer distinct families and avoiding unsubstantiated splits based on testing, which can lead to more fragmented inventories in . While emphasizes demographic data like speaker numbers and vitality status, Glottolog prioritizes bibliographic , providing traceable references for classifications and excluding unsubstantiated claims that lack scholarly support. Additionally, Glottolog is fully open-access, contrasting with 's proprietary model requiring subscriptions, which has drawn for limiting academic . In comparison to , Glottolog offers broader coverage by including dialects, unattested varieties, and extinct languages that may exclude or retire as non-existent, ensuring a more comprehensive inventory for linguistic research. Glottocodes provide greater stability than ISO codes, as they are never retired even for provisional or bookkeeping entries, making them preferable for long-term scholarly tracking and historical analysis. This stability facilitates mappings between the two systems, with Glottolog aiming to cover all valid entries while adding unique identifiers for entities outside ISO's scope. Relative to typological databases like the World Atlas of Language Structures (WALS), Glottolog emphasizes deep bibliographic linkages to primary sources rather than structural features such as phonological or grammatical traits, which WALS documents for a subset of languages. Thus, Glottolog complements WALS by supplying inventory and reference data that enhance typological studies, including borrowing coordinates from WALS for geographic mapping, rather than competing directly. Glottolog's strengths include its expert curation by linguists, who compile and verify classifications from global bibliographies, and its versioned release history, such as the 5.2 edition in 2025, which tracks changes transparently for reproducibility in . These features make it particularly suited for diachronic , where stable identifiers and sourced phylogenies are essential. Glottolog promotes interoperability by integrating with other catalogs, such as mapping Glottocodes to and incorporating endangerment data from , enabling researchers to combine resources for multifaceted analyses like geospatial or vitality assessments. As of 2025, Glottolog maintains leadership in open bibliographic depth, cataloging 27,034 languoids with 447,613 references, surpassing peers in accessible, source-verified coverage for lesser-known languages.

Criticisms and Limitations

Glottolog's classification adopts a conservative stance, recognizing only genetic relationships supported by robust scholarly evidence while rejecting speculative hypotheses. This approach results in under-grouping languages in cases where broader affiliations, such as the proposed Altaic family linking Turkic, Mongolic, Tungusic, and other groups, remain debated or refuted. For example, Glottolog maintains separate families for these components, citing key critiques including Vovin (2005) on etymological flaws, Georg (2021) on methodological issues, and Janhunen (2024) on the lack of convincing shared innovations. Such conservatism frustrates some researchers who argue it hinders exploration of potential distant relationships, particularly in under-documented regions. Coverage gaps persist in areas like urban and contact languages, where creoles and mixed lects are often subsumed under their lexifier's rather than highlighted as distinct categories, complicating targeted searches. Dialect-level detail is also incomplete, particularly in regions like , due to reliance on unsystematically revised sources such as Multitree, leading to inconsistent variant representations. The database's reliance on a core team of expert curators, including Harald Hammarström for genealogical relations and Martin Haspelmath for nomenclature, introduces potential subjective biases reflecting individual scholarly perspectives. This expert dependency contributes to slower updates in dynamic subfields like creolistics, where new evidence emerges rapidly but integration lags behind annual releases. Additionally, Glottolog's focus as a bibliographical catalog excludes multimedia resources such as audio or video samples, limiting its support for phonetic and prosodic analyses. Accessibility challenges arise from the technical terminology and structure geared toward professional linguists, requiring specialized knowledge to navigate advanced features like languoid hierarchies. The English-only interface further restricts use by non-English-speaking researchers or communities. In response, Glottolog promotes collaborative input from diverse contributors via its public repository to mitigate biases and enhance coverage. Recent 2025 updates, including version 5.2.1, reflect ongoing efforts to refine data and bibliographical completeness, with calls for expanded expert involvement to address sustainability and review processes.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.