Hubbry Logo
Catalogue of LifeCatalogue of LifeMain
Open search
Catalogue of Life
Community hub
Catalogue of Life
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Catalogue of Life
Catalogue of Life
from Wikipedia

The Catalogue of Life (CoL) is an online database that provides an index of known species of animals, plants, fungi, and microorganisms. It was created in 2001 as a partnership between the global Species 2000 and the American Integrated Taxonomic Information System. The Catalogue is used by research scientists, citizen scientists, educators, and policy makers.[1] The Catalogue is also used by the Biodiversity Heritage Library, the Barcode of Life Data System, Encyclopedia of Life, and the Global Biodiversity Information Facility.[2] The Catalogue currently compiles data from 165 peer-reviewed taxonomic databases that are maintained by specialist institutions around the world. As of September 2022, the COL Checklist lists 2,067,951[3] of the world's 2.2m extant species known to taxonomists on the planet at present time.

Key Information

Structure

[edit]

The Catalogue of Life employs a simple data structure to provide information on synonymy, grouping within a taxonomic hierarchy, common names, distribution and ecological environment.[4] It provides a dynamic edition,[5] which is updated monthly (and in which data can change without tracking of those changes) and an Annual Checklist,[6] which provides a dated, verifiable reference for the usage of names and associated data. Development of the Catalogue of Life was funded through the Species 2000 europa (EuroCat),[7] 4d4Life,[8] i4Life[9] projects in 2003–2013, and later by the Naturalis Biodiversity Center, Leiden, the Netherlands and Species Files group at Illinois Natural History Survey in Champaign-Urbana.

Current people governing the CoL,[10] contributors,[11] and other relevant information which changes over time, are listed on the CoL website.

Usage

[edit]

Much of the use of the Catalogue is to provide a backbone taxonomy for other global data portals and biological collections. Through the i4Life project, it has formal partnerships with Global Biodiversity Information Facility, European Nucleotide Archive, Encyclopedia of Life, European Consortium for the Barcode of Life, IUCN Red List, and Life Watch. The public interface includes both search and browse functions as well as offering multi-lingual services.[2]

The Catalogue listed 300,000 species by 2003, 500,000 species by 2005, and over 800,000 species by 2006.[12] As of 2019, the Catalogue listed 1.9 million extant and extinct species.[13] There are an estimated 14 million mainly unpublished species; however, this number is uncertain as there is a lack of data on the possible number of undescribed insects, nematodes, bacteria, fungi and many others.[14]

Catalogue of Life Plus

[edit]

In 2015, an expert panel presented a consensus hierarchical classification of life[15] which included some sectors not yet represented in the published Catalogue. In the same year, the Catalogue of Life, Barcode of Life Data System, Biodiversity Heritage Library, Encyclopedia of Life, and the Global Biodiversity Information Facility (GBIF) met to consider building a single shared authoritative nomenclature and taxonomic foundation "Catalogue of Life Plus" that could be used to order and connect biodiversity data, including content not yet in CoL but available via other sources, to serve both the users of the present Catalogue and users of extended taxonomic content (such as GBIF) using a common infrastructure. COL+ will develop a clearinghouse covering scientific names across all life, provide a single taxonomic view, and provide an avenue for feedback from content authorities.[2] The CoL is developing in conjunction with the Global Species List Working Group to avoid replication and work towards an authoritative global list of species.

See also

[edit]

References

[edit]

Bibliography

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The Catalogue of Life (CoL) is an open-access online database serving as the most comprehensive and authoritative global index of known species names for animals, plants, fungi, bacteria, and other organisms, encompassing 2,238,246 living species and approximately 153,000 extinct species in its 2025 annual release (as of November 2025). Maintained through collaborative efforts of hundreds of taxonomic experts worldwide, it aims to provide a verified, up-to-date checklist that supports biodiversity research, conservation, policy-making, and public education by integrating standardized taxonomic data into a single, searchable resource. The database distinguishes between a base release, which features expert-curated, non-overlapping species lists with high accuracy, and an eXtended Release, which incorporates additional unverified data from over 60,000 sources, including molecular sequences and common names, to offer broader coverage. Initiated in June 2001 as a partnership between Species 2000, a global federation of taxonomic databases, and the Integrated Taxonomic Information System (ITIS), a U.S.-based authority on North American and international , the Catalogue of Life emerged from efforts to consolidate fragmented species lists into a unified global catalogue. This collaboration addressed the need for a consistent taxonomic backbone amid growing data demands, with annual releases beginning in 2000 (skipping 2001 and 2020) and monthly updates ensuring ongoing relevance. Over time, the project has expanded to include contributions from institutions such as the , the (GBIF), and the Illinois Natural History Survey (INHS), fostering a decentralized network of over 200 taxonomic editors who review and update entries. The Catalogue of Life's structure emphasizes principles, with downloads available in formats like Darwin Core Archives and TextTree for integration into other platforms, and it plays a pivotal role in initiatives like the by providing the foundational index. Despite covering a significant portion of described —2,238,246 extant as of the November 2025 edition—gaps persist in underrepresented groups such as microbes and deep-sea organisms, highlighting the ongoing challenge of cataloging Earth's estimated 8.7 million . Through its infrastructure, including ChecklistBank for data hosting, CoL facilitates dynamic assembly of taxonomic information, promoting interoperability and long-term preservation of knowledge.

History and Development

Founding and Partnerships

The Catalogue of Life was founded in June 2001 as a collaborative partnership between Species 2000 and the (ITIS). Species 2000, an international federation of taxonomic databases, had been initiated in 1997 by Frank Bisby at the in the , with the aim of coordinating global efforts to index species names uniformly. ITIS, established in 1996 by a U.S. subcommittee involving government agencies from the , , and , focused on providing authoritative taxonomic information for North American species and beyond. The primary goal of this partnership was to develop a single, integrated global index of all known names, thereby addressing the fragmentation and inconsistencies prevalent in existing taxonomic databases worldwide. Frank Bisby, as the key initiator through his leadership of Species 2000, played a pivotal role in envisioning and launching the project, emphasizing collaboration among taxonomists to create a comprehensive, validated resource. Species 2000 contributed its expertise in aggregating international databases, while ITIS provided specialized knowledge on North American , forming the foundational backbone for the Catalogue's data integration. Initial operational costs for the Catalogue of Life were covered by the and the in the , enabling the early development and coordination efforts without relying solely on external grants. This support underscored the commitment of these institutions to fostering a unified platform for documentation from its inception.

Key Milestones and Evolution

The Catalogue of Life released its first annual checklist in 2000, with the 2004 edition compiling approximately 323,000 species from multiple taxonomic to provide an initial global index of known organisms. This marked the transition from preliminary prototypes to a standardized annual update process, emphasizing integration of peer-reviewed taxonomic . By 2007, the catalogue had grown to over 1 million , reflecting rapid expansion through contributions from international partners and the inclusion of additional phyla such as and fungi. Subsequent years saw continued acceleration in species coverage, reaching about 1.35 million species by the 2013 annual checklist, driven by enhanced from over 100 sources and refinements in taxonomic hierarchies. During the 2010s, CoL began deeper integration with the (GBIF), enabling the linkage of taxonomic names to occurrence records and improving data interoperability for . This period highlighted the catalogue's evolution from a static list to a foundational resource for global . A pivotal shift occurred in 2017 with the launch of the Catalogue of Life Plus (CoL+) project, which aimed to develop a dynamic for real-time taxonomic updates and broader data incorporation, moving beyond annual static releases to support ongoing expert curation. In 2021, CoL aligned its operations with emerging global principles for list , including transparent decision-making on synonymy and taxonomic revisions, as outlined in collaborative frameworks to foster a unified index of life on . By this time, the catalogue exceeded 2 million accepted , underscoring its role in addressing taxonomic instability through consensus-based classifications. The 2025 annual release on July 9 incorporated contributions from hundreds of taxonomic experts, including 2,068,366 living and 152,871 extinct across all domains of for a total of 2,221,237 , with enhanced coverage of microbes. On October 7, 2025, CoL launched the eXtended Release (COLXR), integrating additional nomenclatural and distributional to align with more than 3.5 billion species occurrence records mediated through GBIF, thereby enhancing scalability for large-scale analyses. Throughout its evolution, CoL has tackled challenges such as frequent taxonomic revisions and synonym resolution by implementing multi-step , including expert verification and automated checks for consistency, ensuring reliability amid evolving scientific understanding.

Organizational Framework

Governance and Contributors

The Catalogue of Life (COL) is governed by a structured framework established as a partnership between Species 2000 and the (ITIS), formalized as a Dutch foundation (KVK 86436481) since 2022. This governance is overseen by a responsible for international policy, appointments of key personnel, and strategic direction, currently comprising members such as Dr. Edward DeWalt (Acting Chair, USA), Dr. Olaf Bánki (Managing Director, Acting Secretary & Treasurer, ), and Prof. W. Alex Gray (). An advisory Catalogue of Life Global Team, meeting every 6-9 months, handles taxonomic and IT policies, work program design, and quality control, with 19 members including taxonomic experts like Thomas Pape (Chair, ) from institutions such as the and . Specialized working groups, limited to 15 members each, focus on areas like , information systems, and species list governance, ensuring expert input from global institutions. The contributor network encompasses a vast global of hundreds of taxonomic specialists, experts, and institutions that maintain and update checklists. integrates data from over 165 peer-reviewed taxonomic databases, with custodians ranging from individual experts to major organizations, covering sectors like Animalia (e.g., via the for marine groups) and Plantae (e.g., World Checklist of Vascular Plants for ). Regional editors, such as Camila Plata in and Diana Hernández in , coordinate contributions for specific geographic or taxonomic areas, while thousands of taxonomists worldwide provide consensus-based classifications through -managed checklists hosted on platforms like ChecklistBank. This network emphasizes collaboration with initiatives like the (GBIF) for infrastructure support and the (WoRMS) for marine , fostering a distributed model of expertise. COL adheres to an policy, licensing its content under the Attribution 4.0 International unless otherwise specified, which promotes free access, reuse, and sharing while requiring attribution to original contributors. This approach aligns with global biodiversity standards, enabling integration with resources like GBIF and encouraging broad collaboration among taxonomic communities. Funding for COL derives from a mix of institutional support and project grants, with the foundation relying on an international consortium including GBIF, the Illinois Natural History Survey, and the Smithsonian Institution for core operations. Historical grants have included support from the European Commission (approximately €20 million over 29 years through projects like BiCIKL, DiSSCo Prepare, and Synthesys+), the US National Science Foundation (NSF), and the Dutch Ministry of Science and Technology via the Netherlands Biodiversity Information Facility. Post-2021, COL has transitioned toward a sustainable model through this consortium and targeted EU-funded initiatives like Biodiversity Meets Data and TETTRIs, focusing on policy-relevant species lists and taxonomic integration, supplemented by contributions from the World Bank/Global Environment Facility.

Data Integration Processes

The Catalogue of Life (COL) employs a structured aggregation to compile taxonomic data from diverse global, regional, and specialized sources into a unified checklist. This process relies heavily on ChecklistBank, an open-source platform that hosts and standardizes incoming datasets in the COL Data Package (ColDP) format, enabling dynamic assembly through monthly update cycles. Data providers submit checklists, which are then cross-referenced against existing COL content to identify overlaps, synonyms, and taxonomic alignments using name-matching algorithms that account for variations in spelling and authorship. For instance, the workflow incorporates sources such as the Index Number (BIN) system from the BOLD database to integrate molecular identifiers, supporting phylogenetic refinements since the early . This phased approach includes preparation of a curated Base Release, followed by the eXtended Release (COL XR), which programmatically enriches the base with additional datasets from over 59,000 checklists. Quality assurance in COL's integration begins with automated checks to ensure nomenclatural compliance, adhering to standards like the (ICZN) for animals and the International Code of Nomenclature for algae, fungi, and plants (ICN) for others. These checks validate name authorship, publication references, and hierarchical consistency, flagging issues such as invalid synonyms or misclassifications for resolution. Expert reviewers from the COL Taxonomy Group and contributing specialists then conduct manual assessments, applying editorial decisions to reconcile discrepancies and maintain taxonomic stability. Community feedback mechanisms, including issue reporting, further support ongoing refinements, ensuring that integrated data reflects consensus on contentious classifications. Data interoperability is achieved through adherence to Darwin Core standards, which structure taxonomic information into extensible archives for seamless exchange. Hierarchies from kingdom to subspecies are preserved using unique, persistent identifiers such as Life Science Identifiers (LSIDs), assigned to taxa to enable stable linking across versions and external databases. These LSIDs facilitate the representation of complex relationships, including synonyms and parent-child linkages, without altering source data integrity. Key challenges in include resolving conflicts arising from divergent classifications across sources, such as differing phylogenetic placements informed by molecular versus morphological evidence. addresses these through a consensus-building process, prioritizing expert-vetted decisions and selective incorporation of higher where gaps exist, while avoiding over-synchronization that could propagate errors. Since the , the inclusion of molecular data has enhanced phylogenetic accuracy but introduced complexities in aligning sequence-based clusters with traditional , mitigated by linking to external resources like BOLD for verification. This iterative approach ensures a balanced, authoritative index despite source heterogeneity.

Content and Taxonomy

Species Coverage and Classification

The Catalogue of Life (CoL) 2025 release encompasses over 2.2 million accepted names, providing a comprehensive index of nearly all described , with partial coverage of prokaryotes. This extensive compilation draws from global taxonomic expertise to provide a near-complete inventory of known , focusing primarily on extant taxa while including some extinct where data is available. The 2025 release added 48,766 new , a 2% increase, with notable expansions in and crustaceans (Animalia), ferns (Plantae), and prokaryotes through LPSN. The database's scope emphasizes multicellular organisms but extends to microbes through integrated sources, highlighting the ongoing challenge of cataloging microbial diversity. The species are primarily distributed across major kingdoms, with the majority in Animalia, followed by Plantae, Fungi, , and smaller numbers in , , and . These figures reflect contributions from specialized databases, such as ITIS for animals and AlgaeBase for , ensuring a balanced yet incomplete representation of . CoL employs a hierarchical Linnaean system, organizing taxa from domain to infraspecific ranks, including accepted names, synonyms, and common names in multiple languages where available. This structure facilitates precise identification and phylogenetic mapping, with infraspecific taxa (e.g., ) incorporated for groups like vertebrates and flowering plants. The system integrates peer-reviewed taxonomic revisions to maintain nomenclatural stability under the International Code of Zoological Nomenclature and International Code of Nomenclature for algae, fungi, and plants. Despite its breadth, CoL exhibits notable gaps, with strongest coverage in vertebrates and vascular , where completeness approaches 95% for described . In contrast, —particularly nematodes and arthropods beyond major orders—and microbial groups like and remain underrepresented, comprising less than 10% of estimated totals due to taxonomic challenges and limited expert input. Ongoing efforts prioritize these areas through partnerships with initiatives like WoRMS for and LPSN for prokaryotes, aiming to enhance completeness in underrepresented phyla.

Annual Releases and Quality Control

The Catalogue of Life produces two primary types of releases to balance comprehensiveness with data reliability: the Base Release and the eXtended Release. The Base Release is an annual, expert-curated compilation of non-overlapping taxonomic checklists, emphasizing high accuracy through verification by taxonomic specialists, though it may contain gaps in coverage. In contrast, the eXtended Release builds upon the Base Release by incorporating dynamic, unverified from over 60,000 partner sources, including global, regional, and thematic databases, to enhance completeness while adding elements like molecular identifiers and vernacular names. Annual Base Releases follow a consistent schedule, with the 2025 edition published on July 9, incorporating monthly updates accumulated throughout the year. Each release is assigned a persistent DOI for citability and long-term access, such as doi:10.48580/dgr6n for the 2025 version, enabling researchers to reference specific iterations reliably. Historical releases dating back to 2000 (excluding 2001 and 2020) are archived in ChecklistBank, supporting and temporal analysis of taxonomic data. The eXtended Release complements this by providing ongoing monthly updates, available for at least one year before integration into the next annual Base Release. Quality control in the Catalogue of Life employs a multi-step process to maintain data integrity, beginning at the data origin where raw contributions are converted to the standardized ColDP format, identifying issues like structural inconsistencies or formatting errors for correction by providers. During assembly in ChecklistBank, automated checks validate identifiers, detect duplicates, misclassifications, and incomplete taxa, with monthly comparisons flagging emerging problems for iterative refinement. Peer review by taxonomic editors and community experts is central to the Base Release, ensuring expert-vetted classifications, while the eXtended Release undergoes initial programmatic checks on additional sources, allowing editors to block erroneous data and solicit community input for rapid improvements. This tiered approach results in stricter controls for the Base Release's high accuracy and more flexible, completeness-focused verification for the eXtended Release, fostering ongoing enhancements through contributor pipelines and metadata reviews.

Features and Accessibility

Database Structure and Search Tools

The Catalogue of Life (COL) employs a robust technical architecture centered on ChecklistBank, an open-source backend infrastructure co-developed by COL and the (GBIF). ChecklistBank serves as a repository and index for taxonomic checklists, enabling the standardization, publication, analysis, and curation of biodiversity datasets regardless of their original formats. It processes contributions in structured formats like the Catalogue of Life Data Package (ColDP) and supports the assembly of COL's annual releases by integrating global species databases (GSDs) into a unified structure. This backend ensures data consistency through features like dataset versioning, provenance tracking, and quality assessments, handling over 2 million entries with scalability for large-scale taxonomic operations. The frontend is accessible via the COL at catalogueoflife.org, which provides an intuitive interface for users to interact with the database. Programmatic access is facilitated through the ChecklistBank , featuring RESTful endpoints for tasks such as name lookups, retrieval, and metadata queries—for instance, endpoints like /name/usage/{id} allow retrieval of taxonomic usages. This supports to COL's content, with optional authentication for advanced features like custom downloads. The architecture's modularity allows seamless integration of partner s, ensuring the portal reflects the latest verified taxonomy while linking to external resources. Search tools within the COL portal enable advanced querying by scientific name, common name, or higher taxon levels, with filters for kingdoms (e.g., Animalia, Plantae) and geographic regions to refine results. Users can perform fuzzy matching for variant spellings or synonyms, and results include hierarchical navigation through taxonomic ranks. Export functionalities support formats such as CSV for tabular data, RDF for semantic web applications, and full dataset downloads via DOI-linked releases, facilitating integration into external workflows. These tools prioritize usability for both casual explorers and experts, with recent enhancements as of the 2025 release improving query speed for large result sets. Taxon pages form a core feature, presenting detailed profiles for each entry with sections on geographic distribution (often mapped via partner ), bibliographic references, and like images sourced from collaborators such as or GBIF. These pages link to primary sources for verification and include identifiers for cross-referencing with other databases. Mobile accessibility is enhanced through integration with the app, allowing users to query COL taxa directly during field observations via calls. This feature set supports efficient navigation and reuse without requiring deep technical expertise. Technically, the system accommodates queries through compatible endpoints linked to GBIF's infrastructure, enabling complex semantic searches across COL's RDF exports for advanced users. Scalability is demonstrated by its management of approximately 5.4 million taxonomic names and synonyms, with ChecklistBank's cloud-based deployment handling high query volumes from global users. in searches draws from expert-verified base releases, though extended data may vary in completeness.

Usage in Research and Conservation

The Catalogue of Life (CoL) serves as a foundational taxonomic backbone for phylogenetic studies and modeling, providing standardized classifications that enable researchers to integrate diverse datasets across global scales. For instance, it has been utilized in constructing higher-level classifications of all living organisms, facilitating analyses of evolutionary relationships and diversification patterns among millions of . In modeling, CoL data support community-driven syntheses that assemble expert checklists for assessing and ecological patterns, aiding in the prediction of distributions and risks. In conservation, CoL underpins IUCN Red List assessments by supplying verified taxonomic references for evaluating threat statuses of over 172,000 species, ensuring consistent nomenclature in global conservation priorities. It integrates with the Global Biodiversity Information Facility (GBIF) to map species occurrences, supporting protected area planning through tools that align occurrence data with taxonomic indices for optimizing habitat protection. Examples include its application in assessing ecological representation within China's protected areas network and identifying gaps in coverage for key functional groups. CoL informs policy and education by offering reliable species data for governmental biosecurity measures and environmental impact assessments, particularly in monitoring invasive alien species and their ecological effects. In biosecurity, it provides the taxonomic framework for risk assessments of non-native species, as seen in initiatives cataloging invasive management projects. For education, CoL functions as an accessible tool for species identification in platforms and academic curricula, promoting hands-on learning in and through integrated dictionaries like those in iSpot. As a free, open-access recognized as a Global Core Biodata Resource, enhances global equity in biodiversity data by enabling discoveries such as the addition of 48,766 new names in its 2025 annual release, fostering linkages between previously disparate taxonomic records worldwide.

Extensions and Future Directions

Catalogue of Life Plus Initiative

The Catalogue of Life Plus (CoL+) project, initiated in 2017 and concluding in 2019, represented a major collaborative effort to modernize the Catalogue of Life by developing a robust, service-oriented for global taxonomic data. Funded primarily by the Netherlands Biodiversity Information Facility (NLBIF) and the Netherlands Ministry of Education, Science, and , with contributions from partners including Species 2000, the Illinois Survey, , and the , the initiative aimed to address limitations in the static annual checklists by fostering greater interoperability among databases. Key objectives of included establishing a global clearinghouse for scientific names and , integrating additional taxonomic sources beyond the core Catalogue to create an extended, expert-reviewed , and separating from taxonomic classifications using unique stable identifiers for enhanced precision. The project emphasized building APIs and web services to enable dynamic access to data, reducing reliance on inconsistent text-string matching algorithms in favor of standardized, interoperable systems that support content providers like nomenclatural databases and regional lists. Additionally, it sought to incorporate supplementary data such as references, names, and links to enrich the catalogue's utility for research. Among its achievements, CoL+ successfully rebuilt the Catalogue's core infrastructure, including a store, importer tools, and enhanced editorial systems, which facilitated the integration of diverse sources and improved name resolution through the GBIF Backbone Taxonomy framework. This work culminated in the development of the ChecklistBank , providing to the evolving via web services and reusable interface components, thereby enabling real-time querying and reducing duplication in efforts. The also piloted connections with key partners such as the and Plazi, demonstrating practical interoperability for name curation and taxonomic resolution. The legacy of CoL+ lies in its foundational role for subsequent advancements in the Catalogue of Life, transitioning the platform from periodic static releases to a continuously updated dynamic system that supports ongoing expert contributions and broader data linkages. This infrastructure has underpinned post-2019 enhancements, including expanded coverage of microbial and taxa, and serves as a cornerstone for initiatives like the eXtended Release (COLXR), promoting a more comprehensive and accessible global index of life.

eXtended Release (COLXR) Developments

The eXtended Release (COLXR) of the Catalogue of Life was launched on October 7, 2025, marking a significant of the CoL Plus initiative by expanding taxonomic coverage to support the classification of over 3.5 billion species occurrence records hosted by the (GBIF). This release builds upon the expert-curated Base Release by programmatically integrating approximately 17,500 additional data sources, including regional, national, and management checklists as well as digitized , to address gaps in the verified dataset. Key new features include the dynamic incorporation of unverified scientific names, authorships, references, vernacular names, and higher classifications from these extended sources, which are distinctly marked with an XR icon to indicate their provisional status alongside verified Base Release data. Enhanced linkages to occurrence records are facilitated through GBIF's adoption of COLXR as its primary taxonomic backbone, enabling more precise mapping of observations to taxonomic concepts. The assembly process has been upgraded for monthly updates, allowing for continuous integration of open-access data under CC0 or CC BY licenses, with quality gradients clearly delineating verified content from extended, potentially overlapping entries. Technically, COLXR leverages GBIF's infrastructure for scalable data processing while maintaining the Catalogue's role as the authoritative backbone, incorporating molecular data such as Barcode Index Numbers and DNA-based Species Hypotheses to bridge taxonomic uncertainties. This approach ensures broader coverage of described species without compromising the integrity of core classifications. Looking ahead, the release paves the way for integrating (eDNA) sequences to encompass undescribed diversity, aiming to progressively close gaps in representation through ongoing community feedback and source expansion.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.