Recent from talks
Nothing was collected or created yet.
Catalogue of Life
View on Wikipedia
The Catalogue of Life (CoL) is an online database that provides an index of known species of animals, plants, fungi, and microorganisms. It was created in 2001 as a partnership between the global Species 2000 and the American Integrated Taxonomic Information System. The Catalogue is used by research scientists, citizen scientists, educators, and policy makers.[1] The Catalogue is also used by the Biodiversity Heritage Library, the Barcode of Life Data System, Encyclopedia of Life, and the Global Biodiversity Information Facility.[2] The Catalogue currently compiles data from 165 peer-reviewed taxonomic databases that are maintained by specialist institutions around the world. As of September 2022,[update] the COL Checklist lists 2,067,951[3] of the world's 2.2m extant species known to taxonomists on the planet at present time.
Key Information
Structure
[edit]The Catalogue of Life employs a simple data structure to provide information on synonymy, grouping within a taxonomic hierarchy, common names, distribution and ecological environment.[4] It provides a dynamic edition,[5] which is updated monthly (and in which data can change without tracking of those changes) and an Annual Checklist,[6] which provides a dated, verifiable reference for the usage of names and associated data. Development of the Catalogue of Life was funded through the Species 2000 europa (EuroCat),[7] 4d4Life,[8] i4Life[9] projects in 2003–2013, and later by the Naturalis Biodiversity Center, Leiden, the Netherlands and Species Files group at Illinois Natural History Survey in Champaign-Urbana.
Current people governing the CoL,[10] contributors,[11] and other relevant information which changes over time, are listed on the CoL website.
Usage
[edit]Much of the use of the Catalogue is to provide a backbone taxonomy for other global data portals and biological collections. Through the i4Life project, it has formal partnerships with Global Biodiversity Information Facility, European Nucleotide Archive, Encyclopedia of Life, European Consortium for the Barcode of Life, IUCN Red List, and Life Watch. The public interface includes both search and browse functions as well as offering multi-lingual services.[2]
The Catalogue listed 300,000 species by 2003, 500,000 species by 2005, and over 800,000 species by 2006.[12] As of 2019[update], the Catalogue listed 1.9 million extant and extinct species.[13] There are an estimated 14 million mainly unpublished species; however, this number is uncertain as there is a lack of data on the possible number of undescribed insects, nematodes, bacteria, fungi and many others.[14]
Catalogue of Life Plus
[edit]In 2015, an expert panel presented a consensus hierarchical classification of life[15] which included some sectors not yet represented in the published Catalogue. In the same year, the Catalogue of Life, Barcode of Life Data System, Biodiversity Heritage Library, Encyclopedia of Life, and the Global Biodiversity Information Facility (GBIF) met to consider building a single shared authoritative nomenclature and taxonomic foundation "Catalogue of Life Plus" that could be used to order and connect biodiversity data, including content not yet in CoL but available via other sources, to serve both the users of the present Catalogue and users of extended taxonomic content (such as GBIF) using a common infrastructure. COL+ will develop a clearinghouse covering scientific names across all life, provide a single taxonomic view, and provide an avenue for feedback from content authorities.[2] The CoL is developing in conjunction with the Global Species List Working Group to avoid replication and work towards an authoritative global list of species.
See also
[edit]References
[edit]- ^ Harmon, Joanie (2 December 2016). "Animal, vegetable, data: Exploring the online 'Catalogue of Life'". UCLA News Room. Archived from the original on 24 June 2018. Retrieved 23 June 2018.
- ^ a b c Bánki, Olaf; Döring, Markus; Holleman, Ayco; Addink, Wouter (2018). "Catalogue of Life Plus: Innovating the CoL systems as a foundation for a clearinghouse for names and taxonomy". Biodiversity Information Science and Standards. 2 e26922. doi:10.3897/biss.2.26922.
- ^ "COL". Archived from the original on 14 June 2021. Retrieved 12 June 2021.
- ^ "About the Catalogue of Life: 2018 Annual Checklist". Catalogue of Life. Species 2000. Archived from the original on 30 May 2018. Retrieved 10 May 2018.
- ^ "Catalogue of Life – 30th October 2017: Search all names". catalogueoflife.org. Archived from the original on 16 May 2015. Retrieved 5 May 2015.
- ^ "Catalogue of Life – 2017 Annual Checklist: Search all names". catalogueoflife.org. Archived from the original on 27 April 2007. Retrieved 11 April 2007.
- ^ "Welcome to Species 2000 europa". European Catalogue of Life Project. 3 February 2008. Archived from the original on 3 February 2008.
- ^ Orvill. "Home – 4D4Life". Archived from the original on 13 October 2016.
- ^ "i4life: Indexing For Life". i4life.eu. Archived from the original on 13 June 2013. Retrieved 4 October 2012.
- ^ "COL Governance". CoL. 18 July 2023. Updated as required.
- ^ "The COL contributors". CoL. July 2023. Updated as required.
- ^ Cachuela-Palacio, Monalisa (2006). "Towards an index of all known species: The Catalogue of Life, its rationale, design and use". Integrative Zoology. 1 (1): 18–21. doi:10.1111/j.1749-4877.2006.00007.x. PMID 21395986.
- ^ "Species estimates". Catalogue of Life. Species 2000. Archived from the original on 9 February 2019. Retrieved 7 February 2019.
- ^ United Nations Environment Programme (2002). Global Environment Outlook 3: Past, Present and Future Perspectives. EarthScan Publications, London. p 120
- ^ Ruggiero, Michael A; Gordon, Dennis P; Orrell, Thomas M; Bailly, Nicolas; Bourgoin, Thierry; Brusca, Richard C; Cavalier-Smith, Thomas; Guiry, Michael D; Kirk, Paul M (2015). "A Higher Level Classification of All Living Organisms". PLOS ONE. 10 (4) e0119248. Bibcode:2015PLoSO..1019248R. doi:10.1371/journal.pone.0119248. PMC 4418965. PMID 25923521.
Bibliography
[edit]- Blundell, Nigel (8 December 2005). "There's more to life on Earth". Telegraph Online. Retrieved 3 May 2012.
External links
[edit]- Catalogue of Life: historical checklist downloads
- 2022 Annual Checklist, 2,065,448 species – https://doi.org/10.48580/dfq8
- 2021 Annual Checklist, 2,008,947 species – https://doi.org/10.48580/dfq8
- 2019 Annual Checklist, 1,900,983 species (incl. 63,418 extinct species)
- 2018 Annual Checklist, 1,803,488 species (incl. 59,284 extinct species)
- 2017 Annual Checklist, 1,713,852 species (incl. 49,346 extinct species)
- 2016 Annual Checklist, 1,640,969 species
- 2015 Annual Checklist, 1,606,554 species
- 2014 Annual Checklist, 1,578,063 species
- 2013 Annual Checklist, 1,352,112 species
- 2012 Annual Checklist, 1,404,038 species
- 2011 Annual Checklist, 1,347,224 species
- 2010 Annual Checklist, 1,257,735 species
- 2009 Annual Checklist, 1,160,711 species
- 2008 Annual Checklist, 1,105,589 species
- 2007 Annual Checklist, 1,008,965 species
- 2006 Annual Checklist, 884,552 species
- 2005 Annual Checklist, 526,323 species
- A list of contributing databases
Catalogue of Life
View on GrokipediaHistory and Development
Founding and Partnerships
The Catalogue of Life was founded in June 2001 as a collaborative partnership between Species 2000 and the Integrated Taxonomic Information System (ITIS).[4] Species 2000, an international federation of taxonomic databases, had been initiated in 1997 by Frank Bisby at the University of Reading in the United Kingdom, with the aim of coordinating global efforts to index species names uniformly.[4] ITIS, established in 1996 by a U.S. White House subcommittee involving government agencies from the United States, Canada, and Mexico, focused on providing authoritative taxonomic information for North American species and beyond.[5] The primary goal of this partnership was to develop a single, integrated global index of all known species names, thereby addressing the fragmentation and inconsistencies prevalent in existing taxonomic databases worldwide.[4] Frank Bisby, as the key initiator through his leadership of Species 2000, played a pivotal role in envisioning and launching the project, emphasizing collaboration among taxonomists to create a comprehensive, validated resource.[4] Species 2000 contributed its expertise in aggregating international databases, while ITIS provided specialized knowledge on North American taxonomy, forming the foundational backbone for the Catalogue's data integration.[4] Initial operational costs for the Catalogue of Life were covered by the University of Reading and the Naturalis Biodiversity Center in the Netherlands, enabling the early development and coordination efforts without relying solely on external grants.[8] This support underscored the commitment of these institutions to fostering a unified platform for biodiversity documentation from its inception.[8]Key Milestones and Evolution
The Catalogue of Life released its first annual checklist in 2000, with the 2004 edition compiling approximately 323,000 species from multiple taxonomic databases to provide an initial global index of known organisms.[1] This marked the transition from preliminary prototypes to a standardized annual update process, emphasizing integration of peer-reviewed taxonomic data. By 2007, the catalogue had grown to over 1 million species, reflecting rapid expansion through contributions from international partners and the inclusion of additional phyla such as bacteria and fungi.[9] Subsequent years saw continued acceleration in species coverage, reaching about 1.35 million species by the 2013 annual checklist, driven by enhanced data aggregation from over 100 sources and refinements in taxonomic hierarchies.[10] During the 2010s, CoL began deeper integration with the Global Biodiversity Information Facility (GBIF), enabling the linkage of taxonomic names to occurrence records and improving data interoperability for biodiversity research.[7] This period highlighted the catalogue's evolution from a static list to a foundational resource for global biodiversity informatics. A pivotal shift occurred in 2017 with the launch of the Catalogue of Life Plus (CoL+) project, which aimed to develop a dynamic infrastructure for real-time taxonomic updates and broader data incorporation, moving beyond annual static releases to support ongoing expert curation.[11] In 2021, CoL aligned its operations with emerging global principles for species list governance, including transparent decision-making on synonymy and taxonomic revisions, as outlined in collaborative frameworks to foster a unified index of life on Earth.[4] By this time, the catalogue exceeded 2 million accepted species, underscoring its role in addressing taxonomic instability through consensus-based classifications.[5] The 2025 annual release on July 9 incorporated contributions from hundreds of taxonomic experts, including 2,068,366 living species and 152,871 extinct species across all domains of life for a total of 2,221,237 species, with enhanced coverage of microbes.[1] On October 7, 2025, CoL launched the eXtended Release (COLXR), integrating additional nomenclatural and distributional data to align with more than 3.5 billion species occurrence records mediated through GBIF, thereby enhancing scalability for large-scale analyses.[12] Throughout its evolution, CoL has tackled challenges such as frequent taxonomic revisions and synonym resolution by implementing multi-step quality assurance, including expert verification and automated checks for consistency, ensuring reliability amid evolving scientific understanding.[13]Organizational Framework
Governance and Contributors
The Catalogue of Life (COL) is governed by a structured framework established as a partnership between Species 2000 and the Integrated Taxonomic Information System (ITIS), formalized as a Dutch foundation (KVK 86436481) since 2022.[14] This governance is overseen by a Board of Directors responsible for international policy, appointments of key personnel, and strategic direction, currently comprising members such as Dr. Edward DeWalt (Acting Chair, USA), Dr. Olaf Bánki (Managing Director, Acting Secretary & Treasurer, Netherlands), and Prof. W. Alex Gray (UK).[14] An advisory Catalogue of Life Global Team, meeting every 6-9 months, handles taxonomic and IT policies, work program design, and quality control, with 19 members including taxonomic experts like Thomas Pape (Chair, Denmark) from institutions such as the Smithsonian Institution and Naturalis Biodiversity Center.[14] Specialized working groups, limited to 15 members each, focus on areas like taxonomy, information systems, and species list governance, ensuring expert input from global institutions.[14] The contributor network encompasses a vast global community of hundreds of taxonomic specialists, informatics experts, and institutions that maintain and update species checklists.[6] COL integrates data from over 165 peer-reviewed taxonomic databases, with custodians ranging from individual experts to major organizations, covering sectors like Animalia (e.g., via the World Register of Marine Species for marine groups) and Plantae (e.g., World Checklist of Vascular Plants for Fabaceae).[5] Regional editors, such as Camila Plata in Colombia and Diana Hernández in Mexico, coordinate contributions for specific geographic or taxonomic areas, while thousands of taxonomists worldwide provide consensus-based classifications through community-managed checklists hosted on platforms like ChecklistBank.[15] This network emphasizes collaboration with initiatives like the Global Biodiversity Information Facility (GBIF) for infrastructure support and the World Register of Marine Species (WoRMS) for marine taxonomy, fostering a distributed model of expertise.[15][16] COL adheres to an open data policy, licensing its content under the Creative Commons Attribution 4.0 International License unless otherwise specified, which promotes free access, reuse, and sharing while requiring attribution to original contributors.[17] This approach aligns with global biodiversity standards, enabling integration with resources like GBIF and encouraging broad collaboration among taxonomic communities.[18] Funding for COL derives from a mix of institutional support and project grants, with the foundation relying on an international consortium including GBIF, the Illinois Natural History Survey, and the Smithsonian Institution for core operations.[8] Historical grants have included support from the European Commission (approximately €20 million over 29 years through projects like BiCIKL, DiSSCo Prepare, and Synthesys+), the US National Science Foundation (NSF), and the Dutch Ministry of Science and Technology via the Netherlands Biodiversity Information Facility.[19][8] Post-2021, COL has transitioned toward a sustainable model through this consortium and targeted EU-funded initiatives like Biodiversity Meets Data and TETTRIs, focusing on policy-relevant species lists and taxonomic integration, supplemented by contributions from the World Bank/Global Environment Facility.[8][19]Data Integration Processes
The Catalogue of Life (COL) employs a structured aggregation workflow to compile taxonomic data from diverse global, regional, and specialized sources into a unified checklist. This process relies heavily on ChecklistBank, an open-source platform that hosts and standardizes incoming datasets in the COL Data Package (ColDP) format, enabling dynamic assembly through monthly update cycles. Data providers submit checklists, which are then cross-referenced against existing COL content to identify overlaps, synonyms, and taxonomic alignments using name-matching algorithms that account for variations in spelling and authorship. For instance, the workflow incorporates sources such as the Barcode Index Number (BIN) system from the BOLD database to integrate molecular identifiers, supporting phylogenetic refinements since the early 2010s. This phased approach includes preparation of a curated Base Release, followed by the eXtended Release (COL XR), which programmatically enriches the base with additional datasets from over 59,000 checklists.[20][21][22] Quality assurance in COL's integration begins with automated checks to ensure nomenclatural compliance, adhering to standards like the International Code of Zoological Nomenclature (ICZN) for animals and the International Code of Nomenclature for algae, fungi, and plants (ICN) for others. These checks validate name authorship, publication references, and hierarchical consistency, flagging issues such as invalid synonyms or misclassifications for resolution. Expert reviewers from the COL Taxonomy Group and contributing specialists then conduct manual assessments, applying editorial decisions to reconcile discrepancies and maintain taxonomic stability. Community feedback mechanisms, including GitHub issue reporting, further support ongoing refinements, ensuring that integrated data reflects consensus on contentious classifications.[20][21][22] Data interoperability is achieved through adherence to Darwin Core standards, which structure taxonomic information into extensible archives for seamless exchange. Hierarchies from kingdom to subspecies are preserved using unique, persistent identifiers such as Life Science Identifiers (LSIDs), assigned to taxa to enable stable linking across versions and external databases. These LSIDs facilitate the representation of complex relationships, including synonyms and parent-child linkages, without altering source data integrity.[23][24][25] Key challenges in data integration include resolving conflicts arising from divergent classifications across sources, such as differing phylogenetic placements informed by molecular versus morphological evidence. COL addresses these through a consensus-building process, prioritizing expert-vetted decisions and selective incorporation of higher taxonomy where gaps exist, while avoiding over-synchronization that could propagate errors. Since the 2010s, the inclusion of molecular data has enhanced phylogenetic accuracy but introduced complexities in aligning sequence-based clusters with traditional nomenclature, mitigated by linking to external resources like BOLD for verification. This iterative approach ensures a balanced, authoritative index despite source heterogeneity.[20][21][26]Content and Taxonomy
Species Coverage and Classification
The Catalogue of Life (CoL) 2025 release encompasses over 2.2 million accepted species names, providing a comprehensive index of nearly all described species, with partial coverage of prokaryotes.[6] This extensive compilation draws from global taxonomic expertise to provide a near-complete inventory of known biodiversity, focusing primarily on extant taxa while including some extinct species where data is available. The 2025 release added 48,766 new species, a 2% increase, with notable expansions in insects and crustaceans (Animalia), ferns (Plantae), and prokaryotes through LPSN. The database's scope emphasizes multicellular organisms but extends to microbes through integrated sources, highlighting the ongoing challenge of cataloging microbial diversity. The species are primarily distributed across major kingdoms, with the majority in Animalia, followed by Plantae, Fungi, Chromista, and smaller numbers in Bacteria, Archaea, and Protozoa. These figures reflect contributions from specialized databases, such as ITIS for animals and AlgaeBase for Chromista, ensuring a balanced yet incomplete representation of global biodiversity. CoL employs a hierarchical Linnaean classification system, organizing taxa from domain to infraspecific ranks, including accepted names, synonyms, and common names in multiple languages where available. This structure facilitates precise identification and phylogenetic mapping, with infraspecific taxa (e.g., subspecies) incorporated for groups like vertebrates and flowering plants. The system integrates peer-reviewed taxonomic revisions to maintain nomenclatural stability under the International Code of Zoological Nomenclature and International Code of Nomenclature for algae, fungi, and plants. Despite its breadth, CoL exhibits notable gaps, with strongest coverage in vertebrates and vascular plants, where completeness approaches 95% for described species. In contrast, invertebrates—particularly nematodes and arthropods beyond major orders—and microbial groups like Bacteria and Archaea remain underrepresented, comprising less than 10% of estimated totals due to taxonomic challenges and limited expert input. Ongoing efforts prioritize these areas through partnerships with initiatives like WoRMS for marine invertebrates and LPSN for prokaryotes, aiming to enhance completeness in underrepresented phyla.[6]Annual Releases and Quality Control
The Catalogue of Life produces two primary types of releases to balance comprehensiveness with data reliability: the Base Release and the eXtended Release. The Base Release is an annual, expert-curated compilation of non-overlapping taxonomic checklists, emphasizing high accuracy through verification by taxonomic specialists, though it may contain gaps in coverage. In contrast, the eXtended Release builds upon the Base Release by incorporating dynamic, unverified data from over 60,000 partner sources, including global, regional, and thematic databases, to enhance completeness while adding elements like molecular identifiers and vernacular names.[3] Annual Base Releases follow a consistent schedule, with the 2025 edition published on July 9, incorporating monthly updates accumulated throughout the year. Each release is assigned a persistent DOI for citability and long-term access, such as doi:10.48580/dgr6n for the 2025 version, enabling researchers to reference specific iterations reliably. Historical releases dating back to 2000 (excluding 2001 and 2020) are archived in ChecklistBank, supporting reproducibility and temporal analysis of taxonomic data. The eXtended Release complements this by providing ongoing monthly updates, available for at least one year before integration into the next annual Base Release.[6][3] Quality control in the Catalogue of Life employs a multi-step process to maintain data integrity, beginning at the data origin where raw contributions are converted to the standardized ColDP format, identifying issues like structural inconsistencies or formatting errors for correction by providers. During assembly in ChecklistBank, automated checks validate identifiers, detect duplicates, misclassifications, and incomplete taxa, with monthly comparisons flagging emerging problems for iterative refinement. Peer review by taxonomic editors and community experts is central to the Base Release, ensuring expert-vetted classifications, while the eXtended Release undergoes initial programmatic checks on additional sources, allowing editors to block erroneous data and solicit community input for rapid improvements. This tiered approach results in stricter controls for the Base Release's high accuracy and more flexible, completeness-focused verification for the eXtended Release, fostering ongoing enhancements through contributor pipelines and metadata reviews.[13][3]Features and Accessibility
Database Structure and Search Tools
The Catalogue of Life (COL) employs a robust technical architecture centered on ChecklistBank, an open-source backend infrastructure co-developed by COL and the Global Biodiversity Information Facility (GBIF). ChecklistBank serves as a repository and index for taxonomic checklists, enabling the standardization, publication, analysis, and curation of biodiversity datasets regardless of their original formats. It processes contributions in structured formats like the Catalogue of Life Data Package (ColDP) and supports the assembly of COL's annual releases by integrating global species databases (GSDs) into a unified structure. This backend ensures data consistency through features like dataset versioning, provenance tracking, and quality assessments, handling over 2 million species entries with scalability for large-scale taxonomic operations.[27][28][29] The frontend is accessible via the COL web portal at catalogueoflife.org, which provides an intuitive interface for users to interact with the database. Programmatic access is facilitated through the ChecklistBank API, featuring RESTful endpoints for tasks such as taxon name lookups, dataset retrieval, and metadata queries—for instance, endpoints like/name/usage/{id} allow retrieval of taxonomic usages. This API supports open access to COL's content, with optional authentication for advanced features like custom downloads. The architecture's modularity allows seamless integration of partner datasets, ensuring the portal reflects the latest verified taxonomy while linking to external resources.[2][30][31]
Search tools within the COL portal enable advanced querying by scientific name, common name, or higher taxon levels, with filters for kingdoms (e.g., Animalia, Plantae) and geographic regions to refine results. Users can perform fuzzy matching for variant spellings or synonyms, and results include hierarchical navigation through taxonomic ranks. Export functionalities support formats such as CSV for tabular data, RDF for semantic web applications, and full dataset downloads via DOI-linked releases, facilitating integration into external workflows. These tools prioritize usability for both casual explorers and experts, with recent enhancements as of the 2025 release improving query speed for large result sets.[32]
Taxon pages form a core feature, presenting detailed profiles for each entry with sections on geographic distribution (often mapped via partner data), bibliographic references, and multimedia like images sourced from collaborators such as Wikimedia Commons or GBIF. These pages link to primary sources for verification and include identifiers for cross-referencing with other biodiversity databases. Mobile accessibility is enhanced through integration with the iNaturalist app, allowing users to query COL taxa directly during field observations via API calls. This feature set supports efficient navigation and data reuse without requiring deep technical expertise.[2]
Technically, the system accommodates SPARQL queries through compatible endpoints linked to GBIF's infrastructure, enabling complex semantic searches across COL's RDF exports for advanced users. Scalability is demonstrated by its management of approximately 5.4 million taxonomic names and synonyms, with ChecklistBank's cloud-based deployment handling high query volumes from global users. Data quality in searches draws from expert-verified base releases, though extended data may vary in completeness.[7][28][32]
