Hubbry Logo
EntrezEntrezMain
Open search
Entrez
Community hub
Entrez
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Entrez
Entrez
from Wikipedia
The Entrez logo

The Entrez (/ɒnˈtr/)[1] Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website.[2] The NCBI is a part of the National Library of Medicine (NLM), which is itself a department of the National Institutes of Health (NIH), which in turn is a part of the United States Department of Health and Human Services. The name "Entrez" (a greeting meaning "Come in" in French) was chosen to reflect the spirit of welcoming the public to search the content available from the NLM.

Entrez Global Query is an integrated search and retrieval system that provides access to all databases simultaneously with a single query string and user interface. Entrez can efficiently retrieve related sequences, structures, and references. The Entrez system can provide views of gene and protein sequences and chromosome maps. Some textbooks are also available online through the Entrez system.

Features

[edit]

The Entrez front page provides, by default, access to the global query. All databases indexed by Entrez can be searched via a single query string, supporting Boolean operators and search term tags to limit parts of the search statement to particular fields. This returns a unified results page, that shows the number of hits for the search in each of the databases, which are also linked to actual search results for that particular database.

Entrez also provides a similar interface for searching each particular database and for refining search results. The Limits feature allows the user to narrow a search, a web forms interface. The History feature gives a numbered list of recently performed queries. Results of previous queries can be referred to by number and combined via Boolean operators. Search results can be saved temporarily in a Clipboard. Users with a MyNCBI account can save queries indefinitely, and also choose to have updates with new search results e-mailed for saved queries of most databases. It is widely used in the field of biotechnology as a reference tool for students and professionals alike.

Databases

[edit]

Entrez searches the following databases:

  • PubMed: biomedical literature citations and abstracts, including Medline—articles from (mainly medical) journals, often including abstracts. Links to PubMed Central and other full-text resources are provided for articles from the 1990s.
  • PubMed Central: free, full-text journal articles
  • Site Search: NCBI web and FTP web sites
  • Books: online books
  • Online Mendelian Inheritance in Man (OMIM)
  • Nucleotide: sequence database (GenBank)
  • Protein: sequence database (GenPept)
  • Genome: whole genome sequences and mapping
  • Structure: three-dimensional macromolecular structures
  • Taxonomy: organisms in GenBank Taxonomy
  • dbSNP: single nucleotide polymorphism
  • Gene:[3] gene-centered information
  • HomoloGene: eukaryotic homology groups
  • PubChem Compound: unique small molecule chemical structures
  • PubChem Substance: deposited chemical substance records
  • Genome Project: genome project information
  • UniGene: gene-oriented clusters of transcript sequences
  • CDD: conserved protein domain database
  • PopSet: population study data sets (epidemiology)
  • GEO Profiles: expression and molecular abundance profiles
  • GEO DataSets: experimental sets of GEO data
  • Sequence read archive: high-throughput sequencing data
  • Cancer Chromosomes: cytogenetic databases
  • PubChem BioAssay: bioactivity screens of chemical substances
  • Probe: sequence-specific reagents
  • NLM Catalog: NLM bibliographic data for over 1.2 million journals, books, audiovisuals, computer software, electronic resources, and other materials resident in LocatorPlus (updated every weekday).

Access

[edit]

In addition to using the search engine forms to query the data in Entrez, NCBI provides the Entrez Programming Utilities[4] (eUtils) for more direct access to query results. The eUtils are accessed by posting specially formed URLs to the NCBI server, and parsing the XML response. There was also an eUtils SOAP interface which was terminated in July 2015.[5]

History

[edit]

In 1991, Entrez was introduced in CD form. In 1993, a client-server version of the software provided connectivity with the internet. In 1994, NCBI established a website, and Entrez was a part of this initial release. In 2001, Entrez bookshelf was released and in 2003, the Entrez Gene database was developed.[6]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Entrez is the National Center for Biotechnology Information's (NCBI) primary text-based search and retrieval system, designed to provide integrated access to a vast array of biomedical and life sciences databases, including and protein sequences, information, , and genomic data. Developed initially in 1991 as a CD-ROM-based tool for querying linked databases, Entrez evolved into a web-accessible platform by 1993, enabling users worldwide to perform unified searches across disparate resources such as for , for sequences, and Protein for amino acid sequences. Today, it encompasses over 30 interconnected databases, including BioProject, ClinVar, , OMIM, SNP, and , facilitating cross-references and links that reveal relationships between molecular data, publications, and biological entities. Key features of Entrez include advanced search capabilities with operators, field-specific queries, and facets for refining results, as well as tools like Search History and for managing and combining queries efficiently. Users can access Entrez through the NCBI website, where a single interface allows retrieval of records in formats like or XML, and integration with My NCBI enables personalized collections and alerts. The system's programming utilities, known as E-utilities, further extend its functionality by allowing programmatic access for developers to build custom applications that query and retrieve data programmatically. Since its inception under NCBI—established in 1988 to advance through services—Entrez has become a cornerstone of biomedical research, supporting discoveries in , , and clinical by democratizing access to high-quality, curated data.

Introduction

Purpose and Scope

Entrez serves as the National Center for Biotechnology Information's (NCBI) primary text-based search and retrieval system, designed to integrate diverse biomedical databases for unified querying across literature, molecular sequences, and related resources. Developed by NCBI, a division of the U.S. National Library of Medicine, it enables users to perform cross-database searches that connect disparate data types, such as linking a gene sequence to its associated publications or structural models. The core purpose of Entrez is to facilitate efficient discovery, retrieval, and of biomedical , supporting researchers, clinicians, and educators in navigating complex scientific datasets. By providing a single interface for querying over 30 NCBI databases—including those on DNA and protein sequences, genes, genomes, and genetic variations—it streamlines access to interconnected knowledge without requiring users to switch between isolated tools. This integration addresses the need for cohesive exploration in , where related data often spans multiple domains. Entrez's scope is limited to public-domain biomedical data hosted by NCBI, encompassing molecular, genomic, and resources while excluding proprietary or non-biomedical content. It emphasizes free, to these resources worldwide, with no subscription barriers as of 2025, ensuring broad availability for global scientific use. Historically, Entrez was first released in 1991 to resolve the fragmented access to databases that characterized the pre-1990s era.

Integration with NCBI Resources

Entrez serves as the primary unified interface for accessing and retrieving data from the (NCBI)'s extensive suite of over 30 interconnected databases and tools, enabling users to perform cross-resource searches without needing to navigate multiple standalone platforms. This integration facilitates seamless transitions from Entrez search results to specialized NCBI tools, such as BLAST for sequence similarity searches, Primer-BLAST for designing PCR primers against specific templates, and ClinVar for exploring clinically relevant genetic variants. By linking query outputs directly to these resources, Entrez supports efficient workflows for researchers, clinicians, and educators engaging with biomedical data. A key example of this cross-integration is how Entrez queries can feed into visualization and analysis platforms like the Genome Data Viewer and the NCBI Datasets resource, which resulted from the June 2024 merger of the legacy Entrez Genome and Assembly websites to provide streamlined access to genome assemblies and related metadata. Users can initiate a search in Entrez for a or , then transition to Datasets for downloading complete datasets or to the Genome Data Viewer for interactive browsing of chromosomal contexts, annotations, and alignments. This interconnected approach ensures that data from sources like the Sequence Read Archive (SRA), which alone exceeds 47 petabytes, is accessible through a single entry point. The benefits of Entrez's integration extend to providing "one-stop" access to NCBI's vast repository, encompassing 4.6 billion across 31 knowledgebases as of August 2024, while handling the underlying indexing and retrieval processes to simplify use for non-specialized users. Entrez employs controlled vocabularies and ontologies, notably (MeSH) for literature indexing in and the NCBI Taxonomy for organism classification, to enable standardized, precise querying across disparate resources. These ontologies promote consistent data linkage and discovery, reducing ambiguity in searches involving biomedical terms or evolutionary relationships.

Supported Databases

Literature and Biomedical Databases

Entrez provides access to several key databases focused on biomedical literature and publications, enabling researchers to search, retrieve, and analyze citations, abstracts, and full-text content. These resources form the backbone of literature-based inquiries in the , supporting evidence-based research and knowledge synthesis. serves as the primary literature database within Entrez, containing more than 39 million citations and abstracts of biomedical literature sourced from , life science journals, and online books. It includes links to full-text articles where available and employs (MeSH) for precise indexing and retrieval, facilitating targeted searches across diverse topics in and . 's coverage extends to journals from the 1940s onward, with comprehensive indexing beginning in 1966 and retrospective inclusion of earlier citations through OLDMEDLINE for pre-1966 literature. PubMed Central (PMC) functions as an open-access subset of , offering free full-text access to a growing archive of biomedical and life sciences journal articles deposited by publishers and authors. As of 2025, PMC supports compliance with the 2024 NIH Public Access Policy, which mandates public access to NIH-funded research outputs no later than 12 months after publication, effective July 1, 2025, thereby enhancing the dissemination of peer-reviewed content. The NCBI Bookshelf complements these resources by providing free online access to full-text books, reports, and documents in the biomedical, life sciences, , and fields. Integrated into Entrez, Bookshelf enables contextual reading alongside journal literature, with searchable content from more than 13,000 titles that include authoritative textbooks, technical reports, and educational materials to support in-depth study and reference. These databases collectively allow Entrez users to perform unified searches across literature holdings, linking citations to related biomedical data for holistic research exploration. Additional resources in this category include the Online Mendelian Inheritance in Man (OMIM) database, which catalogs genes and genetic phenotypes associated with inherited diseases.

Molecular Sequence and Gene Databases

The Nucleotide database in Entrez serves as a comprehensive repository for DNA and RNA sequences, primarily through its integration with GenBank, the annotated collection of publicly available nucleotide sequences submitted by researchers worldwide. GenBank, established in 1982, contains over 5.9 billion records encompassing 47.01 trillion bases as of release 268.0 in August 2025, covering sequences from viruses, prokaryotes, eukaryotes, and organelles. Each record includes detailed annotations such as gene names, protein products, biological source, and literature references, facilitating functional analysis and comparative genomics. Submission to GenBank follows standardized guidelines outlined by the International Nucleotide Sequence Database Collaboration (INSDC), ensuring data quality through validation tools like the Submission Portal and BankIt, which support formats including FASTA and feature annotations for exons, introns, and regulatory elements. The Protein database in Entrez provides a centralized collection of sequences derived mainly from the conceptual translations of coding regions in nucleotide records, augmented by curated entries from sources like , Swiss-Prot, and PDB. This database enables researchers to perform sequence alignments, homology searches, and functional predictions using integrated tools such as BLAST for identifying similar proteins across . With a focus on non-redundant representations where possible, it supports applications in , evolutionary studies, and by linking sequences to experimental data like enzymatic activities and post-translational modifications. Entrez Protein emphasizes practical utilities, including viewers and prediction algorithms for secondary structure and domains, enhancing its role in workflows. Entrez Gene offers a gene-centered view of genomic , aggregating curated records from and other sources to provide summaries of function, location, expression patterns, and interactions for organisms ranging from to humans. Each record includes details on orthologs across species, genetic s, pathways, and expression data from sources like GEO, with over 50 million loci documented as of 2025. In 2025, NCBI introduced redesigned pages through the Datasets tool, featuring an intuitive interface for downloading sequences, annotations, and metadata in formats like or TSV, improving accessibility for bulk and visualization of models. This update integrates from dbSNP, allowing users to explore SNPs, indels, and their clinical implications directly within contexts. Key to navigating these databases are Entrez's support for standard data formats and identification systems, such as FASTA for sequence retrieval and display, which simplifies importing data into analysis software like sequence aligners or phylogenetic tools. Accession numbers serve as stable identifiers, with the legacy GI (GenInfo Identifier) system supplemented by unique IDs (UIDs) for versioning and tracking updates, ensuring traceability in publications and databases. Furthermore, Gene records link seamlessly to dbSNP for variant analysis, enabling queries on population frequencies and phenotypic associations without leaving the Entrez environment. The dbSNP database itself catalogs single nucleotide polymorphisms (SNPs), insertions, deletions, and other variants, supporting genetic association studies and population genetics. These features, combined with cross-links to PubMed for relevant literature, underscore Entrez's utility in integrating molecular sequence data for comprehensive biological research.

Taxonomy and Structural Databases

The Entrez Taxonomy database provides a curated hierarchical classification and nomenclature system for organisms represented in public sequence databases, encompassing over 2.7 million taxonomic nodes as of 2025. This includes detailed lineage information tracing evolutionary relationships from domains to species, facilitating phylogenetic analysis through an interactive taxonomy browser that displays the tree structure and links to related genomic data. The database covers a broad spectrum of life forms, with approximately 595,000 nodes for bacteria, 15,000 for archaea, 1.8 million for eukaryotes (including major subgroups like metazoa, fungi, and viridiplantae), and 273,000 for viruses, enabling researchers to explore organismal diversity in evolutionary and structural biology contexts. Recent enhancements, including 2024 updates to prokaryotic classifications and integration with metagenomic data, support taxonomic assignment for uncultured microbial communities by incorporating environmental sequencing projects into the hierarchy. These updates align with the International Committee on Taxonomy of Viruses (ICTV) and other standards, improving resolution for viral and bacterial phylogenies. Additionally, the BioProject database within Entrez offers metadata on sequencing initiatives, such as project scope, organism associations, and assembly details, which link directly to taxonomy entries to contextualize large-scale genomic efforts without delving into raw sequence data from sources like GenBank. Following the 2024 merger of Entrez Genome and Assembly resources into NCBI Datasets, taxonomy records now provide streamlined access to genome assemblies, enhancing links between organism classifications and structural assemblies for viral, bacterial, and eukaryotic entries. The Entrez Structure database, centered on the Molecular Modeling Database (MMDB), archives three-dimensional molecular structures derived from the Protein Data Bank (PDB), focusing on proteins, nucleic acids, and complexes to support studies in structural biology and evolution. As of March 2025, MMDB contains over 233,000 structure records, each enhanced with annotations like chemical graphs, secondary structure assignments, and cross-references to sequence data for functional inference. These models enable visualization of evolutionary conservation through domain alignments and superposition tools. Integrated with the Cn3D viewer, users can interactively explore 3D structures alongside phylogenetic lineages from Taxonomy, highlighting structural motifs across related organisms without requiring separate software. Other notable databases in the taxonomy and structural categories include ClinVar, which aggregates information about genomic variations and their relationship to human health.

Core Features

Search and Query Capabilities

Entrez supports a range of search mechanisms designed to facilitate precise retrieval from its integrated databases. Users can construct queries using operators such as AND, OR, and NOT, which must be entered in uppercase to ensure proper processing. These operators allow for complex combinations, evaluated from left to right unless parentheses are used to group terms, as in the example "g1p3 AND ( OR promoter)". Field-specific searches enhance targeting by restricting terms to particular data elements, using square bracket notation like [field]. For instance, in , [tiab] limits searches to titles and abstracts, while [au] specifies authors and [organism] denotes species. Advanced filters further refine queries, including date ranges (e.g., "2015/3/1:2016/4/30[Publication Date]") and terms (e.g., "neoplasms[MeSH Terms]"), enabling users to narrow results by publication date, organism, or other indexed attributes. The Global Query feature provides a unified , allowing a single search string to span all Entrez simultaneously via the NCBI homepage. This returns ranked results across , ordered by scoring based on term frequency and proximity, with options to filter by database type for focused exploration. Search History maintains a record of recent queries for up to eight hours of inactivity, permitting users to revisit, combine, or modify them through the Advanced Search interface. Complementing this, the temporarily stores up to 500 search results per database, facilitating temporary holding before further actions. Results from either can be exported via the "Send to" menu in formats such as XML or CSV, depending on the database, for offline analysis or integration with other tools.

Linking and Cross-Database Navigation

Entrez employs a sophisticated system of hyperlinks known as to facilitate navigation between related within and across its integrated databases, enabling users to discover contextual connections without reformulating searches. These are categorized into two primary types: hard links and neighbor links. Hard links are direct, predefined connections derived from the inherent data relationships in , such as a article linking to the entry it cites or a Protein sequence record connecting to its corresponding three-dimensional in the Structure database. Neighbor links, in contrast, are computationally generated associations that identify similarities or co-occurrences, such as linking a sequence to its taxonomic lineage in the database or suggesting related articles in based on shared content. This dual approach allows for both explicit and inferred navigation, enhancing the discovery of biological relationships. A key feature of Entrez's cross-database navigation is the use of neighbor links to generate related searches, which provide suggestions based on patterns like co-citation or sequence similarity. For instance, searching for a specific gene in the Gene database may yield neighbor links to homologous sequences in the Nucleotide or Protein databases, derived from alignment algorithms that detect evolutionary relationships. These suggestions appear as facets or sidebar options in search results, allowing users to pivot seamlessly to pertinent data in other databases, such as from a literature abstract to associated genomic variants in dbSNP. By prioritizing these automated connections, Entrez supports exploratory analysis, where users can trace pathways from molecular data to functional annotations without manual intervention. Entrez's linking system incorporates unique concepts like Related Structures and NCBI Orthologs to represent complex biological networks. Related Structures uses the Vector Alignment Search Tool () to compute neighbor links between protein structures based on three-dimensional similarity, enabling navigation from one structure record to others with analogous folds or functions, such as linking a query to evolutionarily conserved homologs. Similarly, NCBI Orthologs aggregates orthologous genes across species through automated detection, providing links from a record to 1:1 orthologs in over 100 species, which aids in . These features rely on underlying indexing that groups records by shared attributes, forming conceptual graphs of relatedness. For efficient large-scale navigation, Entrez supports batch linking via Unique Identifiers (UIDs), allowing programmatic retrieval of connections for multiple records simultaneously through tools like the E-utilities' elink function. This capability is particularly useful for workflows involving high-throughput data, where users can fetch neighbor links for an entire set of articles to their cited genes or proteins in one operation. Overall, this infrastructure transforms static database entries into a dynamic, interconnected , promoting interdisciplinary insights in biomedical research.

Access and Usage

Web-Based Interface

The Entrez web-based interface provides an intuitive browser-accessible for users to search and retrieve data from NCBI's interconnected databases, centered around a prominent search bar located at the top of the NCBI homepage. This search bar allows users to enter queries using terms, phrases, operators (such as AND, OR, and NOT), wildcards, and field-specific restrictions, with a pull-down menu for selecting from over 30 supported databases. Below the search bar, options for advanced search link to a dedicated builder tool that enables constructing complex queries via indexed fields and maintains a search history for iterative refinement. The overall layout emphasizes simplicity and accessibility, including skip-to-content links and access keys for keyboard navigation, ensuring compliance with web standards for users with disabilities. Upon submitting a search, results appear in a paginated summary view, displaying 20 records per page by default, with adjustable settings via a "Display Options" menu to show 10, 50, 100, or 200 items. The left-hand sidebar features facets for filtering results by attributes like publication date, organism, or availability of full text, allowing users to narrow large result sets efficiently. Individual records can be expanded to full views tailored to the database, revealing detailed metadata, abstracts, or sequences, while a "Send To" dropdown facilitates exporting selections to formats such as CSV, XML, or direct integration with tools like citation managers. controls at the bottom of result pages enable through thousands of hits, supporting workflows from broad discovery to targeted retrieval. To aid users, the interface integrates comprehensive help resources, including inline tooltips, a searchable help manual with tutorials on query syntax and , and guided examples for common tasks. Integration with NCBI Accounts via My NCBI allows registered users to save searches, set up alerts for new results, and store collections of records for later access, addressing limitations in anonymous sessions by persisting preferences across devices. The interface incorporates a responsive, mobile-first that adapts to various screen sizes, enhancing on tablets and smartphones without requiring separate apps. As of 2025, these features reflect ongoing refinements to streamline the , with no major redesign implemented.

Programmatic and API Access

Entrez offers programmatic access primarily through the Entrez Programming Utilities (E-utilities), a suite of eight server-side programs that provide a stable interface for querying and retrieving data from its interconnected databases. These utilities enable developers to perform operations such as searching, fetching records, summarizing data, and linking across databases, supporting output formats including XML and, for select utilities, JSON. Key examples include ESearch, which retrieves unique identifiers (UIDs) matching a query term, and EFetch, which downloads full records based on those UIDs. To prevent server overload, NCBI imposes rate limits on E-utilities requests: three per second without an and ten per second with a registered key obtained via an NCBI account. Developers must adhere to these guidelines, which also recommend batching large jobs by using the WebEnv parameter and server to store intermediate search results as temporary sessions, allowing subsequent utilities to reference and process them efficiently without repeated full queries. For instance, a might involve EPost to a large list of UIDs into a history session, followed by EFetch in batches to retrieve records while respecting limits. Several programming libraries and tools simplify integration with E-utilities. In Python, Biopython's Bio.Entrez module wraps the utilities, offering functions like esearch() and efetch() that handle construction, XML parsing, and automatic rate limiting. For users, the rentrez package provides similar functionality, including entrez_search() for querying and entrez_fetch() for data retrieval, with built-in support for keys and output. On systems, Entrez Direct (EDirect) enables command-line scripting through executables like esearch and efetch, facilitating pipeline automation and integration with tools such as or for data processing; EDirect was updated to version 24.2 on June 20, 2025, with refactored archive paths. While the web-based interface serves manual exploration, E-utilities and associated libraries are designed for scripted, high-volume access in research workflows, ensuring compliance with NCBI's policies on data usage and attribution. The E-utilities documentation was last updated on March 25, 2025.

History and Evolution

Origins and Initial Development

The (NCBI) was established in 1988 as part of the National Library of Medicine (NLM) under the U.S. , with a mandate to develop information systems for and biotechnology data. This creation responded to the burgeoning volume of genetic sequence data and the need for centralized management, particularly as the was initiated in 1990, accelerating the exponential growth of biological information. Prior to widespread , bioinformatics resources were fragmented across disparate databases, complicating retrieval and cross-referencing for researchers. To address this fragmentation, NCBI staff, led by Jim Ostell, conceived and developed Entrez as an integrated search and retrieval system. The system was designed to link related records across multiple using neighbor relationships based on sequence similarity, , and citations, enabling unified access in a pre-web era dominated by local computing resources. Development emphasized text-based querying to facilitate discovery of connections among sequences, structures, and publications, reflecting NCBI's early focus on building integrated approaches to biological data exploration. Entrez was first launched in 1991 as a CD-ROM-based product, providing linked access primarily to nucleotide sequences from , protein sequences from sources including PIR, SwissProt, and PRF, structural data from PDB, and related citations from . This initial release marked a significant advancement by allowing users to navigate interconnected information without manual cross-database searches, directly tackling the challenges of data silos in early bioinformatics. A pivotal early milestone came in 1993 with the release of Network Entrez, a client-server web version that extended the system's reach beyond and initially integrated five core databases: , , Protein, Structure (PDB), and . This transition to accessibility laid the groundwork for broader adoption, aligning with the rapid influx of sequence data driven by genomic initiatives.

Key Milestones and Recent Updates

In the early 2000s, Entrez saw significant expansions that enhanced its utility for research. The introduction of Entrez Gene in 2003 marked a major advancement, providing a centralized repository for gene-specific information by integrating data from sequences, maps, and expression profiles, building on the foundational LocusLink system established in 2000. Shortly thereafter, the Entrez Global Query system launched in 2003, enabling users to perform simultaneous searches across all Entrez databases with a single query term, thereby improving cross-database navigation and discovery. The E-utilities API followed in 2004, offering a stable programmatic interface for automated querying, retrieval, and linking of data from Entrez resources, which facilitated integration into third-party software and workflows. During the , Entrez evolved to support broader genomic analyses and user accessibility. In the , Entrez underwent modernizations to streamline data access and align with contemporary research needs. A key update in June 2024 involved merging the legacy and Assembly databases into NCBI Datasets, providing a unified platform for downloading genome sequences, annotations, and metadata while maintaining Entrez compatibility for queries. In January 2025, the PopSet database was retired, with its data for population study sets of aligned sequences made accessible through alternative Entrez resources like and Protein. By 2025, Entrez had grown to encompass more than 40 databases, reflecting ongoing expansions in scope and interoperability. Recent enhancements ensured compliance with policies such as NIH Public Access requirements for .

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.