Recent from talks
Nothing was collected or created yet.
Named-entity recognition
View on WikipediaNamed-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names (PER), organizations (ORG), locations (LOC), geopolitical entities (GPE), vehicles (VEH), medical codes, time expressions, quantities, monetary values, percentages, etc.
Most research on NER/NEE systems has been structured as taking an unannotated block of text, such as transducing:
Jim bought 300 shares of Acme Corp. in 2006.
into an annotated block of text that highlights the names of entities:
[Jim]Person bought 300 shares of [Acme Corp.]Organization in [2006]Time.
In this example, a person name consisting of one token, a two-token company name and a temporal expression have been detected and classified.
Problem
[edit]Definition
[edit]In the expression named entity, the word named restricts the task to those entities for which one or many strings, such as words or phrases, stand (fairly) consistently for some referent. This is closely related to rigid designators, as defined by Saul Kripke,[1][2] although in practice NER deals with many names and referents that are not philosophically "rigid". For instance, the automotive company created by Henry Ford in 1903 can be referred to as Ford or Ford Motor Company, although "Ford" can refer to many other entities as well (see Ford). Rigid designators include proper names as well as terms for certain biological species and substances,[3] but exclude pronouns (such as "it"; see coreference resolution), descriptions that pick out a referent by its properties (see also De dicto and de re), and names for kinds of things as opposed to individuals (for example "Bank").
Full named-entity recognition is often broken down, conceptually and possibly also in implementations,[4] as two distinct problems: detection of names, and classification of the names by the type of entity they refer to (e.g. person, organization, or location).[5] The first phase is typically simplified to a segmentation problem: names are defined to be contiguous spans of tokens, with no nesting, so that "Bank of America" is a single name, disregarding the fact that inside this name, the substring "America" is itself a name. This segmentation problem is formally similar to chunking. The second phase requires choosing an ontology by which to organize categories of things.
Temporal expressions and some numerical expressions (e.g., money, percentages, etc.) may also be considered as named entities in the context of the NER task. While some instances of these types are good examples of rigid designators (e.g., the year 2001) there are also many invalid ones (e.g., I take my vacations in “June”). In the first case, the year 2001 refers to the 2001st year of the Gregorian calendar. In the second case, the month June may refer to the month of an undefined year (past June, next June, every June, etc.). It is arguable that the definition of named entity is loosened in such cases for practical reasons. The definition of the term named entity is therefore not strict and often has to be explained in the context in which it is used.[6]
Certain hierarchies of named entity types have been proposed in the literature. BBN categories, proposed in 2002, are used for question answering and consists of 29 types and 64 subtypes.[7] Sekine's extended hierarchy, proposed in 2002, is made of 200 subtypes.[8] More recently, in 2011 Ritter used a hierarchy based on common Freebase entity types in ground-breaking experiments on NER over social media text.[9]
Difficulties
[edit]NER involves ambiguities. The same name can refer to different entities of the same type. For example, "JFK" can refer to the former president or his son. This is basically a reference resolution problem.
The same name can refer to completely different types. "JFK" might refer to the airport in New York. "IRA" can refer to Individual Retirement Account or International Reading Association.
This can be caused by metonymy. For example, "The White House" can refer to an organization instead of a location.
Formal evaluation
[edit]To evaluate the quality of an NER system's output, several measures have been defined. The usual measures are called precision, recall, and F1 score. However, several issues remain in just how to calculate those values.
These statistical measures work reasonably well for the obvious cases of finding or missing a real entity exactly; and for finding a non-entity. However, NER can fail in many other ways, many of which are arguably "partially correct", and should not be counted as complete success or failures. For example, identifying a real entity, but:
- with fewer tokens than desired (for example, missing the last token of "John Smith, M.D.")
- with more tokens than desired (for example, including the first word of "The University of MD")
- partitioning adjacent entities differently (for example, treating "Smith, Jones Robinson" as 2 vs. 3 entities)
- assigning it a completely wrong type (for example, calling a personal name an organization)
- assigning it a related but inexact type (for example, "substance" vs. "drug", or "school" vs. "organization")
- correctly identifying an entity, when what the user wanted was a smaller- or larger-scope entity (for example, identifying "James Madison" as a personal name, when it's part of "James Madison University"). Some NER systems impose the restriction that entities may never overlap or nest, which means that in some cases one must make arbitrary or task-specific choices.
One overly simple method of measuring accuracy is merely to count what fraction of all tokens in the text were correctly or incorrectly identified as part of entity references (or as being entities of the correct type). This suffers from at least two problems: first, the vast majority of tokens in real-world text are not part of entity names, so the baseline accuracy (always predict "not an entity") is extravagantly high, typically >90%; and second, mispredicting the full span of an entity name is not properly penalized (finding only a person's first name when his last name follows might be scored as ½ accuracy).
In academic conferences such as CoNLL, a variant of the F1 score has been defined as follows:[5]
- Precision is the number of predicted entity name spans that line up exactly with spans in the gold standard evaluation data. I.e. when [Person Hans] [Person Blick] is predicted but [Person Hans Blick] was required, precision for the predicted name is zero. Precision is then averaged over all predicted entity names.
- Recall is similarly the number of names in the gold standard that appear at exactly the same location in the predictions.
- F1 score is the harmonic mean of these two.
It follows from the above definition that any prediction that misses a single token, includes a spurious token, or has the wrong class, is a hard error and does not contribute positively to either precision or recall. Thus, this measure may be said to be pessimistic: it can be the case that many "errors" are close to correct, and might be adequate for a given purpose. For example, one system might always omit titles such as "Ms." or "Ph.D.", but be compared to a system or ground-truth data that expects titles to be included. In that case, every such name is treated as an error. Because of such issues, it is important actually to examine the kinds of errors, and decide how important they are given one's goals and requirements.
Evaluation models based on a token-by-token matching have been proposed.[10] Such models may be given partial credit for overlapping matches (such as using the Intersection over Union criterion). They allow a finer grained evaluation and comparison of extraction systems.
Approaches
[edit]NER systems have been created that use linguistic grammar-based techniques as well as statistical models such as machine learning. State of the art systems may incorporate multiple approaches.
- GATE supports NER across many languages and domains out of the box, usable via a graphical interface and a Java API.
- OpenNLP includes rule-based and statistical named-entity recognition.
- spaCy features fast statistical NER as well as an open-source named-entity visualizer.
Hand-crafted grammar-based systems typically obtain better precision, but at the cost of lower recall and months of work by experienced computational linguists.[11]
Statistical NER systems typically require a large amount of manually annotated training data. Semisupervised approaches have been suggested to avoid part of the annotation effort.[12][13]
In the statistical learning era, NER was usually performed by learning a simple linear regression model on engineered features, then decoded by a bidirectional Viterbi algorithm. Some commonly used features include:[14]
- Lexical items: The token itself be labeled.
- Stemmed lexical items.
- Shape: The orthographic pattern of the target word. For example, all lowercase, all uppercase, initial uppercase, mixed case, uppercase followed by a period (often indicating a middle name), contains hyphen, etc.
- Affixes of the target word and surrounding words.
- Part of speech of the word.
- Whether the word appears in one or more named entity lists (gazetteers).
- Words and/or n-grams occurring in the surrounding context.
A gazetteer is a list of names and their types, such as "General Electric". It can be used to augment any system for NER. They had been often used in the era of statistical machine learning.[15][16]
Many different classifier types have been used to perform machine-learned NER, with conditional random fields being a typical choice.[17] Transformers features token classification using deep learning models.[18]
History
[edit]This section needs to be updated. (July 2021) |
Early work in NER systems in the 1990s was aimed primarily at extraction from journalistic articles. Attention then turned to processing of military dispatches and reports. Later stages of the automatic content extraction (ACE) evaluation also included several types of informal text styles, such as weblogs and text transcripts from conversational telephone speech conversations. Since about 1998, there has been a great deal of interest in entity identification in the molecular biology, bioinformatics, and medical natural language processing communities. The most common entity of interest in that domain has been names of genes and gene products. There has been also considerable interest in the recognition of chemical entities and drugs in the context of the CHEMDNER competition, with 27 teams participating in this task.[19]
In 2001, research indicated that even state-of-the-art NER systems were brittle, meaning that NER systems developed for one domain did not typically perform well on other domains.[20] Considerable effort is involved in tuning NER systems to perform well in a new domain; this is true for both rule-based and trainable statistical systems.
As of 2007, state-of-the-art NER systems for English produce near-human performance. For example, the best system entering MUC-7 scored 93.39% of F-measure while human annotators scored 97.60% and 96.95%.[21][22]
Current challenges
[edit]Despite high F1 numbers reported on the MUC-7 dataset, the problem of named-entity recognition is far from being solved. The main efforts are directed to reducing the annotations labor by employing semi-supervised learning,[12][23] robust performance across domains[24][25] and scaling up to fine-grained entity types.[8][26] In recent years, many projects have turned to crowdsourcing, which is a promising solution to obtain high-quality aggregate human judgments for supervised and semi-supervised machine learning approaches to NER.[27] Another challenging task is devising models to deal with linguistically complex contexts such as Twitter and search queries.[28]
There are some researchers who did some comparisons about the NER performances from different statistical models such as HMM (hidden Markov model), ME (maximum entropy), and CRF (conditional random fields), and feature sets.[29] And some researchers recently proposed graph-based semi-supervised learning model for language specific NER tasks.[30]
A recently emerging task of identifying "important expressions" in text and cross-linking them to Wikipedia[31][32][33] can be seen as an instance of extremely fine-grained named-entity recognition, where the types are the actual Wikipedia pages describing the (potentially ambiguous) concepts. Below is an example output of a Wikification system:
<ENTITY url="https://en.wikipedia.org/wiki/Michael_I._Jordan"> Michael Jordan </ENTITY> is a professor at <ENTITY url="https://en.wikipedia.org/wiki/University_of_California,_Berkeley"> Berkeley </ENTITY>
Another field that has seen progress but remains challenging is the application of NER to Twitter and other microblogs, considered "noisy" due to non-standard orthography, shortness and informality of texts.[34][35] NER challenges in English Tweets have been organized by research communities to compare performances of various approaches, such as bidirectional LSTMs, Learning-to-Search, or CRFs.[36][37][38]
See also
[edit]- Controlled vocabulary
- Coreference resolution
- Entity linking (aka named entity normalization, entity disambiguation)
- Information extraction
- Knowledge extraction
- Onomastics
- Record linkage
- Smart tag (Microsoft)
References
[edit]- ^ Kripke, Saul (1971). "Identity and Necessity". In M.K. Munitz (ed.). Identity and Individuation. New York: New York University Press. pp. 135–64.
{{cite book}}: CS1 maint: publisher location (link) - ^ LaPorte, Joseph (2018). "Rigid Designators". The Stanford Encyclopedia of Philosophy.
- ^ Nadeau, David; Sekine, Satoshi (2007). A survey of named entity recognition and classification (PDF). Lingvisticae Investigationes.
- ^ Carreras, Xavier; Màrquez, Lluís; Padró, Lluís (2003). A simple named entity extractor using AdaBoost (PDF). CoNLL.
- ^ a b Tjong Kim Sang, Erik F.; De Meulder, Fien (2003). Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. CoNLL.
- ^ Named Entity Definition. Webknox.com. Retrieved on 2013-07-21.
- ^ Brunstein, Ada. "Annotation Guidelines for Answer Types". LDC Catalog. Linguistic Data Consortium. Archived from the original on 16 April 2016. Retrieved 21 July 2013.
- ^ a b Sekine's Extended Named Entity Hierarchy. Nlp.cs.nyu.edu. Retrieved on 2013-07-21.
- ^ Ritter, A.; Clark, S.; Mausam; Etzioni., O. (2011). Named Entity Recognition in Tweets: An Experimental Study (PDF). Proc. Empirical Methods in Natural Language Processing.
- ^ Esuli, Andrea; Sebastiani, Fabrizio (2010). Evaluating Information Extraction (PDF). Cross-Language Evaluation Forum (CLEF). pp. 100–111.
- ^ Kapetanios, Epaminondas; Tatar, Doina; Sacarea, Christian (2013-11-14). Natural Language Processing: Semantic Aspects. CRC Press. p. 298. ISBN 9781466584969.
- ^ a b Lin, Dekang; Wu, Xiaoyun (2009). Phrase clustering for discriminative learning (PDF). Annual Meeting of the ACL and IJCNLP. pp. 1030–1038.
- ^ Nothman, Joel; et al. (2013). "Learning multilingual named entity recognition from Wikipedia". Artificial Intelligence. 194: 151–175. doi:10.1016/j.artint.2012.03.006.
- ^ Jurafsky, Dan; Martin, James H. (2009). "22.1. Named Entity Recognition". Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall series in artificial intelligence (2 ed.). Upper Saddle River, N.J: Pearson Prentice Hall. ISBN 978-0-13-187321-6. OCLC 213375806.
- ^ Mikheev, Andrei; Moens, Marc; Grover, Claire (June 1999). Thompson, Henry S.; Lascarides, Alex (eds.). "Named Entity Recognition without Gazetteers". Ninth Conference of the European Chapter of the Association for Computational Linguistics. Bergen, Norway: Association for Computational Linguistics: 1–8.
- ^ Nadeau, David; Turney, Peter D.; Matwin, Stan (2006). "Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity". In Lamontagne, Luc; Marchand, Mario (eds.). Advances in Artificial Intelligence. Lecture Notes in Computer Science. Vol. 3060. Berlin, Heidelberg: Springer. pp. 266–277. doi:10.1007/11766247_23. ISBN 978-3-540-34630-2.
- ^ Jenny Rose Finkel; Trond Grenager; Christopher Manning (2005). Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling (PDF). 43rd Annual Meeting of the Association for Computational Linguistics. pp. 363–370.
- ^ Wolf; Debut, Lysandre; Sanh, Victor; Chaumond, Julien; Delangue, Clement; Moi, Anthony; Cistac, Pierric; Rault, Tim; Louf, Remi; Funtowicz, Morgan; Davison, Joe; Shleifer, Sam; von Platen, Patrick; Ma, Clara; Jernite, Yacine; Plu, Julien; Xu, Canwen; Le Scao, Teven; Gugger, Sylvain; Drame, Mariama; Lhoest, Quentin; Wolf, Thomas; Rush, Alexander (2020). Transformers: State-of-the-art natural language processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. pp. 38–45.
- ^ Krallinger, M; Leitner, F; Rabal, O; Vazquez, M; Oyarzabal, J; Valencia, A (2013). "Overview of the chemical compound and drug name recognition (CHEMDNER) task". Proceedings of the Fourth BioCreative Challenge Evaluation Workshop vol. 2. pp. 6–37. CiteSeerX 10.1.1.684.4118.
- ^ Poibeau, Thierry; Kosseim, Leila (2001). "Proper Name Extraction from Non-Journalistic Texts" (PDF). Computational Linguistics in the Netherlands 2000. Language and Computers. Vol. 37. pp. 144–157. doi:10.1163/9789004333901_011. ISBN 978-90-04-33390-1. S2CID 12591786. Archived from the original (PDF) on 2019-07-30.
- ^ Elaine Marsh, Dennis Perzanowski, "MUC-7 Evaluation of IE Technology: Overview of Results", 29 April 1998 PDF
- ^ MUC-07 Proceedings (Named Entity Tasks)
- ^ Turian, J., Ratinov, L., & Bengio, Y. (2010, July). Word representations: a simple and general method for semi-supervised learning. In Proceeding of the 48th Annual Meeting of the Association for Computational Linguistics (pp. 384–394). Association for Computational Linguistics. PDF
- ^ Ratinov, L., & Roth, D. (2009, June). Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (pp. 147–155). Association for Computational Linguistics.
- ^ "Frustratingly Easy Domain Adaptation" (PDF). Archived from the original (PDF) on 2010-06-13. Retrieved 2012-04-05.
- ^ Lee, Changki; Hwang, Yi-Gyu; Oh, Hyo-Jung; Lim, Soojong; Heo, Jeong; Lee, Chung-Hee; Kim, Hyeon-Jin; Wang, Ji-Hyun; Jang, Myung-Gil (2006). "Fine-Grained Named Entity Recognition Using Conditional Random Fields for Question Answering". Information Retrieval Technology. Lecture Notes in Computer Science. Vol. 4182. pp. 581–587. doi:10.1007/11880592_49. ISBN 978-3-540-45780-0.
- ^ Web 2.0-based crowdsourcing for high-quality gold standard development in clinical Natural Language Processing
- ^ Eiselt, Andreas; Figueroa, Alejandro (2013). A Two-Step Named Entity Recognizer for Open-Domain Search Queries. IJCNLP. pp. 829–833.
- ^ Han, Li-Feng Aaron, Wong, Fai, Chao, Lidia Sam. (2013). Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics. Proceeding of International Conference of Language Processing and Intelligent Information Systems. M.A. Klopotek et al. (Eds.): IIS 2013, LNCS Vol. 7912, pp. 57–68 [1]
- ^ Han, Li-Feng Aaron, Wong, Zeng, Xiaodong, Derek Fai, Chao, Lidia Sam. (2015). Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model. In Proceedings of SIGHAN workshop in ACL-IJCNLP. 2015. [2]
- ^ Linking Documents to Encyclopedic Knowledge.
- ^ "Learning to link with Wikipedia" (PDF). Archived from the original (PDF) on 2019-01-25. Retrieved 2014-07-21.
- ^ Local and Global Algorithms for Disambiguation to Wikipedia.
- ^ Derczynski, Leon and Diana Maynard, Giuseppe Rizzo, Marieke van Erp, Genevieve Gorrell, Raphael Troncy, Johann Petrak, and Kalian Botcheva (2014). “Analysis of named entity recognition and linking for tweets”. Information Processing and Management 51(2) : pages 32–49.
- ^ Baldwin, Timothy; de Marneffe, Marie Catherine; Han, Bo; Kim, Young-Bum; Ritter, Alan; Xu, Wei (July 2015). "Shared Tasks of the 2015 Workshop on Noisy User-generated Text: Twitter Lexical Normalization and Named Entity Recognition". Proceedings of the Workshop on Noisy User-generated Text. Beijing, China: Association for Computational Linguistics: 126–135. doi:10.18653/v1/W15-4319. hdl:2078.1/284718. S2CID 14500933.
- ^ "COLING 2016 Workshop on Noisy User-generated Text (W-NUT)". noisy-text.github.io. Retrieved 2022-08-13.
- ^ Partalas, Ioannis; Lopez, Cédric; Derbas, Nadia; Kalitvianski, Ruslan (December 2016). "Learning to Search for Recognizing Named Entities in Twitter". Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT). Osaka, Japan: The COLING 2016 Organizing Committee: 171–177.
- ^ Limsopatham, Nut; Collier, Nigel (December 2016). "Bidirectional LSTM for Named Entity Recognition in Twitter Messages". Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT). Osaka, Japan: The COLING 2016 Organizing Committee: 145–152.
- Jurafsky, Daniel; Martin, James H. (2008). "13.5 Partial Parsing". Speech and Language Processing (2nd ed.). Upper Saddle River, N.J.: Prentice Hall. ISBN 978-0131873216.
- Jurafsky, Daniel; Martin, James H. (2008). "22.1. Named Entity Recognition". Speech and Language Processing (2nd ed.). Upper Saddle River, N.J.: Prentice Hall. ISBN 978-0131873216.
Named-entity recognition
View on GrokipediaFundamentals
Definition and Scope
Named-entity recognition (NER), also known as named-entity identification, is a subtask of information extraction within natural language processing that aims to locate and classify named entities in unstructured text into predefined categories such as persons, organizations, locations, and temporal or numerical expressions.[6][7] The term was coined during the Sixth Message Understanding Conference (MUC-6) in 1995, where it was formalized as a core component for extracting structured information from free-form text like news articles.[1] This process transforms raw textual data into a more analyzable form by tagging entities with their types, enabling further semantic understanding without requiring full sentence parsing.[7] NER differs from related natural language processing tasks like part-of-speech tagging, which assigns broad grammatical categories (e.g., noun, verb) to individual words regardless of semantic content, whereas NER focuses on semantically specific entity identification and multi-word spans.[7] Similarly, it is distinct from coreference resolution, which resolves references to the same entity across different mentions in a text (e.g., linking "the president" to a prior named person), rather than merely detecting and categorizing the entities themselves.[7] These distinctions highlight NER's emphasis on entity-level semantics over syntactic structure or discourse linkage. The basic process of NER typically begins with tokenization, which segments the input text into words or subword units, followed by entity boundary detection to identify the start and end positions of potential entity spans, and concludes with classification to assign each detected entity to a predefined category.[7] This sequential approach ensures precise localization and typing, often leveraging contextual clues to disambiguate ambiguous cases. The scope of NER is generally limited to predefined entity types, as established in early frameworks like MUC-6, which contrasts with open-domain extraction methods that aim to identify entities and relations without fixed categories or schemas.[6][7][8] NER's reliance on such predefined sets facilitates consistent evaluation and integration into structured knowledge bases but may overlook novel or domain-specific entities outside the schema.Entity Types and Categories
Named entity recognition systems typically identify a core set of standard categories derived from early benchmarks like the Message Understanding Conference (MUC-7), which defined entities under ENAMEX for proper names including persons (PER) (e.g., "John Smith"), organizations (ORG) (e.g., "Microsoft Corporation"), and locations (LOC) (e.g., "New York"); NUMEX for numerical expressions such as money (MNY) (e.g., "$100 million") and percentages (PERC) (e.g., "25%"); and TIMEX for temporal expressions like dates (DAT) (e.g., "July 4, 1776") and times (TIM) (e.g., "3:00 PM"). These categories emphasize referential and quantitative entities central to information extraction in general-domain text.[7] Subsequent benchmarks introduced hierarchical schemes to capture nested structures, where entities can contain sub-entities of different types. In the Automatic Content Extraction (ACE) program, entities are organized into seven main types—person, organization, location, facility, weapon, vehicle, and geo-political entity (GPE)—with subtypes and nesting, such as a location nested within an organization (e.g., "headquarters in Paris" where "Paris" is a LOC within the ORG). Similarly, the OntoNotes 5.0 corpus employs a multi-level ontology with 18 core entity types, including person, organization, GPE, location, facility, norp (nationalities, religious or political groups), event, work of art, law, language, date, time, money, percent, quantity, ordinal, cardinal, and product, allowing for hierarchical annotations like a date nested within an event description. These schemes enable recognition of complex, overlapping entities beyond flat structures, improving coverage for real-world texts. Domain-specific NER adapts these categories to specialized vocabularies. In biomedical texts, common types include genes/proteins (e.g., "BRCA1"), diseases (e.g., "Alzheimer's disease"), chemicals/drugs (e.g., "aspirin"), cell types/lines (e.g., "HeLa cells"), and DNA/RNA sequences, as seen in datasets like JNLPBA and BC5CDR, which focus on molecular and clinical entities for tasks such as literature mining.[9] In legal documents, entity types extend to statutes (e.g., "Section 230 of the Communications Decency Act"), courts (e.g., "Supreme Court"), petitioners/respondents (e.g., party names in cases), provisions, precedents, judges, and witnesses, tailored to extract structured information from judgments and contracts.[10] The categorization in NER has evolved from flat structures in early systems like MUC, which treated entities as non-overlapping spans, to nested and hierarchical representations in ACE and OntoNotes, accommodating real-world complexities such as embedded entities and multi-type overlaps.[11] This progression reflects a shift toward more expressive models capable of handling ambiguity and granularity, influencing evaluation by requiring metrics that account for nesting depth and type hierarchies.[12]Challenges
Inherent Difficulties
Named-entity recognition (NER) faces significant ambiguity in determining entity boundaries, as the same word or phrase can refer to different types of entities depending on context. For instance, the term "Washington" may denote a person (e.g., George Washington), a location (e.g., Washington state or D.C.), or an organization, requiring precise boundary detection to avoid misclassification. This ambiguity arises because natural language lacks explicit markers for entity spans, making it difficult for models to consistently identify the correct start and end positions without additional contextual cues.[13][14] Contextual dependencies further complicate NER, as entity identification often relies on coreference resolution and disambiguation that demand extensive world knowledge. Coreference occurs when multiple mentions refer to the same entity (e.g., "the president" and "Biden" in a sentence), necessitating understanding of prior references to accurately tag subsequent spans. Disambiguation, meanwhile, involves resolving polysemous terms using external knowledge, such as distinguishing "Apple" as a company versus a fruit based on surrounding discourse or real-world associations. These processes highlight NER's dependence on broader linguistic and encyclopedic understanding, beyond mere pattern matching.[15][16] Nested and overlapping entities pose another inherent challenge, where one entity is embedded within another, complicating span extraction. For example, in the phrase "New York City Council," "New York City" is a location containing the nested entity "York," while the full span might represent an organization; traditional flat NER models struggle to capture such hierarchies without losing precision on inner or outer boundaries. This nesting occurs frequently in real-world texts, such as legal documents or news, where entities like persons (PER) within organizations (ORG) overlap, demanding models capable of handling multi-level structures.[17][18] Processing informal text exacerbates these issues, as abbreviations, typos, and code-switching introduce variability not present in standard corpora. Abbreviations like "Dr." for doctor or "NYC" for New York City require expansion or normalization to match entity patterns, while typos (e.g., "Washingtin" for Washington) can evade detection altogether. In multilingual contexts, code-switching—alternating between languages mid-sentence, common in social media—disrupts entity continuity, as seen in Hindi-English mixes where entity spans cross linguistic boundaries. These elements in user-generated content demand robust preprocessing and adaptability, underscoring NER's sensitivity to text quality.[19][20][21]Evaluation Metrics
The performance of named entity recognition (NER) systems is primarily assessed using precision, recall, and the F1-score, which quantify the accuracy of entity detection and classification. These metrics are derived from counts of true positives (TP, correctly identified entities), false positives (FP, incorrectly identified entities), and false negatives (FN, missed entities). Precision measures the proportion of predicted entities that are correct:Recall measures the proportion of actual entities that are detected:
The F1-score, as the harmonic mean of precision and recall, balances these measures and is the most commonly reported metric in NER evaluations:
[22][23] Evaluations can occur at the entity level or token level, with entity-level being standard for NER to emphasize complete entity identification rather than isolated word tags. In entity-level assessment, an entity prediction is correct only if its full span (boundaries) and type exactly match the gold annotation, often using the BIO tagging scheme—where "B" denotes the beginning of an entity, "I" the interior, and "O" outside any entity—to delineate boundaries precisely. Token-level evaluation, by contrast, scores each tag independently, which may inflate performance by rewarding partial boundary accuracy but fails to penalize incomplete entities. The CoNLL shared tasks, for instance, adopted entity-level F1 with exact matching to ensure robust boundary detection.[22][23] Prominent benchmarks for NER include the CoNLL-2003 dataset, a foundational English resource from Reuters news articles annotating four entity types (person, location, organization, miscellaneous) across approximately 300,000 tokens (training, development, and test sets combined), serving as the de facto standard for flat, non-nested NER with reported F1 scores around 90-93% for state-of-the-art systems. OntoNotes 5.0 extends this with a larger, multi-genre corpus (over 2 million words) supporting multilingual annotations and nested structures across 18 entity types, enabling evaluation of complex hierarchies in domains like broadcast news and web text. The WNUT series, particularly WNUT-17, targets emerging entities in noisy social media (e.g., Twitter), with 6 entity types including novel terms like hashtags or events, where F1 scores typically range from 50-70% due to informal language challenges.[22][24][23] For datasets with nested entities like OntoNotes 5.0, metrics distinguish strict matching—requiring exact span and type overlap for credit—from partial matching, which awards partial credit for boundary approximations or inner/outer entity detection to better capture system capabilities in hierarchical scenarios. Strict matching aligns with flat benchmarks like CoNLL-2003, ensuring conservative scores, while partial variants (e.g., relaxed F1) are used in nested contexts to evaluate boundary tolerance without overpenalizing near-misses.[23][25]
Methodologies
Classical Approaches
Classical approaches to named entity recognition (NER) primarily relied on rule-based systems, which employed hand-crafted patterns and linguistic rules to identify and classify entities in text. These systems operated deterministically, matching predefined templates against input text to detect entity boundaries and types, such as person names or locations, without requiring training data. For instance, patterns could specify syntactic structures like capitalized words following verbs of attribution to flag potential person names.[26] A key component of these systems was the use of gazetteers, which are curated lists of known entities, such as city names or organization titles, to perform exact or fuzzy matching against text spans. Gazetteers enhanced precision by providing lexical resources for entity lookup, often integrated with part-of-speech tagging to filter candidates. In specialized domains like biomedicine, gazetteers drawn from synonym dictionaries helped recognize protein or gene names by associating text mentions with database entries.[27][28] Boundary detection in rule-based NER frequently utilized regular expressions to capture patterns indicative of entities, such as sequences of capitalized words or specific punctuation, and finite-state transducers to model sequential dependencies in entity spans. Regular expressions, for example, could define patterns like[A-Z][a-z]+ for proper nouns, while finite-state transducers processed text as automata to recognize multi-word entities like "New York City" as a single location. These tools allowed efficient scanning of text for potential entity starts and ends.[26][28]
Classification often involved integrating dictionaries—structured collections of entity terms—with heuristics, such as contextual clues like preceding prepositions or domain-specific triggers, to assign entity types. Dictionaries supplemented gazetteers by providing broader lexical coverage, and heuristics resolved ambiguities by prioritizing rules based on confidence scores derived from pattern specificity. This combination enabled systems to handle basic entity categorization in controlled environments, as formalized in early evaluations like those from the Message Understanding Conference.[29][26]
Despite their interpretability and high precision on well-defined patterns, rule-based systems suffered from significant limitations, including poor scalability to new domains due to the need for extensive manual rule engineering and their inability to generalize beyond explicit patterns. The high manual effort required for creating and maintaining rules often made these approaches labor-intensive, limiting their applicability to diverse or evolving text corpora. Early systems achieved F1-scores around 90-93% on benchmark tasks but struggled with recall for unseen variations.[28][29]
