Recent from talks
Nothing was collected or created yet.
Ontology engineering
View on Wikipedia
In computer science, information science and systems engineering, ontology engineering is a field which studies the methods and methodologies for building ontologies, which encompasses a representation, formal naming and definition of the categories, properties and relations between the concepts, data and entities of a given domain of interest. In a broader sense, this field also includes a knowledge construction of the domain using formal ontology representations such as OWL/RDF. A large-scale representation of abstract concepts such as actions, time, physical objects and beliefs would be an example of ontological engineering.[2] Ontology engineering is one of the areas of applied ontology, and can be seen as an application of philosophical ontology. Core ideas and objectives of ontology engineering are also central in conceptual modeling.
Ontology engineering aims at making explicit the knowledge contained within software applications, and within enterprises and business procedures for a particular domain. Ontology engineering offers a direction towards solving the inter-operability problems brought about by semantic obstacles, i.e. the obstacles related to the definitions of business terms and software classes. Ontology engineering is a set of tasks related to the development of ontologies for a particular domain.
— Line Pouchard, Nenad Ivezic and Craig Schlenoff, [3]
Automated processing of information not interpretable by software agents can be improved by adding rich semantics to the corresponding resources, such as video files. One of the approaches for the formal conceptualization of represented knowledge domains is the use of machine-interpretable ontologies, which provide structured data in, or based on, RDF, RDFS, and OWL. Ontology engineering is the design and creation of such ontologies, which can contain more than just the list of terms (controlled vocabulary); they contain terminological, assertional, and relational axioms to define concepts (classes), individuals, and roles (properties) (TBox, ABox, and RBox, respectively).[4] Ontology engineering is a relatively new field of study concerning the ontology development process, the ontology life cycle, the methods and methodologies for building ontologies,[5][6] and the tool suites and languages that support them. A common way to provide the logical underpinning of ontologies is to formalize the axioms with description logics, which can then be translated to any serialization of RDF, such as RDF/XML or Turtle. Beyond the description logic axioms, ontologies might also contain SWRL rules. The concept definitions can be mapped to any kind of resource or resource segment in RDF, such as images, videos, and regions of interest, to annotate objects, persons, etc., and interlink them with related resources across knowledge bases, ontologies, and LOD datasets. This information, based on human experience and knowledge, is valuable for reasoners for the automated interpretation of sophisticated and ambiguous contents, such as the visual content of multimedia resources.[7] Application areas of ontology-based reasoning include, but are not limited to, information retrieval, automated scene interpretation, and knowledge discovery.
Languages
[edit]An ontology language is a formal language used to encode the ontology. There are a number of such languages for ontologies, both proprietary and standards-based:
- Common logic is ISO standard 24707, a specification for a family of ontology languages that can be accurately translated into each other.
- The Cyc project has its own ontology language called CycL, based on first-order predicate calculus with some higher-order extensions.
- The Gellish language includes rules for its own extension and thus integrates an ontology with an ontology language.
- IDEF5 is a software engineering method to develop and maintain usable, accurate, domain ontologies.
- KIF is a syntax for first-order logic that is based on S-expressions.
- Rule Interchange Format (RIF), F-Logic and its successor ObjectLogic combine ontologies and rules.
- OWL is a language for making ontological statements, developed as a follow-on from RDF and RDFS, as well as earlier ontology language projects including OIL, DAML and DAML+OIL. OWL is intended to be used over the World Wide Web, and all its elements (classes, properties and individuals) are defined as RDF resources, and identified by URIs.
- OntoUML is a well-founded language for specifying reference ontologies.
- SHACL (RDF SHapes Constraints Language) is a language for describing structure of RDF data. It can be used together with RDFS and OWL or it can be used independently from them.
- XBRL (Extensible Business Reporting Language) is a syntax for expressing business semantics.
Methodologies and tools
[edit]In life sciences
[edit]Life sciences is flourishing with ontologies that biologists use to make sense of their experiments.[9] For inferring correct conclusions from experiments, ontologies have to be structured optimally against the knowledge base they represent. The structure of an ontology needs to be changed continuously so that it is an accurate representation of the underlying domain.
Recently, an automated method was introduced for engineering ontologies in life sciences such as Gene Ontology (GO),[10] one of the most successful and widely used biomedical ontology.[11] Based on information theory, it restructures ontologies so that the levels represent the desired specificity of the concepts. Similar information theoretic approaches have also been used for optimal partition of Gene Ontology.[12] Given the mathematical nature of such engineering algorithms, these optimizations can be automated to produce a principled and scalable architecture to restructure ontologies such as GO.
Open Biomedical Ontologies (OBO), a 2006 initiative of the U.S. National Center for Biomedical Ontology, provides a common 'foundry' for various ontology initiatives, amongst which are:
- The Generic Model Organism Project (GMOD)
- Gene Ontology Consortium
- Sequence Ontology
- Ontology Lookup Service
- The Plant Ontology Consortium
- Standards and Ontologies for Functional Genomics
and more
See also
[edit]References
[edit]
This article incorporates public domain material from the National Institute of Standards and Technology
- ^ Peter Shames, Joseph Skipper. "Toward a Framework for Modeling Space Systems Architectures" Archived 2009-02-27 at the Wayback Machine. NASA, JPL.
- ^ "Beyond Concepts: Ontology as Reality Representation" (PDF). Archived from the original (PDF) on 2006-03-03.
- ^ Line Pouchard, Nenad Ivezic and Craig Schlenoff (2000) "Ontology Engineering for Distributed Collaboration in Manufacturing". In Proceedings of the AIS2000 conference, March 2000.
- ^ Sikos, L. F. (14 March 2016). "A Novel Approach to Multimedia Ontology Engineering for Automated Reasoning over Audiovisual LOD Datasets". Lecture Notes in Artificial Intelligence. Vol. 9621. Springer. pp. 1–13. arXiv:1608.08072. doi:10.1007/978-3-662-49381-6_1.
- ^ Asunción Gómez-Pérez, Mariano Fernández-López, Oscar Corcho (2004). Ontological Engineering: With Examples from the Areas of Knowledge Management, E-commerce and the Semantic Web. Springer, 2004.
- ^ De Nicola, A; Missikoff, M; Navigli, R (2009). "A software engineering approach to ontology building" (PDF). Information Systems. 34 (2): 258. CiteSeerX 10.1.1.149.7258. doi:10.1016/j.is.2008.07.002.
- ^ Zarka, M; Ammar, AB; AM, Alimi (2015). "Fuzzy reasoning framework to improve semantic video interpretation". Multimedia Tools and Applications. 75 (10): 5719–5750. doi:10.1007/s11042-015-2537-1. S2CID 16505884.
- ^ Fathallah, Nadeen; Das, Arunav; De Giorgis, Stefano; Poltronieri, Andrea; Haase, Peter; Kovriguina, Liubov (2024-05-26). NeOn-GPT: A Large Language Model-Powered Pipeline for Ontology Learning (PDF). Extended Semantic Web Conference 2024. Hersonissos, Greece.
- ^ Malone, J; Holloway, E; Adamusiak, T; Kapushesky, M; Zheng, J; Kolesnikov, N; Zhukova, A; Brazma, A; Parkinson, H (2010). "Modeling sample variables with an Experimental Factor Ontology". Bioinformatics. 26 (8): 1112–1118. doi:10.1093/bioinformatics/btq099. PMC 2853691. PMID 20200009.
- ^ Alterovitz, G; Xiang, M; Hill, DP; Lomax, J; Liu, J; Cherkassky, M; Dreyfuss, J; Mungall, C; et al. (2010). "Ontology engineering". Nature Biotechnology. 28 (2): 128–30. doi:10.1038/nbt0210-128. PMC 4829499. PMID 20139945.
- ^ Botstein, David; Cherry, J. Michael; Ashburner, Michael; Ball, Catherine A.; Blake, Judith A.; Butler, Heather; Davis, Allan P.; Dolinski, Kara; et al. (2000). "Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium" (PDF). Nature Genetics. 25 (1): 25–9. doi:10.1038/75556. PMC 3037419. PMID 10802651. Archived from the original (PDF) on 2011-05-26.
- ^ Alterovitz, G.; Xiang, M.; Mohan, M.; Ramoni, M. F. (2007). "GO PaD: The Gene Ontology Partition Database". Nucleic Acids Research. 35 (Database issue): D322–7. doi:10.1093/nar/gkl799. PMC 1669720. PMID 17098937.
Further reading
[edit]- Kotis, K., A. Papasalouros, G. A. Vouros, N. Pappas, and K. Zoumpatianos, "Enhancing the Collective Knowledge for the Engineering of Ontologies in Open and Socially Constructed Learning Spaces", Journal of Universal Computer Science, vol. 17, issue 12, pp. 1710–1742, 08/2011
- Kotis, K., and A. Papasalouros, "Learning useful kick-off ontologies from Query Logs: HCOME revised", 4th International Conference on Complex, Intelligent and Software Intensive Systems (CISIS-2010), Kracow, IEEE Computer Society Press, 2010.
- John Davies (Ed.) (2006). Semantic Web Technologies: Trends and Research in Ontology-based Systems. Wiley. ISBN 978-0-470-02596-3
- Asunción Gómez-Pérez, Mariano Fernández-López, Oscar Corcho (2004). Ontological Engineering: With Examples from the Areas of Knowledge Management, E-commerce and the Semantic Web. Springer, 2004.
- Jarrar, Mustafa (2006). "Position paper". Proceedings of the 15th international conference on World Wide Web - WWW '06. pp. 497–503. doi:10.1145/1135777.1135850. ISBN 978-1-59593-323-2. S2CID 14184354.
- Mustafa Jarrar and Robert Meersman (2008). "Ontology Engineering -The DOGMA Approach". Book Chapter (Chapter 3). In Advances in Web Semantics I. Volume LNCS 4891, Springer.
- Riichiro Mizoguchi (2004). "Tutorial on ontological engineering: part 3: Advanced course of ontological engineering" Archived 2013-03-09 at the Wayback Machine. In: New Generation Computing. Ohmsha & Springer-Verlag, 22(2):198-220.
- Elena Paslaru Bontas Simperl and Christoph Tempich (2006). "Ontology Engineering: A Reality Check"
- Devedzić, Vladan (2002). "Understanding ontological engineering". Communications of the ACM. 45 (4): 136–144. CiteSeerX 10.1.1.218.7546. doi:10.1145/505248.506002. S2CID 5352880.
- Sure, York, Staab, Steffen and Studer, Rudi (2009). Ontology Engineering Methodology. In Staab, Steffen & Studer, Rudi (eds.) Handbook on Ontologies (2nd edition), Springer-Verlag, Heidelberg. ISBN 978-3-540-70999-2
External links
[edit]- Ontopia.net: Metadata? Thesauri? Taxonomies? Topic Maps! Making Sense of it All, by Lars Marius Garshol, 2004.
- OntologyEngineering.org: Ontology Engineering With Diagrams Archived 2023-06-09 at the Wayback Machine
Ontology engineering
View on GrokipediaFundamentals
Definition and Scope
Ontology engineering is the systematic process of developing, maintaining, and evolving ontologies to explicitly represent domain knowledge for computational use.[1] An ontology itself is defined as a formal, explicit specification of a conceptualization, encompassing the objects, concepts, and entities presumed to exist in some area of interest, along with their properties and interrelations.[2] This engineering discipline focuses on creating structured, logic-based knowledge representations that are machine-readable and reusable across applications.[1] The primary objectives of ontology engineering include enhancing interoperability between heterogeneous information systems, facilitating semantic integration of data from diverse sources, and enabling automated reasoning to infer new knowledge or validate consistency.[1] By providing a shared vocabulary and formal constraints, it supports precise knowledge sharing and reduces ambiguity in data exchange, particularly in distributed environments like the Semantic Web.[4] Ontologies enable modeling of domain-specific concepts, allowing for accurate data retrieval and analysis in various fields.[1] Ontology engineering differs from knowledge engineering in its emphasis on formal, logic-based structures optimized for computational inference, whereas knowledge engineering encompasses broader activities like informal knowledge acquisition from experts and diverse representation techniques without strict formalization.[1] While both aim to capture expertise for intelligent systems, ontology engineering prioritizes explicit axioms and machine-interpretable semantics to support automated processes.[1] Core components of an ontology include classes (representing concepts or categories), properties (such as object properties for relations between classes and data properties for attributes), relations (defining how classes interact), instances (specific examples of classes), and axioms (logical rules or constraints that govern the ontology).[10] These elements form a knowledge base typically divided into a TBox (terminological knowledge about classes and properties) and an ABox (assertional knowledge about instances).[1] Within computer science, information science, and systems engineering, the scope of ontology engineering extends to applications such as knowledge representation, data integration, and semantic query answering, yielding benefits like improved data sharing across silos and enhanced decision-making through inferential capabilities.[1] It plays a crucial role in fields requiring precise semantic modeling, including artificial intelligence and database systems, where ontologies promote consistency and scalability in knowledge management.[1]Historical Development
The roots of ontology engineering trace back to ancient philosophy, particularly Aristotle's work in the Categories, where he outlined a foundational classification of entities into ten fundamental types, such as substance, quantity, and quality, to systematically describe what exists in the world.[11] This approach laid the groundwork for ontological inquiry as a means of categorizing reality. In the 20th century, analytic philosophy further advanced these ideas, with Willard Van Orman Quine's 1948 paper "On What There Is" introducing the concept of ontological commitment, which posits that a theory's commitments to entities are determined by the variables it quantifies over in first-order logic, influencing later computational interpretations of existence and representation.[12] The emergence of ontology engineering in artificial intelligence began in the 1970s and 1980s, driven by the need for structured knowledge representation in expert systems. During this period, frame-based systems, proposed by Marvin Minsky in his 1974 paper "A Framework for Representing Knowledge," provided a method to organize knowledge into reusable structures that capture stereotyped situations and their attributes, facilitating reasoning in early AI applications. A seminal project in this era was the Cyc initiative, launched in 1984 by Douglas Lenat at the Microelectronics and Computer Technology Corporation, which aimed to encode a vast common-sense knowledge base to enable machine understanding of everyday concepts, marking an early large-scale effort in manual ontology construction.[13] The 1990s marked a pivotal shift toward ontology engineering in the context of the semantic web, with Tom Gruber's 1993 paper "A Translation Approach to Portable Ontology Specifications" defining an ontology as "an explicit specification of a conceptualization," emphasizing its role in enabling shared understanding across systems.[14] Key U.S. Defense Advanced Research Projects Agency (DARPA) initiatives, such as the 1990 Summer Ontology Project and the later DARPA Agent Markup Language (DAML) program in the late 1990s, promoted reusable ontologies for knowledge sharing and integration in distributed AI systems.[15] This momentum culminated in the World Wide Web Consortium's (W3C) standardization of the Web Ontology Language (OWL) in 2004, providing a formal framework for web-based ontologies that built on RDF and supported advanced reasoning.[16] Post-2000, the field saw significant growth, particularly in biomedical domains, exemplified by the Gene Ontology project initiated in 1998 but expanding rapidly after 2000 to standardize gene function annotations across species.[17] The early 2000s also witnessed the formal establishment of the ontology engineering community, with dedicated surveys and methodologies emerging around 2006 to address systematic development processes.[18]Formal Foundations
Ontology Languages and Standards
Ontology engineering relies on standardized languages to formally represent knowledge structures, ensuring interoperability and machine readability across systems. The Resource Description Framework (RDF) serves as a foundational model for this purpose, defining data as directed graphs composed of triples in the form of subject-predicate-object statements.[19] This graph-based approach allows for flexible representation of relationships between resources, identified by URIs, and supports the integration of heterogeneous data sources without a fixed schema. RDF 1.1, published in 2014, provides the current specification with enhancements such as support for named graphs and improved internationalization. Building upon RDF, the Web Ontology Language (OWL) provides a more expressive framework for defining ontologies, enabling the specification of classes, properties, and axioms with precise semantics. OWL was standardized by the W3C in 2004, with OWL 2 published in 2009 to address limitations in expressivity and performance.[4] OWL 2 defines two main semantics: OWL 2 DL, which supports description logic constructs while maintaining computational decidability for automated reasoning; and OWL 2 Full, which allows unrestricted use of RDF vocabulary but at the cost of potential undecidability. Additionally, OWL 2 includes three tractable profiles—OWL 2 EL for existential expressivity in large ontologies, OWL 2 QL for query answering via database technologies, and OWL 2 RL for rule-based implementations—optimized for efficiency in specific applications.[20] Beyond OWL, several other languages facilitate ontology representation, particularly for rule-based and logic-intensive extensions. Common Logic (CL), defined in ISO/IEC 24707, offers a family of first-order logic dialects for interchanging knowledge across systems, emphasizing modular and extensible syntax.[21] The Knowledge Interchange Format (KIF) provides a predicate calculus-based language for sharing knowledge among disparate programs, with declarative semantics that avoid procedural implications.[22] F-Logic extends frame-based systems with logic programming features, supporting rule-based ontology definitions through stratified negation and inheritance mechanisms.[23] Standards bodies play a crucial role in governing these languages and their implementations. The W3C oversees RDF and OWL specifications, ensuring web compatibility, while ISO standardizes broader logic frameworks like Common Logic.[24] The OBO Foundry coordinates domain-specific ontologies, particularly in biomedicine, by enforcing principles such as adherence to shared syntax and serialization standards.[25] Serialization formats like Turtle, a compact textual syntax for RDF graphs, and N-Triples, a line-based format for simple triple encoding, are defined by W3C to promote data exchange.[26][27] A key distinction in ontology languages lies in their expressivity trade-offs. OWL 2 DL achieves decidability through syntactic restrictions aligned with description logics, enabling sound and complete automated reasoning.[28] In contrast, OWL 2 Full's unrestricted RDF integration leads to undecidability, as it permits constructs that exceed the decidable fragment of first-order logic.[29]Representation Formalisms
Description logics (DLs) provide the mathematical and logical foundations for representing ontologies in ontology engineering, offering a decidable fragment of first-order logic tailored for knowledge representation. DLs model domain knowledge through concepts (interpreted as unary predicates or sets of individuals), roles (binary predicates or relations between individuals), and axioms that define relationships such as subsumption. The family of DLs, particularly the ALC (Attributative Language with Complements) subfamily, serves as the cornerstone for expressive ontology languages. ALC includes basic constructors such as intersection (), which combines concepts into their logical conjunction; union (), representing disjunction; negation (), for complementation; universal quantification (), restricting all role fillers to concept ; and existential quantification (), requiring at least one role filler in . These constructors enable precise definitions of complex concepts while maintaining computational tractability.[30] Formally, an ontology in DLs is often defined as a tuple , where denotes the set of concepts, the set of roles, the set of axioms (including general concept inclusions and role inclusions), and the set of individuals with assertions linking them to concepts and roles. Semantics are provided by interpretations , where is a non-empty domain and concepts map to subsets of , while roles map to binary relations on . For instance, the intersection constructor satisfies , ensuring set-theoretic consistency. This structure supports the separation of terminological knowledge (TBox axioms in ) from assertional knowledge (ABox involving ). DLs like ALC form the basis for standards such as OWL 2 DL, which corresponds to the more expressive SROIQ(D) logic.[30][31][32] Reasoning in DLs relies on tableau algorithms, which construct models (tableaux) by systematically expanding an initial ABox through non-deterministic rules that enforce concept and role constraints. These algorithms apply expansion rules for constructors, such as the -rule (branching to satisfy both subconcepts) and -rule (introducing successors), while clash detection identifies contradictions like an individual in both a concept and its negation. Tableau methods are sound, preserving satisfiability from the initial to expanded ABox, and complete, as blocking techniques (e.g., pair-wise or subset blocking) ensure termination by preventing infinite expansions in cyclic models. For expressive DLs like SROIQ(D), underlying OWL 2 DL, reasoning tasks such as concept satisfiability and subsumption are NEXP-complete, reflecting the added complexity from features like transitive roles (S), role hierarchies (H), nominals (O), inverse roles (I), and datatypes (D). Despite this worst-case complexity, optimizations like caching and rule ordering enhance practical efficiency.[33][32] Extensions to basic DLs address limitations in modeling dynamic or uncertain domains. Hybrid logics augment DLs with nominals (singleton concepts) and binders (state variables for referencing individuals), enabling more flexible ontology representations while preserving decidability in restricted forms; for example, adding nominals to ALC yields ALCO, which supports extensional class definitions common in ontologies. Temporal DLs incorporate linear-time operators (e.g., "always" or "eventually" ) to capture evolving knowledge in dynamic ontologies, such as business processes, with decidable fragments like achieving EXPTIME complexity for satisfiability over bounded traces. Probabilistic extensions, such as ALC, integrate probability bounds (e.g., for conditional probabilities) to handle uncertainty, ensuring consistency through non-empty probabilistic models and enabling inferences like range refinement under statistical assumptions.[34][35][36] In contrast to full first-order logic (FOL), which offers unrestricted expressivity but is undecidable for satisfiability, DLs are carefully restricted fragments that ensure decidability. ALC, for instance, embeds into the two-variable fragment of FOL (FO), which is NEXPTIME-complete, but avoids features like arbitrary quantifier nesting or function symbols that lead to undecidability. While FOL supports full predicate expressivity, DLs prioritize tailored constructors and optimized reasoning, trading some generality for computational feasibility in ontology applications.[30][37]Development Methodologies
Engineering Processes
Ontology engineering processes provide structured approaches to the systematic development, refinement, and maintenance of ontologies, ensuring they meet domain-specific requirements while supporting interoperability and scalability. These processes typically follow lifecycle models that emphasize iteration, collaboration, and validation to address the complexity of knowledge representation. Seminal methodologies have emerged to guide practitioners in transforming informal domain knowledge into formal, computable structures. One foundational methodology is METHONTOLOGY, introduced in 1997, which outlines a lifecycle based on evolving prototypes divided into five main phases: specification, where the ontology's purpose, scope, and explicit assumptions are defined; conceptualization, involving the organization of knowledge into abstract models such as hierarchies and taxonomies; formalization, where these models are expressed using semi-formal or formal representations; implementation, translating the formal model into a specific ontology language; and maintenance, encompassing updates and corrections to ensure ongoing relevance.[38] This approach draws from software engineering principles to treat ontology development as an engineering discipline rather than an ad hoc art.[39] Building on such foundations, the NeOn Methodology, developed through an EU-funded project from 2006 to 2010, extends lifecycle support to networked and collaborative ontology engineering. It adopts a scenario-based framework that tailors processes to specific contexts, such as reusing existing ontologies or integrating distributed knowledge sources in team environments, thereby facilitating the construction of ontology networks.[40] The methodology includes guidelines for scheduling activities across the lifecycle, emphasizing adaptability for large-scale, interdisciplinary projects.[41] A widely referenced practical guide is Ontology Development 101, published in 2001, which proposes a seven-step process for beginners: determining the domain and scope via competency questions; considering reuse of existing ontologies; enumerating important terms; defining classes and the class hierarchy; defining properties of classes; defining facets of slots (properties); and creating instances.[42] This iterative process encourages incremental building and testing, starting from core concepts and expanding outward. Across these methodologies, ontology lifecycles incorporate iterative refinement to accommodate evolving domain knowledge, version control mechanisms to track changes and manage dependencies, and evaluation metrics focused on consistency (ensuring no logical contradictions within the ontology) and coherence (verifying the ontology's alignment with domain knowledge through techniques like competency questions).[1] Such aspects promote reliability, with consistency checks often involving automated reasoning to detect contradictions, and coherence evaluations comparing the ontology against competency questions.[43] Key principles guiding these processes include the use of competency questions to elicit precise requirements—what specific queries the ontology must support—to define scope and validate coverage from the outset.[42] Additionally, a middle-out development strategy balances top-down (starting from broad categories) and bottom-up (from specific instances) approaches by prioritizing central, domain-relevant concepts first, then extending to abstractions and details, which enhances modularity and eases integration with reuse techniques like modular ontology design.[44] Recent developments include LLM-based approaches for automated ontology drafting and collaborative engineering, enhancing efficiency in knowledge elicitation and alignment (as of 2025).[45]Reuse and Integration Techniques
Ontology reuse strategies enable the efficient construction of new ontologies by leveraging existing ones, particularly upper-level or foundational ontologies such as DOLCE (Descriptive Ontology for Linguistic and Cognitive Engineering) and BFO (Basic Formal Ontology), which provide generic conceptual structures applicable across domains.[46] Extension involves adding domain-specific concepts and axioms to an upper ontology while preserving its core structure, allowing for specialization without altering foundational elements.[47] Subsetting, conversely, extracts relevant portions of an upper ontology to focus on a narrower scope, reducing complexity and ensuring alignment with specific requirements.[48] Merging combines multiple upper ontologies or fragments to create a unified representation, often requiring resolution of overlapping concepts to avoid redundancy.[49] These strategies are integral to methodologies like NeOn, which outline scenarios for reusing ontological resources in network-based engineering.[50] Alignment techniques address semantic heterogeneity between ontologies by establishing correspondences between entities, facilitating integration. Entity matching commonly employs string similarity measures, such as Levenshtein distance or Jaro-Winkler, to compare lexical labels and identifiers. Structural analysis extends this by examining relational patterns, like subclass hierarchies or property connections, to infer alignments based on graph isomorphism or subgraph matching. Semantic embedding techniques incorporate external knowledge sources, such as WordNet, to capture contextual meanings through synset mappings and hypernym relations, enhancing accuracy for ambiguous terms.[51] These methods are often combined in hybrid approaches to balance precision and recall, as surveyed in comprehensive ontology matching frameworks. Modularization techniques partition large ontologies into manageable modules to support reuse and maintenance, promoting scalability in engineering processes. Extraction creates modules by selecting subgraphs centered on specific concepts or axioms, ensuring logical consistency through criteria like locality preservation.[52] Pruning removes irrelevant elements from an ontology while retaining core semantics, often guided by relevance metrics to minimize information loss.[53] Merging modules assembles them into a cohesive whole, adhering to principles like ontology double articulation, which ensures bidirectional logical connections between modules without introducing inconsistencies.[54] These partitioning approaches, rooted in formal semantics, enable distributed development and targeted reuse.[52] Tools for ontology integration, such as AgreementMaker, automate mapping tasks by integrating multiple alignment techniques into a unified workflow. AgreementMaker employs a weighted combination of lexical, structural, and semantic matchers to generate entity correspondences, supporting large-scale ontologies through efficient algorithms.[55] Its effectiveness is evaluated using precision (correct mappings among proposed ones) and recall (proposed mappings among true ones) against gold standard references from benchmarks like the Ontology Alignment Evaluation Initiative (OAEI).[56] For instance, AgreementMaker has demonstrated strong performance in OAEI tracks, such as ranking highly in the anatomy task. Challenges in ontology integration arise from heterogeneity in representational choices and axiom structures, complicating seamless merging. Heterogeneity resolution requires reconciling differences in naming conventions, granularity, and modeling paradigms across source ontologies, often demanding manual intervention for ambiguous cases.[57] Conflict detection focuses on identifying inconsistent axioms post-integration, such as contradictory subclass relations or property constraints, using reasoning tools to verify coherence.[58] These issues can propagate errors in downstream applications, underscoring the need for systematic frameworks that prioritize logical consistency during reuse.[59]Tools and Technologies
Editing and Building Tools
Ontology engineering relies on specialized software environments that facilitate the creation, modification, and maintenance of ontologies, enabling users to define classes, properties, and relationships in a structured manner.[60] These tools typically provide graphical user interfaces (GUIs) to abstract the underlying formalisms, making ontology development accessible to domain experts without deep programming knowledge. Prominent examples include both open-source and commercial platforms that support standards like OWL and RDF, with varying emphases on collaboration, scalability, and integration.[61] Protégé, developed by Stanford University since 2002, is a widely adopted open-source ontology editor that supports OWL 2 and provides extensive plugin architecture for tasks such as visualization of class hierarchies and collaborative editing through WebProtégé. The tool was updated to version 5.6.8 in September 2025, enhancing stability and macOS compatibility.[60] It allows users to build ontologies via intuitive forms for defining axioms, individuals, and annotations, while supporting import and export in formats like OWL/XML and RDF/XML.[62] The tool's plugin ecosystem enables extensions for specific needs, such as diagrammatic representations using tools like OWLviz, enhancing usability for complex ontology structures. TopBraid Composer, an enterprise-grade tool from TopQuadrant, offers advanced editing capabilities for RDF and OWL ontologies, including integrated SPARQL querying and SHACL-based validation to ensure data quality during development. It features a robust GUI for managing large-scale semantic models, with support for modular ontology design and automation through scripting in SPARQL and Java.[63] As part of the TopBraid EDG platform, it emphasizes enterprise deployment, handling interconnected knowledge graphs with high performance.[64] Open-source alternatives include the NeOn Toolkit, a legacy tool from the EU-funded NeOn project (2007–2013) designed for networked ontology development, which supported collaborative workflows across distributed teams and integrated multiple ontology languages like OWL and F-Logic for reuse in modular environments.[61] Another option is VocBench, a web-based editor focused on SKOS vocabularies and OWL ontologies, providing multilingual support for thesaurus management and concept scheme editing in collaborative settings.[65] These tools prioritize accessibility for vocabulary-centric tasks, such as defining skos:Concept hierarchies and broader/narrower relations.[65] Common features across these tools include graphical interfaces for visualizing and editing class hierarchies, asserting properties on instances, and handling relationships like subclassOf or objectPropertyDomain.[60] They support import/export in standard formats such as OWL/XML, Turtle, and RDF/XML, facilitating interoperability with other semantic technologies.[66] Many incorporate version control and diff tools to track changes in ontology evolution.[61] In comparisons, tools like Protégé excel in usability for domain experts due to their intuitive drag-and-drop interfaces and extensive tutorials.[60] Conversely, enterprise tools such as TopBraid Composer demonstrate superior scalability for large ontologies, supporting efficient querying and editing over massive datasets in production environments, though at the cost of a steeper learning curve for non-technical users.[67] Open-source options like NeOn and VocBench balance usability and scalability for collaborative, mid-sized projects, particularly in networked or vocabulary-focused scenarios.[61] These editing tools often serve as front-ends in broader reasoning workflows, where ontologies are loaded into inference engines for consistency checking.[60] Recent advancements include tools leveraging large language models (LLMs) for ontology engineering tasks, such as DeepOnto, a Python package released in 2023 that supports ontology alignment, completion, and other deep learning-based operations.[68]Reasoning and Validation Tools
Reasoning and validation tools in ontology engineering enable the inference of implicit knowledge from explicit axioms and the assessment of ontology quality through automated analysis. These tools leverage description logics (DLs) underlying languages like OWL to perform tasks such as consistency checking, subsumption computation, and classification, ensuring ontologies are logically sound and free from structural errors. HermiT is a Java-based OWL 2 reasoner employing a hypertableau calculus, a variant of tableau-based algorithms optimized for nondeterministic expansions and model caching to handle OWL DL inferences efficiently. It supports key reasoning services including ontology consistency checking, entailment verification, and class/property classification, often outperforming traditional tableau reasoners on complex ontologies with thousands of axioms.[69][70] FaCT++ is an open-source C++ reasoner implementing tableau-based procedures for OWL 2 DL, focusing on optimized absorption and automation techniques to compute subsumption hierarchies and detect unsatisfiable concepts. It excels in classification tasks for large-scale ontologies, providing entailment support through modular decomposition of axioms to reduce computational overhead.[71] Pellet serves as a comprehensive OWL 2 DL reasoner with tableau expansion for consistency checking and incremental reasoning capabilities, extending to rule-based inference via integration with SWRL for deriving new facts from horn-like rules alongside DL axioms. This allows validation of ontology coherence while incorporating procedural knowledge, such as deriving class memberships from property assertions.[72][73] For structural validation, OOPS! (Ontology Pitfall Scanner) automates the detection of common modeling errors in OWL ontologies, including circular hierarchies where cyclic subclass relationships violate acyclic assumptions, as well as inconsistencies like multiple inheritance paths leading to logical paradoxes. It evaluates over 40 pitfalls categorized by severity, providing remediation suggestions based on best practices.[74] OntoMetric facilitates quality assessment through metric computation, measuring aspects like coverage by evaluating the ratio of defined concepts to total entities and relationship density to gauge completeness relative to domain requirements. This tool supports comparative analysis across ontology versions or peers, emphasizing semantic richness without exhaustive enumeration.[75] Performance optimizations in these tools often target tractable DL fragments, such as the OWL 2 EL profile based on EL++, which admits polynomial-time reasoning for subsumption and consistency via completion rules that avoid exponential blowup in existential restrictions and conjunctions.[76] Integration of reasoners into development environments occurs via standardized APIs, such as the OWL API in Protégé, allowing seamless embedding of HermiT or Pellet for on-the-fly inference during ontology authoring without separate invocation.[77] Newer reasoners, such as Whelk (introduced around 2024), support combined OWL EL+RL reasoning, enabling efficient inference for biological and other large-scale ontologies.[78]Applications
In Life Sciences
Ontology engineering in the life sciences involves developing structured knowledge representations tailored to biomedical and biological domains, facilitating data integration, annotation, and analysis across diverse datasets. These ontologies address the complexity of biological systems by standardizing terminology for genes, proteins, diseases, and clinical concepts, enabling precise querying and inference in research and healthcare applications. Key examples include the Gene Ontology (GO) and SNOMED CT, which exemplify domain-specific adaptations through hierarchical structures and formal axioms. The Gene Ontology (GO), initiated in 1998 by the Gene Ontology Consortium, provides a controlled vocabulary to describe gene and gene product attributes across organisms.[79] It is organized into three independent branches: molecular function, which captures activities such as catalytic or binding activities at the molecular level; biological process, encompassing series of molecular events like signaling or metabolic pathways; and cellular component, denoting locations such as organelles or supramolecular complexes. This structure supports functional annotation of over 1.6 million gene products from 5,495 species (as of October 2025), promoting interoperability in genomic databases.[80] SNOMED CT (Systematized Nomenclature of Medicine—Clinical Terms) serves as a comprehensive clinical terminology ontology designed for healthcare interoperability, representing clinical information such as diagnoses, procedures, and observations.[81] It encompasses over 375,000 unique concepts, organized hierarchically with formal definitions based on description logics, allowing for consistent encoding in electronic health records across more than 80 countries.[82] This scale enables detailed clinical documentation and supports automated decision-making in patient care systems.[81] Engineering approaches in life sciences ontologies emphasize automation and rigor to handle vast biomedical literature and ensure conceptual accuracy. Automated term extraction from scientific texts using natural language processing (NLP) techniques identifies candidate concepts and relations, as seen in pipelines that process PubMed abstracts to populate ontology hierarchies semi-automatically.[83] Quality assurance incorporates logical axioms, such as disjointness constraints, to prevent overlaps between classes—for instance, ensuring that certain cellular components remain mutually exclusive—and detect inconsistencies during ontology maintenance.[84] These methods, including description logic-based checks, enhance the formal validity of ontologies like GO. Recent developments include integration of ontologies with large language models for automated ontology learning and expansion in biomedicine.[3] A prominent case study is the OBO Foundry, a collaborative initiative establishing principles for building orthogonal ontologies in biology that cover distinct domains without redundancy.[85] Core principles include openness for community contributions, orthogonality to minimize overlap (e.g., separating anatomy from phenotype ontologies), and adherence to formal syntax like OWL for interoperability. This framework has coordinated over 100 ontologies, fostering reuse and integration in model organism databases.[86] The impact of these ontologies is evident in enabling semantic queries across large-scale resources, such as UniProt and PubMed. In UniProt, GO annotations allow federated queries to retrieve proteins by function or process, integrating data from multiple species for comparative genomics.[87] Similarly, tools like GO2PUB expand PubMed searches using GO hierarchies, improving retrieval of relevant literature on gene functions through inheritance-based term expansion.[88] Methodologies like METHONTOLOGY have been adapted for bio-ontologies to guide specification and evaluation phases.[89]In Semantic Web and Knowledge Graphs
Ontology engineering plays a pivotal role in the Semantic Web by providing the schema layer that structures RDF data, enabling the principles of Linked Data through standardized vocabularies like RDFS and OWL. These ontologies define classes, properties, and relationships, allowing machines to interpret and infer new knowledge from distributed datasets represented as triples (subject-predicate-object). For example, RDFS extends RDF with basic schema elements such as rdfs:subClassOf for hierarchical organization, while OWL adds advanced constructs like owl:TransitiveProperty to support complex reasoning and entailment over web-scale data.[90][91] This formalization facilitates the interlinking of datasets, promoting a web of machine-readable information where inference engines can derive implicit facts, such as subclass relationships or property domains, from explicit RDF statements.[90] In knowledge graphs, ontology engineering involves designing schemas to integrate and populate vast, heterogeneous data sources, often drawing from unstructured text. DBpedia exemplifies this approach, where a crowd-sourced ontology with 768 classes and over 3,000 properties serves as the schema for extracting structured entities and relations from Wikipedia articles and infoboxes using natural language processing techniques.[92][93] The extraction process populates the graph with billions of RDF triples (approximately 9.5 billion), enabling queries via SPARQL and linking to other Linked Open Data resources. Similarly, the Google Knowledge Graph leverages ontology-inspired schemas, incorporating types from schema.org to organize entities like people, places, and events extracted from web sources, enhancing search results with contextual inferences.[94] Key techniques in this domain include lightweight ontologies for broad applicability and extensions for specialized dimensions. Schema.org, developed collaboratively by major search engines, functions as a lightweight ontology with over 800 types and 1,500 properties, allowing web publishers to annotate content using simple markup formats like JSON-LD or RDFa without heavy formal semantics.[95] This approach supports rapid schema design for everyday web data, focusing on core domains such as CreativeWork and Organization to improve discoverability. For more advanced needs, YAGO incorporates temporal and spatial extensions into its ontology, anchoring facts to time intervals and geographic coordinates extracted from Wikipedia and GeoNames, resulting in a knowledge base with approximately 49 million entities and 109 million facts (as of 2024).[96] A prominent case study is Wikidata, which employs an upper-level ontology to integrate structured data across Wikipedia's multilingual articles, representing knowledge as semantic triples with items, properties, and values. This ontology enables the reconciliation of diverse sources, supporting complex queries like lineage tracing or population statistics, and serves over 119 million entities (as of August 2025) through APIs compatible with Semantic Web tools.[6][97] By centralizing structured data from Wikimedia projects, Wikidata facilitates cross-language data federation, allowing inferences that bridge linguistic and cultural gaps in global knowledge representation. Recent trends include enhanced use of Wikidata in AI-driven knowledge graphs for real-time data integration. The application of ontologies in knowledge graphs yields significant benefits, particularly in enhancing search precision, recommendation systems, and data federation. In search engines, ontological schemas enable semantic matching beyond keywords, delivering contextually relevant results—such as disambiguating entities via types and relations—which improves user experience in platforms like Google Search.[98] For recommendations, the structured relationships in graphs power personalized suggestions in e-commerce, like Amazon's product graphs, by inferring user preferences through entity linkages. Data federation across silos is streamlined, as ontologies provide a common schema for integrating disparate sources in social platforms, reducing redundancy and enabling scalable analytics without proprietary data movement.[98]Challenges and Trends
Key Challenges
Ontology engineering faces several persistent technical and practical obstacles that hinder the development and deployment of effective knowledge representations. These challenges span from computational limitations to socio-technical issues, impacting the reliability and adoption of ontologies in diverse domains. Addressing them requires a balance between theoretical rigor and practical implementation, often leveraging automated tools for partial mitigation. Scalability remains a core issue in ontology engineering, particularly when handling large-scale ontologies comprising millions of concepts and relations. For instance, reasoning over ontologies expressed in expressive description logics (DLs), such as SHOIN(D), incurs exponential time complexity due to the inherent computational demands of satisfiability checking and inference tasks, which can become prohibitive for real-world applications with vast axiom sets.[99] In domains like healthcare, where ontologies must integrate massive datasets from genomics and electronic health records, static structures struggle to scale efficiently, necessitating modular designs to manage growing data volumes while preserving performance.[100] Biomedical ontologies, such as the Gene Ontology with 39,354 terms (as of October 2025), exemplify this, where continuous integration pipelines are employed to handle interdependencies, yet full reasoning remains resource-intensive.[101][102] Interoperability barriers arise from semantic drift and vocabulary mismatches, complicating the alignment and integration of ontologies across domains. Semantic drift occurs as concepts evolve over ontology versions, leading to gradual shifts in meaning that undermine mappings and reuse; for example, changes in concept extensions or intentions can result in up to 25-31% term overlap but less than 9% actual reuse in biomedical contexts.[103][101] These mismatches manifest in hierarchical misalignments and terminological heterogeneity, as seen in standards like SNOMED CT and HL7 FHIR, where disparate representations of the same domain knowledge impede seamless data exchange without extensive manual reconciliation.[100] Automated alignment techniques, while helpful, often fail to fully resolve these drifts, exacerbating silos in multi-ontology environments. Maintenance overhead poses significant challenges due to the dynamic nature of domains, requiring ontologies to evolve in response to new knowledge or requirements while minimizing disruption. Ontology evolution involves detecting changes from sources like usage logs or external corpora, suggesting modifications, validating them for consistency, and assessing impacts on dependent systems—a process that is resource-intensive and lacks standardized change languages, leading to high manual effort.[104] In practice, tools like Protégé facilitate versioning, but propagating updates across modular or distributed ontologies incurs substantial storage and communication costs, particularly when ensuring backward compatibility with applications and queries.[104] This overhead deters widespread adoption, as seen in Semantic Web projects where fragmented evolution strategies increase long-term costs without automated mechanisms for propagation.[104] Quality assurance in ontology engineering is complicated by the need to detect redundancies, inconsistencies, and incompleteness without relying solely on exhaustive manual reviews. Redundant hierarchical relations, such as multiple paths between concepts in is-a structures, can inflate ontology size and degrade inference accuracy; for example, the SNOMED CT ontology contains hundreds of such redundancies that automated tools like FEDRR can identify in seconds using dynamic programming.[105] Inconsistencies, including logical contradictions from evolving axioms, further compromise trustworthiness, with reasoners providing partial automated detection but struggling at scale due to DL complexity.[105] Incompleteness assessments remain subjective, often requiring domain-specific metrics to evaluate coverage, yet current methods fall short of comprehensive validation, leading to persistent quality gaps in large ontologies like the Gene Ontology.[101] Human factors introduce additional hurdles, particularly in balancing ontology expressivity with usability for non-experts in collaborative settings. The steep learning curve of formalisms like OWL and tools like Protégé alienates domain specialists without technical backgrounds, resulting in modeling errors and low adoption rates in team-based development.[101] Collaborative ontology engineering demands intuitive interfaces for non-experts to contribute without deep semantic knowledge, yet existing methodologies often prioritize expressivity over accessibility, leading to usability issues such as cumbersome annotation processes and poor support for iterative feedback.[106] In distributed environments, these factors amplify coordination challenges, where mismatched expertise levels hinder consensus on concept definitions and alignments.[107] Reasoners and simpler design patterns offer some mitigation by automating consistency checks, allowing focus on conceptual contributions.[101]Emerging Developments
Recent advancements in ontology engineering have increasingly incorporated large language models (LLMs) for automated ontology learning, enabling the extraction of concepts, relations, and axioms from unstructured text sources.[108] A notable example is the NeOn-GPT pipeline, introduced in 2024, which integrates the structured NeOn methodology with LLMs to translate natural language descriptions into formal ontology representations, facilitating semi-automated schema generation and reducing manual effort in domain-specific ontology development.[108] This approach was highlighted in the LLMs4OL 2024 challenge, the first dedicated evaluation of LLMs for ontology learning tasks, demonstrating improved accuracy in concept extraction from diverse datasets compared to traditional rule-based methods.[109] This progress continued with the second LLMs4OL challenge at ISWC 2025, further evaluating LLMs for ontology learning tasks.[110] Hybrid approaches combining ontologies with graph neural networks (GNNs) have emerged to support dynamic knowledge graphs, allowing for real-time updates and inference over evolving data structures.[111] For instance, neuro-symbolic frameworks leverage GNNs for embedding learning while preserving ontological constraints, enabling scalable reasoning in enterprise knowledge graphs without sacrificing interpretability.[112] These methods address scalability challenges by iteratively refining graph representations through symbolic reasoning, as seen in approaches like ReasonKGE, which corrects inconsistent predictions in knowledge graph embeddings.[113] Advances in ontology modularization have benefited from AI-assisted alignment techniques using embeddings, particularly post-2020 developments with models like BERT.[114] The BERTMap system, for example, fine-tunes BERT on textual ontology elements to predict entity matches in both unsupervised and semi-supervised settings, achieving higher precision in aligning large-scale ontologies than classical string-based matchers.[114] This embedding-driven modularization supports the decomposition and reuse of ontology components, enhancing interoperability in distributed systems. Emerging trends include ontology-based explainable AI (XAI), where ontologies provide structured vocabularies to generate human-interpretable explanations for AI decisions.[115] Ontologies serve multiple roles in XAI, such as defining explanation scopes and anchoring post-hoc interpretations to domain knowledge, as explored in surveys on semantic-based XAI applications in manufacturing.[116] Additionally, federated learning integrated with ontologies enables privacy-preserving knowledge sharing by aligning distributed models through shared ontological schemas without exchanging raw data.[117] Frameworks like ontology-guided federated unlearning use knowledge distillation to remove sensitive information while maintaining model utility across institutions.[118] Recent milestones from the ESWC 2024 conference underscore LLM pipelines for ontology engineering, including end-to-end workflows for automated term extraction and axiom generation.[108] Furthermore, integrations with vector databases enhance semantic search over ontological knowledge graphs by combining graph traversals with vector similarity queries, as in hybrid RAG systems that boost retrieval accuracy for complex queries.[119]References
- https://www.wikidata.org/wiki/Wikidata:Statistics
