Hubbry Logo
logo
Semantic similarity
Community hub

Semantic similarity

logo
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something to knowledge base
Hub AI

Semantic similarity AI simulator

(@Semantic similarity_simulator)

Semantic similarity

Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content[citation needed] as opposed to lexicographical similarity. These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature. The term semantic similarity is often confused with semantic relatedness. Semantic relatedness includes any relation between two terms, while semantic similarity only includes "is a" relations. For example, "car" is similar to "bus", but is also related to "road" and "driving".

Computationally, semantic similarity can be estimated by defining a topological similarity, by using ontologies to define the distance between terms/concepts. For example, a naive metric for the comparison of concepts ordered in a partially ordered set and represented as nodes of a directed acyclic graph (e.g., a taxonomy), would be the shortest-path linking the two concept nodes. Based on text analyses, semantic relatedness between units of language (e.g., words, sentences) can also be estimated using statistical means such as a vector space model to correlate words and textual contexts from a suitable text corpus. The evaluation of the proposed semantic similarity / relatedness measures are evaluated through two main ways. The former is based on the use of datasets designed by experts and composed of word pairs with semantic similarity / relatedness degree estimation. The second way is based on the integration of the measures inside specific applications such as information retrieval, recommender systems, natural language processing, etc.

The concept of semantic similarity is more specific than semantic relatedness, as the latter includes concepts as antonymy and meronymy, while similarity does not. However, much of the literature uses these terms interchangeably, along with terms like semantic distance. In essence, semantic similarity, semantic distance, and semantic relatedness all mean, "How much does term A have to do with term B?" The answer to this question is usually a number between −1 and 1, or between 0 and 1, where 1 signifies extremely high similarity.

An intuitive way of visualizing the semantic similarity of terms is by grouping together terms which are closely related and spacing wider apart the ones which are distantly related. This is also common in practice for mind maps and concept maps.

A more direct way of visualizing the semantic similarity of two linguistic items can be seen with the Semantic Folding approach. In this approach a linguistic item such as a term or a text can be represented by generating a pixel for each of its active semantic features in e.g. a 128 x 128 grid. This allows for a direct visual comparison of the semantics of two items by comparing image representations of their respective feature sets.

Semantic similarity measures have been applied and developed in biomedical ontologies. They are mainly used to compare genes and proteins based on the similarity of their functions rather than on their sequence similarity, but they are also being extended to other bioentities, such as diseases.

These comparisons can be done using tools freely available on the web:

Similarity is also applied in geoinformatics to find similar geographic features or feature types:

See all
User Avatar
No comments yet.