Hubbry Logo
Graph edit distanceGraph edit distanceMain
Open search
Graph edit distance
Community hub
Graph edit distance
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Graph edit distance
Graph edit distance
from Wikipedia
At first glance, the GED (graph edit distance) may appear to be 7, removing 3 edges, adding the white vertex, and adding edges between it and the 3 other vertices. However, the optimal set of operations would be to remove the edge between 2 colors of choice (for example, red and blue), change the third (green) to white, add a vertex of the now missing color (green), and connect it to the newly white vertex, for a GED of 4.

In mathematics and computer science, graph edit distance (GED) is a measure of similarity (or dissimilarity) between two graphs. The concept of graph edit distance was first formalized mathematically by Alberto Sanfeliu and King-Sun Fu in 1983.[1] A major application of graph edit distance is in inexact graph matching, such as error-tolerant pattern recognition in machine learning.[2]

The graph edit distance between two graphs is related to the string edit distance between strings. With the interpretation of strings as connected, directed acyclic graphs of maximum degree one, classical definitions of edit distance such as Levenshtein distance,[3][4] Hamming distance[5] and Jaro–Winkler distance may be interpreted as graph edit distances between suitably constrained graphs. Likewise, graph edit distance is also a generalization of tree edit distance between rooted trees.[6][7][8][9][10]

Formal definitions and properties

[edit]

The mathematical definition of graph edit distance is dependent upon the definitions of the graphs over which it is defined, i.e. whether and how the vertices and edges of the graph are labeled and whether the edges are directed. Generally, given a set of graph edit operations (also known as elementary graph operations), the graph edit distance between two graphs and , written as can be defined as

where denotes the set of edit paths transforming into (a graph isomorphic to) and is the cost of each graph edit operation .

The set of elementary graph edit operators typically includes:

vertex insertion to introduce a single new labeled vertex to a graph.
vertex deletion to remove a single (often disconnected) vertex from a graph.
vertex substitution to change the label (or color) of a given vertex.
edge insertion to introduce a new colored edge between a pair of vertices.
edge deletion to remove a single edge between a pair of vertices.
edge substitution to change the label (or color) of a given edge.

Additional, but less common operators, include operations such as edge splitting that introduces a new vertex into an edge (also creating a new edge), and edge contraction that eliminates vertices of degree two between edges (of the same color). Although such complex edit operators can be defined in terms of more elementary transformations, their use allows finer parameterization of the cost function when the operator is cheaper than the sum of its constituents.

A deep analysis of the elementary graph edit operators is presented in [11][12][13]

And some methods have been presented to automatically deduce these elementary graph edit operators.[14][15][16][17][18] And some algorithms learn these costs online:[19]

Applications

[edit]

Graph edit distance finds applications in handwriting recognition,[20] fingerprint recognition[21] and cheminformatics.[22]

Algorithms and complexity

[edit]

Exact algorithms for computing the graph edit distance between a pair of graphs typically transform the problem into one of finding the minimum cost edit path between the two graphs. The computation of the optimal edit path is cast as a pathfinding search or shortest path problem, often implemented as an A* search algorithm.

In addition to exact algorithms, a number of efficient approximation algorithms are also known. Most of them have cubic computational time [23][24] [25] [26] [27]

Moreover, there is an algorithm that deduces an approximation of the GED in linear time [28]

Despite the above algorithms sometimes working well in practice, in general the problem of computing graph edit distance is NP-hard (for a proof that's available online, see Section 2 of Zeng et al.), and is even hard to approximate (formally, it is APX-hard[29]).

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Graph edit distance (GED) is a measure of similarity between two graphs, defined as the minimum cost required to transform one graph into the other through a sequence of edit operations, including the insertion, deletion, and substitution of nodes and edges. These operations can be assigned costs based on attributes of the nodes and edges, allowing GED to handle both unlabeled and attributed graphs in an error-tolerant manner for inexact . Formally, for two graphs G1=(V1,E1)G_1 = (V_1, E_1) and G2=(V2,E2)G_2 = (V_2, E_2), GED is the cost of the least expensive edit path, often modeled as an where mappings between nodes and edges minimize the total edit cost. The concept of GED was first formalized by Sanfeliu and Fu in 1983 as a distance measure between attributed relational graphs for tasks. It built on earlier ideas from edit distance and was further developed by Bunke and Allermann in 1983 for inexact in structural pattern recognition, emphasizing its role in handling distortions and noise in graph representations. Subsequent work by Messmer and Bunke in the 1990s introduced extensions like error-tolerant subgraph matching and probabilistic interpretations, establishing GED as a foundational metric in graph-based . Over time, GED has been generalized to include various cost functions, such as those forming convex polygons in labeling spaces, which influence the shape and properties of the edit surface. Computing the exact GED is NP-hard, equivalent to finding the minimum-cost edit path in a search space that grows factorially with graph size, often reduced to a quadratic assignment problem or for approximations. Exact algorithms, such as branch-and-bound or A* search, are feasible only for small graphs (typically under 20 vertices), while heuristics like , self-organizing maps for cost learning, or neural network-based approximations (e.g., via graph neural networks) enable efficient for larger structures. Key challenges include defining appropriate edit costs, handling attribute mismatches, and scaling to real-world datasets, with recent advances incorporating to estimate distances in sub-quadratic time. GED finds broad applications in , such as offline handwritten character recognition, shape and image analysis, and video indexing, where graphs represent structural relationships tolerant to deformations. In bioinformatics, it measures similarity between molecular structures or protein interaction networks; in , it supports sketch-to-photo matching and scene recognition; and in , it underpins tasks like graph classification and clustering by providing a flexible dissimilarity metric. Its adaptability to different domains, combined with ongoing research into faster approximations and learned edit costs, continues to drive its relevance in graph-based .

Fundamentals

Definition

Graph edit distance (GED) is a measure of similarity between two graphs, defined as the minimum total cost of a sequence of edit operations required to transform a source graph G1G_1 into a target graph G2G_2. This edit path allows for structural and label differences to be quantified in a flexible manner, making GED particularly useful for inexact in scenarios where perfect is unlikely. The concept was introduced to address error-tolerant comparisons in recognition, where graphs represent relational data with potential noise or variations. Formally, let G1=(V1,E1,1)G_1 = (V_1, E_1, \ell_1) and G2=(V2,E2,2)G_2 = (V_2, E_2, \ell_2) be two undirected labeled graphs, where ViV_i denotes the vertex set, EiE_i the set, and i\ell_i the labeling function assigning labels from finite alphabets to both vertices (iV:ViΣV\ell_{i_V}: V_i \to \Sigma_V) and edges (iE:EiΣE\ell_{i_E}: E_i \to \Sigma_E). The GED is given by GED(G1,G2)=min{k=1mc(ok)  |  (o1,,om)Ψ(G1,G2)},\text{GED}(G_1, G_2) = \min \left\{ \sum_{k=1}^m c(o_k) \;\middle|\; (o_1, \dots, o_m) \in \Psi(G_1, G_2) \right\},
Add your contribution
Related Hubs
User Avatar
No comments yet.