Hubbry Logo
search
logo
1639682

Graph embedding

logo
Community Hub0 Subscribers
Read side by side
from Wikipedia

The Heawood graph and associated map embedded in the torus.

In topological graph theory, an embedding (also spelled imbedding) of a graph on a surface is a representation of on in which points of are associated with vertices and simple arcs (homeomorphic images of ) are associated with edges in such a way that:

  • the endpoints of the arc associated with an edge are the points associated with the end vertices of
  • no arcs include points associated with other vertices,
  • two arcs never intersect at a point which is interior to either of the arcs.

Here a surface is a connected -manifold.

Informally, an embedding of a graph into a surface is a drawing of the graph on the surface in such a way that its edges may intersect only at their endpoints. It is well known that any finite graph can be embedded in 3-dimensional Euclidean space .[1] A planar graph is one that can be embedded in 2-dimensional Euclidean space

Often, an embedding is regarded as an equivalence class (under homeomorphisms of ) of representations of the kind just described.

Some authors define a weaker version of the definition of "graph embedding" by omitting the non-intersection condition for edges. In such contexts the stricter definition is described as "non-crossing graph embedding".[2]

This article deals only with the strict definition of graph embedding. The weaker definition is discussed in the articles "graph drawing" and "crossing number".

Terminology

[edit]

If a graph is embedded on a closed surface , the complement of the union of the points and arcs associated with the vertices and edges of is a family of regions (or faces).[3] A 2-cell embedding, cellular embedding or map is an embedding in which every face is homeomorphic to an open disk.[4] A closed 2-cell embedding is an embedding in which the closure of every face is homeomorphic to a closed disk.

The genus of a graph is the minimal integer such that the graph can be embedded in a surface of genus . In particular, a planar graph has genus , because it can be drawn on a sphere without self-crossing. A graph that can be embedded on a torus is called a toroidal graph.

The non-orientable genus of a graph is the minimal integer such that the graph can be embedded in a non-orientable surface of (non-orientable) genus .[3]

The Euler genus of a graph is the minimal integer such that the graph can be embedded in an orientable surface of (orientable) genus or in a non-orientable surface of (non-orientable) genus . A graph is orientably simple if its Euler genus is smaller than its non-orientable genus.

The maximum genus of a graph is the maximal integer such that the graph can be -cell embedded in an orientable surface of genus .

Combinatorial embedding

[edit]

An embedded graph uniquely defines cyclic orders of edges incident to the same vertex. The set of all these cyclic orders is called a rotation system. Embeddings with the same rotation system are considered to be equivalent and the corresponding equivalence class of embeddings is called combinatorial embedding (as opposed to the term topological embedding, which refers to the previous definition in terms of points and curves). Sometimes, the rotation system itself is called a "combinatorial embedding".[5][6][7]

An embedded graph also defines natural cyclic orders of edges which constitutes the boundaries of the faces of the embedding. However handling these face-based orders is less straightforward, since in some cases some edges may be traversed twice along a face boundary. For example this is always the case for embeddings of trees, which have a single face. To overcome this combinatorial nuisance, one may consider that every edge is "split" lengthwise in two "half-edges", or "sides". Under this convention in all face boundary traversals each half-edge is traversed only once and the two half-edges of the same edge are always traversed in opposite directions.

Other equivalent representations for cellular embeddings include the ribbon graph, a topological space formed by gluing together topological disks for the vertices and edges of an embedded graph, and the graph-encoded map, an edge-colored cubic graph with four vertices for each edge of the embedded graph.

Computational complexity

[edit]

The problem of finding the graph genus is NP-hard (the problem of determining whether an -vertex graph has genus is NP-complete).[8]

At the same time, the graph genus problem is fixed-parameter tractable, i.e., polynomial time algorithms are known to check whether a graph can be embedded into a surface of a given fixed genus as well as to find the embedding.

The first breakthrough in this respect happened in 1979, when algorithms of time complexity O(nO(g)) were independently submitted to the Annual ACM Symposium on Theory of Computing: one by I. Filotti and G.L. Miller and another one by John Reif. Their approaches were quite different, but upon the suggestion of the program committee they presented a joint paper.[9] However, Wendy Myrvold and William Kocay proved in 2011 that the algorithm given by Filotti, Miller and Reif was incorrect.[10]

In 1999 it was reported that the fixed-genus case can be solved in time linear in the graph size and doubly exponential in the genus.[11]

Embeddings of graphs into higher-dimensional spaces

[edit]

It is known that any finite graph can be embedded into a three-dimensional space.[1]

One method for doing this is to place the points on any line in space and to draw the edges as curves each of which lies in a distinct halfplane, with all halfplanes having that line as their common boundary. An embedding like this in which the edges are drawn on halfplanes is called a book embedding of the graph. This metaphor comes from imagining that each of the planes where an edge is drawn is like a page of a book. It was observed that in fact several edges may be drawn in the same "page"; the book thickness of the graph is the minimum number of halfplanes needed for such a drawing.

Alternatively, any finite graph can be drawn with straight-line edges in three dimensions without crossings by placing its vertices in general position so that no four are coplanar. For instance, this may be achieved by placing the ith vertex at the point (i,i2,i3) of the moment curve.

An embedding of a graph into three-dimensional space in which no two of the cycles are topologically linked is called a linkless embedding. A graph has a linkless embedding if and only if it does not have one of the seven graphs of the Petersen family as a minor.

[edit]

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Graph embedding is a concept in graph theory and computer science with two primary interpretations. In classical graph theory, it refers to a representation of a graph on a surface (such as a plane or sphere) or in higher-dimensional space where vertices are mapped to distinct points and edges to non-intersecting curves, preserving adjacency relations without crossings; this is central to topics like planarity and genus.[1] In the modern machine learning context, also known as graph representation learning, it involves mapping nodes, edges, or substructures of a graph $ G = (V, E) $ into a low-dimensional vector space of dimension $ d $ (where $ d \ll |V| $), such that the structural properties and relational information of the original graph are preserved as much as possible. This process transforms high-dimensional, sparse graph data into dense, continuous embeddings that facilitate downstream machine learning tasks by enabling efficient computation on relational structures.[2][3] The classical notion of graph embedding dates back to the 19th century with studies on planar graphs, evolving through key results like Kuratowski's theorem in 1930; detailed historical context is provided in the Fundamentals section. In parallel, the development of representation learning techniques traces back to the early 2000s, when initial efforts focused on dimensionality reduction for non-relational data using similarity graphs, such as Laplacian eigenmaps for manifold learning. A significant shift occurred around 2010 with the advent of direct graph embedding methods that leverage auxiliary information and deep learning to capture complex network proximities, addressing challenges in scalability and expressiveness for large-scale graphs.[2] By the mid-2010s, foundational algorithms like DeepWalk (2014), which applies random walks and SkipGram models to learn node representations preserving second-order proximity, and node2vec (2016), which introduces biased random walks for flexible neighborhood exploration, established random walk-based approaches as a cornerstone.[2] Graph embedding methods in representation learning are broadly categorized into several paradigms, including matrix factorization techniques that decompose graph adjacency or Laplacian matrices to capture global similarities (e.g., LINE for edge preservation); random walk-based methods that sample paths to model local microstructures; deep learning approaches such as Graph Convolutional Networks (GCNs, 2017) for message passing on node features and Graph Attention Networks (GATs, 2017) for weighted neighbor aggregation; and edge reconstruction models like TransE (2013) for knowledge graph embeddings via translational distances. More recent advancements incorporate graph isomorphism networks (GINs, 2019) for distinguishing graph structures and graph transformers (e.g., Graphormer, 2021) for long-range dependencies.[2][3] Detailed discussions of topological and geometric embeddings appear in their respective sections. These embeddings, particularly in the representation learning sense, enable a wide array of applications across domains, including node-level tasks like classification and clustering in social networks, edge-level predictions such as link prediction and recommendation in e-commerce systems, and graph-level analyses like molecular property prediction in bioinformatics or visualization in network analysis. In knowledge graphs, embeddings support entity resolution and relation inference, while in dynamic settings, they aid anomaly detection in temporal networks.[2][3] Trends in representation learning since 2020 have emphasized self-supervised and contrastive learning paradigms to reduce reliance on labeled data, with methods like InfoGraph (2020) maximizing mutual information between graph views, alongside extensions to heterogeneous, multimodal, and temporal graphs for real-world scenarios involving diverse data modalities or evolving structures. As of 2025, further advancements include the integration of large language models (LLMs) with knowledge graph embeddings to enhance semantic understanding and reasoning.[3][4]

Fundamentals

Definition and Terminology

In graph theory, particularly within topological graph theory, a graph embedding refers to a representation of a graph G=(V(G),E(G))G = (V(G), E(G)) on a host structure, such as a surface or Euclidean space, where vertices are mapped to distinct points and edges to curves or line segments connecting those points, such that adjacency is preserved: for vertices u,vV(G)u, v \in V(G), if (u,v)E(G)(u, v) \in E(G), then the images f(u)f(u) and f(v)f(v) are adjacent via the image of the edge under the mapping ff. In such embeddings, edges may intersect only at shared endpoints (vertices), ensuring no improper crossings in the interior. This preservation extends to non-adjacency conditions in certain contexts, where non-adjacent vertices map to non-adjacent points in the host.[5] Key terminology includes the crossing number of a graph, denoted cr(G)\operatorname{cr}(G), which is the minimum number of edge crossings over all possible drawings of GG in the plane.[6] The genus of a graph g(G)g(G) is the smallest integer gg such that GG admits an embedding without crossings on an orientable surface of genus gg (e.g., a sphere for g=0g=0, a torus for g=1g=1).[7] A straight-line embedding represents edges as straight line segments between vertex points, as guaranteed for any planar graph by Fáry's theorem. Minor-closed families are classes of graphs closed under taking minors (subgraphs obtained by deleting vertices/edges and contracting edges), such as the family of planar graphs, which exclude K5K_5 and K3,3K_{3,3} as minors.[8] Embeddings are typically required to be injective, meaning no two vertices map to the same point, distinguishing them from immersions, where the mapping is locally injective but edges may cross without overlapping interiors (though vertices remain distinct). The concept originates from topology and early graph theory studies in the 1930s, particularly investigations into planar graphs by Kazimierz Kuratowski, who characterized non-planar graphs via forbidden subdivisions, and Hassler Whitney, who introduced combinatorial approaches to planarity.[5] It is important to distinguish these topological and geometric embeddings from the notion of graph embedding in machine learning and representation learning, where the term refers to mapping nodes (and sometimes edges or subgraphs) into low-dimensional vector spaces to capture structural similarities for tasks like node classification or link prediction, often via methods such as random walks or graph neural networks.[9] In contrast, graph theory embeddings focus on spatial or surface placements preserving combinatorial or geometric properties without dimensional reduction for computation.[6]

Historical Context

The concept of graph embedding originated in the field of topology during the early 20th century, where mathematicians sought to characterize graphs that could be drawn on a plane without edge crossings. In 1930, Kazimierz Kuratowski established a foundational result by proving that a graph is non-planar if and only if it contains a subdivision of K5K_5 or K3,3K_{3,3}, providing a forbidden subgraph criterion for planarity.[10] This theorem marked a pivotal milestone in understanding the embeddability of graphs in the plane. Two years later, in 1932, Hassler Whitney offered a dual characterization of planar graphs using cycle spaces and abstract duality, further solidifying the topological foundations of graph embeddings.[11] These early contributions by Kuratowski and Whitney shifted focus from geometric drawings to abstract combinatorial properties, influencing subsequent developments in graph theory. Building on these topological insights, mid-20th-century research emphasized constructive and computational aspects of embeddings. In 1948, István Fáry demonstrated that every planar graph admits a straight-line embedding in the plane without crossings, bridging abstract planarity with explicit geometric realizations.[12] This result complemented earlier work and highlighted the feasibility of non-curved representations. By the 1960s and 1970s, attention turned to embeddings on higher-genus surfaces, culminating in the 1968 solution to the Heawood conjecture by Gerhard Ringel and J. W. T. Youngs, who proved the map color theorem for orientable surfaces by constructing explicit genus embeddings for complete graphs.[13] Concurrently, the computational turn arrived with the 1974 linear-time planarity testing algorithm by John Hopcroft and Robert Tarjan, which efficiently determines embeddability and produces an embedding when possible, enabling practical applications in algorithm design. Frank Harary played a key role in this era through his standardization of graph theory terminology and exploration of embeddings in topological contexts, as detailed in his influential 1969 textbook. From the 2000s onward, graph embedding evolved into an interdisciplinary tool, particularly in machine learning for representation learning on networks. This shift was propelled by the need to encode graph structures into low-dimensional vector spaces for tasks like node classification and link prediction. A seminal advancement came in 2014 with Bryan Perozzi et al.'s DeepWalk, which adapted word2vec techniques to generate node embeddings via random walks on graphs, capturing structural similarities in social and biological networks.[14] This work marked the transition from pure mathematical topology and algorithms to data-driven AI applications, fostering a surge in scalable embedding methods for large-scale graphs.

Topological Embeddings

Combinatorial Embeddings

In combinatorial graph theory, an embedding of a graph is specified abstractly by a rotation system, which consists of a cyclic ordering of the edges incident to each vertex, independent of any geometric realization. This system defines the facial walks of the embedding by tracing cycles formed by successive edges in the rotations and their reverses, thereby determining the faces and the associated dual graph.[15] A key property of such embeddings on orientable surfaces is the generalized Euler characteristic, given by
VE+F=22g, V - E + F = 2 - 2g,
where VV is the number of vertices, EE the number of edges, FF the number of faces, and gg the genus of the surface. Additionally, the handshaking lemma for faces holds: the sum of the degrees of the faces equals 2E2E, ensuring that each edge bounds exactly two faces. An embedding is valid if the rotation system produces closed facial walks and satisfies the Euler characteristic for some genus g0g \geq 0.[15][16] Two combinatorial embeddings are equivalent if there exists a homeomorphism of the underlying surface that preserves the rotation systems at each vertex. For planar graphs (embeddable on the sphere, where g=0g=0), Whitney's theorem establishes that every 3-connected graph admits a unique embedding up to reflection and homeomorphism of the sphere.[17] Representative examples illustrate these concepts: a cycle graph CnC_n (for n3n \geq 3) embeds combinatorially on the sphere with two faces (the interior and exterior), satisfying Euler's formula with V=nV = n, E=nE = n, and F=2F = 2. In contrast, the complete graph K5K_5 cannot embed on the sphere but admits multiple combinatorial embeddings on the torus (g=1g=1), such as a rectangular embedding where faces are quadrilaterals.[15][18]

Embeddings on Surfaces

Graph embeddings on surfaces extend the concept of planar embeddings to two-dimensional manifolds with higher topological complexity. These surfaces are broadly classified into orientable and non-orientable types. Orientable surfaces include the sphere, which has genus 0 and corresponds to the plane via stereographic projection, and the torus, which has genus 1. Non-orientable surfaces encompass the projective plane, equivalent to a sphere with one crosscap, and the Klein bottle, formed by a sphere with two crosscaps. Embeddings on such surfaces require drawing the graph without edge crossings, preserving the topological properties of the surface.[15] The orientable genus of a graph GG is defined as the smallest integer g0g \geq 0 such that GG admits a crossing-free embedding on an orientable surface of genus gg. For instance, the complete graph K7K_7 has orientable genus 1 and can be embedded on the torus, where its 21 edges divide the surface into 14 triangular faces, saturating Euler's formula for the torus (VE+F=0V - E + F = 0). In contrast, K7K_7 is non-planar and requires crossings on the sphere. The formula for the genus of the complete graph KnK_n (for n3n \geq 3) is γ(Kn)=(n3)(n4)12\gamma(K_n) = \left\lceil \frac{(n-3)(n-4)}{12} \right\rceil, which approximates (n3)(n4)12\frac{(n-3)(n-4)}{12} for large nn and provides a lower bound derived from Euler's characteristic.[19][20] A key result in this area is the Heawood number H(g)H(g), which bounds the maximum chromatic number of any graph embeddable on an orientable surface of genus g1g \geq 1:
H(g)=7+1+48g2. H(g) = \left\lfloor \frac{7 + \sqrt{1 + 48g}}{2} \right\rfloor.
This formula arises from applying a greedy coloring algorithm to graphs satisfying the surface's Euler characteristic, yielding an upper bound on the chromatic number. For non-embeddable graphs on a given surface, the surface crossing number is the minimum number of edge crossings over all possible drawings on that surface, generalizing the classical crossing number to topological constraints. Bounds on surface crossing numbers often rely on Euler's formula and density arguments, with algorithms achieving approximations within O(log2g)O(\log^2 g) of the optimum for genus gg.[13][6][21] Illustrative examples highlight these concepts. The Petersen graph, a non-planar 3-regular graph on 10 vertices, embeds without crossings on the projective plane, a non-orientable surface of genus 1, where it forms a symmetric embedding with pentagonal faces. Rectangular grid graphs, such as the m×nm \times n grid, embed naturally on the torus via periodic boundary conditions, forming a toroidal grid that wraps around both dimensions without crossings. The Ringel-Youngs theorem (1968) resolved the Heawood map-coloring conjecture by explicitly constructing embeddings of complete graphs KnK_n on orientable surfaces of genus γ(Kn)\gamma(K_n), proving that the chromatic number of such surfaces equals H(g)H(g) for g1g \geq 1, except for the non-orientable Klein bottle (where it is 6 instead of 7). These constructions, often using voltage graphs or current graphs, saturated the Euler bound and confirmed the theorem for all cases.[22][13]

Geometric Embeddings

Planar Embeddings

A planar embedding of a graph is a drawing of the graph in the Euclidean plane in which vertices are represented by distinct points, edges by continuous curves (Jordan arcs) connecting their incident vertices, and no two edges intersect except possibly at shared endpoints.[23] This definition ensures the embedding is crossing-free, preserving the combinatorial structure without topological distortions beyond the plane's constraints. Embeddings in the plane are topologically equivalent to those on the sphere, as any planar drawing can be projected onto the sphere via stereographic projection, and vice versa. The planarity of a graph is characterized by forbidden subgraph criteria, most notably Kuratowski's theorem from 1930, which states that a finite undirected graph is planar if and only if it contains no subgraph homeomorphic to (i.e., a subdivision of) the complete graph $ K_5 $ or the complete bipartite graph $ K_{3,3} $.[10] Independently, Wagner's theorem provides an equivalent characterization in terms of minors: a graph is planar if and only if it has neither $ K_5 $ nor $ K_{3,3} $ as a minor, where a minor is obtained by deleting vertices/edges and contracting edges (merging adjacent vertices and removing self-loops).[24] These theorems are linked because subdivisions relate to topological minors, while contractions enable reduction to core forbidden structures; specifically, any subdivision of $ K_5 $ or $ K_{3,3} $ contains one of them as a minor, and the operations preserve non-planarity.[25] Fáry's theorem, established in 1948, extends the representational possibilities for planar graphs by proving that every simple planar graph admits a straight-line embedding in the plane, where all edges are represented as straight line segments without crossings.[26] This result highlights the flexibility of planar embeddings, allowing geometric realizations that simplify computations in areas like computational geometry. A fundamental relation governing planar embeddings is Euler's formula, which for any connected planar graph with $ v $ vertices, $ e $ edges, and $ f $ faces (including the infinite outer face) satisfies:
ve+f=2. v - e + f = 2.
This formula arises from the topology of the plane (Euler characteristic 2) and holds for any maximal embedding, providing bounds on graph density (e.g., $ e \leq 3v - 6 $ for simple planar graphs with $ v \geq 3 $).[27] Examples of planar graphs include all trees, which embed without cycles and thus without crossings, and outerplanar graphs, which admit embeddings where all vertices lie on the boundary of the outer face.[27] In contrast, the utility graph $ K_{3,3} $—modeling connections between three houses and three utilities—is non-planar, as it is itself a forbidden subdivision under Kuratowski's theorem, famously demonstrating the impossibility of crossing-free connections in the plane.[27] To verify planarity via these criteria, a proof sketch for the equivalence and forbidden structure detection relies on minors and contractions. Assume a graph $ G $ is non-planar; then, by maximality arguments, it must contain a subdivision of $ K_5 $ or $ K_{3,3} $, as any planar supergraph would contradict embedding properties. Conversely, if $ G $ has a $ K_5 $ or $ K_{3,3} $ minor, non-planarity follows because contracting edges in a hypothetical planar embedding of $ G $ yields a planar minor, but $ K_5 $ and $ K_{3,3} $ are known non-planar (e.g., $ K_5 $ violates Euler's formula with $ v=5 $, $ e=10 $, implying $ f=7 $ but exceeding face degree bounds). The detection process iteratively applies deletions and contractions to simplify $ G $ toward a forbidden minor: delete non-incident edges/vertices to isolate branches, then contract paths in subdivisions to recover $ K_5 $ or $ K_{3,3} $, preserving the embedding contradiction at each step since contractions merge faces without introducing crossings in planar graphs. This reduction establishes the if-and-only-if condition, with full proofs often using induction on vertex count and case analysis for 3-connected components.[24]

Higher-Dimensional Embeddings

Higher-dimensional embeddings of graphs involve mapping the vertices to distinct points in Euclidean space Rd\mathbb{R}^d for d3d \geq 3, with edges represented as straight-line segments between these points. In dimensions greater than or equal to three, such embeddings permit edge crossings in general, though constructions exist that avoid them entirely for arbitrary graphs. This contrasts with two-dimensional embeddings, where crossings must be absent for planar graphs to maintain a valid drawing.[28] Planar graphs admit straight-line crossing-free embeddings in R2\mathbb{R}^2, particularly for 3-connected ones by Fáry's theorem. For general graphs, R3\mathbb{R}^3 suffices for straight-line embeddings without edge crossings. A canonical construction positions the nn vertices along the moment curve parametrized by (t,t2,t3)(t, t^2, t^3) for distinct real numbers t1<t2<<tnt_1 < t_2 < \cdots < t_n; the resulting straight-line segments for edges do not intersect except at shared vertices, due to the curve's property that any four points span a tetrahedron of positive volume, preventing interior intersections of chords. This approach yields a crossing-free straight-line embedding of any graph in R3\mathbb{R}^3, including dense graphs like the complete graph KnK_n.[29] The nn-dimensional hypercube graph QnQ_n embeds naturally and crossing-free in Rn\mathbb{R}^n, with vertices placed at the 2n2^n corners of the unit hypercube {0,1}n\{0,1\}^n and edges as axis-parallel segments of length 1. This realization preserves the graph's bipartite structure and connectivity while leveraging the ambient space's coordinates. Beyond topological properties, higher-dimensional embeddings often aim to preserve metric aspects, such as shortest-path distances between vertices. Bourgain's theorem guarantees that any finite metric space with nn points can be embedded into an p\ell_p space (1p<1 \leq p < \infty) with distortion at most O(lognloglogn)O(\sqrt{\log n \log \log n}), meaning distances are approximated up to this multiplicative factor. For graph metrics induced by shortest paths, this enables low-distortion embeddings into Euclidean space Rd\mathbb{R}^d with d=O(logn)d = O(\log n), facilitating applications like dimension reduction for analysis. Such embeddings find practical use in VLSI design, where multi-layer chip layouts benefit from 3D straight-line representations to optimize wire routing and minimize interlayer crossings, and in network visualization, enabling intuitive 3D layouts of complex interconnections for better comprehension of large-scale systems.[28]

Computational Aspects

Planarity Testing Algorithms

Planarity testing algorithms determine whether a given graph admits a planar embedding and, if so, construct one explicitly. These algorithms are crucial for computational geometry and graph drawing, as they enable the verification of planarity in linear time relative to the number of vertices. Seminal approaches rely on depth-first search (DFS) to explore the graph structure while detecting forbidden configurations that violate planarity. The Hopcroft-Tarjan algorithm, introduced in 1974, was the first to achieve linear time complexity, O(V), for planarity testing. It employs DFS to construct a spanning tree and classifies edges as tree edges or back edges. Key to the method are lowpoint values, which for each vertex represent the smallest discovery time reachable from its subtree via back edges or adjacent tree edges. By examining these lowpoints, the algorithm identifies interlaced paths that form subdivisions of the complete graph K_5 or the complete bipartite graph K_{3,3}, the forbidden minors for planarity per Kuratowski's theorem. If no such subdivisions are found, the algorithm proceeds to build an embedding by adding paths to a base cycle (the "spine") while ensuring no crossings occur.[30] Building on similar principles, the Boyer-Myrvold algorithm from 2004 provides a simplified linear-time planarity test that also outputs a combinatorial embedding. It uses DFS to generate a palm tree decomposition, where the DFS tree is augmented with back edges classified by their endpoints' discovery times. Edges are categorized into tree edges, fronds (back edges to ancestors), and return edges. Non-planarity is detected by testing for interlaced back edges that cannot be embedded without crossings, using a block-cut tree structure to handle biconnected components and allow efficient flipping of subembeddings. The algorithm constructs the embedding incrementally by adding edges and vertices, maintaining a rotation system that specifies the cyclic order of edges around each vertex. This approach isolates Kuratowski subgraphs when the graph is non-planar, providing explicit evidence of forbidden subdivisions.[31] A dedicated Kuratowski subgraph finder explicitly searches for subdivisions of K_5 or K_{3,3} to certify non-planarity. Integrated into algorithms like Boyer-Myrvold, this involves tracing paths in the DFS tree that connect back edge endpoints in a way that forms the forbidden structures, often by identifying multiple disjoint paths between branch vertices. Such finders are particularly useful in planarization tasks, where removing edges from the identified subgraph yields a maximal planar graph.[31] In practice, these algorithms represent graphs using adjacency lists for efficient traversal, with O(V + E) space. The output is typically a rotation system, a combinatorial representation of the embedding consisting of cyclic orders at vertices, which can be converted to a straight-line drawing. For example, applying the Hopcroft-Tarjan algorithm to K_5 reveals non-planarity through the detection of multiple crossing back edges that cannot be resolved without forming a K_5 subdivision, as all possible orderings lead to at least one pair of interlaced paths.[30] Recent improvements include certified embedding methods using PQ-trees, as explored by de Fraysseix et al. in 1990, which provide linear-time verification of planarity while supporting grid-based drawings. PQ-trees encode permissible vertex orderings on the outer face, allowing efficient testing and construction of planar layouts by reducing the problem to consecutive ones properties in matrices derived from the graph. These structures ensure the embedding is canonical and certifiable, enhancing reliability in applications like VLSI design.[32]

Complexity Results

Planarity testing, the problem of determining whether a graph admits a crossing-free embedding in the plane, can be solved in polynomial time, specifically in linear time O(n) where n is the number of vertices, using algorithms such as the one developed by Hopcroft and Tarjan.[30] The problem of finding a minimum genus embedding, which seeks the smallest genus g of an orientable surface on which the graph can be embedded without crossings, is NP-complete. This hardness holds even when restricting to orientable surfaces, as established by Thomassen (1989).[33] The crossing number minimization problem, which asks for the minimum number of edge crossings in a drawing of the graph in the plane, is NP-hard, and remains so even when restricted to simple cubic graphs (Hliněný, 2006).[34][35] For fixed genus g, deciding whether a graph embeds on a surface of genus at most g is solvable in polynomial time, leveraging the graph minors theory of Robertson and Seymour, although the runtime is exponential in g due to the enormous constants involved in the forbidden minors characterization.[36] In the parameterized complexity framework, where the parameter is the genus g, problems like embedding on a surface of genus at most g are fixed-parameter tractable (FPT), with algorithms running in time f(g) \cdot n^{O(1)} for some function f depending only on g; this follows from bidimensionality theory applied to minor-closed graph classes such as bounded-genus graphs.[36] As an example, deciding whether a graph embeds on the torus (genus 1) is polynomial-time solvable for fixed genus, but the general minimum genus problem remains NP-complete, with constant-factor approximation algorithms existing for bounded-degree graphs (Eisenbrand et al., 2015).[37] More recent developments include an efficient polynomial-time approximation scheme (EPTAS) for the genus of dense graphs (Jing and Zhou, 2020).[38]

Representation Learning Embeddings

Node Embeddings

Node embeddings represent individual nodes in a graph as low-dimensional vectors ϕ(v)Rd\phi(v) \in \mathbb{R}^d, where the mapping preserves structural similarities such that nodes in close proximity (e.g., connected by short paths) have nearby vectors in the embedding space. This approach facilitates machine learning tasks like node classification and link prediction by transforming graph data into a format amenable to standard algorithms, such as logistic regression or neural networks. The core idea draws from natural language processing techniques, treating graph neighborhoods as analogous to linguistic contexts to learn representations that capture local and global structures.[39] Seminal methods for node embeddings include DeepWalk, introduced in 2014, which generates sequences of nodes via uniform random walks on the graph and applies the Skip-Gram model from word2vec to learn embeddings by predicting nearby nodes in these sequences. Node2Vec, proposed in 2016, extends this by using biased random walks that balance breadth-first search (BFS) for local exploration and depth-first search (DFS) for global structure, controlled by parameters pp and qq to flexibly tune the walk behavior. In contrast, LINE from 2015 optimizes embeddings directly from edges, preserving first-order proximity (direct connections) and second-order proximity (shared neighbors) through edge-sampling and stochastic gradient descent, avoiding random walks for efficiency. These methods typically aim to maximize the log-probability of observing a neighbor nn given a node vv, approximated as logP(nv)ϕ(n)ϕ(v)\log P(n|v) \approx \phi(n) \cdot \phi(v) under the Skip-Gram objective, often with negative sampling for scalability.[39][40] Embeddings are evaluated on downstream tasks such as node classification and link prediction. For instance, on the Cora citation network dataset, random walk-based methods like DeepWalk demonstrate effectiveness in capturing homophily for node classification using a linear classifier. Dimensions are commonly set to d=128d = 128 to 512, balancing expressiveness and computational cost, with performance stabilizing around 128 for many networks. Scalability is achieved through sampling techniques, enabling processing of graphs with millions of nodes; DeepWalk, for example, handles the 1.1 million-node YouTube network in hours on a single machine. In social networks, such embeddings ensure that friends' vectors cluster together, reflecting relational similarities and aiding in recommendation tasks.[39]

Graph-Level Embeddings

Graph-level embeddings represent an entire graph GG as a fixed-dimensional vector ψ(G)Rd\psi(G) \in \mathbb{R}^d, typically obtained by aggregating embeddings of its nodes to capture global structural and feature information for tasks such as graph classification or similarity computation.[41] Common aggregation techniques include mean pooling, which averages node vectors; sum pooling, which accumulates them; and max pooling, which selects the element-wise maximum, each preserving different aspects of the graph's collective representation.[42] Graph Neural Networks (GNNs) advance graph-level embeddings through iterative message-passing, where node representations are refined by exchanging information with neighbors, ultimately enabling a holistic graph vector via readout aggregation.[41] GraphSAGE, introduced in 2017, exemplifies this by sampling a fixed-size set of neighbors for each node during training, supporting inductive learning on previously unseen graphs and addressing scalability in large networks.[41] The core update rule in GraphSAGE is:
hv(k)=σ(W(k)CONCAT(hv(k1),AGG({hu(k1)uN(v)}))) \mathbf{h}_v^{(k)} = \sigma \left( \mathbf{W}^{(k)} \cdot \mathrm{CONCAT} \left( \mathbf{h}_v^{(k-1)}, \mathrm{AGG} \left( \{ \mathbf{h}_u^{(k-1)} \mid u \in \mathcal{N}(v) \} \right) \right) \right)
where hv(k)\mathbf{h}_v^{(k)} is the embedding of node vv at layer kk, σ\sigma is a nonlinear activation (e.g., ReLU), W(k)\mathbf{W}^{(k)} is a weight matrix, CONCAT\mathrm{CONCAT} concatenates the node's prior embedding with the aggregated neighbor embeddings, N(v)\mathcal{N}(v) denotes vv's neighborhood, and AGG\mathrm{AGG} is a permutation-invariant function such as mean, LSTM, or pooling.[41] For graph-level tasks, GraphSAGE applies global mean pooling over all final node embeddings to yield ψ(G)\psi(G).[41] The Graph Isomorphism Network (GIN), proposed in 2019, enhances expressiveness by employing multi-layer perceptrons (MLPs) within its aggregation steps, approximating the power of the Weisfeiler-Lehman (WL) graph isomorphism test to better distinguish non-isomorphic graphs.[43] This makes GIN particularly effective for graph classification, where its injective aggregation—such as sum with learnable transformations—outperforms simpler mean-based methods on benchmarks involving molecular and synthetic graphs.[43] Spectral methods offer an alternative foundation for graph-level embeddings by decomposing the graph Laplacian matrix and using its eigenvectors to encode global harmonic properties, with Laplacian Eigenmaps providing a seminal approach that embeds graphs while preserving local neighborhoods in a low-dimensional space. Introduced in 2003, this technique computes embeddings as the smallest non-trivial eigenvectors of the normalized Laplacian, enabling dimensionality reduction that reflects the graph's intrinsic geometry, though contemporary GNNs often incorporate spectral convolutions for hybrid efficiency. These embeddings find key applications in molecule classification, such as predicting quantum mechanical properties (e.g., HOMO-LUMO gap) on the QM9 dataset of ~134,000 small organic molecules, where GNN-based models achieve low mean absolute errors for several targets. In social network analysis, they support graph-level tasks like classifying communities or detecting anomalous subgraphs by aggregating relational patterns into compact vectors.[41] Challenges in graph-level embeddings include over-smoothing, where deep GNN layers cause node representations to converge indistinguishably, degrading global aggregation quality after 3–4 layers in dense graphs, and scalability on massive graphs, mitigated by neighbor sampling in methods like GraphSAGE.[44][41] For example, embeddings of protein structures—modeled as graphs of residues or atoms—facilitate fold prediction by classifying these representations to identify structural motifs.[45]

References

User Avatar
No comments yet.