Hubbry Logo
Graph partitionGraph partitionMain
Open search
Graph partition
Community hub
Graph partition
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Graph partition
Graph partition
from Wikipedia

In mathematics, a graph partition is the reduction of a graph to a smaller graph by partitioning its set of nodes into mutually exclusive groups. Edges of the original graph that cross between the groups will produce edges in the partitioned graph. If the number of resulting edges is small compared to the original graph, then the partitioned graph may be better suited for analysis and problem-solving than the original. Finding a partition that simplifies graph analysis is a hard problem, but one that has applications to scientific computing, VLSI circuit design, and task scheduling in multiprocessor computers, among others.[1] Recently, the graph partition problem has gained importance due to its application for clustering and detection of cliques in social, pathological and biological networks. For a survey on recent trends in computational methods and applications see Buluc et al. (2013).[2] Two common examples of graph partitioning are minimum cut and maximum cut problems.

Problem complexity

[edit]

Typically, graph partition problems fall under the category of NP-hard problems. Solutions to these problems are generally derived using heuristics and approximation algorithms.[3] However, uniform graph partitioning or a balanced graph partition problem can be shown to be NP-complete to approximate within any finite factor.[1] Even for special graph classes such as trees and grids, no reasonable approximation algorithms exist,[4] unless P=NP. Grids are a particularly interesting case since they model the graphs resulting from Finite Element Model (FEM) simulations. When not only the number of edges between the components is approximated, but also the sizes of the components, it can be shown that no reasonable fully polynomial algorithms exist for these graphs.[4]

Problem

[edit]

Consider a graph G = (V, E), where V denotes the set of n vertices and E the set of edges. For a (k,v) balanced partition problem, the objective is to partition G into k components of at most size v · (n/k), while minimizing the capacity of the edges between separate components.[1] Also, given G and an integer k > 1, partition V into k parts (subsets) V1, V2, ..., Vk such that the parts are disjoint and have equal size, and the number of edges with endpoints in different parts is minimized. Such partition problems have been discussed in literature as bicriteria-approximation or resource augmentation approaches. A common extension is to hypergraphs, where an edge can connect more than two vertices. A hyperedge is not cut if all vertices are in one partition, and cut exactly once otherwise, no matter how many vertices are on each side. This usage is common in electronic design automation.

Analysis

[edit]

For a specific (k, 1 + ε) balanced partition problem, we seek to find a minimum cost partition of G into k components with each component containing a maximum of (1 + ε)·(n/k) nodes. We compare the cost of this approximation algorithm to the cost of a (k,1) cut, wherein each of the k components must have the same size of (n/k) nodes each, thus being a more restricted problem. Thus,

We already know that (2,1) cut is the minimum bisection problem and it is NP-complete.[5] Next, we assess a 3-partition problem wherein n = 3k, which is also bounded in polynomial time.[1] Now, if we assume that we have a finite approximation algorithm for (k, 1)-balanced partition, then, either the 3-partition instance can be solved using the balanced (k,1) partition in G or it cannot be solved. If the 3-partition instance can be solved, then (k, 1)-balanced partitioning problem in G can be solved without cutting any edge. Otherwise, if the 3-partition instance cannot be solved, the optimum (k, 1)-balanced partitioning in G will cut at least one edge. An approximation algorithm with a finite approximation factor has to differentiate between these two cases. Hence, it can solve the 3-partition problem which is a contradiction under the assumption that P = NP. Thus, it is evident that (k,1)-balanced partitioning problem has no polynomial-time approximation algorithm with a finite approximation factor unless P = NP.[1]

The planar separator theorem states that any n-vertex planar graph can be partitioned into roughly equal parts by the removal of O(n) vertices. This is not a partition in the sense described above, because the partition set consists of vertices rather than edges. However, the same result also implies that every planar graph of bounded degree has a balanced cut with O(n) edges.

Graph partition methods

[edit]

Since graph partitioning is a hard problem, practical solutions are based on heuristics. There are two broad categories of methods, local and global. Well-known local methods are the Kernighan–Lin algorithm, and Fiduccia-Mattheyses algorithms, which were the first effective 2-way cuts by local search strategies. Their major drawback is the arbitrary initial partitioning of the vertex set, which can affect the final solution quality. Global approaches rely on properties of the entire graph and do not rely on an arbitrary initial partition. The most common example is spectral partitioning, where a partition is derived from approximate eigenvectors of the adjacency matrix, or spectral clustering that groups graph vertices using the eigendecomposition of the graph Laplacian matrix.

Multi-level methods

[edit]

A multi-level graph partitioning algorithm works by applying one or more stages. Each stage reduces the size of the graph by collapsing vertices and edges, partitions the smaller graph, then maps back and refines this partition of the original graph.[6] A wide variety of partitioning and refinement methods can be applied within the overall multi-level scheme. In many cases, this approach can give both fast execution times and very high quality results. One widely used example of such an approach is METIS,[7] a graph partitioner, and hMETIS, the corresponding partitioner for hypergraphs.[8] An alternative approach originated from [9] and implemented, e.g., in scikit-learn is spectral clustering with the partitioning determined from eigenvectors of the graph Laplacian matrix for the original graph computed by LOBPCG solver with multigrid preconditioning.

Spectral partitioning and spectral bisection

[edit]

Given a graph with adjacency matrix , where an entry implies an edge between node and , and degree matrix , which is a diagonal matrix, where each diagonal entry of a row , , represents the node degree of node . The Laplacian matrix is defined as . Now, a ratio-cut partition for graph is defined as a partition of into disjoint , and , minimizing the ratio

of the number of edges that actually cross this cut to the number of pairs of vertices that could support such edges. Spectral graph partitioning can be motivated[10] by analogy with partitioning of a vibrating string or a mass-spring system and similarly extended to the case of negative weights of the graph.[11]

Fiedler eigenvalue and eigenvector

[edit]

In such a scenario, the second smallest eigenvalue () of , yields a lower bound on the optimal cost () of ratio-cut partition with . The eigenvector () corresponding to , called the Fiedler vector, bisects the graph into only two communities based on the sign of the corresponding vector entry. Division into a larger number of communities can be achieved by repeated bisection or by using multiple eigenvectors corresponding to the smallest eigenvalues.[12] The examples in Figures 1,2 illustrate the spectral bisection approach.

Figure 1: The graph G = (5,4) is analysed for spectral bisection. The linear combination of the smallest two eigenvectors leads to [1 1 1 1 1]' having an eigen value = 0.
Figure 2: The graph G = (5,5) illustrates that the Fiedler vector in red bisects the graph into two communities, one with vertices {1,2,3} with positive entries in the vector space, and the other community has vertices {4,5} with negative vector space entries.

Modularity and ratio-cut

[edit]

Minimum cut partitioning however fails when the number of communities to be partitioned, or the partition sizes are unknown. For instance, optimizing the cut size for free group sizes puts all vertices in the same community. Additionally, cut size may be the wrong thing to minimize since a good division is not just one with small number of edges between communities. This motivated the use of Modularity (Q)[13] as a metric to optimize a balanced graph partition. The example in Figure 3 illustrates 2 instances of the same graph such that in (a) modularity (Q) is the partitioning metric and in (b), ratio-cut is the partitioning metric.

Figure 3: Weighted graph G may be partitioned to maximize Q in (a) or to minimize the ratio-cut in (b). We see that (a) is a better balanced partition, thus motivating the importance of modularity in graph partitioning problems.

Applications

[edit]

Conductance

[edit]

Another objective function used for graph partitioning is Conductance which is the ratio between the number of cut edges and the volume of the smallest part. Conductance is related to electrical flows and random walks. The Cheeger bound guarantees that spectral bisection provides partitions with nearly optimal conductance. The quality of this approximation depends on the second smallest eigenvalue of the Laplacian λ2.

Immunization

[edit]

Graph partition can be useful for identifying the minimal set of nodes or links that should be immunized in order to stop epidemics.[14]

Other graph partition methods

[edit]

Spin models have been used for clustering of multivariate data wherein similarities are translated into coupling strengths.[15] The properties of ground state spin configuration can be directly interpreted as communities. Thus, a graph is partitioned to minimize the Hamiltonian of the partitioned graph. The Hamiltonian (H) is derived by assigning the following partition rewards and penalties.

  • Reward internal edges between nodes of same group (same spin)
  • Penalize missing edges in same group
  • Penalize existing edges between different groups
  • Reward non-links between different groups.

Additionally, Kernel-PCA-based Spectral clustering takes a form of least squares Support Vector Machine framework, and hence it becomes possible to project the data entries to a kernel induced feature space that has maximal variance, thus implying a high separation between the projected communities.[16]

Some methods express graph partitioning as a multi-criteria optimization problem which can be solved using local methods expressed in a game theoretic framework where each node makes a decision on the partition it chooses.[17]

For very large-scale distributed graphs classical partition methods might not apply (e.g., spectral partitioning, Metis[7]) since they require full access to graph data in order to perform global operations. For such large-scale scenarios distributed graph partitioning is used to perform partitioning through asynchronous local operations only.

Software tools

[edit]

scikit-learn implements spectral clustering with the partitioning determined from eigenvectors of the graph Laplacian matrix for the original graph computed by ARPACK, or by LOBPCG solver with multigrid preconditioning.[9]

METIS[7] is a graph partitioning family by Karypis and Kumar. Among this family, kMetis aims at greater partitioning speed, hMetis,[8] applies to hypergraphs and aims at partition quality, and ParMetis[7] is a parallel implementation of the Metis graph partitioning algorithm.

KaHyPar[18][19][20] is a multilevel hypergraph partitioning framework providing direct k-way and recursive bisection based partitioning algorithms. It instantiates the multilevel approach in its most extreme version, removing only a single vertex in every level of the hierarchy. By using this very fine grained n-level approach combined with strong local search heuristics, it computes solutions of very high quality.

Scotch[21] is graph partitioning framework by Pellegrini. It uses recursive multilevel bisection and includes sequential as well as parallel partitioning techniques.

Jostle[22] is a sequential and parallel graph partitioning solver developed by Chris Walshaw. The commercialized version of this partitioner is known as NetWorks.

Party[23] implements the Bubble/shape-optimized framework and the Helpful Sets algorithm.

The software packages DibaP[24] and its MPI-parallel variant PDibaP[25] by Meyerhenke implement the Bubble framework using diffusion; DibaP also uses AMG-based techniques for coarsening and solving linear systems arising in the diffusive approach.

Sanders and Schulz released a graph partitioning package KaHIP[26] (Karlsruhe High Quality Partitioning) that implements for example flow-based methods, more-localized local searches and several parallel and sequential meta-heuristics.

The tools Parkway[27] by Trifunovic and Knottenbelt as well as Zoltan[28] by Devine et al. focus on hypergraph partitioning.

References

[edit]

Further reading

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Graph partitioning is a fundamental problem in and , involving the division of a graph's vertex set into a specified number of pairwise disjoint subsets, typically with the goals of balancing the sizes of the subsets (e.g., equal node weights) and minimizing the total weight of edges crossing between subsets, known as the cut . This problem, which generalizes to k-way partitioning for arbitrary k (with bipartitioning as the case k=2), is NP-hard, as it can be reduced from the problem, itself NP-hard via polynomial reduction from 3-SAT. Due to its intractability for exact solutions on large graphs, practical approaches rely on and approximation algorithms, such as the Kernighan-Lin algorithm (which iteratively swaps vertices to reduce cut size) and multilevel partitioning methods that coarsen the graph, partition it, and refine the result. Notable variants include the max-cut problem, which maximizes crossing edges, and balanced graph partitioning, which enforces strict size constraints on subsets. Graph partitioning has broad applications across domains, including very-large-scale integration (VLSI) design for optimizing chip layouts by minimizing wire lengths, for load balancing across processors in distributed systems, and computations in scientific simulations to enhance efficiency. It also arises in finite element methods for meshing in engineering simulations, quadratic assignment problems, and community detection in . The field's historical roots trace back to early work on max-cut in the , with ongoing research exploring spectral methods, genetic algorithms, and extensions to directed graphs and dynamic scenarios.

Introduction

Definition and Basic Concepts

Graph partitioning is the problem of dividing the vertices of a graph into a set of disjoint subsets such that the number of edges connecting vertices in different subsets—known as the edge cut—is minimized. This process aims to create subgraphs that are as disconnected as possible from one another while preserving the overall structure of the graph. The is fundamental in and arises in various applications, such as optimizing data distribution in environments. Graph partitioning typically applies to undirected graphs, where edges represent symmetric connections between vertices without direction. In unweighted graphs, all edges are considered equal, with each contributing equally to the cut size if it crosses subsets. Weighted graphs extend this by assigning non-negative weights to edges (and sometimes vertices), where the cut size is the sum of weights of crossing edges, allowing for more nuanced representations of connection strengths, such as communication costs in networks. Formally, consider an undirected graph G=(V,E)G = (V, E), where VV is the set of vertices and EE is the set of edges. A partition P={V1,V2,,Vk}P = \{V_1, V_2, \dots, V_k\} divides VV into kk disjoint subsets such that i=1kVi=V\bigcup_{i=1}^k V_i = V and ViVj=V_i \cap V_j = \emptyset for iji \neq j. For a balanced kk-way partition, the sizes of the subsets are approximately equal, satisfying ViV/k|V_i| \approx |V|/k (or a weighted analog using vertex weights). As a simple illustrative example, consider an unweighted undirected graph with four vertices {v1,v2,v3,v4}\{v_1, v_2, v_3, v_4\} forming a cycle: edges connect v1v2v_1-v_2, v2v3v_2-v_3, v3v4v_3-v_4, and v4v1v_4-v_1. A balanced 2-way partition might assign V1={v1,v2}V_1 = \{v_1, v_2\} and V2={v3,v4}V_2 = \{v_3, v_4\}, resulting in a cut size of 2 (the edges v2v3v_2-v_3 and v4v1v_4-v_1). An alternative partition V1={v1,v3}V_1 = \{v_1, v_3\} and V2={v2,v4}V_2 = \{v_2, v_4\} would also yield a cut size of 2, demonstrating equivalent minimal cuts for this .

Historical Development

The origins of graph partitioning can be traced to the early 1970s, when Brian W. Kernighan and Shen Lin developed a specifically for applications, aiming to divide graphs into subsets while minimizing the number of edges crossing between them. This Kernighan-Lin , published in 1970, marked a pivotal milestone by providing an iterative swapping method that improved upon initial random partitions, influencing subsequent work in very-large-scale integration (VLSI) and . In the mid-1970s, foundational theoretical advances laid the groundwork for spectral partitioning methods, with Miroslav Fiedler introducing the concept of algebraic connectivity—the second smallest eigenvalue of the graph Laplacian—in 1973, which quantifies graph cohesion and later proved instrumental in eigenvector-based partitioning. These ideas gained practical traction in the 1980s and 1990s, as researchers like Donath and Hoffman in 1972–1973 first proposed using Laplacian eigenvectors for cuts, followed by key implementations such as Pothen, Simon, and Li's 1990 algorithm for sparse matrix partitioning via spectral bisection. Spectral methods became widely adopted for their ability to reveal global structure, though computational costs limited scalability until refined in works like Hendrickson and Leland's improvements in the mid-1990s. The saw the rise of multilevel partitioning techniques to address ever-larger graphs in scientific computing and ordering, with George Karypis and Vipin Kumar introducing the METIS software package around 1995 as an open-source tool implementing recursive through coarsening, initial partitioning, and refinement phases. METIS's efficiency on unstructured meshes and graphs set a benchmark for parallel implementations, enabling partitions of millions of vertices and inspiring tools like ParMETIS for distributed environments. Following 2010, graph partitioning evolved to incorporate models for capturing n-way interactions in applications like VLSI and social networks, with algorithms extending multilevel approaches to hyperedges for more accurate cut minimization. Concurrently, integration emerged, using techniques like graph neural networks to predict optimal partitions or select heuristics based on graph features, enhancing adaptability for massive, dynamic datasets as highlighted in recent surveys. More recent advancements as of 2025 include the application of large language models to tackle graph partitioning tasks and specialized techniques for large-scale .

Problem Formulation

Formal Definition

The graph partitioning problem, in its standard formulation, involves dividing the vertex set VV of an undirected graph G=(V,E)G = (V, E) into kk disjoint subsets V1,V2,,VkV_1, V_2, \dots, V_k such that i=1kVi=V\bigcup_{i=1}^k V_i = V and ViVj=V_i \cap V_j = \emptyset for all iji \neq j, while minimizing the total weight of edges that connect vertices in different subsets and satisfying balance constraints on the subset sizes. For a weighted graph where each edge e=(u,v)Ee = (u, v) \in E has a non-negative weight w(e)w(e), the objective is to minimize the cut size cut(P)=e=(u,v)EP(u)P(v)w(e)\mathrm{cut}(P) = \sum_{\substack{e=(u,v) \in E \\ P(u) \neq P(v)}} w(e), where PP denotes the partition and P(u)P(u) is the subset containing vertex uu. In the unweighted case, w(e)=1w(e) = 1 for all edges, so the cut size reduces to the number of crossing edges. For the specific case of 2-way partitioning (bisection), the cut is given by cut(V1,V2)=uV1,vV2w(uv)\mathrm{cut}(V_1, V_2) = \sum_{u \in V_1, v \in V_2} w(uv). To ensure computational balance, the partition must satisfy size constraints: for each ii, Vi(1+ϵ)Vk|V_i| \leq (1 + \epsilon) \frac{|V|}{k}, where ϵ0\epsilon \geq 0 is a small imbalance parameter that allows slight deviations from perfect equality (typically ϵ<0.05\epsilon < 0.05 in practice). Vertex weights can be incorporated similarly, requiring the sum of weights in each ViV_i to be at most (1+ϵ)(1 + \epsilon) times the average. This constraint prevents trivial solutions like placing all vertices in one subset. The problem can be approached via direct kk-way partitioning, which optimizes the subsets simultaneously, or recursive bisection, which repeatedly applies 2-way partitioning to subdivide parts until kk subsets are obtained (requiring log2k\log_2 k levels for kk a power of 2). Recursive bisection is computationally simpler but may yield suboptimal cuts compared to direct methods due to accumulated errors across levels.

Common Variants

In graph partitioning, the unbalanced variant removes size constraints on the partitions, allowing one side to be arbitrarily small while focusing solely on minimizing the edge cut size. This formulation, often simply called the minimum cut problem, arises in applications like network reliability analysis where identifying the weakest disconnection point is key, regardless of partition balance. The problem is NP-hard in general graphs when the number of partitions is part of the input. Multi-objective variants extend the standard problem by optimizing multiple criteria simultaneously, such as minimizing the cut size while maximizing modularity to enhance community detection or minimizing the diameter of induced subgraphs for low-latency parallel processing. For instance, modularity optimization in partitioning seeks to maximize the density of intra-partition edges relative to a null model, which is particularly useful in social network analysis. Similarly, low-diameter partitioning prioritizes compact subgraphs to reduce communication delays in distributed systems. These approaches often use evolutionary algorithms or Pareto optimization to balance trade-offs. Directed graph partitioning adapts the problem to digraphs by considering edge directions, typically minimizing flow-based cuts that respect arc orientations, such as in traffic networks or dependency graphs where one-way connections matter. Here, the cut is defined by the total capacity of arcs from one partition to another, often modeled via multicommodity flows to capture multiple source-sink pairs. This variant supports applications like load balancing in directed communication topologies. Hypergraph partitioning generalizes the graph model by allowing hyperedges to connect multiple vertices, providing a more accurate representation of multi-way interactions, such as in VLSI design where nets link several components or in sparse matrix computations. In this setting, the objective is to partition vertices into balanced parts while minimizing the number of hyperedges spanning multiple parts, with the connectivity metric adjusted to account for hyperedge cuts. Unlike pairwise graph edges, this captures higher-order dependencies without bipartite expansions. A specific example of an unbalanced variant is the minimum k-cut problem, which seeks to remove the minimum-weight set of edges to disconnect the graph into exactly k components, without imposing size balance on the parts. This is useful in fault-tolerant system design, where isolating k failure modes is prioritized over even distribution. The problem remains NP-hard for variable k.

Quality Metrics

Edge Cut Measures

In graph partitioning, the edge cut size serves as a fundamental measure of the connectivity between subsets in a partition. For a graph G=(V,E)G = (V, E) with edge weights w(e)0w(e) \geq 0, and a partition (S,T)(S, T) where ST=VS \cup T = V and ST=S \cap T = \emptyset, the edge cut size is the total weight of edges crossing from SS to TT, formally defined as e={u,v}E,uS,vTw(e)\sum_{e=\{u,v\} \in E, u \in S, v \in T} w(e). This metric quantifies the communication cost or boundary strength in applications like parallel computing, where minimizing the cut size reduces inter-processor data transfer. To normalize for graph scale and imbalance, variants such as expansion and conductance adjust the cut size relative to subset properties. The expansion of a cut (S,T)(S, T) is given by cut(S,T)min(S,T)\frac{cut(S, T)}{\min(|S|, |T|)}, which measures the boundary density per vertex in the smaller subset and is central to expander graph theory for assessing expansion properties. Conductance provides a degree-weighted normalization, defining ϕ(S)=cut(S,VS)min(\vol(S),\vol(VS))\phi(S) = \frac{cut(S, V \setminus S)}{\min(\vol(S), \vol(V \setminus S))}, where \vol(S)=vSdeg(v)\vol(S) = \sum_{v \in S} \deg(v) is the volume of SS. This formulation, prominent in spectral graph theory, better captures expansion in non-regular graphs by accounting for vertex degrees. These measures relate to the discrete isoperimetric problem, which seeks partitions minimizing the normalized cut relative to the boundary size, analogous to geometric isoperimetry where perimeter is minimized for a given area. The graph's isoperimetric constant is the minimum such value over balanced cuts, providing a global connectivity benchmark. In spectral partitioning, low-conductance cuts approximate solutions to this problem via eigenvector analysis.

Balance and Fairness Constraints

In graph partitioning, vertex balance constraints ensure that the subgraphs induced by the partitions have approximately equal numbers of vertices, promoting equitable load distribution across computational units. For a graph G=(V,E)G = (V, E) with V=n|V| = n vertices partitioned into kk parts V1,,VkV_1, \dots, V_k, each Vi|V_i| must satisfy Vi(1+ϵ)nk|V_i| \leq (1 + \epsilon) \frac{n}{k} for some tolerance ϵ<1\epsilon < 1, where ϵ\epsilon quantifies the allowable imbalance. This formulation, often termed (k,1+ϵ)(k, 1 + \epsilon)-balanced partitioning, prevents any single partition from dominating the others in size, which is crucial for parallel processing efficiency. A common value for ϵ\epsilon in practical implementations is 0.03, corresponding to a 3% imbalance tolerance, as used in widely adopted tools like METIS for high-quality partitions. For instance, in a balanced bisection (k=2k=2) of a graph with nn vertices and ϵ=0.03\epsilon = 0.03, the size of one part V1|V_1| is constrained to lie between approximately 0.485n0.485n and 0.515n0.515n, ensuring neither part exceeds the other by more than 3%. These constraints contribute to the NP-hardness of balanced graph partitioning, as even exact balance (ϵ=0\epsilon = 0) yields inapproximability results unless P = NP. For edge-weighted graphs or edge-partitioning variants, edge balance constraints extend this equity to the distribution of edge weights or counts across partitions. Here, if the graph has total edge weight mm, each partition ii must satisfy eEiwe(1+ν)mk\sum_{e \in E_i} w_e \leq (1 + \nu) \frac{m}{k} for a load tolerance ν0\nu \geq 0, balancing computational load in distributed systems like vertex programs. This is particularly relevant in scenarios minimizing communication overhead, where unbalanced edge loads can lead to processing disparities. In multi-objective graph partitioning, especially for heterogeneous graphs with node attributes or labels, fairness constraints enforce proportional representation to avoid bias in subgroup allocations. Proportional fairness requires that the fraction of nodes with a specific label ss in partition ii, CsVi/Vi|C_s \cap V_i| / |V_i|, approximates the global fraction Cs/n|C_s| / n, where CsC_s is the set of nodes with label ss and n=Vn = |V|. This is quantified via a fairness score Bf=12vi=1vs=1hCsViViCsnB_f = -\frac{1}{2v} \sum_{i=1}^v \sum_{s=1}^h \left| \frac{|C_s \cap V_i|}{|V_i|} - \frac{|C_s|}{n} \right|
Add your contribution
Related Hubs
User Avatar
No comments yet.