Recent from talks
Nothing was collected or created yet.
Hopcroft–Karp algorithm
View on Wikipedia| Class | Graph algorithm |
|---|---|
| Data structure | Graph |
| Worst-case performance | |
| Worst-case space complexity |
In computer science, the Hopcroft–Karp algorithm (sometimes more accurately called the Hopcroft–Karp–Karzanov algorithm)[1] is an algorithm that takes a bipartite graph as input and produces a maximum-cardinality matching as output — a set of as many edges as possible with the property that no two edges share an endpoint. It runs in time in the worst case, where is set of edges in the graph, is set of vertices of the graph, and it is assumed that . In the case of dense graphs the time bound becomes , and for sparse random graphs it runs in time with high probability.[2]
The algorithm was discovered by John Hopcroft and Richard Karp (1973) and independently by Alexander Karzanov (1973).[3] As in previous methods for matching such as the Hungarian algorithm and the work of Edmonds (1965), the Hopcroft–Karp algorithm repeatedly increases the size of a partial matching by finding augmenting paths. These paths are sequences of edges of the graph, which alternate between edges in the matching and edges out of the partial matching, and where the initial and final edge are not in the partial matching. Finding an augmenting path allows us to increment the size of the partial matching, by simply toggling the edges of the augmenting path (putting in the partial matching those that were not, and vice versa). Simpler algorithms for bipartite matching, such as the Ford–Fulkerson algorithm‚ find one augmenting path per iteration: the Hopcroft-Karp algorithm instead finds a maximal set of shortest augmenting paths, so as to ensure that only iterations are needed instead of iterations. The same performance of can be achieved to find maximum-cardinality matchings in arbitrary graphs, with the more complicated algorithm of Micali and Vazirani.[4]
The Hopcroft–Karp algorithm can be seen as a special case of Dinic's algorithm for the maximum-flow problem.[5]
Augmenting paths
[edit]A vertex that is not the endpoint of an edge in some partial matching is called a free vertex. The basic concept that the algorithm relies on is that of an augmenting path, a path that starts at a free vertex, ends at a free vertex, and alternates between unmatched and matched edges within the path. It follows from this definition that, except for the endpoints, all other vertices (if any) in augmenting path must be non-free vertices. An augmenting path could consist of only two vertices (both free) and single unmatched edge between them.
If is a matching, and is an augmenting path relative to , then the symmetric difference of the two sets of edges, , would form a matching with size . Thus, by finding augmenting paths, an algorithm may increase the size of the matching.
Conversely, suppose that a matching is not optimal, and let be the symmetric difference where is an optimal matching. Because and are both matchings, every vertex has degree at most 2 in . So must form a collection of disjoint cycles, of paths with an equal number of matched and unmatched edges in , of augmenting paths for , and of augmenting paths for ; but the latter is impossible because is optimal. Now, the cycles and the paths with equal numbers of matched and unmatched vertices do not contribute to the difference in size between and , so this difference is equal to the number of augmenting paths for in . Thus, whenever there exists a matching larger than the current matching , there must also exist an augmenting path. If no augmenting path can be found, an algorithm may safely terminate, since in this case must be optimal.
An augmenting path in a matching problem is closely related to the augmenting paths arising in maximum flow problems, paths along which one may increase the amount of flow between the terminals of the flow. It is possible to transform the bipartite matching problem into a maximum flow instance, such that the alternating paths of the matching problem become augmenting paths of the flow problem. It suffices to insert two vertices, source and sink, and insert edges of unit capacity from the source to each vertex in , and from each vertex in to the sink; and let edges from to have unit capacity.[6] A generalization of the technique used in Hopcroft–Karp algorithm to find maximum flow in an arbitrary network is known as Dinic's algorithm.
Algorithm
[edit]The algorithm may be expressed in the following pseudocode.
- Input: Bipartite graph
- Output: Matching
- repeat
- maximal set of vertex-disjoint shortest augmenting paths
- until
In more detail, let and be the two sets in the bipartition of , and let the matching from to at any time be represented as the set . The algorithm is run in phases. Each phase consists of the following steps.
- A breadth-first search partitions the vertices of the graph into layers. The free vertices in are used as the starting vertices of this search and form the first layer of the partitioning. At the first level of the search, there are only unmatched edges, since the free vertices in are by definition not adjacent to any matched edges. At subsequent levels of the search, the traversed edges are required to alternate between matched and unmatched. That is, when searching for successors from a vertex in , only unmatched edges may be traversed, while from a vertex in only matched edges may be traversed. The search terminates at the first layer where one or more free vertices in are reached.
- All free vertices in at layer are collected into a set . That is, a vertex is put into if and only if it ends a shortest augmenting path.
- The algorithm finds a maximal set of vertex disjoint augmenting paths of length . (Maximal means that no more such paths can be added. This is different from finding the maximum number of such paths, which would be harder to do. Fortunately, it is sufficient here to find a maximal set of paths.) This set may be computed by depth-first search (DFS) from to the free vertices in , using the breadth first layering to guide the search: the DFS is only allowed to follow edges that lead to an unused vertex in the previous layer, and paths in the DFS tree must alternate between matched and unmatched edges. Once an augmenting path is found that involves one of the vertices in , the DFS is continued from the next starting vertex. Any vertex encountered during the DFS can immediately be marked as used, since if there is no path from it to at the current point in the DFS, then that vertex can't be used to reach at any other point in the DFS. This ensures running time for the DFS. It is also possible to work in the other direction, from free vertices in to those in , which is the variant used in the pseudocode.
- Every one of the paths found in this way is used to enlarge .
The algorithm terminates when no more augmenting paths are found in the breadth first search part of one of the phases.
Analysis
[edit]Each phase consists of a single breadth first search and a single depth-first search. Thus, a single phase may be implemented in time. Therefore, the first phases, in a graph with vertices and edges, take time .
Each phase increases the length of the shortest augmenting path by at least one: the phase finds a maximal set of augmenting paths of the given length, so any remaining augmenting path must be longer. Therefore, once the initial phases of the algorithm are complete, the shortest remaining augmenting path has at least edges in it. However, the symmetric difference of the eventual optimal matching and of the partial matching M found by the initial phases forms a collection of vertex-disjoint augmenting paths and alternating cycles. If each of the paths in this collection has length at least , there can be at most paths in the collection, and the size of the optimal matching can differ from the size of by at most edges. Since each phase of the algorithm increases the size of the matching by at least one, there can be at most additional phases before the algorithm terminates.
Since the algorithm performs a total of at most phases, it takes a total time of in the worst case.
In many instances, however, the time taken by the algorithm may be even faster than this worst case analysis indicates. For instance, in the average case for sparse bipartite random graphs, Bast et al. (2006) (improving a previous result of Motwani 1994) showed that with high probability all non-optimal matchings have augmenting paths of logarithmic length. As a consequence, for these graphs, the Hopcroft–Karp algorithm takes phases and total time.
Comparison with other bipartite matching algorithms
[edit]For sparse graphs, the Hopcroft–Karp algorithm continues to have the best known worst-case performance, but for dense graphs () a more recent algorithm by Alt et al. (1991) achieves a slightly better time bound, . Their algorithm is based on using a push-relabel maximum flow algorithm and then, when the matching created by this algorithm becomes close to optimum, switching to the Hopcroft–Karp method.
Several authors have performed experimental comparisons of bipartite matching algorithms. Their results in general tend to show that the Hopcroft–Karp method is not as good in practice as it is in theory: it is outperformed both by simpler breadth-first and depth-first strategies for finding augmenting paths, and by push-relabel techniques.[7]
Non-bipartite graphs
[edit]The same idea of finding a maximal set of shortest augmenting paths works also for finding maximum cardinality matchings in non-bipartite graphs, and for the same reasons the algorithms based on this idea take phases. However, for non-bipartite graphs, the task of finding the augmenting paths within each phase is more difficult. Building on the work of several slower predecessors, Micali & Vazirani (1980) showed how to implement a phase in linear time, resulting in a non-bipartite matching algorithm with the same time bound as the Hopcroft–Karp algorithm for bipartite graphs. The Micali–Vazirani technique is complex, and its authors did not provide full proofs of their results; subsequently, a "clear exposition" was published by Peterson & Loui (1988) and alternative methods were described by other authors.[8] In 2012, Vazirani offered a new simplified proof of the Micali-Vazirani algorithm.[9]
Pseudocode
[edit]/*
G = U ∪ V ∪ {NIL}
where U and V are the left and right sides of the bipartite graph and NIL is a special null vertex
*/
function BFS() is
for each u in U do
if Pair_U[u] = NIL then
Dist[u] := 0
Enqueue(Q, u)
else
Dist[u] := ∞
Dist[NIL] := ∞
while Empty(Q) = false do
u := Dequeue(Q)
if Dist[u] < Dist[NIL] then
for each v in Adj[u] do
if Dist[Pair_V[v]] = ∞ then
Dist[Pair_V[v]] := Dist[u] + 1
Enqueue(Q, Pair_V[v])
return Dist[NIL] ≠ ∞
function DFS(u) is
if u ≠ NIL then
for each v in Adj[u] do
if Dist[Pair_V[v]] = Dist[u] + 1 then
if DFS(Pair_V[v]) = true then
Pair_V[v] := u
Pair_U[u] := v
return true
Dist[u] := ∞
return false
return true
function Hopcroft–Karp is
for each u in U do
Pair_U[u] := NIL
for each v in V do
Pair_V[v] := NIL
matching := 0
while BFS() = true do
for each u in U do
if Pair_U[u] = NIL then
if DFS(u) = true then
matching := matching + 1
return matching

Explanation
[edit]Let the vertices of our graph be partitioned in U and V, and consider a partial matching, as indicated by the Pair_U and Pair_V tables that contain the one vertex to which each vertex of U and of V is matched, or NIL for unmatched vertices. The key idea is to add two dummy vertices on each side of the graph: uDummy connected to all unmatched vertices in U and vDummy connected to all unmatched vertices in V. Now, if we run a breadth-first search (BFS) from uDummy to vDummy then we can get the paths of minimal length that connect currently unmatched vertices in U to currently unmatched vertices in V. Note that, as the graph is bipartite, these paths always alternate between vertices in U and vertices in V, and we require in our BFS that when going from V to U, we always select a matched edge. If we reach an unmatched vertex of V, then we end at vDummy and the search for paths in the BFS terminate. To summarize, the BFS starts at unmatched vertices in U, goes to all their neighbors in V, if all are matched then it goes back to the vertices in U to which all these vertices are matched (and which were not visited before), then it goes to all the neighbors of these vertices, etc., until one of the vertices reached in V is unmatched.
Observe in particular that BFS marks the unmatched nodes of U with distance 0, then increments the distance every time it comes back to U. This guarantees that the paths considered in the BFS are of minimal length to connect unmatched vertices of U to unmatched vertices of V while always going back from V to U on edges that are currently part of the matching. In particular, the special NIL vertex, which corresponds to vDummy, then gets assigned a finite distance, so the BFS function returns true iff some path has been found. If no path has been found, then there are no augmenting paths left and the matching is maximal.
If BFS returns true, then we can go ahead and update the pairing for vertices on the minimal-length paths found from U to V: we do so using a depth-first search (DFS). Note that each vertex in V on such a path, except for the last one, is currently matched. So we can explore with the DFS, making sure that the paths that we follow correspond to the distances computed in the BFS. We update along every such path by removing from the matching all edges of the path that are currently in the matching, and adding to the matching all edges of the path that are currently not in the matching: as this is an augmenting path (the first and last edges of the path were not part of the matching, and the path alternated between matched and unmatched edges), then this increases the number of edges in the matching. This is same as replacing the current matching by the symmetric difference between the current matching and the entire path..
Note that the code ensures that all augmenting paths that we consider are vertex disjoint. Indeed, after doing the symmetric difference for a path, none of its vertices could be considered again in the DFS, just because the Dist[Pair_V[v]] will not be equal to Dist[u] + 1 (it would be exactly Dist[u]).
Also observe that the DFS does not visit the same vertex multiple times. This is thanks to the following lines:
Dist[u] = ∞ return false
When we were not able to find any shortest augmenting path from a vertex u, then the DFS marks vertex u by setting Dist[u] to infinity, so that these vertices are not visited again.
One last observation is that we actually don't need uDummy: its role is simply to put all unmatched vertices of U in the queue when we start the BFS. As for vDummy, it is denoted as NIL in the pseudocode above.
See also
[edit]- Maximum cardinality matching, the problem solved by the algorithm, and its generalization to non-bipartite graphs
- Assignment problem, a generalization of this problem on weighted graphs, solved e.g. by the Hungarian algorithm
- Edmonds–Karp algorithm for finding maximum flow, a generalization of the Hopcroft–Karp algorithm
Notes
[edit]- ^ Gabow (2017); Annamalai (2018)
- ^ Bast et al. (2006).
- ^ Dinitz (2006).
- ^ Peterson & Loui (1988).
- ^ Tarjan (1983), p. 102.
- ^ Ahuja, Magnanti & Orlin (1993), section 12.3, bipartite cardinality matching problem, pp. 469–470.
- ^ Chang & McCormick (1990); Darby-Dowman (1980); Setubal (1993); Setubal (1996).
- ^ Gabow & Tarjan (1991).
- ^ Vazirani (2012)
References
[edit]- Ahuja, Ravindra K.; Magnanti, Thomas L.; Orlin, James B. (1993), Network Flows: Theory, Algorithms and Applications, Prentice-Hall.
- Alt, H.; Blum, N.; Mehlhorn, K.; Paul, M. (1991), "Computing a maximum cardinality matching in a bipartite graph in time ", Information Processing Letters, 37 (4): 237–240, doi:10.1016/0020-0190(91)90195-N.
- Annamalai, Chidambaram (2018), "Finding perfect matchings in bipartite hypergraphs", Combinatorica, 38 (6): 1285–1307, arXiv:1509.07007, doi:10.1007/s00493-017-3567-2, MR 3910876, S2CID 1997334
- Bast, Holger; Mehlhorn, Kurt; Schäfer, Guido; Tamaki, Hisao (2006), "Matching algorithms are fast in sparse random graphs", Theory of Computing Systems, 39 (1): 3–14, CiteSeerX 10.1.1.395.6643, doi:10.1007/s00224-005-1254-y, MR 2189556, S2CID 9321036
- Chang, S. Frank; McCormick, S. Thomas (1990), A faster implementation of a bipartite cardinality matching algorithm, Tech. Rep. 90-MSC-005, Faculty of Commerce and Business Administration, Univ. of British Columbia. As cited by Setubal (1996).
- Darby-Dowman, Kenneth (1980), The exploitation of sparsity in large scale linear programming problems – Data structures and restructuring algorithms, Ph.D. thesis, Brunel University. As cited by Setubal (1996).
- Dinitz, Yefim (2006), "Dinitz' Algorithm: The Original Version and Even's Version", in Goldreich, Oded; Rosenberg, Arnold L.; Selman, Alan L. (eds.), Theoretical Computer Science: Essays in Memory of Shimon Even (PDF), Lecture Notes in Computer Science, vol. 3895, Berlin and Heidelberg: Springer, pp. 218–240, doi:10.1007/11685654_10, ISBN 978-3-540-32880-3.
- Edmonds, Jack (1965), "Paths, Trees and Flowers", Canadian Journal of Mathematics, 17: 449–467, doi:10.4153/CJM-1965-045-4, MR 0177907, S2CID 18909734.
- Gabow, Harold N. (2017), "The weighted matching approach to maximum cardinality matching", Fundamenta Informaticae, 154 (1–4): 109–130, arXiv:1703.03998, doi:10.3233/FI-2017-1555, MR 3690573, S2CID 386509
- Gabow, Harold N.; Tarjan, Robert E. (1991), "Faster scaling algorithms for general graph matching problems", Journal of the ACM, 38 (4): 815–853, doi:10.1145/115234.115366, S2CID 18350108.
- Hopcroft, John E.; Karp, Richard M. (1973), "An n5/2 algorithm for maximum matchings in bipartite graphs", SIAM Journal on Computing, 2 (4): 225–231, doi:10.1137/0202019. Previously announced at the 12th Annual Symposium on Switching and Automata Theory, 1971.
- Karzanov, A. V. (1973), "An exact estimate of an algorithm for finding a maximum flow, applied to the problem on representatives", Problems in Cybernetics, 5: 66–70. Previously announced at the Seminar on Combinatorial Mathematics (Moscow, 1971).
- Micali, S.; Vazirani, V. V. (1980), "An algorithm for finding maximum matching in general graphs", Proc. 21st IEEE Symp. Foundations of Computer Science, pp. 17–27, doi:10.1109/SFCS.1980.12, S2CID 27467816.
- Peterson, Paul A.; Loui, Michael C. (November 1988), "The general maximum matching algorithm of Micali and Vazirani", Algorithmica, 3 (1–4): 511–533, CiteSeerX 10.1.1.228.9625, doi:10.1007/BF01762129, ISSN 1432-0541, S2CID 16820.
- Motwani, Rajeev (1994), "Average-case analysis of algorithms for matchings and related problems", Journal of the ACM, 41 (6): 1329–1356, doi:10.1145/195613.195663, S2CID 2968208.
- Setubal, João C. (1993), "New experimental results for bipartite matching", Proc. Netflow93, Dept. of Informatics, Univ. of Pisa, pp. 211–216. As cited by Setubal (1996).
- Setubal, João C. (1996), Sequential and parallel experimental results with bipartite matching algorithms, Tech. Rep. IC-96-09, Inst. of Computing, Univ. of Campinas, CiteSeerX 10.1.1.48.3539.
- Tarjan, Robert Endre (1983). Data Structures and Network Algorithms. CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics. doi:10.1137/1.9781611970265. ISBN 978-0-89871-187-5.
- Vazirani, Vijay (2012), An Improved Definition of Blossoms and a Simpler Proof of the MV Matching Algorithm, CoRR abs/1210.4594, arXiv:1210.4594, Bibcode:2012arXiv1210.4594V.
Hopcroft–Karp algorithm
View on GrokipediaBackground
Bipartite Graphs and Matching
A bipartite graph is an undirected graph whose vertices can be partitioned into two disjoint sets such that no two graph vertices in the same set are adjacent.[3] This structure ensures that every edge connects a vertex from one set to a vertex from the other set, making bipartite graphs useful for modeling relationships between two distinct categories.[3] In a bipartite graph with vertex partitions and , a matching is a set of edges such that no two edges share a common vertex.[4] A maximum matching is a matching of largest possible cardinality, representing the maximum number of pairwise disjoint edges that can be selected.[4] If the maximum matching covers all vertices in both partitions (assuming ), it is called a perfect matching.[4] Hall's marriage theorem provides a necessary and sufficient condition for the existence of a perfect matching in a bipartite graph.[5] Specifically, for partitions and with , a perfect matching exists if and only if for every subset , the size of the neighborhood (the set of vertices in adjacent to at least one vertex in ) satisfies .[5] This condition, originally formulated in combinatorial terms, ensures balanced connectivity between the partitions.[5] A classic example of bipartite matching arises in assigning applicants to jobs, where one partition represents applicants and the other represents job openings, with edges indicating qualifications or preferences.[6] A maximum matching in this graph corresponds to the largest possible set of assignments where no applicant or job is assigned more than once.[6] Augmenting paths serve as a conceptual tool to extend existing matchings toward maximality.[4]Maximum Bipartite Matching Problem
The maximum bipartite matching problem seeks to identify the largest possible set of edges in a bipartite graph such that no two edges share a common vertex. Formally, given a bipartite graph with disjoint vertex sets and and edge set connecting vertices only between and , the objective is to find a matching that maximizes the cardinality , where a matching is a subset of edges with no two incident to the same vertex.[7][8] Naive strategies, such as greedily selecting edges one by one without backtracking, often fail to yield an optimal solution because they may commit to suboptimal choices early, blocking larger matchings. For instance, in a graph where an initial greedy pick isolates multiple vertices that could otherwise be matched through alternative paths, the resulting matching size falls short of the maximum.[9][10] These limitations underscore the computational challenge of ensuring global optimality in polynomial time, motivating the development of systematic algorithms that explore augmenting structures. A key insight for solving this problem lies in the concept of augmenting paths, which enable iterative improvement of a matching. According to Berge's lemma, a matching is maximum if and only if no -augmenting path exists in the graph—an augmenting path being a simple path that starts and ends with unmatched vertices and alternates between edges not in and edges in , allowing the matching to be enlarged by flipping along this path.[11] This characterization provides a foundation for algorithms that repeatedly search for such paths until none remain, thereby guaranteeing maximality. The problem traces its roots to early 20th-century combinatorics, where it emerged in studies of graph decompositions and set systems, with foundational contributions from Dénes Kőnig on vertex covers and matchings in bipartite graphs. Early algorithmic solutions appeared in the mid-20th century, notably Harold W. Kuhn's 1955 Hungarian method, which efficiently computes maximum matchings by iteratively adjusting potentials in weighted variants, laying groundwork for unweighted cases.[12][13]Augmenting Paths in Matching
In the context of bipartite matching, an augmenting path with respect to a matching is a simple path in the graph that alternates between edges in and edges not in , beginning and ending at vertices that are unmatched (exposed) by .[8] Such paths are crucial because they allow the matching to be extended iteratively toward a maximum cardinality matching.[8] When an augmenting path is identified, the symmetric difference —defined as the set of edges in exactly one of or —forms a new matching that includes all edges of except those in , plus the edges of not in . Since starts and ends with unmatched vertices, it contains one more edge not in than edges in , thereby increasing the size of the matching by exactly 1.[8] Berge's lemma provides a foundational characterization: a matching in a graph is maximum if and only if no augmenting path exists with respect to . This equivalence implies that algorithms for maximum bipartite matching can proceed by repeatedly searching for and augmenting along such paths until none remain. To illustrate, consider a simple bipartite graph with partitions and , and edges , , . Start with the initial matching , which leaves and unmatched. An augmenting path is , alternating between a non-matching edge (), a matching edge (), and another non-matching edge (). Augmenting along this path yields , a maximum matching of size 2 with no further augmenting paths.[8]Algorithm Description
High-Level Overview
The Hopcroft–Karp algorithm was developed by John E. Hopcroft and Richard M. Karp in 1973 to compute maximum cardinality matchings in bipartite graphs more efficiently than prior approaches that identify and augment a single path per iteration.[1] The algorithm's core innovation is its phased structure, which in each phase discovers a maximal collection of shortest vertex-disjoint augmenting paths: it begins with a breadth-first search from all free vertices on one side of the bipartition to build a layered graph capturing the shortest possible augmenting paths, followed by multiple depth-first searches within this layering to extract disjoint paths for simultaneous augmentation.[14] Augmenting paths form the essential mechanism for enlarging the matching size. These phases continue iteratively until no augmenting paths exist, yielding the maximum matching in O(E √V) time, where E denotes the number of edges and V the number of vertices.[1] This method contrasts with Ford–Fulkerson-style algorithms for bipartite matching, which can require up to O(V) separate augmentations and thus O(VE) time overall, by instead computing a blocking flow—a saturating set of shortest disjoint augmenting paths—per phase, limiting the total phases to O(√V) and enhancing efficiency.[2]Phase Structure
The Hopcroft–Karp algorithm operates through a series of iterative phases that systematically expand the matching in a bipartite graph. Each phase begins with a breadth-first search (BFS) to construct a layered graph based on the shortest distances from free vertices in one partite set, followed by multiple depth-first searches (DFS) to identify and augment along a maximal set of vertex-disjoint augmenting paths confined to these layers. This structure allows the algorithm to find multiple shortest augmenting paths simultaneously in each phase, distinguishing it from earlier methods that augment along single paths.[1] A phase terminates once no additional vertex-disjoint augmenting paths of the current shortest length remain available within the layered graph. This condition ensures that the phase exhausts all possible augmentations at the minimal distance level before proceeding, preventing redundant searches and maintaining efficiency. By design, each phase guarantees at least one augmentation to the matching, as the layered graph always contains at least one such path if the matching is not maximum.[1] The phased approach ensures monotonic progress toward a maximum matching by increasing the length of the shortest augmenting paths across successive phases. Specifically, after a phase completes, any remaining augmenting paths must be longer than those just saturated, as the layering reflects the minimal distances in the residual graph. This strict increase in shortest-path length precludes revisiting shorter paths in future phases. The number of phases is bounded by , where is the number of vertices.[1]BFS for Layering
The breadth-first search (BFS) phase in the Hopcroft–Karp algorithm constructs a layered representation of the residual graph, starting from all free vertices in the left partition , to identify the shortest possible augmenting paths in a bipartite graph . This layering ensures that subsequent searches for multiple disjoint augmenting paths operate on a structured subgraph where paths are of minimal length and do not cross between layers.[1] The BFS begins by initializing layer with all unmatched (free) vertices in , assigning them a distance of 0 in the residual graph. The residual graph is constructed with directed edges: forward edges from to for unmatched edges (where is the current matching), and backward edges from to for matched edges . From each vertex in the current layer , the BFS explores adjacent vertices in the residual graph that have not yet been visited, assigning them to layer with distance . This process continues until no further vertices can be reached or a free vertex in is encountered, at which point the layering terminates; the BFS does not explore beyond the minimal distance to free vertices in .[1] Layering adheres to strict parity rules to maintain the bipartite structure: even-numbered layers () consist exclusively of vertices from , while odd-numbered layers () contain vertices from . Edges in the residual graph are only traversed between consecutive layers, ensuring that all paths from to any layer have exactly length and alternate properly between and . Matched edges are handled via backward directions to allow alternation in augmenting paths, but only if they connect a visited vertex in an odd layer to an unvisited matched partner in an even layer. This directed traversal prevents revisiting vertices and enforces shortest-path distances.[1] The output of the BFS is a layered subgraph, often called the level graph, comprising layers (where is the length of the shortest augmenting path) and the edges between consecutive layers. In this structure, all shortest augmenting paths from free vertices in to free vertices in have length , allowing the subsequent DFS to efficiently discover a maximal set of vertex-disjoint such paths. If no free vertex in is reachable, the BFS indicates that the current matching is maximum.[1]DFS for Path Finding
In the Hopcroft–Karp algorithm, the depth-first search (DFS) phase operates on the directed level graph produced by the breadth-first search layering, enabling the discovery of multiple vertex-disjoint shortest augmenting paths in a single iteration. Each such path begins at an unmatched vertex in the left partition (layer 0) and ends at an unmatched vertex in the right partition (the final odd-numbered layer), alternating between non-matching edges from to and matching edges from to , strictly increasing the layer number by 1 at each step.[15] This restriction to layer-increasing edges ensures that all found paths are of minimal length, avoiding longer detours that could violate the phase's goal of maximal blocking flow. The DFS procedure is typically implemented recursively, starting from each unmatched in layer 0 that has not yet been explored in the current phase. From , the search iterates over its adjacent vertices such that the level of is exactly one greater than that of and has not been visited. If is unmatched, the path to is immediately augmenting, and the matching edges are reversed along the accumulated path by updating the pairing: set the mate of to and the mate of to . If is matched to some with level[] = level[] + 1, the search recurses on (which resides in the next even layer), attempting to find an augmenting subpath from ; success propagates back, reversing edges incrementally through the recursion stack.[16] Backtracking occurs when no viable neighbor yields a successful recursion or when is free but the path cannot be confirmed, causing the function to return false and unwind to the previous call.[15] To guarantee vertex-disjointness across multiple DFS invocations in the same phase, a visited flag is maintained for vertices in , reset at the phase's start and set upon attempting a branch from any to . This prevents subsequent DFS starts from reusing a that is part of an already-found path, ensuring no overlap in the right partition and thus producing a maximal set of disjoint augmenting paths. Similarly, explored starting vertices in (layer 0) are marked to avoid redundant searches, though the primary disjointness enforcement is via the -side flags. Upon finding and augmenting along a path, the matching size increases by one, and the process repeats for remaining free vertices in layer 0 until no more augmenting paths exist within the current layering.[16] This mechanism collectively augments the matching by the size of the blocking flow in the level graph.[15]Implementation Details
Pseudocode
The Hopcroft–Karp algorithm is typically implemented using breadth-first search (BFS) to construct level layers in the residual graph and multiple depth-first searches (DFS) to find and augment along disjoint shortest augmenting paths in each phase. The following pseudocode assumes a bipartite graph with left partition , right partition , and adjacency lists for each containing neighbors in . NIL is represented as 0, INF as a large integer (e.g., ). Arrays and track the current matching, stores level distances (with \mathrm{dist}{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}} for NIL), and a global array is used per phase for DFS efficiency (though omitted in simplest recursive forms).Initialization
pairU ← array of size n+1, initialized to 0 // pairU[u] = v if u matched to v, else 0
pairV ← array of size m+1, initialized to 0 // pairV[v] = u if v matched to u, else 0
dist ← array of size n+1, initialized to 0 // dist[0] for NIL
pairU ← array of size n+1, initialized to 0 // pairU[u] = v if u matched to v, else 0
pairV ← array of size m+1, initialized to 0 // pairV[v] = u if v matched to u, else 0
dist ← array of size n+1, initialized to 0 // dist[0] for NIL
BFS Procedure (Builds Levels and Checks for Augmenting Paths)
function BFS():
Q ← empty queue
for u = 1 to n:
if pairU[u] == 0:
dist[u] ← 0
Q.enqueue(u)
else:
dist[u] ← INF
dist[0] ← INF
while Q not empty:
u ← Q.dequeue()
if dist[u] < dist[0]: // Explore only if before reaching free vertices in V
for each v in adj[u]:
if dist[pairV[v]] == INF:
dist[pairV[v]] ← dist[u] + 1
Q.enqueue(pairV[v])
return dist[0] != INF // True if augmenting paths exist
function BFS():
Q ← empty queue
for u = 1 to n:
if pairU[u] == 0:
dist[u] ← 0
Q.enqueue(u)
else:
dist[u] ← INF
dist[0] ← INF
while Q not empty:
u ← Q.dequeue()
if dist[u] < dist[0]: // Explore only if before reaching free vertices in V
for each v in adj[u]:
if dist[pairV[v]] == INF:
dist[pairV[v]] ← dist[u] + 1
Q.enqueue(pairV[v])
return dist[0] != INF // True if augmenting paths exist
DFS Procedure (Finds and Augments a Single Shortest Path)
function DFS(u):
for each v in adj[u]: // Can use iterator to resume from last tried edge for efficiency
if dist[pairV[v]] == dist[u] + 1: // Follow only level-increasing edges
if pairV[v] == 0 or DFS(pairV[v]): // Free v or recurse to matched u'
pairU[u] ← v
pairV[v] ← u
return true
dist[u] ← INF // Block this u for future DFS in phase (optional optimization)
return false
function DFS(u):
for each v in adj[u]: // Can use iterator to resume from last tried edge for efficiency
if dist[pairV[v]] == dist[u] + 1: // Follow only level-increasing edges
if pairV[v] == 0 or DFS(pairV[v]): // Free v or recurse to matched u'
pairU[u] ← v
pairV[v] ← u
return true
dist[u] ← INF // Block this u for future DFS in phase (optional optimization)
return false
Main Algorithm (Phases of BFS and Multiple DFS)
matching ← 0
while BFS():
// Reset visited if using per-DFS visited for V (optional for disjointness)
for each free u in U (pairU[u] == 0):
if DFS(u):
matching ← matching + 1
return matching // Or the pairU/pairV arrays for the matching
matching ← 0
while BFS():
// Reset visited if using per-DFS visited for V (optional for disjointness)
for each free u in U (pairU[u] == 0):
if DFS(u):
matching ← matching + 1
return matching // Or the pairU/pairV arrays for the matching
Step-by-Step Execution Example
Consider a sample bipartite graph with partite sets and , and edges , , , , , , . The initial matching is empty.[1] In phase 1, the BFS begins from all free vertices in at level 0: . Using non-matching edges (all edges, since the matching is empty), the vertices in are reached at level 1: from , from , from , from . All vertices in are free, so the shortest augmenting path length . The level graph consists of edges from level 0 to level 1. The DFS is then performed repeatedly from free vertices in to find vertex-disjoint augmenting paths of length 1 in the level graph, marking used vertices to ensure disjointness. For example, DFS from reaches (free at level 1), yielding the path . Augmentation adds this edge to the matching. Next, DFS from reaches (free at level 1), yielding , and augmentation adds it. The current matching is . No further length-1 paths are found from remaining free vertices in this phase due to vertex usage in the level graph. (Note: In full execution, additional paths may be found depending on order, but this illustrates finding multiple disjoint paths.[1]) After phase 1 augmentation, the matching size is 2, with free and free . In phase 2, a new BFS is performed from free vertices at level 0. Non-matching edges reach from at level 1 and from at level 1. is matched to (level 2), and is free at level 1, so . However, to illustrate longer paths in subsequent steps (as the algorithm progresses), consider the updated state after further augmentation where a longer path is needed; the BFS layers would extend via matching edges to level 2 () and non-matching to level 3 (, free). The DFS then finds a single path, e.g., , but since , only short paths are sought; for demonstration, assume a configuration where the single path is (length 1), augmenting by adding it. Augmentation for longer paths involves flipping edges along the path: matched edges become unmatched, and unmatched become matched.[1] The final maximum matching is , of size 4. A new BFS in a subsequent phase reaches no free vertices, verifying no augmenting paths remain.[1]Theoretical Analysis
Time Complexity Proof
The time complexity of the Hopcroft–Karp algorithm is derived by analyzing the cost per phase and bounding the total number of phases. Each phase consists of a breadth-first search (BFS) to construct the layering of the graph, followed by multiple depth-first searches (DFS) to find a maximal set of vertex-disjoint augmenting paths within that layering. The BFS traverses each edge and vertex at most once, taking time, where is the number of edges. For the DFS portion, the layering induces a directed acyclic graph where edges only go from layer to , ensuring that across all DFS calls in a phase, each edge is examined at most once and each vertex is visited at most once per starting free vertex in . Thus, the total time for all DFS in a phase is also , yielding an overall per-phase cost of . The number of phases is at most , where is the total number of vertices in the bipartite graph . Each phase strictly increases the length of the shortest augmenting path relative to the current matching; specifically, after a phase that saturates a maximal set of shortest augmenting paths of length , no augmenting path of length remains, so the next shortest length is at least (since path lengths are odd in the standard formulation). Consequently, there are at most "early" phases where the shortest path length , as the possible odd lengths number . After these early phases, if the matching is not maximum, let be a maximum matching with . By the theory of alternating paths, there exist at least vertex-disjoint augmenting paths relative to . Each such path now has length at least . Since these paths are vertex-disjoint, they use at least distinct vertices, which cannot exceed , implying . In the remaining "late" phases, each phase augments the matching by at least 1, so there are at most such phases. Therefore, the total number of phases is at most . Combining these bounds, the total running time satisfies This improves upon the time of simpler augmenting path methods by efficiently finding multiple disjoint paths per phase.Correctness and Termination
The correctness of the Hopcroft–Karp algorithm is established through its iterative augmentation along maximal sets of vertex-disjoint shortest augmenting paths, ensuring that the final matching admits no augmenting paths relative to it. By Berge's lemma, a matching in a bipartite graph is maximum if and only if no augmenting path exists; an augmenting path would allow symmetric difference with the current matching to yield a larger one, while the absence of such paths implies maximality.[2] The algorithm simulates the effect of repeated single-path augmentations by finding multiple disjoint paths in each phase, preserving the invariant that the matching grows until no augmenting paths remain, as verified by induction on the phases: assuming prior augmentations yield a valid partial matching, the current phase's paths are augmenting due to their disjointness and construction from the layered graph.[17] The focus on shortest augmenting paths, identified via BFS layering from all free vertices in one partite set, ensures completeness in path discovery: the BFS constructs levels where edges connect consecutive layers, guaranteeing that any augmenting path of minimal length is captured, and subsequent DFS explorations within these layers find a maximal disjoint set without overlap. This layering prevents the oversight of viable paths, as all shortest paths share the same length and are exhaustively searched from the free vertices, maintaining the matching's validity after augmentation.[2][16] Termination follows from the strict progress in each phase: if augmenting paths exist, at least one (in fact, a maximal set) is found and augmented, increasing the matching size by at least one; otherwise, the algorithm halts with no augmenting paths, yielding a maximum matching. The matching size is bounded above by , where and are the partite sets, so at most this many augmentations can occur before saturation. Additionally, after each successful phase, the length of the shortest remaining augmenting path (if any) strictly increases by at least 2, preventing cycles in the process. The BFS from all free vertices implicitly handles disconnected components by exploring all reachable structure simultaneously.[2][17]Comparisons and Extensions
Versus Other Bipartite Matching Algorithms
The Hopcroft–Karp algorithm represents a significant improvement over earlier bipartite matching methods, such as Kuhn's algorithm (commonly known as the Hungarian algorithm), which relies on repeated depth-first searches to find single augmenting paths until a maximum matching is achieved. This approach results in a time complexity of in the worst case, where is the number of vertices and is the number of edges, making it less efficient for dense graphs or large-scale instances where multiple augmentations are needed.[18][19] In contrast, the Hopcroft–Karp algorithm phases its augmentations by identifying multiple shortest disjoint augmenting paths simultaneously via breadth-first search layering, achieving an optimal time complexity of , which is particularly advantageous for sparse graphs common in real-world applications.[18] Another notable alternative is the adaptation of Dinic's algorithm, originally designed for general maximum flow problems, to the bipartite matching setting by modeling it as a unit-capacity flow network. Dinic's method builds level graphs iteratively and finds blocking flows using depth-first searches, yielding a general time complexity of , though it matches the Hopcroft–Karp bound of for unit-capacity cases like bipartite matching.[20] The Hopcroft–Karp algorithm can be viewed as a specialized instance of Dinic's framework tailored exclusively to bipartite graphs, incorporating similar layering but with optimizations that eliminate unnecessary generality, leading to lower constant factors and simpler implementation in practice.[20][18] In terms of practical selection, the Hopcroft–Karp algorithm is preferred for pure maximum cardinality bipartite matching tasks, especially on sparse graphs where its phase-based multiple-path finding yields substantial speedups over the single-path augmentations of Kuhn's method. Dinic's adaptation, while versatile for broader flow problems, may introduce overhead from handling general capacities and thus is better suited when the problem extends beyond strict bipartite cardinality matching. Overall, these advantages position Hopcroft–Karp as the go-to method for efficient bipartite matching in theoretical and applied contexts.[20][18]Adaptations for Non-Bipartite Graphs
While the Hopcroft–Karp algorithm efficiently finds multiple shortest augmenting paths in bipartite graphs, adapting these ideas to non-bipartite graphs faces significant challenges due to the presence of odd cycles, which create blossoms—cyclic structures in the alternating graph that can cause augmenting paths to intersect themselves or form loops, preventing simple disjoint path searches.[21] These blossoms require special handling to avoid redundant explorations and ensure correctness, as standard breadth-first search layering used in Hopcroft–Karp may not suffice without modifications for non-even-length alternating paths.[22] The foundational counterpart for general graphs is Edmonds' matching algorithm from 1965, which extends the augmenting path theorem to non-bipartite settings by identifying and shrinking blossoms into single vertices, allowing recursive searches for paths in the contracted graph.[23] This approach, while conceptually similar to Hopcroft–Karp's path-finding phases, involves more complex data structures for blossom detection and contraction, leading to an original time complexity of , later refined to through efficient implementations using adjacency matrices and priority queues.[24] Direct adaptations of Hopcroft–Karp's phase-based strategy to general graphs appear in the Micali–Vazirani algorithm (1980), which achieves time by finding a maximal set of shortest vertex-disjoint augmenting paths per phase, while implicitly managing blossoms through careful layering and orientation of edges in the auxiliary graph, without explicit shrinking.[25] This matches the efficiency of Hopcroft–Karp for bipartite cases but requires additional bookkeeping for potential odd cycles. Partial adaptations leverage Hopcroft–Karp phases within blossom-free subgraphs, such as those induced after initial blossom contractions in hybrid Edmonds-based frameworks, enabling faster local augmentations before global refinements.[26] Overall, these extensions highlight the trade-off: while bipartite efficiency is preserved asymptotically in sparse general graphs, the added complexity from blossoms increases constant factors and implementation difficulty compared to the bipartite bound.[27]Applications and Implementations
Real-World Uses
The Hopcroft–Karp algorithm finds extensive application in assignment problems, particularly in job scheduling and resource allocation within bipartite graph frameworks. In job scheduling scenarios, such as dynamic task distribution in cloud computing environments, the algorithm efficiently computes maximum matchings between tasks and processing resources like virtual machines, reducing allocation time and optimizing workload balance.[28] For resource allocation, it supports multiagent systems where agents compete for limited items, maximizing the likelihood of successful assignments while minimizing compromises, as demonstrated in auction-based mechanisms for distributed systems.[29] In bioinformatics, the Hopcroft–Karp algorithm is employed to analyze protein-protein interaction (PPI) networks modeled as bipartite graphs, aiding in controllability studies and antiviral drug target identification. For example, in examining influenza virus-host PPI networks, it identifies driver nodes by computing maximum matchings, revealing minimal intervention points to control network dynamics and prioritize therapeutic targets.[30] Similarly, it facilitates the discovery of input nodes for structural controllability in larger biological networks, such as those derived from high-throughput omics data, enhancing understanding of disease pathways.[31] The algorithm plays a key role in computer vision tasks involving feature matching, where it aligns descriptors like SIFT keypoints between images to support applications such as relative pose estimation and image classification. In pose estimation pipelines, Hopcroft–Karp computes optimal correspondences in putative matches, improving accuracy in scenarios with unknown point-to-point associations, as seen in robust estimation for camera calibration.[32] It also accelerates maximum cardinality matching for hepatic lesion classification, where matching metrics between image features and reference patterns enable efficient diagnostic systems.[33] Since the 2000s, the Hopcroft–Karp algorithm has been integrated into recommendation systems, treating user-item interactions as bipartite graphs to generate personalized suggestions via maximum matching. In web discovery platforms, it constructs recommendation subgraphs by finding disjoint matchings in user-query or user-content graphs, boosting relevance and scalability in collaborative filtering setups.[34] This approach extends to machine learning pipelines for content recommendation, where it optimizes pairings in sparse interaction networks to enhance prediction accuracy without exhaustive computation.[35] In VLSI design, the algorithm addresses gate assignment and routing challenges by solving bipartite matching problems for track and net allocations, minimizing bends and congestion in layout optimization. For initial detailed routing in modern chip designs, it processes layered graphs to assign paths efficiently, supporting track-structured architectures in high-density circuits.[36] Its use in lithography mask decomposition further aids in slicing complex polygons into rectangles for fabrication, ensuring manufacturable designs with reduced complexity.[37]Software Libraries and Code Examples
The Hopcroft–Karp algorithm is implemented in several established software libraries for graph algorithms, facilitating its use in various programming languages. In Python, the NetworkX library provides thehopcroft_karp_matching function, which computes the maximum cardinality matching in a bipartite graph represented as a NetworkX Graph object.[38] This function requires an undirected bipartite graph and optionally a container specifying one partition (top_nodes) if the graph is disconnected, returning a dictionary mapping matched nodes to their partners.[38] NetworkX, maintained by the Python Software Foundation, ensures efficient integration with other graph tools and is suitable for both small prototypes and larger analyses.
For Java, the JGraphT library includes the HopcroftKarpMaximumCardinalityBipartiteMatching class, which implements the algorithm for undirected bipartite graphs, accepting self-loops and multiple edges while running in O(|E| √|V|) time.[39] The constructor takes the graph and the two vertex partitions as sets, producing a matching without verifying bipartiteness (users should confirm via GraphTests.isBipartite).[39] JGraphT, an open-source project under Apache License 2.0, supports generic vertex and edge types, making it versatile for enterprise applications.
In C++, while the Boost Graph Library offers general maximum matching via flow-based methods like Edmonds' algorithm, dedicated Hopcroft–Karp implementations are available in open-source repositories such as TheAlgorithms/C++, which provides a header-only file (hopcroft_karp.cpp) using adjacency lists for bipartite graphs up to moderate sizes.[40] This implementation follows the standard BFS-DFS layering approach and is part of a community-maintained collection of algorithms, licensed under MIT. For custom needs, developers can implement from pseudocode using adjacency lists in languages like Java via Apache Commons Collections for graph structures, though full algorithm logic requires additional coding.
Code Examples
A simple Python example using NetworkX demonstrates integration with adjacency-based graph construction. The input is a bipartite graph with nodes partitioned by sets, edges added explicitly, and the output a dictionary of matches:import networkx as nx
# Create bipartite graph
G = nx.Graph()
G.add_nodes_from([1, 2], bipartite=0) # Left partition
G.add_nodes_from(['a', 'b'], bipartite=1) # Right partition
G.add_edges_from([(1, 'a'), (1, 'b'), (2, 'a')]) # Edges
# Compute matching
matching = nx.bipartite.hopcroft_karp_matching(G, top_nodes=[1, 2])
print(matching) # Output: {1: 'b', 'b': 1, 2: 'a', 'a': 2}
import networkx as nx
# Create bipartite graph
G = nx.Graph()
G.add_nodes_from([1, 2], bipartite=0) # Left partition
G.add_nodes_from(['a', 'b'], bipartite=1) # Right partition
G.add_edges_from([(1, 'a'), (1, 'b'), (2, 'a')]) # Edges
# Compute matching
matching = nx.bipartite.hopcroft_karp_matching(G, top_nodes=[1, 2])
print(matching) # Output: {1: 'b', 'b': 1, 2: 'a', 'a': 2}
class BipGraph:
def __init__(self, m, n):
self.m = m # Left vertices
self.n = n # Right vertices
self.graph = [[] for _ in range(m + 1)] # Adjacency lists (1-indexed)
self.pairU = [-1] * (m + 1)
self.pairV = [-1] * (n + 1)
self.dist = [0] * (m + 1)
def addEdge(self, u, v):
self.graph[u].append(v)
def bfs(self):
from collections import deque
queue = deque()
for u in range(1, self.m + 1):
if self.pairU[u] == -1:
self.dist[u] = 0
queue.append(u)
else:
self.dist[u] = float('inf')
self.dist[0] = float('inf')
while queue:
u = queue.popleft()
if self.dist[u] < self.dist[0]:
for v in self.graph[u]:
if self.dist[self.pairV[v]] == float('inf'):
self.dist[self.pairV[v]] = self.dist[u] + 1
queue.append(self.pairV[v])
return self.dist[0] != float('inf')
def dfs(self, u):
for v in self.graph[u]:
if self.dist[self.pairV[v]] == self.dist[u] + 1:
if self.pairV[v] == -1 or self.dfs(self.pairV[v]):
self.pairU[u] = v
self.pairV[v] = u
return True
self.dist[u] = float('inf')
return False
def hopcroftKarp(self):
matching = 0
while self.bfs():
for u in range(1, self.m + 1):
if self.pairU[u] == -1 and self.dfs(u):
matching += 1
return matching
# Example usage
g = BipGraph(4, 4)
g.addEdge(1, 2); g.addEdge(1, 3)
g.addEdge(2, 1)
g.addEdge(3, 2)
g.addEdge(4, 2); g.addEdge(4, 4)
print(g.hopcroftKarp()) # Output: 3
class BipGraph:
def __init__(self, m, n):
self.m = m # Left vertices
self.n = n # Right vertices
self.graph = [[] for _ in range(m + 1)] # Adjacency lists (1-indexed)
self.pairU = [-1] * (m + 1)
self.pairV = [-1] * (n + 1)
self.dist = [0] * (m + 1)
def addEdge(self, u, v):
self.graph[u].append(v)
def bfs(self):
from collections import deque
queue = deque()
for u in range(1, self.m + 1):
if self.pairU[u] == -1:
self.dist[u] = 0
queue.append(u)
else:
self.dist[u] = float('inf')
self.dist[0] = float('inf')
while queue:
u = queue.popleft()
if self.dist[u] < self.dist[0]:
for v in self.graph[u]:
if self.dist[self.pairV[v]] == float('inf'):
self.dist[self.pairV[v]] = self.dist[u] + 1
queue.append(self.pairV[v])
return self.dist[0] != float('inf')
def dfs(self, u):
for v in self.graph[u]:
if self.dist[self.pairV[v]] == self.dist[u] + 1:
if self.pairV[v] == -1 or self.dfs(self.pairV[v]):
self.pairU[u] = v
self.pairV[v] = u
return True
self.dist[u] = float('inf')
return False
def hopcroftKarp(self):
matching = 0
while self.bfs():
for u in range(1, self.m + 1):
if self.pairU[u] == -1 and self.dfs(u):
matching += 1
return matching
# Example usage
g = BipGraph(4, 4)
g.addEdge(1, 2); g.addEdge(1, 3)
g.addEdge(2, 1)
g.addEdge(3, 2)
g.addEdge(4, 2); g.addEdge(4, 4)
print(g.hopcroftKarp()) # Output: 3
addEdge(u, v) where u is left and v right, and computes the matching size (here 3 for the given edges).[41]
Implementations like these scale to graphs with up to 10^5 vertices and millions of edges on standard hardware, thanks to the O(E √V) complexity and optimizations such as adjacency lists for sparse representations, which minimize memory and traversal overhead.[42] Current open-source repositories on GitHub, such as sofiatolaosebikan/hopcroftkarp for Python, provide standalone packages installable via pip for quick deployment in 2025 projects.[43]