Kosaraju's algorithm
View on WikipediaThis article includes a list of references, related reading, or external links, but its sources remain unclear because it lacks inline citations. (July 2020) |
In computer science, Kosaraju-Sharir's algorithm (also known as Kosaraju's algorithm) is a linear time algorithm to find the strongly connected components of a directed graph. Aho, Hopcroft and Ullman credit it to S. Rao Kosaraju and Micha Sharir.[1][2] Kosaraju suggested it in 1978 but did not publish it, while Sharir independently discovered it and published it in 1981.[3] It makes use of the fact that the transpose graph (the same graph with the direction of every edge reversed) has exactly the same strongly connected components as the original graph.
The algorithm
[edit]The primitive graph operations that the algorithm uses are to enumerate the vertices of the graph, to store data per vertex (if not in the graph data structure itself, then in some table that can use vertices as indices), to enumerate the out-neighbours of a vertex (traverse edges in the forward direction), and to enumerate the in-neighbours of a vertex (traverse edges in the backward direction); however the last can be done without, at the price of constructing a representation of the transpose graph during the forward traversal phase. The only additional data structure needed by the algorithm is an ordered list L of graph vertices, that will grow to contain each vertex once.
If strong components are to be represented by appointing a separate root vertex for each component, and assigning to each vertex the root vertex of its component, then Kosaraju's algorithm can be stated as follows.
- For each vertex u of the graph, mark u as unvisited. Let L be empty.
- For each vertex u of the graph do
Visit(u), whereVisit(u)is the recursive subroutine:- If u is unvisited then:
- Mark u as visited.
- For each out-neighbour v of u, do
Visit(v). - Prepend u to L.
- Otherwise do nothing.
- If u is unvisited then:
- For each element u of L in order, do
Assign(u,u)whereAssign(u,root)is the recursive subroutine:- If u has not been assigned to a component then:
- Assign u as belonging to the component whose root is root.
- For each in-neighbour v of u, do
Assign(v,root).
- Otherwise do nothing.
- If u has not been assigned to a component then:
Trivial variations are to instead assign a component number to each vertex, or to construct per-component lists of the vertices that belong to it. The unvisited/visited indication may share storage location with the final assignment of root for a vertex.
The key point of the algorithm is that during the first (forward) traversal of the graph edges, vertices are prepended to the list L in post-order relative to the search tree being explored. This means it does not matter whether a vertex v was first visited because it appeared in the enumeration of all vertices or because it was the out-neighbour of another vertex u that got visited; either way v will be prepended to L before u is, so if there is a forward path from u to v then u will appear before v on the final list L (unless u and v both belong to the same strong component, in which case their relative order in L is arbitrary).
This means, that each element n of the list can be made to correspond to a block L[in-1: in], where the block consists of all the vertices reachable from vertex n using just outward edges at each node in the path. It is important to note that no vertex in the block beginning at n has an inward link from any of the blocks beginning at some vertex to its right, i.e., the blocks corresponding to vertices in, in+1, … N in the list. This is so, because otherwise the vertex having the inward link (say from the block beginning at n' ≥ in+1) would have been already visited and pre-pended to L in the block of n', which is a contradiction. On the other hand, vertices in the block starting at n can have edges pointing to the blocks starting at some vertex in {in, in+1, … N}.
Step 3 of the algorithm, starts from L[0], assigns all vertices which point to it, the same component as L[0]. Note that these vertices can only lie in the block beginning at L[0] as higher blocks can't have links pointing to vertices in the block of L[0]. Let the set of all vertices that point to L[0] be In(L[0]). Subsequently, all the vertices pointing to these vertices, In(In(L[0])) are added too, and so on till no more vertices can be added.
There is a path to L[0], from all the vertices added to the component containing L[0]. And there is a path to all the vertices added from L[0], as all those lie in the block beginning at L[0] (which contains all the vertices reachable from L[0] following outward edges at each step of path). Hence all these form a single strongly connected component. Moreover, no vertex remains, because, to be in this strongly connected component a vertex must be reachable from L[0] and must be able to reach L[0]. All vertices that are able to reach L[0], if any, lie in the first block only, and all the vertices in first block are reachable from L[0]. So the algorithm chooses all the vertices in the connected component of L[0].
When we reach vertex v = L[i], in the loop of step 3, and v hasn't been assigned to any component, we can be sure that all the vertices to the left have made their connected components; that v doesn't belong to any of those components; that v doesn't point to any of the vertices to the left of it. Also, since, no edge from higher blocks to v's block exists, the proof remains same.
As given above, the algorithm for simplicity employs depth-first search, but it could just as well use breadth-first search as long as the post-order property is preserved.
The algorithm can be understood as identifying the strong component of a vertex u as the set of vertices which are reachable from u both by backwards and forwards traversal. Writing F(u) for the set of vertices reachable from u by forward traversal, B(u) for the set of vertices reachable from u by backwards traversal, and P(u) for the set of vertices which appear strictly before u on the list L after phase 2 of the algorithm, the strong component containing a vertex u appointed as root is
Set intersection is computationally costly, but it is logically equivalent to a double set difference, and since it becomes sufficient to test whether a newly encountered element of B(u) has already been assigned to a component or not.
Complexity
[edit]Provided the graph is described using an adjacency list, Kosaraju's algorithm performs two complete traversals of the graph and so runs in Θ(V+E) (linear) time, which is asymptotically optimal because there is a matching lower bound (any algorithm must examine all vertices and edges). It is the conceptually simplest efficient algorithm, but is not as efficient in practice as Tarjan's strongly connected components algorithm and the path-based strong component algorithm, which perform only one traversal of the graph.
If the graph is represented as an adjacency matrix, the algorithm requires Ο(V2) time.
References
[edit]- ^ Aho, Alfred V.; Hopcroft, John E.; Ullman, Jeffrey D. (1999). Data structures and algorithms. Addison-Wesley series in computer science and information processing (Repr. with corrections ed.). Reading, Mass.: Addison-Wesley. ISBN 978-0-201-00023-8.
- ^ Cormen, Thomas H.; Leiserson, Charles Eric; Rivest, Ronald Linn; Stein, Clifford (2009). Introduction to algorithms (3rd ed.). Cambridge, Massachusetts London, England: MIT Press. ISBN 978-0-262-03384-8.
- ^ Sharir, M. (1981). "A strong-connectivity algorithm and its applications in data flow analysis". Computers & Mathematics with Applications. 7 (1): 67–72. doi:10.1016/0898-1221(81)90008-0.
External links
[edit]Kosaraju's algorithm
View on GrokipediaIntroduction
Overview and Purpose
Kosaraju's algorithm is a linear-time method for identifying all strongly connected components (SCCs) in a directed graph.[5] A strongly connected component is defined as a maximal subgraph where every pair of distinct vertices is mutually reachable via directed paths.[5] The algorithm was invented by S. Rao Kosaraju in the 1970s during a series of lectures at Johns Hopkins University, with the ideas remaining unpublished until a 1978 manuscript.[6] It was independently discovered and published by Micha Sharir in 1981.[6] At a high level, the algorithm relies on two passes of depth-first search (DFS): the first pass traverses the original graph to record the finishing times of vertices, providing a reverse postorder that corresponds to a topological ordering of the SCCs; the second pass traverses the transpose graph—obtained by reversing all edge directions—in the order of decreasing finishing times, with each DFS tree identifying an SCC.[5] By partitioning the graph into its SCCs, Kosaraju's algorithm enables the condensation of the original directed graph into a directed acyclic graph (DAG), where each SCC is contracted to a single vertex; this structure is essential for analyzing graph connectivity, resolving reachability queries, and addressing dependency relations in applications such as program flow analysis.[6]Historical Development
Kosaraju's algorithm traces its origins to an unpublished manuscript by S. Rao Kosaraju, a computer scientist at Johns Hopkins University, who first described the method in his 1978 lecture notes during a course at Johns Hopkins University. These notes outlined a two-pass depth-first search approach for identifying strongly connected components in directed graphs, but they were not formally published at the time, limiting early dissemination.[7] Independently, Israeli computer scientist Micha Sharir developed and published a nearly identical algorithm in 1981, presenting it in the paper "A strong-connectivity algorithm and its applications in data flow analysis" within Computers and Mathematics with Applications. Sharir's work applied the algorithm to data flow analysis in compilers, highlighting its efficiency for program optimization tasks.[8] Due to the parallel discoveries, the algorithm is frequently attributed to both researchers and termed the Kosaraju-Sharir algorithm in academic literature. Aho, Hopcroft, and Ullman's influential 1983 textbook Data Structures and Algorithms provided one of the earliest printed acknowledgments, crediting Kosaraju's unpublished contribution alongside Sharir's publication and thereby bringing wider attention to the method despite its pre-publication obscurity.[7] Kosaraju's version gained particular prominence in computer science education for its straightforward implementation using basic depth-first search traversals, contrasting with more complex alternatives like Tarjan's algorithm from the same era. The development occurred amid a surge in graph algorithm research during the 1970s and 1980s, building on foundational advancements such as Edsger Dijkstra's 1959 shortest-path algorithm and Robert Floyd and Stephen Warshall's 1962 all-pairs shortest-path method, which had spurred interest in efficient graph traversals and connectivity problems. Later, the algorithm received broader recognition through its inclusion in Cormen, Leiserson, Rivest, and Stein's Introduction to Algorithms (third edition, 2009), where it is detailed as a standard technique for strongly connected components.Background Concepts
Strongly Connected Components
In a directed graph $ G = (V, E) $, a strongly connected component (SCC) is a maximal subgraph $ H $ such that for every pair of distinct vertices $ u, v \in V(H) $, there exists a directed path from $ u $ to $ v $ in $ H $ and a directed path from $ v $ to $ u $ in $ H $.[9] This definition captures mutual reachability within the component, ensuring it cannot be extended by adding another vertex while preserving the property. Trivial SCCs consist of single vertices with no self-loops or cycles involving them. The SCCs of a directed graph partition its vertex set, with each vertex belonging to exactly one SCC.[9] Contracting each SCC into a single vertex, while preserving edges between different components (and removing self-loops), yields the condensation graph, which is always a directed acyclic graph (DAG).[9] This structure highlights the hierarchical organization of reachability in the original graph, where edges in the condensation represent one-way dependencies between components. For example, in a directed cycle where each vertex points to the next, forming a loop back to the start, the entire graph constitutes a single SCC due to the cyclic paths enabling mutual reachability.[10] Conversely, an isolated vertex with no incoming or outgoing edges forms a trivial SCC of size one.[11] SCCs reveal the cyclic or feedback structures in directed graphs, which are vital for analyzing dependencies, such as in workflow systems or network topologies where mutual accessibility indicates tightly coupled elements.[12] They provide insights into reachability patterns that undirected analyses overlook. Unlike weakly connected components, which treat the graph as undirected by ignoring edge directions and focus solely on undirected path existence, SCCs emphasize directed paths and thus better model asymmetric relationships in real-world directed networks.[9] Kosaraju's algorithm computes these components efficiently for large graphs.Graph Transpose and DFS
The transpose of a directed graph $ G = (V, E) $, denoted $ G^T $, is the directed graph with the same vertex set $ V $ but with the edge set $ E^T = {(v, u) \mid (u, v) \in E} $, which reverses the direction of every edge in $ G $.[13] This construction preserves the vertices while flipping the orientation of all arcs, making $ G^T $ a mirror image of $ G $ in terms of edge directions.[14] A fundamental property of the transpose graph is the reversal of reachability directions: if there exists a directed path from vertex $ u $ to vertex $ v $ in $ G $, then there exists a directed path from $ v $ to $ u $ in $ G^T $.[14] This holds because reversing all edges in a path from $ u $ to $ v $ yields a valid path in the opposite direction in $ G^T $, and the absence of such a path in one graph implies its absence in the transpose.[12] Depth-first search (DFS) is a recursive algorithm for traversing or searching tree or graph data structures, starting from a source vertex and exploring as far as possible along each branch before backtracking to explore alternative paths.[15] In a directed graph, DFS systematically visits all vertices reachable from the starting vertex by following directed edges, marking vertices as discovered to avoid cycles.[16] It assigns timestamps to each vertex: the discovery time records when the vertex is first visited, and the finishing time records when the recursive exploration of its subtree completes and the vertex is fully processed.[15] These timestamps help analyze the order of visitation and the structure of the graph. The DFS traversal organizes the visited vertices into a DFS forest, a collection of DFS trees where each tree spans the vertices reachable from a root in a connected component (considering directed paths).[17] Relative to this forest, graph edges are classified into four types based on the timestamps and tree structure: tree edges, which form the branches of the DFS trees and connect a vertex to an unvisited neighbor; back edges, which point from a vertex to one of its ancestors in the same tree (indicating cycles); forward edges, which point from a vertex to a non-direct descendant in the same tree; and cross edges, which connect vertices belonging to different trees or non-ancestor-descendant pairs within the same tree.[18] This edge classification provides insights into the graph's connectivity and is useful for detecting cycles or ordering vertices.[17] When implemented with adjacency lists, DFS runs in $ O(|V| + |E|) $ time, visiting each vertex and edge exactly once across the entire graph (potentially requiring multiple calls from unvisited roots to cover all components).[16] This linear-time efficiency makes DFS a cornerstone for many graph algorithms, including those for finding strongly connected components.Algorithm Description
First Pass: DFS on Original Graph
The first pass of Kosaraju's algorithm performs a depth-first search (DFS) traversal on the original directed graph $ G $ to compute the finishing times of its vertices, ensuring comprehensive coverage of all vertices regardless of the graph's connectivity. This traversal begins by iterating over all vertices in an arbitrary order and initiating DFS from each unvisited vertex to handle multiple weakly connected components.[19][20] In this DFS, each vertex is marked as visited upon entry, and the algorithm recursively explores all unvisited outgoing neighbors before recording the vertex's finishing time. Finishing times are captured in post-order: a vertex is added to the output list $ L $ (or pushed onto a stack) only after all its descendants in the DFS forest have been fully explored. This produces a reverse post-order sequence in $ L $, which approximates a topological order if $ G $ is acyclic and serves to order vertices for the subsequent pass on the graph's transpose.[19][20] The following pseudocode illustrates the first pass, where $ L $ is built by prepending (or using a stack for efficiency in reverse post-order):DFS-FirstPass(G):
marked ← array of |V| booleans, initialized to false
L ← empty list
for each vertex v ∈ V[G]:
if not marked[v]:
DFS-Visit(G, v, marked, L)
return L // vertices in reverse post-order
DFS-Visit(G, v, marked, L):
marked[v] ← true
for each neighbor w of v in G:
if not marked[w]:
DFS-Visit(G, w, marked, L)
prepend v to L // or push v onto a stack
This implementation ensures linear-time execution relative to the graph size and directly supports the algorithm's overall correctness.[19]
Second Pass: DFS on Transpose Graph
In the second pass of Kosaraju's algorithm, the directed graph is first transposed to form , where every edge is reversed to become . This transposition preserves the structure of strongly connected components while reversing the direction of reachability.[21] The vertices are then processed in the order given by the list from the first pass, traversed from the last finished vertex to the first (i.e., in decreasing order of finishing times). For each vertex , if has not yet been visited, a depth-first search (DFS) is initiated from on . This DFS explores and marks all vertices reachable from in , collecting them into a set. These reachable vertices in correspond precisely to the set of vertices in the original graph from which is reachable, forming a tree in the DFS forest.[22][23] Processing vertices in decreasing finishing time order ensures that each new DFS starts at a vertex that acts as the "source" in the remaining portion of the graph's condensation DAG (the directed acyclic graph formed by contracting each strongly connected component to a single vertex). This order guarantees that the starting vertex for each DFS is a root of an arborescence in the transpose graph corresponding to a sink component in the condensation DAG of the original graph, preventing overlap and ensuring complete coverage without revisiting components.[22] The following pseudocode outlines the second pass, assuming a DFS procedureDFS-Assign(G^T, v) that visits and collects reachable vertices from while marking them as visited and assigning them to the current component:
SecondPass(G^T, L):
index = 1
visited = empty set
for each v in L:
if v not in visited:
component = DFS-Assign(G^T, v)
assign component vertices to SCC index
index = index + 1
Each invocation of DFS in this pass discovers exactly one strongly connected component, as the ordering and transposition together isolate the mutual reachability groups.[21][23]
Assigning Strongly Connected Components
In Kosaraju's algorithm, the assignment of strongly connected components (SCCs) occurs during the second pass over the transpose graph, where each depth-first search (DFS) traversal identifies a distinct SCC by grouping all vertices reachable from the starting vertex in that traversal. Specifically, for each unvisited vertex selected from the stack (ordered by decreasing finishing times from the first pass), the DFS explores the transpose graph and collects all visited vertices into a single component; these vertices form an SCC, with the starting vertex often serving as its representative or assigned a unique identifier for labeling purposes. This process ensures that vertices within the same SCC are mutually reachable, as the finishing times guarantee that no cross-component reachability disrupts the grouping.[20] The output of the algorithm is a partitioning of the graph's vertices into disjoint SCCs, typically represented as a list of sets or lists (one per SCC) or an array labeling each vertex with its component ID. For implementation, the transpose graph is constructed by reversing the adjacency lists of the original graph, which can be done in linear time by iterating over all edges and swapping directions. The following high-level pseudocode integrates both passes, assuming adjacency list representation and a stack for finishing times:function Kosaraju(Graph G):
visited = empty set
stack = empty stack
for each vertex v in G:
if v not in visited:
DFS(v, G, visited, stack) // push v to stack upon finishing
GT = transpose(G) // reverse all adjacency lists
visited = empty set
SCCs = empty list
while stack is not empty:
v = stack.pop()
if v not in visited:
component = empty list
DFS(v, GT, visited, component) // append visited vertices to component
append component to SCCs
return SCCs // list of SCCs, in topological order of condensation DAG
Here, the DFS in the first pass pushes vertices to the stack post-visit, and the second-pass DFS populates the component list with reachable vertices.[22][20]
The algorithm handles edge cases gracefully: for an empty graph (no vertices), it returns an empty list of SCCs; for a graph with a single vertex (no edges), that vertex forms a singleton SCC; and for a fully connected graph where all vertices are mutually reachable, the entire vertex set is assigned to one SCC during the single second-pass DFS invocation. Notably, the SCCs are output in topological order corresponding to the condensation graph (a directed acyclic graph where each SCC is contracted to a single vertex), facilitating further analysis like topological sorting on the components.[20]
Illustrative Example
Sample Graph Setup
To illustrate Kosaraju's algorithm, consider a directed graph with five vertices labeled A, B, C, D, and E, and the following eight directed edges: A → B, A → E, B → C, B → D, C → A, C → D, D → E, and E → D. This graph features two cycles—A → B → C → A and D → E → D—connected by directed bridges from the first cycle to the second (via B → D and C → D), resulting in two strongly connected components that the algorithm will identify. The graph can be represented in adjacency list format as follows:A: B, E
B: C, D
C: A, D
D: E
E: D
The transpose graph, obtained by reversing the direction of every edge, has the edges: B → A, E → A, C → B, D → B, A → C, D → C, E → D, and D → E. Its adjacency list is:
A: C
B: A
C: B
D: B, C, E
E: A, D
At the initial state, all vertices (A through E) are marked as unvisited, with no prior depth-first search traversals or component assignments performed.