Prim's algorithm
View on Wikipedia
In computer science, Prim's algorithm is a greedy algorithm that finds a minimum spanning tree for a weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. The algorithm operates by building this tree one vertex at a time, from an arbitrary starting vertex, at each step adding the cheapest possible connection from the tree to another vertex.
The algorithm was developed in 1930 by Czech mathematician Vojtěch Jarník[1] and later rediscovered and republished by computer scientists Robert C. Prim in 1957[2] and Edsger W. Dijkstra in 1959.[3] Therefore, it is also sometimes called the Jarník's algorithm,[4] Prim–Jarník algorithm,[5] Prim–Dijkstra algorithm[6] or the DJP algorithm.[7]
Other well-known algorithms for this problem include Kruskal's algorithm and Borůvka's algorithm.[8] These algorithms find the minimum spanning forest in a possibly disconnected graph; in contrast, the most basic form of Prim's algorithm only finds minimum spanning trees in connected graphs. However, running Prim's algorithm separately for each connected component of the graph, it can also be used to find the minimum spanning forest.[9] In terms of their asymptotic time complexity, these three algorithms are equally fast for sparse graphs, but slower than other more sophisticated algorithms.[7][6] However, for graphs that are sufficiently dense, Prim's algorithm can be made to run in linear time, meeting or improving the time bounds for other algorithms.[10]

Description
[edit]The algorithm may informally be described as performing the following steps:
- Initialize a tree with a single vertex, chosen arbitrarily from the graph.
- Grow the tree by one edge: Of the edges that connect the tree to vertices not yet in the tree, find the minimum-weight edge, and transfer it to the tree.
- Repeat step 2 (until all vertices are in the tree).
In more detail, it may be implemented following the pseudocode below.
function Prim(vertices, edges) is
for each vertex in vertices do
cheapestCost[vertex] ← ∞
cheapestEdge[vertex] ← null
explored ← empty set
unexplored ← set containing all vertices
startVertex ← any element of vertices
cheapestCost[startVertex] ← 0
while unexplored is not empty do
// Select vertex in unexplored with minimum cost
currentVertex ← vertex in unexplored with minimum cheapestCost[vertex]
unexplored.remove(currentVertex)
explored.add(currentVertex)
for each edge (currentVertex, neighbor) in edges do
if neighbor in unexplored and weight(currentVertex, neighbor) < cheapestCost[neighbor] THEN
cheapestCost[neighbor] ← weight(currentVertex, neighbor)
cheapestEdge[neighbor] ← (currentVertex, neighbor)
resultEdges ← empty list
for each vertex in vertices do
if cheapestEdge[vertex] ≠ null THEN
resultEdges.append(cheapestEdge[vertex])
return resultEdges
As described above, the starting vertex for the algorithm will be chosen arbitrarily, because the first iteration of the main loop of the algorithm will have a set of vertices in Q that all have equal weights, and the algorithm will automatically start a new tree in F when it completes a spanning tree of each connected component of the input graph. The algorithm may be modified to start with any particular vertex s by setting C[s] to be a number smaller than the other values of C (for instance, zero), and it may be modified to only find a single spanning tree rather than an entire spanning forest (matching more closely the informal description) by stopping whenever it encounters another vertex flagged as having no associated edge.
Different variations of the algorithm differ from each other in how the set Q is implemented: as a simple linked list or array of vertices, or as a more complicated priority queue data structure. This choice leads to differences in the time complexity of the algorithm. In general, a priority queue will be quicker at finding the vertex v with minimum cost, but will entail more expensive updates when the value of C[w] changes.
Time complexity
[edit]The time complexity of Prim's algorithm depends on the data structures used for the graph and for ordering the edges by weight, which can be done using a priority queue. The following table shows the typical choices:
| Minimum edge weight data structure | Time complexity (total) |
|---|---|
| adjacency matrix, searching | |
| binary heap and adjacency list | |
| Fibonacci heap and adjacency list |
A simple implementation of Prim's, using an adjacency matrix or an adjacency list graph representation and linearly searching an array of weights to find the minimum weight edge to add, requires O(|V|2) running time. However, this running time can be greatly improved by using heaps to implement finding minimum weight edges in the algorithm's inner loop.
A first improved version uses a heap to store all edges of the input graph, ordered by their weight. This leads to an O(|E| log |E|) worst-case running time. But storing vertices instead of edges can improve it still further. The heap should order the vertices by the smallest edge-weight that connects them to any vertex in the partially constructed minimum spanning tree (MST) (or infinity if no such edge exists). Every time a vertex v is chosen and added to the MST, a decrease-key operation is performed on all vertices w outside the partial MST such that v is connected to w, setting the key to the minimum of its previous value and the edge cost of (v,w).
Using a simple binary heap data structure, Prim's algorithm can now be shown to run in time O(|E| log |V|) where |E| is the number of edges and |V| is the number of vertices. Using a more sophisticated Fibonacci heap, this can be brought down to O(|E| + |V| log |V|), which is asymptotically faster when the graph is dense enough that |E| is ω(|V|), and linear time when |E| is at least |V| log |V|. For graphs of even greater density (having at least |V|c edges for some c > 1), Prim's algorithm can be made to run in linear time even more simply, by using a d-ary heap in place of a Fibonacci heap.[10][11]

Proof of correctness
[edit]Let P be a connected, weighted graph. At every iteration of Prim's algorithm, an edge must be found that connects a vertex in a subgraph to a vertex outside the subgraph. Since P is connected, there will always be a path to every vertex. The output Y of Prim's algorithm is a tree, because the edge and vertex added to tree Y are connected.
Let Y1 be a minimum spanning tree of graph P. If Y1=Y then Y is a minimum spanning tree. Otherwise, let e be the first edge added during the construction of tree Y that is not in tree Y1, and V be the set of vertices connected by the edges added before edge e. Then one endpoint of edge e is in set V and the other is not. Since tree Y1 is a spanning tree of graph P, there is a path in tree Y1 joining the two endpoints. As one travels along the path, one must encounter an edge f joining a vertex in set V to one that is not in set V. Now, at the iteration when edge e was added to tree Y, edge f could also have been added and it would be added instead of edge e if its weight was less than e, and since edge f was not added, we conclude that
Let tree Y2 be the graph obtained by removing edge f from and adding edge e to tree Y1. It is easy to show that tree Y2 is connected, has the same number of edges as tree Y1, and the total weights of its edges is not larger than that of tree Y1, therefore it is also a minimum spanning tree of graph P and it contains edge e and all the edges added before it during the construction of set V. Repeat the steps above and we will eventually obtain a minimum spanning tree of graph P that is identical to tree Y. This shows Y is a minimum spanning tree. The minimum spanning tree allows for the first subset of the sub-region to be expanded into a larger subset X, which we assume to be the minimum.
Parallel algorithm
[edit]
The main loop of Prim's algorithm is inherently sequential and thus not parallelizable. However, the inner loop, which determines the next edge of minimum weight that does not form a cycle, can be parallelized by dividing the vertices and edges between the available processors.[12] The following pseudocode demonstrates this.
- Assign each processor a set of consecutive vertices of length .
- Create C, E, F, and Q as in the sequential algorithm and divide C, E, as well as the graph between all processors such that each processor holds the incoming edges to its set of vertices. Let , denote the parts of C, E stored on processor .
- Repeat the following steps until Q is empty:
- On every processor: find the vertex having the minimum value in [] (local solution).
- Min-reduce the local solutions to find the vertex v having the minimum possible value of C[v] (global solution).
- Broadcast the selected node to every processor.
- Add v to F and, if E[v] is not the special flag value, also add E[v] to F.
- On every processor: update and as in the sequential algorithm.
- Return F
This algorithm can generally be implemented on distributed machines[12] as well as on shared memory machines.[13] The running time is , assuming that the reduce and broadcast operations can be performed in .[12] A variant of Prim's algorithm for shared memory machines, in which Prim's sequential algorithm is being run in parallel, starting from different vertices, has also been explored.[14] It should, however, be noted that more sophisticated algorithms exist to solve the distributed minimum spanning tree problem in a more efficient manner.
See also
[edit]- Dijkstra's algorithm, a very similar algorithm for the shortest path problem
- Greedoids offer a general way to understand the correctness of Prim's algorithm
References
[edit]- ^ Jarník, V. (1930), "O jistém problému minimálním" [About a certain minimal problem], Práce Moravské Přírodovědecké Společnosti (in Czech), 6 (4): 57–63, hdl:10338.dmlcz/500726.
- ^ Prim, R. C. (November 1957), "Shortest connection networks And some generalizations", Bell System Technical Journal, 36 (6): 1389–1401, Bibcode:1957BSTJ...36.1389P, doi:10.1002/j.1538-7305.1957.tb01515.x.
- ^ Dijkstra, E. W. (December 1959), "A note on two problems in connexion with graphs" (PDF), Numerische Mathematik, 1 (1): 269–271, CiteSeerX 10.1.1.165.7577, doi:10.1007/BF01386390, S2CID 123284777.
- ^ Sedgewick, Robert; Wayne, Kevin Daniel (2011), Algorithms (4th ed.), Addison-Wesley, p. 628, ISBN 978-0-321-57351-3.
- ^ Rosen, Kenneth (2011), Discrete Mathematics and Its Applications (7th ed.), McGraw-Hill Science, p. 798.
- ^ a b Cheriton, David; Tarjan, Robert Endre (1976), "Finding minimum spanning trees", SIAM Journal on Computing, 5 (4): 724–742, doi:10.1137/0205051, MR 0446458.
- ^ a b Pettie, Seth; Ramachandran, Vijaya (January 2002), "An optimal minimum spanning tree algorithm" (PDF), Journal of the ACM, 49 (1): 16–34, CiteSeerX 10.1.1.110.7670, doi:10.1145/505241.505243, MR 2148431, S2CID 5362916.
- ^ Tarjan, Robert Endre (1983), "Chapter 6. Minimum spanning trees. 6.2. Three classical algorithms", Data Structures and Network Algorithms, CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 44, Society for Industrial and Applied Mathematics, pp. 72–77.
- ^ Kepner, Jeremy; Gilbert, John (2011), Graph Algorithms in the Language of Linear Algebra, Software, Environments, and Tools, vol. 22, Society for Industrial and Applied Mathematics, p. 55, ISBN 9780898719901.
- ^ a b Tarjan (1983), p. 77.
- ^ Johnson, Donald B. (December 1975), "Priority queues with update and finding minimum spanning trees", Information Processing Letters, 4 (3): 53–57, doi:10.1016/0020-0190(75)90001-0.
- ^ a b c Grama, Ananth; Gupta, Anshul; Karypis, George; Kumar, Vipin (2003), Introduction to Parallel Computing, Addison-Wesley, pp. 444–446, ISBN 978-0201648652
- ^ Quinn, Michael J.; Deo, Narsingh (1984), "Parallel graph algorithms", ACM Computing Surveys, 16 (3): 319–348, doi:10.1145/2514.2515, S2CID 6833839
- ^ Setia, Rohit (2009), "A new parallel algorithm for minimum spanning tree problem" (PDF), Proc. International Conference on High Performance Computing (HiPC)
External links
[edit]- Prim's Algorithm progress on randomly distributed points
Media related to Prim's algorithm at Wikimedia Commons
Prim's algorithm
View on GrokipediaOverview
Definition and Purpose
Prim's algorithm is a greedy algorithm designed to find a minimum spanning tree (MST) in a connected, undirected graph with weighted edges. It begins with an arbitrary starting vertex and constructs the MST by iteratively incorporating the edge of minimum weight that connects a vertex already in the tree to one that is not, thereby growing a single tree until all vertices are included.[4] The primary purpose of Prim's algorithm is to solve the MST problem by identifying a spanning tree that minimizes the sum of the edge weights, which is essential for optimizing networks where edges represent costs, distances, or other quantifiable resources, such as in telecommunications or infrastructure planning. This minimization ensures efficient connectivity without cycles, providing a foundational solution for various graph-based optimization challenges.[5] At a high level, the algorithm maintains a set of vertices in the partial tree and, at each iteration, selects the cheapest connecting edge to expand it, leveraging the greedy choice property to guarantee an optimal MST. For instance, in a simple graph with vertices A, B, C, and D, and edges AB (weight 2), AC (weight 3), AD (weight 1), BC (weight 1), BD (weight 4), and CD (weight 2), Prim's algorithm starting from A yields an MST consisting of edges AD (1), AB (2), and BC (1), with a total weight of 4.[6]Historical Background
Prim's algorithm was developed by Robert C. Prim, a researcher at Bell Laboratories, and published in 1957 under the title "Shortest connection networks and some generalizations" in the Bell System Technical Journal.[7] The work originated from efforts to optimize connection networks for telephone systems, addressing the need for minimal-cost wiring configurations in communication infrastructure.[8] The algorithm represents an independent rediscovery of a method first described by Czech mathematician Vojtěch Jarník in 1930, in his paper "O jistém problému minimálním" published in Práce moravské přírodovědecké společnosti.[9] Jarník's approach, which simplified earlier ideas from Otakar Borůvka's 1926 work on minimum spanning trees, focused on constructing optimal graphs through iterative edge selection. Due to its rediscovery by Prim, the algorithm is sometimes referred to as Jarník's algorithm or the Prim-Jarník algorithm. Although commonly attributed to Prim, the algorithm emerged amid broader advancements in the minimum spanning tree (MST) field, including Joseph Kruskal's 1956 publication "On the shortest spanning subtree of a graph and the traveling salesman problem" in the Proceedings of the American Mathematical Society. Kruskal's greedy method complemented Prim's vertex-centric strategy, both contributing to efficient solutions for network optimization problems. Prim's formulation at Bell Labs laid foundational groundwork that influenced subsequent developments in graph theory and computational algorithms for connectivity problems.[10]Prerequisites
Graph Theory Basics
In graph theory, a graph is formally defined as an ordered pair $ G = (V, E) $, where $ V $ is a finite set of vertices (also called nodes) and $ E $ is a set of edges connecting pairs of vertices.[11] The number of vertices is denoted by $ n = |V| $, and the number of edges by $ m = |E| $.[11] This notation provides a standard framework for analyzing graph structures in computational problems. An undirected graph is a specific type of graph where edges have no direction, meaning each edge $ e \in E $ connects two vertices $ u, v \in V $ bidirectionally without implying an order.[12] In such graphs, edges are often represented as unordered pairs $ {u, v} $, and they model symmetric relationships, such as mutual connections in a network.[13] A weighted undirected graph extends this by assigning a real-valued weight $ w(e) $ to each edge $ e $, where the weights can represent quantities like distances, costs, or capacities.[14] A graph is connected if, for every pair of distinct vertices $ u, v \in V $, there exists at least one path—a sequence of edges—between them.[15] This property ensures the graph forms a single component without isolated parts, which is essential for problems involving traversal or optimization across all vertices. A spanning tree of a connected undirected graph $ G = (V, E) $ is a subgraph that includes all vertices in $ V $ and a subset of edges forming a tree: it is connected and acyclic, containing exactly $ n-1 $ edges.[16] Such trees provide a minimal structure that connects the entire graph without redundancy. A minimum spanning tree, which minimizes the sum of edge weights, builds on this concept and is explored in subsequent sections.Minimum Spanning Tree Concept
In graph theory, a minimum spanning tree (MST) of a connected, undirected, weighted graph is defined as a spanning tree whose edges have the minimum possible total weight among all spanning trees of the graph.[17][18] This structure connects every vertex in the graph without forming cycles, ensuring the sum of the weights of its edges is minimized.[19] The minimum spanning tree problem involves, given such a graph, identifying an MST that satisfies this optimality condition. The problem assumes the graph is connected (meaning there exists a path between every pair of vertices) and that edge weights are real numbers.[17][20] Formally, for a spanning tree , the total weight is given byAlgorithm Description
Core Steps
Prim's algorithm operates on a connected, undirected graph with weighted edges to construct a minimum spanning tree (MST) through a greedy process that grows a subtree incrementally. The algorithm initializes by selecting an arbitrary vertex $ v $ from the vertex set $ V $, forming the initial spanning tree $ T = {v} $, and starting with an empty fringe set $ F $, which tracks candidate edges connecting $ T $ to the remaining vertices $ V \setminus T $. This setup establishes a single-vertex tree as the foundation for expansion.[23] In the iterative phase, while the size of $ T $ is less than the total number of vertices $ n $, the algorithm scans for the minimum-weight edge $ (u, w) $ such that $ u \in T $ and $ w \notin T $. Upon selection, vertex $ w $ is added to $ T $, the edge $ (u, w) $ is incorporated into the growing tree, and the fringe $ F $ is updated to reflect new potential connections from the enlarged $ T $ to vertices outside it. This step-by-step extension ensures the subtree remains connected and acyclic.[23] The greedy mechanism drives the selection: at every iteration, the lowest-cost edge bridging the current tree to an external vertex is chosen, prioritizing immediate optimality to build toward the global minimum total weight without revisiting prior decisions. This approach leverages the cut property of MSTs, where the minimum edge across the cut between $ T $ and $ V \setminus T $ is safe to include.[23] The process terminates when $ |T| = n $, at which point $ T $ spans all vertices and the collected edges form the MST with minimal total weight. No further additions are needed, as the graph is fully connected by the tree.[23] To illustrate, consider a 4-vertex graph with vertices A, B, C, D and edges A-B (weight 2), A-C (3), A-D (7), B-C (1), B-D (5), C-D (4). Starting with T = {A}, the fringe includes edges A-B (2), A-C (3), A-D (7); the minimum is A-B, so add B to T (now {A, B}) and edge A-B. Next, fringe updates to A-C (3), A-D (7), B-C (1), B-D (5); minimum is B-C (1), add C to T (now {A, B, C}) and edge B-C. Fringe now A-D (7), B-D (5), C-D (4); minimum is C-D (4), add D to T (now {A, B, C, D}) and edge C-D. The MST edges are A-B, B-C, C-D with total weight 7, demonstrating the tree's growth from a single vertex to full span.[1]Pseudocode
Prim's algorithm can be formally described using pseudocode that abstracts the greedy selection process through key values and parent pointers. The procedure initializes provisional minimum edge weights (keys) for all vertices and selects a starting vertex, then iteratively adds the vertex with the smallest key to the spanning tree while updating keys for adjacent vertices. This abstract representation relies on a priority queue data structure supporting extract-minimum and decrease-key (or update-key) operations, without specifying their concrete implementations. The following pseudocode, adapted from standard algorithmic descriptions, outlines the core procedure:MST-PRIM(G, w, r)
for each vertex u in G.V
key[u] ← ∞
parent[u] ← NIL
key[r] ← 0
create a priority queue Q containing all vertices in G.V
while Q ≠ ∅
u ← EXTRACT-MIN(Q)
for each vertex v adjacent to u in G
if w(u, v) < key[v]
key[v] ← w(u, v)
parent[v] ← u
DECREASE-KEY(Q, v, key[v])
The input to the algorithm is an undirected, connected, weighted graph $ G = (V, E) $ with vertex set $ V $, edge set $ E $, and weight function $ w: E \to \mathbb{R} $, along with a starting vertex $ r \in V $. The output is the parent array, which defines the edges of the minimum spanning tree by connecting each vertex to its parent. Upon completion, the key array holds the weights of the edges in the MST, with key[r] = 0 and all others representing the minimum weight to connect to the growing tree.
The choice of starting vertex $ r $ is arbitrary, as the algorithm produces a minimum spanning tree regardless of the initial selection in a connected graph. The pseudocode assumes the graph is connected; if disconnected, it can be modified to produce a minimum spanning forest by running on each component separately, though the standard version targets a single tree.
Implementations
Naive Version
The naive version of Prim's algorithm employs basic arrays to maintain the minimum key values (tentative edge weights to the spanning tree) and a visited status for each vertex, avoiding any specialized data structures like priority queues. It initializes all keys to infinity except for an arbitrary starting vertex set to zero, marks the starting vertex as visited, updates the keys of remaining unvisited vertices by checking the edge weights from the starting vertex, and proceeds in |V| - 1 iterations. In each iteration, it linearly scans all unvisited vertices to select the one with the smallest key value, adds that vertex to the tree, marks it visited, and then updates the keys of remaining unvisited vertices by checking the edge weights from the newly added vertex, retaining the minimum for each. This process builds the minimum spanning tree by greedily extending the connected component with the cheapest available connection at every step.[4] The key operations rely on full scans: selecting the minimum requires examining up to |V| vertices each time, and updating keys involves iterating over all unvisited vertices (up to |V|) while accessing edge weights, typically via an adjacency matrix for O(1) lookups in dense graphs. This makes the implementation straightforward, as no complex insertions or extractions are needed beyond array traversals. The following pseudocode illustrates this approach, assuming a graph G = (V, E) with weight function w and starting vertex r:PRIM-NAIVE(G, w, r)
n ← |V|
key ← array of size n, initialized to ∞
π ← array of size n, initialized to NIL
visited ← array of size n, initialized to false
key[r] ← 0
visited[r] ← true
// Update keys for all unvisited vertices adjacent to r
for each vertex v in V
if not visited[v] and w(r, v) < key[v]
key[v] ← w(r, v)
π[v] ← r
for i ← 1 to n - 1
// Find unvisited vertex u with minimum key[u]
u ← NIL
min_key ← ∞
for each vertex v in V
if not visited[v] and key[v] < min_key
min_key ← key[v]
u ← v
if u = NIL
return "Graph not connected"
visited[u] ← true
// Update keys for all unvisited vertices adjacent to u
for each vertex v in V
if not visited[v] and w(u, v) < key[v]
key[v] ← w(u, v)
π[v] ← u
This pseudocode captures the essence of the algorithm as originally conceived, where edge weight access w(u, v) is assumed efficient (e.g., via adjacency matrix).[24]
The naive implementation is particularly well-suited for small graphs (e.g., |V| ≤ 100) or dense graphs where the edge density approaches |V|^2, as the quadratic operations align with the input size and simplify coding without additional overhead. It prioritizes clarity and ease of verification over efficiency, serving as an accessible entry point for understanding the greedy strategy before exploring optimizations.[25]
