Hubbry Logo
search
logo
2308272

Prim's algorithm

logo
Community Hub0 Subscribers
Read side by side
from Wikipedia
A demo for Prim's algorithm based on Euclidean distance

In computer science, Prim's algorithm is a greedy algorithm that finds a minimum spanning tree for a weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. The algorithm operates by building this tree one vertex at a time, from an arbitrary starting vertex, at each step adding the cheapest possible connection from the tree to another vertex.

The algorithm was developed in 1930 by Czech mathematician Vojtěch Jarník[1] and later rediscovered and republished by computer scientists Robert C. Prim in 1957[2] and Edsger W. Dijkstra in 1959.[3] Therefore, it is also sometimes called the Jarník's algorithm,[4] Prim–Jarník algorithm,[5] Prim–Dijkstra algorithm[6] or the DJP algorithm.[7]

Other well-known algorithms for this problem include Kruskal's algorithm and Borůvka's algorithm.[8] These algorithms find the minimum spanning forest in a possibly disconnected graph; in contrast, the most basic form of Prim's algorithm only finds minimum spanning trees in connected graphs. However, running Prim's algorithm separately for each connected component of the graph, it can also be used to find the minimum spanning forest.[9] In terms of their asymptotic time complexity, these three algorithms are equally fast for sparse graphs, but slower than other more sophisticated algorithms.[7][6] However, for graphs that are sufficiently dense, Prim's algorithm can be made to run in linear time, meeting or improving the time bounds for other algorithms.[10]

Prim's algorithm starting at vertex A. In the third step, edges BD and AB both have weight 2, so BD is chosen arbitrarily. After that step, AB is no longer a candidate for addition to the tree because it links two nodes that are already in the tree.

Description

[edit]

The algorithm may informally be described as performing the following steps:

  1. Initialize a tree with a single vertex, chosen arbitrarily from the graph.
  2. Grow the tree by one edge: Of the edges that connect the tree to vertices not yet in the tree, find the minimum-weight edge, and transfer it to the tree.
  3. Repeat step 2 (until all vertices are in the tree).

In more detail, it may be implemented following the pseudocode below.

function Prim(vertices, edges) is
    for each vertex in vertices do
        cheapestCost[vertex] ← ∞
        cheapestEdge[vertex] ← null

    explored ← empty set
    unexplored ← set containing all vertices

    startVertex ← any element of vertices
    cheapestCost[startVertex] ← 0

    while unexplored is not empty do
        // Select vertex in unexplored with minimum cost
        currentVertex ← vertex in unexplored with minimum cheapestCost[vertex]
        unexplored.remove(currentVertex)
        explored.add(currentVertex)

        for each edge (currentVertex, neighbor) in edges do
            if neighbor in unexplored and weight(currentVertex, neighbor) < cheapestCost[neighbor] THEN
                cheapestCost[neighbor] ← weight(currentVertex, neighbor)
                cheapestEdge[neighbor] ← (currentVertex, neighbor)

    resultEdges ← empty list
    for each vertex in vertices do
        if cheapestEdge[vertex] ≠ null THEN
            resultEdges.append(cheapestEdge[vertex])

    return resultEdges

As described above, the starting vertex for the algorithm will be chosen arbitrarily, because the first iteration of the main loop of the algorithm will have a set of vertices in Q that all have equal weights, and the algorithm will automatically start a new tree in F when it completes a spanning tree of each connected component of the input graph. The algorithm may be modified to start with any particular vertex s by setting C[s] to be a number smaller than the other values of C (for instance, zero), and it may be modified to only find a single spanning tree rather than an entire spanning forest (matching more closely the informal description) by stopping whenever it encounters another vertex flagged as having no associated edge.

Different variations of the algorithm differ from each other in how the set Q is implemented: as a simple linked list or array of vertices, or as a more complicated priority queue data structure. This choice leads to differences in the time complexity of the algorithm. In general, a priority queue will be quicker at finding the vertex v with minimum cost, but will entail more expensive updates when the value of C[w] changes.

Time complexity

[edit]
Prim's algorithm has many applications, such as in the generation of this maze, which applies Prim's algorithm to a randomly weighted grid graph.

The time complexity of Prim's algorithm depends on the data structures used for the graph and for ordering the edges by weight, which can be done using a priority queue. The following table shows the typical choices:

Minimum edge weight data structure Time complexity (total)
adjacency matrix, searching
binary heap and adjacency list
Fibonacci heap and adjacency list

A simple implementation of Prim's, using an adjacency matrix or an adjacency list graph representation and linearly searching an array of weights to find the minimum weight edge to add, requires O(|V|2) running time. However, this running time can be greatly improved by using heaps to implement finding minimum weight edges in the algorithm's inner loop.

A first improved version uses a heap to store all edges of the input graph, ordered by their weight. This leads to an O(|E| log |E|) worst-case running time. But storing vertices instead of edges can improve it still further. The heap should order the vertices by the smallest edge-weight that connects them to any vertex in the partially constructed minimum spanning tree (MST) (or infinity if no such edge exists). Every time a vertex v is chosen and added to the MST, a decrease-key operation is performed on all vertices w outside the partial MST such that v is connected to w, setting the key to the minimum of its previous value and the edge cost of (v,w).

Using a simple binary heap data structure, Prim's algorithm can now be shown to run in time O(|E| log |V|) where |E| is the number of edges and |V| is the number of vertices. Using a more sophisticated Fibonacci heap, this can be brought down to O(|E| + |V| log |V|), which is asymptotically faster when the graph is dense enough that |E| is ω(|V|), and linear time when |E| is at least |V| log |V|. For graphs of even greater density (having at least |V|c edges for some c > 1), Prim's algorithm can be made to run in linear time even more simply, by using a d-ary heap in place of a Fibonacci heap.[10][11]

Demonstration of proof. In this case, the graph Y1 = Yf + e is already equal to Y. In general, the process may need to be repeated.

Proof of correctness

[edit]

Let P be a connected, weighted graph. At every iteration of Prim's algorithm, an edge must be found that connects a vertex in a subgraph to a vertex outside the subgraph. Since P is connected, there will always be a path to every vertex. The output Y of Prim's algorithm is a tree, because the edge and vertex added to tree Y are connected.

Let Y1 be a minimum spanning tree of graph P. If Y1=Y then Y is a minimum spanning tree. Otherwise, let e be the first edge added during the construction of tree Y that is not in tree Y1, and V be the set of vertices connected by the edges added before edge e. Then one endpoint of edge e is in set V and the other is not. Since tree Y1 is a spanning tree of graph P, there is a path in tree Y1 joining the two endpoints. As one travels along the path, one must encounter an edge f joining a vertex in set V to one that is not in set V. Now, at the iteration when edge e was added to tree Y, edge f could also have been added and it would be added instead of edge e if its weight was less than e, and since edge f was not added, we conclude that

Let tree Y2 be the graph obtained by removing edge f from and adding edge e to tree Y1. It is easy to show that tree Y2 is connected, has the same number of edges as tree Y1, and the total weights of its edges is not larger than that of tree Y1, therefore it is also a minimum spanning tree of graph P and it contains edge e and all the edges added before it during the construction of set V. Repeat the steps above and we will eventually obtain a minimum spanning tree of graph P that is identical to tree Y. This shows Y is a minimum spanning tree. The minimum spanning tree allows for the first subset of the sub-region to be expanded into a larger subset X, which we assume to be the minimum.

Parallel algorithm

[edit]
The adjacency matrix distributed between multiple processors for parallel Prim's algorithm. In each iteration of the algorithm, every processor updates its part of C by inspecting the row of the newly inserted vertex in its set of columns in the adjacency matrix. The results are then collected and the next vertex to include in the MST is selected globally.

The main loop of Prim's algorithm is inherently sequential and thus not parallelizable. However, the inner loop, which determines the next edge of minimum weight that does not form a cycle, can be parallelized by dividing the vertices and edges between the available processors.[12] The following pseudocode demonstrates this.

  1. Assign each processor a set of consecutive vertices of length .
  2. Create C, E, F, and Q as in the sequential algorithm and divide C, E, as well as the graph between all processors such that each processor holds the incoming edges to its set of vertices. Let , denote the parts of C, E stored on processor .
  3. Repeat the following steps until Q is empty:
    1. On every processor: find the vertex having the minimum value in [] (local solution).
    2. Min-reduce the local solutions to find the vertex v having the minimum possible value of C[v] (global solution).
    3. Broadcast the selected node to every processor.
    4. Add v to F and, if E[v] is not the special flag value, also add E[v] to F.
    5. On every processor: update and as in the sequential algorithm.
  4. Return F

This algorithm can generally be implemented on distributed machines[12] as well as on shared memory machines.[13] The running time is , assuming that the reduce and broadcast operations can be performed in .[12] A variant of Prim's algorithm for shared memory machines, in which Prim's sequential algorithm is being run in parallel, starting from different vertices, has also been explored.[14] It should, however, be noted that more sophisticated algorithms exist to solve the distributed minimum spanning tree problem in a more efficient manner.

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Prim's algorithm is a greedy algorithm that finds a minimum spanning tree (MST) for a connected, undirected graph with weighted edges, selecting a subset of edges that connects all vertices while minimizing the total edge weight.[1] Developed by Robert C. Prim and published in 1957, the algorithm begins with an arbitrary starting vertex and iteratively adds the minimum-weight edge that connects a vertex in the current tree to one outside it, continuing until all vertices are included, resulting in a tree with exactly V-1 edges for a graph with V vertices.[1] The correctness of Prim's algorithm stems from the cut property of MSTs, which states that for any cut in the graph (a partition of vertices into two sets), the minimum-weight edge crossing the cut can be included in some MST, allowing the greedy selection to build an optimal tree.[1] Unlike Kruskal's algorithm, which sorts all edges and adds them in order while avoiding cycles, Prim's grows a single tree from a starting point, making it particularly efficient for dense graphs.[1] Implementations often use a binary heap for the priority queue to track candidate edges, yielding a time complexity of O(E log V) in the standard version, where E is the number of edges and V is the number of vertices; more advanced Fibonacci heap implementations can reduce this to O(E + V log V).[1][2] Prim's algorithm finds broad applications in network design problems, such as constructing minimum-cost communication networks where edges represent wiring or connection costs between nodes, ensuring all components are linked with the lowest total expense.[2] It also serves as a building block in approximation algorithms for problems like the traveling salesman problem and in clustering analyses within data structures.[2] Although first described in a similar form by Vojtěch Jarník in 1930, Prim's version gained prominence in computer science for its practical implementation in graph algorithms.[3]

Overview

Definition and Purpose

Prim's algorithm is a greedy algorithm designed to find a minimum spanning tree (MST) in a connected, undirected graph with weighted edges. It begins with an arbitrary starting vertex and constructs the MST by iteratively incorporating the edge of minimum weight that connects a vertex already in the tree to one that is not, thereby growing a single tree until all vertices are included.[4] The primary purpose of Prim's algorithm is to solve the MST problem by identifying a spanning tree that minimizes the sum of the edge weights, which is essential for optimizing networks where edges represent costs, distances, or other quantifiable resources, such as in telecommunications or infrastructure planning. This minimization ensures efficient connectivity without cycles, providing a foundational solution for various graph-based optimization challenges.[5] At a high level, the algorithm maintains a set of vertices in the partial tree and, at each iteration, selects the cheapest connecting edge to expand it, leveraging the greedy choice property to guarantee an optimal MST. For instance, in a simple graph with vertices A, B, C, and D, and edges AB (weight 2), AC (weight 3), AD (weight 1), BC (weight 1), BD (weight 4), and CD (weight 2), Prim's algorithm starting from A yields an MST consisting of edges AD (1), AB (2), and BC (1), with a total weight of 4.[6]

Historical Background

Prim's algorithm was developed by Robert C. Prim, a researcher at Bell Laboratories, and published in 1957 under the title "Shortest connection networks and some generalizations" in the Bell System Technical Journal.[7] The work originated from efforts to optimize connection networks for telephone systems, addressing the need for minimal-cost wiring configurations in communication infrastructure.[8] The algorithm represents an independent rediscovery of a method first described by Czech mathematician Vojtěch Jarník in 1930, in his paper "O jistém problému minimálním" published in Práce moravské přírodovědecké společnosti.[9] Jarník's approach, which simplified earlier ideas from Otakar Borůvka's 1926 work on minimum spanning trees, focused on constructing optimal graphs through iterative edge selection. Due to its rediscovery by Prim, the algorithm is sometimes referred to as Jarník's algorithm or the Prim-Jarník algorithm. Although commonly attributed to Prim, the algorithm emerged amid broader advancements in the minimum spanning tree (MST) field, including Joseph Kruskal's 1956 publication "On the shortest spanning subtree of a graph and the traveling salesman problem" in the Proceedings of the American Mathematical Society. Kruskal's greedy method complemented Prim's vertex-centric strategy, both contributing to efficient solutions for network optimization problems. Prim's formulation at Bell Labs laid foundational groundwork that influenced subsequent developments in graph theory and computational algorithms for connectivity problems.[10]

Prerequisites

Graph Theory Basics

In graph theory, a graph is formally defined as an ordered pair $ G = (V, E) $, where $ V $ is a finite set of vertices (also called nodes) and $ E $ is a set of edges connecting pairs of vertices.[11] The number of vertices is denoted by $ n = |V| $, and the number of edges by $ m = |E| $.[11] This notation provides a standard framework for analyzing graph structures in computational problems. An undirected graph is a specific type of graph where edges have no direction, meaning each edge $ e \in E $ connects two vertices $ u, v \in V $ bidirectionally without implying an order.[12] In such graphs, edges are often represented as unordered pairs $ {u, v} $, and they model symmetric relationships, such as mutual connections in a network.[13] A weighted undirected graph extends this by assigning a real-valued weight $ w(e) $ to each edge $ e $, where the weights can represent quantities like distances, costs, or capacities.[14] A graph is connected if, for every pair of distinct vertices $ u, v \in V $, there exists at least one path—a sequence of edges—between them.[15] This property ensures the graph forms a single component without isolated parts, which is essential for problems involving traversal or optimization across all vertices. A spanning tree of a connected undirected graph $ G = (V, E) $ is a subgraph that includes all vertices in $ V $ and a subset of edges forming a tree: it is connected and acyclic, containing exactly $ n-1 $ edges.[16] Such trees provide a minimal structure that connects the entire graph without redundancy. A minimum spanning tree, which minimizes the sum of edge weights, builds on this concept and is explored in subsequent sections.

Minimum Spanning Tree Concept

In graph theory, a minimum spanning tree (MST) of a connected, undirected, weighted graph is defined as a spanning tree whose edges have the minimum possible total weight among all spanning trees of the graph.[17][18] This structure connects every vertex in the graph without forming cycles, ensuring the sum of the weights of its edges is minimized.[19] The minimum spanning tree problem involves, given such a graph, identifying an MST that satisfies this optimality condition. The problem assumes the graph is connected (meaning there exists a path between every pair of vertices) and that edge weights are real numbers.[17][20] Formally, for a spanning tree TT, the total weight is given by
eTw(e), \sum_{e \in T} w(e),
where w(e)w(e) denotes the weight of edge ee, and the MST is the tree TT that minimizes this sum over all possible spanning trees.[19] The MST is unique if all edge weights in the graph are distinct, as no two spanning trees can then achieve the same minimum weight; however, if some weights are equal, multiple distinct MSTs may exist with the same total weight.[21] This uniqueness property simplifies analysis in cases with generic weights but requires careful handling of ties in general implementations.[22] To illustrate, consider a simple connected undirected graph with three vertices AA, BB, and CC, and edges weighted as follows: ABAB with weight 1, ACAC with weight 2, and BCBC with weight 3. The possible spanning trees are {AB,AC}\{AB, AC\} (total weight 3), {AB,BC}\{AB, BC\} (total weight 4), and {AC,BC}\{AC, BC\} (total weight 5). Here, {AB,AC}\{AB, AC\} is the unique MST, as it has the minimum total weight.[18]

Algorithm Description

Core Steps

Prim's algorithm operates on a connected, undirected graph with weighted edges to construct a minimum spanning tree (MST) through a greedy process that grows a subtree incrementally. The algorithm initializes by selecting an arbitrary vertex $ v $ from the vertex set $ V $, forming the initial spanning tree $ T = {v} $, and starting with an empty fringe set $ F $, which tracks candidate edges connecting $ T $ to the remaining vertices $ V \setminus T $. This setup establishes a single-vertex tree as the foundation for expansion.[23] In the iterative phase, while the size of $ T $ is less than the total number of vertices $ n $, the algorithm scans for the minimum-weight edge $ (u, w) $ such that $ u \in T $ and $ w \notin T $. Upon selection, vertex $ w $ is added to $ T $, the edge $ (u, w) $ is incorporated into the growing tree, and the fringe $ F $ is updated to reflect new potential connections from the enlarged $ T $ to vertices outside it. This step-by-step extension ensures the subtree remains connected and acyclic.[23] The greedy mechanism drives the selection: at every iteration, the lowest-cost edge bridging the current tree to an external vertex is chosen, prioritizing immediate optimality to build toward the global minimum total weight without revisiting prior decisions. This approach leverages the cut property of MSTs, where the minimum edge across the cut between $ T $ and $ V \setminus T $ is safe to include.[23] The process terminates when $ |T| = n $, at which point $ T $ spans all vertices and the collected edges form the MST with minimal total weight. No further additions are needed, as the graph is fully connected by the tree.[23] To illustrate, consider a 4-vertex graph with vertices A, B, C, D and edges A-B (weight 2), A-C (3), A-D (7), B-C (1), B-D (5), C-D (4). Starting with T = {A}, the fringe includes edges A-B (2), A-C (3), A-D (7); the minimum is A-B, so add B to T (now {A, B}) and edge A-B. Next, fringe updates to A-C (3), A-D (7), B-C (1), B-D (5); minimum is B-C (1), add C to T (now {A, B, C}) and edge B-C. Fringe now A-D (7), B-D (5), C-D (4); minimum is C-D (4), add D to T (now {A, B, C, D}) and edge C-D. The MST edges are A-B, B-C, C-D with total weight 7, demonstrating the tree's growth from a single vertex to full span.[1]

Pseudocode

Prim's algorithm can be formally described using pseudocode that abstracts the greedy selection process through key values and parent pointers. The procedure initializes provisional minimum edge weights (keys) for all vertices and selects a starting vertex, then iteratively adds the vertex with the smallest key to the spanning tree while updating keys for adjacent vertices. This abstract representation relies on a priority queue data structure supporting extract-minimum and decrease-key (or update-key) operations, without specifying their concrete implementations. The following pseudocode, adapted from standard algorithmic descriptions, outlines the core procedure:
MST-PRIM(G, w, r)
    for each vertex u in G.V
        key[u] ← ∞
        parent[u] ← NIL
    key[r] ← 0
    create a priority queue Q containing all vertices in G.V
    while Q ≠ ∅
        u ← EXTRACT-MIN(Q)
        for each vertex v adjacent to u in G
            if w(u, v) < key[v]
                key[v] ← w(u, v)
                parent[v] ← u
                DECREASE-KEY(Q, v, key[v])
The input to the algorithm is an undirected, connected, weighted graph $ G = (V, E) $ with vertex set $ V $, edge set $ E $, and weight function $ w: E \to \mathbb{R} $, along with a starting vertex $ r \in V $. The output is the parent array, which defines the edges of the minimum spanning tree by connecting each vertex to its parent. Upon completion, the key array holds the weights of the edges in the MST, with key[r] = 0 and all others representing the minimum weight to connect to the growing tree. The choice of starting vertex $ r $ is arbitrary, as the algorithm produces a minimum spanning tree regardless of the initial selection in a connected graph. The pseudocode assumes the graph is connected; if disconnected, it can be modified to produce a minimum spanning forest by running on each component separately, though the standard version targets a single tree.

Implementations

Naive Version

The naive version of Prim's algorithm employs basic arrays to maintain the minimum key values (tentative edge weights to the spanning tree) and a visited status for each vertex, avoiding any specialized data structures like priority queues. It initializes all keys to infinity except for an arbitrary starting vertex set to zero, marks the starting vertex as visited, updates the keys of remaining unvisited vertices by checking the edge weights from the starting vertex, and proceeds in |V| - 1 iterations. In each iteration, it linearly scans all unvisited vertices to select the one with the smallest key value, adds that vertex to the tree, marks it visited, and then updates the keys of remaining unvisited vertices by checking the edge weights from the newly added vertex, retaining the minimum for each. This process builds the minimum spanning tree by greedily extending the connected component with the cheapest available connection at every step.[4] The key operations rely on full scans: selecting the minimum requires examining up to |V| vertices each time, and updating keys involves iterating over all unvisited vertices (up to |V|) while accessing edge weights, typically via an adjacency matrix for O(1) lookups in dense graphs. This makes the implementation straightforward, as no complex insertions or extractions are needed beyond array traversals. The following pseudocode illustrates this approach, assuming a graph G = (V, E) with weight function w and starting vertex r:
PRIM-NAIVE(G, w, r)
    n ← |V|
    key ← array of size n, initialized to ∞
    π ← array of size n, initialized to NIL
    visited ← array of size n, initialized to false
    key[r] ← 0
    visited[r] ← true
    // Update keys for all unvisited vertices adjacent to r
    for each vertex v in V
        if not visited[v] and w(r, v) < key[v]
            key[v] ← w(r, v)
            π[v] ← r

    for i ← 1 to n - 1
        // Find unvisited vertex u with minimum key[u]
        u ← NIL
        min_key ← ∞
        for each vertex v in V
            if not visited[v] and key[v] < min_key
                min_key ← key[v]
                u ← v
        if u = NIL
            return "Graph not connected"

        visited[u] ← true
        // Update keys for all unvisited vertices adjacent to u
        for each vertex v in V
            if not visited[v] and w(u, v) < key[v]
                key[v] ← w(u, v)
                π[v] ← u
This pseudocode captures the essence of the algorithm as originally conceived, where edge weight access w(u, v) is assumed efficient (e.g., via adjacency matrix).[24] The naive implementation is particularly well-suited for small graphs (e.g., |V| ≤ 100) or dense graphs where the edge density approaches |V|^2, as the quadratic operations align with the input size and simplify coding without additional overhead. It prioritizes clarity and ease of verification over efficiency, serving as an accessible entry point for understanding the greedy strategy before exploring optimizations.[25]

Priority Queue Optimizations

To achieve greater efficiency in Prim's algorithm beyond naive implementations, priority queues are employed to manage the selection of the minimum-weight edge connecting the growing minimum spanning tree (MST) to an unselected vertex. In the binary heap variant, a min-heap stores the key values (minimum edge weights) for all vertices not yet in the MST, with each vertex's key initialized to infinity except for the starting vertex, which is set to 0. The extract-min operation, which selects and removes the vertex with the smallest key to add to the MST, runs in O(log n) time, where n is the number of vertices. Similarly, the decrease-key operation, used to update a vertex's key when a lighter edge is discovered from the newly added vertex to an adjacent unselected vertex, also takes O(log n) time. With n extract-min operations and up to m decrease-key operations (where m is the number of edges), the total time complexity becomes O((n + m) log n).[26] A more advanced optimization uses Fibonacci heaps, which were specifically designed to accelerate algorithms like Prim's that rely on frequent decrease-key operations. In this setup, the Fibonacci heap maintains the same key structure but supports decrease-key in amortized O(1) time due to its lazy merging and linking mechanisms, while extract-min remains O(log n) amortized. This yields an overall time complexity of O(m + n log n), making it asymptotically optimal for dense graphs where m is close to n². The structure achieves this through a collection of heap-ordered trees with additional degree and mark fields to control amortization during consolidations.[27] For practical implementation, the graph is represented with adjacency lists to allow efficient traversal of edges incident to each vertex, typically O(degree(v)) time per vertex v. During execution, the priority queue tracks the "fringe" vertices—those adjacent to the current MST but not yet selected—and key updates are applied only to these fringe vertices when scanning edges from a newly added vertex. Parent pointers are maintained alongside keys to reconstruct the MST edges once all vertices are included. This adjacency list approach ensures the algorithm scales well for sparse graphs, as edge examinations total O(m) across all iterations.[28] Binary heaps strike a balance of simplicity and performance for most applications, as their straightforward array-based structure facilitates easy coding and debugging, though the O(log n) decrease-key cost can accumulate in graphs with high connectivity. In contrast, Fibonacci heaps deliver superior asymptotic speed, especially when decrease-key dominates, but their intricate operations—such as cascading cuts and linking—introduce significant implementation complexity and potential constant-factor overhead in practice.[27]

Analysis

Time and Space Complexity

The naive implementation of Prim's algorithm, which scans all vertices in each of the nn iterations to find the minimum key value, achieves a time complexity of O(n2)O(n^2), where nn is the number of vertices.[18] This approach is suitable for dense graphs but becomes inefficient for sparse ones. The space complexity for this version is O(n)O(n), primarily for storing key values and parent pointers.[29] When using a binary heap as the priority queue, the time complexity improves to O((n+m)logn)O((n + m) \log n), where mm is the number of edges. This arises from nn extract-min operations each taking O(logn)O(\log n) time and up to mm decrease-key operations each also O(logn)O(\log n).[24] The precise bound can be expressed as
T=nlogn+vVdeg(v)logn(n+m)logn, T = n \log n + \sum_{v \in V} \deg(v) \log n \approx (n + m) \log n,
assuming an adjacency list representation of the graph.[30] Space usage is O(n+m)O(n + m) to accommodate the heap, keys, parents, and graph structure.[31] Employing a Fibonacci heap further optimizes the time complexity to O(m+nlogn)O(m + n \log n), leveraging amortized O(1)O(1) time for decrease-key operations and O(logn)O(\log n) for extract-min.[27] The space complexity remains O(n+m)O(n + m), consistent with the binary heap version.[31] Overall, the choice of implementation depends on graph density, as the naive version excels when mn2m \approx n^2, while heap-based variants are preferable for sparser graphs where mn2m \ll n^2.[32]

Performance Characteristics

Prim's algorithm exhibits varying practical efficiency depending on graph density. For dense graphs where the number of edges $ m $ approaches $ n^2 $ (with $ n $ vertices), the naive implementation, which scans all vertices to find the minimum edge weight at each step, performs comparably to optimized versions using priority queues, as the overhead of heap operations becomes negligible relative to the quadratic scanning cost.[32] In contrast, for sparse graphs, priority queue-based implementations are preferred to avoid the full $ O(n^2) $ scan.[33] Implementation choices significantly affect constants and tuning in practice. Binary heaps offer lower constant factors and better cache performance for sparse graphs due to their simpler structure and predictable access patterns, making them the default choice in many libraries.[33] Fibonacci heaps, while providing superior amortized bounds like $ O(E + V \log V) $, often underperform in real-world scenarios because of high overhead from complex linking and lazy deletions, leading to slower execution despite theoretical advantages.[34] The algorithm's performance differs markedly between average and worst cases. On random graphs with uniformly distributed edge weights, Prim's algorithm runs efficiently, with expected time closer to practical linearithmic bounds even under binary heaps, benefiting from balanced decrease-key operations.[35] However, pathological cases involving many frequent updates to priority queue keys—such as graphs with edges ordered to maximize decrease-key calls—can degrade performance, particularly in non-lazy implementations where each update requires explicit heap adjustments.[36] Empirically, Prim's algorithm often outperforms Kruskal's on dense graphs, as it avoids the overhead of sorting all edges and performing numerous union-find operations, which become costly when $ m $ is large; tests on very dense graphs show Prim's running up to three times faster.[37] A key limitation is that the standard Prim's algorithm assumes a connected graph and produces a minimum spanning tree only for the component containing the starting vertex; for disconnected graphs, modifications such as running the algorithm from multiple starting vertices are required to obtain a minimum spanning forest.[38]

Correctness

Invariant-Based Proof

The correctness of Prim's algorithm relies on establishing that it constructs a minimum spanning tree (MST) by repeatedly adding edges that satisfy fundamental properties of MSTs. A key lemma supporting this is the cut property, which states that for any cut (S,VS)(S, V - S) of the vertex set VV in a connected, undirected graph G=(V,E)G = (V, E), where SVS \subset V is nonempty and proper, the minimum-weight edge crossing the cut belongs to some MST of GG.[39] This property ensures that selecting the lightest crossing edge does not preclude forming an optimal spanning tree. Prim's algorithm applies this greedy choice at each step: starting from an arbitrary vertex, it grows a tree TT by adding the minimum-weight edge connecting a vertex in TT to one outside TT, which precisely identifies the lightest edge crossing the cut (V(T),VV(T))(V(T), V - V(T)), where V(T)V(T) is the set of vertices in TT. By the cut property, this edge must belong to some MST of GG.[40] To prove the algorithm's overall correctness, an inductive invariant is maintained: at every stage, after kk edges have been added (for 0k<V10 \leq k < |V| - 1), the partial tree TT with k+1k+1 vertices is a subset of some MST of GG.[6] The proof assumes distinct edge weights to ensure uniqueness of the minimum-weight edge and strict inequality in the exchange argument. For the base case (k=0k = 0), TT consists of a single vertex, which is trivially contained in any MST of GG, as every spanning tree includes all vertices.[6] For the inductive step, assume the invariant holds after kk edges, so TT with k+1k+1 vertices is a subset of some MST TT' of GG. In the next iteration, Prim's selects the minimum-weight edge e=(u,v)e = (u, v) crossing the cut (V(T),VV(T))(V(T), V - V(T)). By the cut property, there exists some MST TT'' of GG that includes ee. If TT'' already contains TT, then T{e}T \cup \{e\} is also a subset of TT'', preserving the invariant. Otherwise, since TTT \subset T' and T⊄TT \not\subset T'', there must be an edge ff in TT' but not in TT'' that crosses the cut (V(T),VV(T))(V(T), V - V(T)). As ee is the lightest such crossing edge and weights are distinct, ω(e)<ω(f)\omega(e) < \omega(f), where ω\omega denotes edge weight. Replacing ff with ee in TT' yields a spanning tree of lower weight that includes T{e}T \cup \{e\}, contradicting the optimality of TT'. Thus, the invariant holds for k+1k+1 edges.[40][6] Upon termination, TT is a spanning tree contained in some MST, and since all MSTs have the same weight, TT is an MST.

Key Properties

Prim's algorithm leverages the cycle property of minimum spanning trees (MSTs), which states that for any cycle in the graph, the edge with the maximum weight on that cycle cannot belong to any MST. The contrapositive of this property supports the greedy avoidance of heavy edges during the algorithm's execution, ensuring that the selected edges do not compromise optimality by forming suboptimal cycles.[41] A core aspect of Prim's correctness is the safe edge addition principle: given a subset A of edges that is contained in some MST, and a cut (S, V - S) that respects A (no edge in A crosses the cut), any light edge (minimum weight among edges crossing the cut) is safe to add to A, meaning A union that edge is also contained in some MST. In Prim's algorithm, each added edge is precisely such a safe light edge connecting the current tree to an unvisited vertex, guaranteeing progress toward an optimal MST.[42] Under the assumption of distinct edge weights, Prim's algorithm yields the unique MST of the graph, as distinct weights ensure no two spanning trees share the same total weight, eliminating alternative optimal solutions.[1] Prim's algorithm fits within the generic MST framework, a greedy procedure that iteratively adds safe edges to grow a forest until spanning the graph; this framework unifies Prim's (growing a single tree) with variants like Kruskal's (growing a forest by components) and Borůvka's (simultaneously growing multiple trees in phases), all equivalent in producing an MST under the cut property.[1]

Comparisons and Variants

Versus Kruskal's Algorithm

Kruskal's algorithm constructs a minimum spanning tree by first sorting all edges of the graph in non-decreasing order of their weights. It then processes the edges sequentially, adding each one to the spanning forest if it connects two distinct components (i.e., does not form a cycle), using a union-find data structure to efficiently track and merge components.[43] Unlike Prim's algorithm, which adopts a vertex-centric approach by starting from an arbitrary vertex and iteratively expanding a single connected component via the minimum-weight edge to an external vertex, Kruskal's is edge-centric and operates globally by considering edges independently of vertex positions. This distinction makes Prim's more localized and incremental, while Kruskal's relies on a complete initial sorting. Prim's tends to perform better on dense graphs, where the edge set is large (approaching O(V2)O(V^2)), as it avoids sorting all edges upfront and leverages adjacency-based access; conversely, Kruskal's is more efficient for sparse graphs with fewer edges, benefiting from the linear-time union-find operations after the initial sort.[44][6] Both algorithms achieve the same asymptotic time complexity of O(ElogV)O(E \log V) when implemented with binary heaps for Prim's priority queue and union-find with path compression and union by rank for Kruskal's, though Fibonacci heaps can reduce Prim's to O(E+VlogV)O(E + V \log V). In practice, Kruskal's union-find structure offers advantages in sparse graphs by enabling near-constant-time cycle checks, minimizing overhead compared to Prim's repeated priority queue updates.[44][6] Prim's is often selected for graphs represented as adjacency lists, where efficiently scanning neighbors supports its growth strategy without global preprocessing, whereas Kruskal's suits scenarios requiring fewer dynamic data structure modifications after edge sorting, such as in networks with limited connectivity. Despite these differences, both are greedy algorithms that yield the same minimum spanning tree for connected, undirected graphs assuming distinct edge weights.[44][43]

Other MST Algorithms

Besides Prim's and Kruskal's algorithms, which independently solved the minimum spanning tree (MST) problem in 1957 and 1956, respectively, several other approaches have been developed for computing MSTs, often highlighting different trade-offs in efficiency, parallelism, or adaptability to specific settings. Borůvka's algorithm, originally proposed in 1926, operates in phases by repeatedly contracting components through their minimum-weight outgoing edges, achieving an O(m log n) time complexity with appropriate data structures.[45][46] This phase-based contraction makes it inherently parallel-friendly, as each phase can process multiple components simultaneously, unlike the sequential growth in Prim's algorithm.[47] Prim's algorithm can be viewed as a specialized single-source variant of Borůvka's, where growth is confined to edges incident to a single growing component rather than global contractions across all components.[47] The reverse-delete algorithm provides a complementary greedy strategy, starting with the full connected graph and iteratively removing the highest-weight edge whose removal does not disconnect the graph, until an MST remains.[48] This approach mirrors Kruskal's in its edge-sorting but in reverse order, making it particularly suitable for dynamic graphs where edges may be added or removed over time, in contrast to Prim's focus on incremental addition from a seed vertex.[48] In distributed computing environments, the Gallager-Humblet-Spira (GHS) algorithm extends core ideas from Prim's and Kruskal's by combining Borůvka-style phases with local MST computations to construct a global MST across networked nodes. It achieves O(n log n) message complexity in synchronous settings, adapting Prim's greedy selection to decentralized coordination without a central authority. Recent advances address massive or streaming graphs, where exact MST computation via Prim's becomes infeasible due to memory constraints; approximate MST algorithms, such as those using quadtree embeddings for Euclidean instances, provide Õ(log n)-approximations in sublinear space.[49] Recent work has also focused on dynamic MST algorithms that update the tree under edge insertions and deletions efficiently.[50]

Applications

Network Design

Prim's algorithm finds wide application in telecommunication network design, where it minimizes the total cabling costs required to connect a set of terminals, such as telephone exchanges or fiber-optic nodes, while ensuring full connectivity. Originally developed at Bell Labs, the algorithm addresses the problem of interconnecting terminals with the shortest possible network of direct links, which directly translates to reducing material and installation expenses in telephone systems.[18] This approach was particularly suited for early computing environments, allowing efficient computation of minimal wiring layouts for expansive communication infrastructures.[51] In transportation planning, Prim's algorithm optimizes the layout of road or rail networks by identifying the minimum spanning tree that connects cities or hubs with the least total length or construction cost. For instance, consider a graph where vertices represent cities and edge weights denote distances or estimated building costs between them; applying Prim's algorithm starts from an arbitrary city and iteratively adds the lowest-cost connection to an unconnected city until all are linked, yielding the cheapest highway spanning tree.[52] This method ensures efficient resource allocation for infrastructure projects, such as rural road extensions or urban transit expansions, by prioritizing essential links without redundant paths.[53] The algorithm also plays a key role in power grid design, where it connects substations or generation points with the minimal total wire length or cost, maintaining network connectivity and reliability. By modeling the grid as a weighted graph—with nodes as substations and edges as potential transmission lines weighted by installation expenses—Prim's algorithm constructs a spanning tree that avoids cycles while minimizing overall infrastructure outlay.[54] This application is critical for balancing load distribution and reducing energy losses in electrical systems.[55] Furthermore, Prim's algorithm scales effectively to very-large-scale integration (VLSI) design for chip routing, where it computes minimal weighted connections between components on a semiconductor layout. In global routing phases, the algorithm treats circuit pins as vertices and possible wire paths as weighted edges, generating a spanning tree that optimizes signal propagation with reduced crosstalk and power consumption.[56] Its greedy nature makes it computationally feasible for dense, high-pin-count chips, supporting the scalability demands of modern integrated circuits.[57]

Clustering and Approximation

Prim's algorithm plays a pivotal role in hierarchical clustering by enabling the construction of a minimum spanning tree (MST) on a similarity graph, which serves as the backbone for single-linkage clustering. In single-linkage hierarchical clustering, vertices represent data points, and edge weights reflect dissimilarity measures; the MST computed via Prim's connects all points with minimal total dissimilarity. To form clusters, edges in this MST are progressively removed based on a threshold, yielding connected components that correspond to clusters at different granularity levels. This approach, equivalent to single-linkage agglomerative clustering, efficiently captures cluster structures without requiring exhaustive pairwise merging. The MST generated by Prim's also underpins approximation algorithms for problems like the k-minimum spanning tree (k-MST), where the goal is to find a minimum-cost tree spanning exactly k vertices in a graph. One effective strategy computes the full MST using Prim's and then extracts a near-optimal subtree covering k vertices by pruning higher-cost branches, achieving a constant-factor approximation. Similarly, for the Steiner tree problem, which seeks a minimum tree interconnecting a subset of terminals possibly via additional vertices, Prim's MST on a metric closure of the terminals provides a 2-approximation when combined with shortcutting techniques, offering subtrees that are provably close to optimal. These methods leverage Prim's greedy growth to efficiently build foundational trees for these NP-hard extensions.[58] In computer vision, Prim's algorithm facilitates image segmentation by constructing an MST on a graph where pixels are nodes and edges are weighted by similarity metrics, such as intensity differences or color gradients. The resulting MST connects pixels into a tree structure representing region relationships; segmenting involves cutting long edges to merge similar regions while separating dissimilar ones, enabling efficient region-based partitioning. This technique excels in handling textured images, producing boundaries that align with perceptual edges without over-segmentation. A representative implementation uses Prim's to build the MST incrementally from pixel neighborhoods, followed by thresholding for final regions.[59] For big data scenarios, adaptations of Prim's algorithm support streaming approximations by computing partial MSTs on evolving graphs, avoiding the need to store the full edge set. These variants grow the tree greedily on sampled or batched edges, providing constant-factor approximations to the true MST cost in one or few passes, which is critical for real-time analysis in distributed environments.[60] As an illustrative example, Prim's MST can cluster documents based on similarity weights derived from cosine distances or TF-IDF vectors in a complete graph. The algorithm starts from an arbitrary document and incrementally adds the most similar unconnected document, forming an MST that links all documents with minimal total dissimilarity. Cutting edges above a similarity threshold then partitions the tree into clusters of topically coherent documents, effectively handling high-dimensional text data for applications like information retrieval. This method, explored in MST-based clustering schemes, demonstrates Prim's utility in deriving hierarchical structures from similarity matrices.[61]

Advanced Topics

Parallel Implementations

Parallel implementations of Prim's algorithm adapt the sequential greedy strategy to multi-processor environments, particularly in shared-memory models like the PRAM, to accelerate the selection of minimum-weight edges connecting the growing spanning tree to unvisited vertices. In the PRAM model, multiple processors collaborate on greedy selection by partitioning the graph's adjacency structure and performing parallel scans to identify candidate minimum edges across the cut defined by the current tree and remaining components; a parallel reduction operation then computes the global minimum in O(log n) time using O(m) processors, where m is the number of edges.[62] This enables efficient handling of dense graphs but requires careful synchronization to avoid concurrent writes during edge updates. A prominent phase-based approach combines elements of Borůvka's and Prim's algorithms in a hybrid manner, dividing the computation into O(log n) phases where each phase contracts multiple components simultaneously by selecting minimum edges in parallel across all cuts, akin to Borůvka's step, followed by localized Prim-like growth within supervertices. With n processors, this achieves O(log² n) time complexity by leveraging parallel minimum-finding in each phase and reducing the number of components exponentially.[63] Such hybrids improve load distribution over pure Prim's by allowing concurrent tree growth from multiple starting points before merging. Implementations often employ parallel priority queues to manage edge weights efficiently, where each processor maintains a local queue for its subgraph partition, and global synchronization via reductions (e.g., MPI_Allreduce) extracts the next minimum edge without full queue merging. For updates to key values after edge additions, techniques like pointer doubling can accelerate propagation across components by iteratively linking representatives in O(log n) steps, ensuring consistent distances in the growing forest. Parallel Prim's variants are available in libraries supporting multi-processor and accelerator environments; for instance, custom CUDA implementations accelerate dense graphs on GPUs by mapping vertices to threads for parallel min-heap operations, achieving approximately 2x speedup over CPU versions on graphs with up to 16,384 vertices.[64] Message-passing libraries like MPI facilitate distributed shared-memory simulations for larger scales. Despite these advances, parallel implementations face limitations in load balancing, particularly on irregular graphs where vertices have varying degrees, leading to uneven processor workloads during edge scans and queue operations that can degrade speedup beyond moderate core counts.[63]

Distributed Computing Adaptations

One prominent distributed adaptation of Prim's algorithm is the Gallager-Humblet-Spira (GHS) algorithm, which constructs a minimum spanning tree (MST) in a message-passing network by maintaining fragments—subtrees that are themselves MSTs of their induced subgraphs—and merging them iteratively. Influenced by Prim's sequential growth of a single tree from an initial fragment, GHS distributes this process across multiple fragments, each starting as a single node, and uses convergecast operations to identify and accept the minimum-weight outgoing edge connecting fragments, ensuring the invariant that added edges preserve the MST property. This approach avoids a central coordinator by electing fragment leaders to coordinate merges via broadcasts and directed message floods along fragment trees. The protocol begins with each node as its own fragment and leader, initiating local growth akin to Prim's by exchanging edge weights with neighbors to test for acceptance into larger fragments. Synchronization occurs through broadcasts to propagate accept and reject messages, while convergecasts aggregate minimum edge information up the fragment tree to the leader, enabling decisions on merges without global knowledge. In the asynchronous CONGEST model, the algorithm proceeds in phases of increasing fragment levels, with each merge reducing the number of fragments until a single MST remains. The original GHS algorithm exhibits a message complexity of O(n^2) in the worst case for dense graphs, primarily due to the repeated broadcasts and convergecasts across fragments. Improvements via pipelining, as in the controlled GHS variant, overlap phase executions to achieve O(m + n log^2 n) messages while maintaining correctness, reducing overhead in networks with moderate edge density. In wireless sensor networks (WSNs), these adaptations construct minimal energy spanning trees for efficient data aggregation and routing, where edge weights represent transmission costs to minimize total power consumption.[65] For instance, the dynamic GHS extension handles node failures by locally repairing fragments through re-initiation of merge phases, preserving connectivity without full recomputation.[65] Key challenges in decentralized settings include managing asynchrony, where message delays can violate phase ordering, addressed in GHS by edge classification rules (blue for intra-fragment, red for inter-fragment tests) to ensure safe acceptance. Failure handling requires extensions like dynamic leader re-election and fragment rebuilding, increasing message overhead but enabling resilience in unreliable environments such as WSNs.[65]

References

User Avatar
No comments yet.