Recent from talks
Nothing was collected or created yet.
Persistent homology
View on WikipediaIn topological data analysis, persistent homology is a method for computing topological features of a space at different spatial resolutions. More persistent features are detected over a wide range of spatial scales and are deemed more likely to represent true features of the underlying space rather than artifacts of sampling, noise, or particular choice of parameters.[1]
To find the persistent homology of a space, the space must first be represented as a simplicial complex. A distance function on the underlying space corresponds to a filtration of the simplicial complex, that is a nested sequence of increasing subsets. One common method of doing this is via taking the sublevel filtration of the distance to a point cloud, or equivalently, the offset filtration on the point cloud and taking its nerve in order to get the simplicial filtration known as Čech filtration.[2] A similar construction uses a nested sequence of Vietoris–Rips complexes known as the Vietoris–Rips filtration.[3]
Definition
[edit]Formally, consider a real-valued function on a simplicial complex that is non-decreasing on increasing sequences of faces, so whenever is a face of in . Then for every the sublevel set is a subcomplex of K, and the ordering of the values of on the simplices in (which is in practice always finite) induces an ordering on the sublevel complexes that defines a filtration
When , the inclusion induces a homomorphism on the simplicial homology groups for each dimension . The persistent homology groups are the images of these homomorphisms, and the persistent Betti numbers are the ranks of those groups.[4] Persistent Betti numbers for coincide with the size function, a predecessor of persistent homology.[5]
Any filtered complex over a field can be brought by a linear transformation preserving the filtration to so called canonical form, a canonically defined direct sum of filtered complexes of two types: one-dimensional complexes with trivial differential and two-dimensional complexes with trivial homology .[6]
A persistence module over a partially ordered set is a set of vector spaces indexed by , with a linear map whenever , with equal to the identity and for . Equivalently, we may consider it as a functor from considered as a category to the category of vector spaces (or -modules). There is a classification of persistence modules over a field indexed by : Multiplication by corresponds to moving forward one step in the persistence module. Intuitively, the free parts on the right side correspond to the homology generators that appear at filtration level and never disappear, while the torsion parts correspond to those that appear at filtration level and last for steps of the filtration (or equivalently, disappear at filtration level ).[7][6]
Each of these two theorems allows us to uniquely represent the persistent homology of a filtered simplicial complex with a persistence barcode or persistence diagram. A barcode represents each persistent generator with a horizontal line beginning at the first filtration level where it appears, and ending at the filtration level where it disappears, while a persistence diagram plots a point for each generator with its x-coordinate the birth time and its y-coordinate the death time. Equivalently the same data is represented by Barannikov's canonical form,[6] where each generator is represented by a segment connecting the birth and the death values plotted on separate lines for each .
Stability
[edit]Persistent homology is stable in a precise sense, which provides robustness against noise. The bottleneck distance is a natural metric on the space of persistence diagrams given by where ranges over bijections. A small perturbation in the input filtration leads to a small perturbation of its persistence diagram in the bottleneck distance. For concreteness, consider a filtration on a space homeomorphic to a simplicial complex determined by the sublevel sets of a continuous tame function . The map taking to the persistence diagram of its th homology is 1-Lipschitz with respect to the -metric on functions and the bottleneck distance on persistence diagrams. That is, .[8]
Computation
[edit]The principal algorithm is based on the bringing of the filtered complex to its canonical form by upper-triangular matrices and runs in worst-case cubical complexity in the number of simplices.[6] The fastest known algorithm for computing persistent homology runs in matrix multiplication time.[9]
Since the number of simplices is highly relevant for computation time, finding filtered simplicial complexes with few simplexes is an active research area. Several approaches have been proposed to reduce the number of simplices in a filtered simplicial complex in order to approximate persistent homology.[10][11][12][13]
Software
[edit]There are various software packages for computing persistence intervals of a finite filtration.[14]
| Software package | Creator | Latest release | Release date | Software license[15] | Open source | Programming language | Features |
|---|---|---|---|---|---|---|---|
| OpenPH | Rodrigo Mendoza-Smith, Jared Tanner | 0.0.1 | 25 April 2019 | Apache 2.0 | Yes | Matlab, CUDA | GPU acceleration |
| javaPlex | Andrew Tausz, Mikael Vejdemo-Johansson, Henry Adams | 4.2.5 | 14 March 2016 | Custom | Yes | Java, Matlab | |
| Dionysus | Dmitriy Morozov | 2.0.10 | 27 Jun 2023 | Modified BSD | Yes | C++, Python bindings | |
| Perseus | Vidit Nanda | 4.0 beta | GPL | Yes | C++ | ||
| PHAT [16] | Ulrich Bauer, Michael Kerber, Jan Reininghaus | 1.4.1 | Yes | C++ | |||
| DIPHA | Jan Reininghaus | Yes | C++ | ||||
| Gudhi [17] | INRIA | 3.10.1 | 1 July 2024 | MIT/GPLv3 | Yes | C++, Python bindings | |
| CTL | Ryan Lewis | 0.2 | BSD | Yes | C++ | ||
| phom | Andrew Tausz | Yes | R | ||||
| TDA | Brittany T. Fasy, Jisu Kim, Fabrizio Lecci, Clement Maria, Vincent Rouvreau | 1.5 | 16 June 2016 | Yes | R | Provides R interface for GUDHI, Dionysus and PHAT | |
| Eirene | Gregory Henselman | 1.0.1 | 9 March 2019 | GPLv3 | Yes | Julia | |
| Ripser | Ulrich Bauer | 1.0.1 | 15 September 2016 | MIT | Yes | C++ | |
| Cubicle | Hubert Wagner | v0.8 beta | May 2018 | GPL | Yes | C++ | Handles large 3D and 2D grayscale images (scalar voxel data) |
| the Topology ToolKit | Julien Tierny, Guillaume Favelier, Joshua Levine, Charles Gueunet, Michael Michaux | 0.9.8 | 29 July 2019 | BSD | Yes | C++, VTK and Python bindings | |
| libstick | Stefan Huber | 0.2 | 27 November 2014 | MIT | Yes | C++ | |
| Ripser++ | Simon Zhang, Mengbai Xiao, and Hao Wang | 1.0 | March 2020 | MIT | Yes | CUDA, C++, Python bindings | GPU acceleration |
See also
[edit]References
[edit]- ^ Carlsson, Gunnar (2009). "Topology and data". Bulletin of the American Mathematical Society. 46 (2): 255–308. doi:10.1090/S0273-0979-09-01249-X.
- ^ Kerber, Michael; Sharathkumar, R. (2013). "Approximate Čech Complex in Low and High Dimensions". In Cai, Leizhen; Cheng, Siu-Wing; Lam, Tak-Wah (eds.). Algorithms and Computation. Lecture Notes in Computer Science. Vol. 8283. Berlin, Heidelberg: Springer. pp. 666–676. doi:10.1007/978-3-642-45030-3_62. ISBN 978-3-642-45030-3. S2CID 5770506.
- ^ Dey, Tamal K.; Shi, Dayu; Wang, Yusu (2019-01-30). "SimBa: An Efficient Tool for Approximating Rips-filtration Persistence via Simplicial Batch Collapse". ACM Journal of Experimental Algorithmics. 24: 1.5:1–1.5:16. doi:10.1145/3284360. ISSN 1084-6654. S2CID 216028146.
- ^ Edelsbrunner, H and Harer, J (2010). Computational Topology: An Introduction. American Mathematical Society.
- ^ Verri, A.; Uras, C.; Frosini, P.; Ferri, M. (1993). "On the use of size functions for shape analysis". Biological Cybernetics. 70 (2): 99–107. doi:10.1007/BF00200823. S2CID 39065932.
- ^ a b c d Barannikov, Sergey (1994). "Framed Morse complex and its invariants". Advances in Soviet Mathematics. 21: 93–115.
- ^ Zomorodian, Afra; Carlsson, Gunnar (2004-11-19). "Computing Persistent Homology". Discrete & Computational Geometry. 33 (2): 249–274. doi:10.1007/s00454-004-1146-y. ISSN 0179-5376.
- ^ Cohen-Steiner, David; Edelsbrunner, Herbert; Harer, John (2006-12-12). "Stability of Persistence Diagrams". Discrete & Computational Geometry. 37 (1): 103–120. doi:10.1007/s00454-006-1276-5. ISSN 0179-5376.
- ^ Morozov, Dmitriy; Skraba, Primoz (2024). "Persistent (Co)Homology in Matrix Multiplication Time". arXiv:2412.02591 [math.AT].
- ^ Sheehy, Donald R. (June 2013). "Linear-Size Approximations to the Vietoris–Rips Filtration". Discrete & Computational Geometry. 49 (4): 778–796. arXiv:1203.6786. doi:10.1007/s00454-013-9513-1.
- ^ Botnan, Magnus Bakke; Spreemann, Gard (March 2015). "Approximating persistent homology in Euclidean space through collapses". Applicable Algebra in Engineering, Communication and Computing. 26 (1–2): 73–101. arXiv:1403.0533. doi:10.1007/s00200-014-0247-y.
- ^ Choudhary, Aruni; Kerber, Michael; Raghvendra, Sharath (2021). "Improved approximate Rips filtrations with shifted integer lattices and cubical complexes". Journal of Applied and Computational Topology. 5 (3): 425–458. doi:10.1007/s41468-021-00072-4. MR 4298670. PMC 8549989. Preliminary version at ESA 2017, doi:10.4230/LIPIcs.ESA.2017.28.
- ^ Brun, Morten; Blaser, Nello (June 2019). "Sparse Dowker nerves". Journal of Applied and Computational Topology. 3 (1–2): 1–28. arXiv:1802.03655. doi:10.1007/s41468-019-00028-9.
- ^ Otter, Nina; Porter, Mason A; Tillmann, Ulrike; et al. (2017-08-09). "A roadmap for the computation of persistent homology". EPJ Data Science. 6 (1). Springer: 17. doi:10.1140/epjds/s13688-017-0109-5. ISSN 2193-1127. PMC 6979512. PMID 32025466.
- ^ Licenses here are a summary, and are not taken to be complete statements of the licenses. Some packages may use libraries under different licenses.
- ^ Bauer, Ulrich; Kerber, Michael; Reininghaus, Jan; Wagner, Hubert (2014). "PHAT – Persistent Homology Algorithms Toolbox". Mathematical Software – ICMS 2014. Springer Berlin Heidelberg. pp. 137–143. doi:10.1007/978-3-662-44199-2_24. ISBN 978-3-662-44198-5. ISSN 0302-9743.
- ^ Maria, Clément; Boissonnat, Jean-Daniel; Glisse, Marc; et al. (2014). "The Gudhi Library: Simplicial Complexes and Persistent Homology". Mathematical Software – ICMS 2014 (PDF). Berlin, Heidelberg: Springer. pp. 167–174. doi:10.1007/978-3-662-44199-2_28. ISBN 978-3-662-44198-5. ISSN 0302-9743. S2CID 17810678.
Persistent homology
View on GrokipediaIntroduction
Definition
Persistent homology is a method in topological data analysis that captures the evolution of topological features, such as connected components, loops, and voids, across multiple scales in data. It achieves this by computing a sequence of homology groups associated with a filtered topological space, where features are tracked by their "birth" and "death" times as the filtration parameter increases. This allows for distinguishing persistent, significant structures from transient noise in noisy or high-dimensional data.[4][5] Formally, persistent homology takes as input a filtered simplicial complex—a nested sequence of simplicial complexes , where each is a subcomplex of the final complex —and produces a persistence module. The persistence module consists of the homology groups for each dimension and filtration index , connected by induced homomorphisms from the inclusions for . These modules encode the multi-scale topology by quantifying how long individual homology classes persist under the filtration.[4][5] A common application is to point cloud data, where the Vietoris-Rips filtration constructs simplicial complexes by connecting points within a varying radius : at small , isolated points form 0-dimensional features (clusters as connected components); as increases, these merge, and higher-dimensional features like 1-dimensional loops emerge when points outline cycles. This process detects clusters at fine scales and loops at coarser scales, providing insight into the underlying shape of the data.[6] The results are often visualized using persistence barcodes, where each topological feature corresponds to a horizontal bar whose left endpoint marks its birth time and right endpoint its death time in the filtration parameter. Long bars represent persistent features of interest, such as stable clusters or loops, while short bars indicate noise that appears and disappears quickly.[4]Historical Context
The roots of persistent homology trace back to the 1990s, emerging from advancements in algebraic topology and computational geometry aimed at analyzing shapes and data at multiple scales. Early precursors included size functions introduced by Patrizio Frosini and Massimo Ferri in 1992, which captured topological changes in shapes under deformations, and experimental topology inference methods explored by Vanessa Robins in 1999 for point cloud data.[7][8] These developments laid the groundwork for handling noisy, multi-scale data structures in computational settings. A pivotal milestone occurred in 2000, when Herbert Edelsbrunner, David Letscher, and Afra Zomorodian introduced persistent homology in their seminal work, providing an algebraic framework to track topological features across scales in filtrations of simplicial complexes.[9] This innovation, formalized in their 2002 journal publication, enabled the computation of persistent topological invariants like holes and voids that persist over varying resolutions, revolutionizing topological data analysis (TDA). Influential figures such as Edelsbrunner, Zomorodian, and later Gunnar Carlsson drove this field forward; Carlsson, in particular, integrated persistent homology into the broader TDA framework through his 2009 survey, emphasizing its role in extracting robust topological signals from complex datasets.[10] Post-2010, the field experienced rapid growth in applications across sciences, fueled by stability theorems that ensured robustness to noise, with Carlsson and collaborators advancing persistence modules and multidimensional extensions around 2005–2009. By the mid-2010s, persistent homology had become a cornerstone of TDA, with widespread adoption in areas like materials science and biology. As of 2025, recent advancements include extensions to multiparameter persistence, enabling analysis of data with multiple filtering parameters, and integrations with machine learning, such as persistence-based feature representations for neural networks.[11] These developments, exemplified by tools like the multipers package for multiparameter computations in ML pipelines, continue to expand the method's applicability to high-dimensional, real-world data challenges.[12]Mathematical Foundations
Homology Basics
A simplicial complex is a topological space constructed by gluing together simplices—such as points (0-simplices), line segments (1-simplices), triangles (2-simplices), and higher-dimensional analogues—along their faces, ensuring that the intersection of any two simplices is either empty or a common face.[13] More formally, in the context of Δ-complexes, a simplicial complex consists of a collection of simplices with an ordering of vertices and a characteristic map σ_α: Δ^n → X for each n-simplex, where Δ^n denotes the standard n-simplex, satisfying compatibility conditions on the faces.[13] This structure provides a combinatorial framework for studying the topology of spaces like polyhedra or manifolds via algebraic invariants.[13] To compute homology, one first forms the associated chain complex. The k-chains C_k form a free abelian group generated by the oriented k-simplices of the complex, and the boundary operator ∂k: C_k → C{k-1} is a group homomorphism defined on a generator σ = [v_0, v_1, ..., v_k] by where the hat denotes omission of the i-th vertex, extended linearly to all chains.[13] These operators satisfy ∂_{k-1} ∘ ∂_k = 0 for all k, making (..., C_k, ∂_k, ...) a chain complex.[13] The homology groups are then defined as H_k = ker(∂k) / im(∂{k+1}), where ker(∂k) consists of k-cycles (chains with zero boundary) and im(∂{k+1}) consists of k-boundaries (boundaries of (k+1)-chains).[13] The Betti numbers β_k = rank(H_k) quantify the number of independent k-dimensional "holes" in the space, with β_0 counting connected components and higher β_k detecting voids.[13] For example, consider the circle S^1 modeled as a Δ-complex with one 0-simplex v and one 1-simplex e looping from v to itself, so ∂_1(e) = v - v = 0.[13] Then C_0 = ℤ⟨v⟩ ≅ ℤ, C_1 = ℤ⟨e⟩ ≅ ℤ, and higher C_k = 0, yielding H_0(S^1) ≅ ℤ (one component) and H_1(S^1) ≅ ℤ (one loop), with H_k(S^1) = 0 for k > 1, so β_0 = 1 and β_1 = 1.[13] In contrast, the disk D^2, being contractible, has H_0(D^2) ≅ ℤ and H_k(D^2) = 0 for k > 0, reflecting the absence of holes.[13] Homology is functorial: a continuous map f: X → Y between spaces induces a chain map on their chain complexes, which in turn yields a group homomorphism f_: H_k(X) → H_k(Y) for each k, preserving composition (f ∘ g)_ = f_* ∘ g_* and the identity id_* = id.[13] This property ensures that homotopy equivalent spaces have isomorphic homology groups, making homology a topological invariant.[13]Persistent Homology Construction
Persistent homology is constructed by extending classical homology to filtered simplicial complexes, capturing the evolution of topological features across scales. A filtered simplicial complex consists of a nested sequence of subcomplexes for , where parameterizes the filtration, often derived from a distance threshold in point cloud data. For instance, the Vietoris-Rips complex builds simplices from points within distance , ensuring the filtration is increasing as grows.[5][14] The core structure is the persistence module, a family of vector spaces indexed by the filtration parameter , equipped with induced homomorphisms for , which are inclusions preserving homology classes that survive in the filtration. These modules formalize how topological features, such as connected components or loops, persist or emerge as the complex grows. Over a field, the persistence module decomposes uniquely into a direct sum of interval modules, each corresponding to a feature that "births" at parameter and "dies" at , represented as shifted appropriately.[14] The rank invariant provides a way to track feature persistence, defined as , which counts the number of -dimensional features alive at scale that persist to scale . This invariant fully characterizes the module and enables comparison of topological changes without explicit decomposition.[5][14] A representative example is the persistent homology of a point cloud sampled from a figure-eight curve, which has two 1-dimensional loops at small scales. As the filtration parameter increases via the Vietoris-Rips complex, each loop births separately around the parameter corresponding to the curve's radius; further increase connects the loops at the crossing point, causing one to die while the other persists, reflecting the merger into a single effective loop.[14]Computational Methods
Algorithms for Computation
The standard algorithm for computing persistent homology relies on reducing the boundary matrix of a filtered simplicial complex to identify persistent topological features. The boundary operator from dimension to is represented as an matrix over a prime field, where rows and columns correspond to simplices ordered by increasing filtration values. Column reduction via Gaussian elimination transforms into a column-echelon form, where each pivot column marks the birth of a homology class at its filtration index, and the index of the column that reduces it to zero indicates its death. This dualization of the boundary matrix—processing columns from left to right—enables tracking of persistence intervals (birth-death pairs) without computing the full Smith normal form, as non-pivot columns correspond to cycles that persist until killed by later boundaries. To enhance efficiency, the reduction incorporates "clear" and "optimize" steps. The clear step eliminates entries above pivots in non-pivot columns using row additions, removing unnecessary dependencies and preventing fill-in during subsequent reductions. The optimize step further reduces entries below and to the right of pivots in the same column, minimizing matrix density and avoiding redundant operations in higher dimensions. These optimizations handle low-high cancellations (where a low-index simplex kills a high-index one) and ensure the algorithm runs in time for a complex with simplices, with practical performance improved by sparse matrix storage. In practice, persistent homology is often computed on Vietoris-Rips complexes derived from point cloud data, where the filtration is parameterized by a radius . Simplices are added incrementally: vertices at , edges when the distance between endpoints is at most , triangles when all pairwise distances are at most , and so on for higher dimensions. This construction uses the pairwise distance matrix to identify cliques, with sparse representations (e.g., adjacency lists) to store only non-zero boundary entries, avoiding dense matrices for large point sets. The process builds the complex as a sequence of nested subcomplexes, allowing the boundary matrix to be assembled and reduced on-the-fly.[15] For fixed topological dimension , the computation achieves linear time in the number of simplices , as the boundary matrices have bounded bandwidth and the reduction can exploit this structure for operations per dimension. Specifically, for dimension 0 homology (connected components), union-find data structures track merges during edge additions in nearly linear time , where is the inverse Ackermann function and is the number of points, providing a fast baseline before higher-dimensional reductions.[16] As an illustrative example, consider a point cloud of three points , , and in forming an equilateral triangle with side length 1. The Vietoris-Rips filtration begins at with three 0-simplices (vertices), yielding three components in born at 0. As increases to 0.5 (half the side length), edges , , and enter simultaneously, merging components: union-find first connects and (killing one class at birth 0, death 0.5), then and (killing another at birth 0, death 0.5), leaving one persistent component. At the same , the three edges create a 1-cycle, birthing an class, but the 2-simplex enters immediately after (in the ordering of same-filtration simplices by increasing dimension), and its boundary—the sum of the three edges over —kills this cycle. Thus, the class births and dies at 0.5, resulting in zero persistence (a transient feature typically filtered as noise). To compute this, the boundary matrix reduction for dimension 1 incorporates columns for both the edges (to vertices) and the 2-simplex (to edges); full reduction pairs the cycle with the 2-simplex, yielding three pivots and no unpaired persistent cycle, confirming the absence of a long-lived 1-dimensional feature.[15]Persistence Diagrams and Barcodes
Persistence diagrams serve as a fundamental representation of the results from persistent homology, encoding the birth and death times of topological features across a filtration. Formally, a persistence diagram is a multiset of points in the plane consisting of off-diagonal points where , each representing a homological class that is born at filtration parameter and dies at , along with the diagonal line itself, which accounts for infinitely many points of zero persistence. Essential features that persist indefinitely are represented by points . This multiset structure arises directly from the algebraic decomposition of the persistence module into interval modules, providing a complete invariant of the module over a field.[4][17] Barcodes offer an intuitive graphical alternative to persistence diagrams, visualizing the same information as a collection of horizontal line segments for each persistent feature, typically grouped vertically by homological dimension. In a barcode for dimension , each segment's length quantifies the persistence of a -dimensional hole or component, with longer bars indicating robust topological signals amid noise. Like diagrams, barcodes derive from the interval decomposition of the persistence module, establishing a direct bijection: each interval module corresponds to a unique bar or point. This representation facilitates visual inspection of feature lifetimes without altering the underlying topological content.[18][17] The conversion between persistence diagrams and barcodes is straightforward and lossless, as both are isomorphic encodings of the persistence module's canonical form over a principal ideal domain like a field. Specifically, the multiset of birth-death pairs in the diagram maps one-to-one with the set of interval endpoints defining the barcode segments. This equivalence ensures that analyses performed on one representation translate seamlessly to the other.[17] Persistence diagrams exhibit stability as representations, remaining unchanged under basis changes in the homology vector spaces, since they capture the unique graded module structure invariant to such transformations. This invariance stems from the algebraic classification of persistence modules into direct sums of interval modules, independent of choices in homological bases. For example, consider a point cloud sampled from a noisy circle; the barcode in dimension 1 reveals a prominent long bar corresponding to the central hole's birth near the minimal sampling scale and death only at a much larger scale where the noise merges the boundary, while shorter bars reflect transient artifacts.[17][19][18]Stability and Robustness
Stability Theorems
Stability theorems in persistent homology establish the robustness of topological features extracted from data under small perturbations, ensuring that the method reliably captures persistent structures rather than transient noise. A key result is the bottleneck stability theorem for persistence diagrams derived from continuous tame functions on a triangulable topological space . Specifically, for two such functions , the bottleneck distance between their persistence diagrams satisfies where is the bottleneck distance, defined as the infimum over all bijections between the diagrams of the supremum of for points in the diagrams (including the diagonal projections). For point cloud data, this stability extends to Vietoris-Rips filtrations, which are commonly used to approximate the topology of metric spaces. If and are finite subsets of a metric space with Hausdorff distance , the bottleneck distance between the persistence diagrams of their Vietoris-Rips complexes is bounded by . This bound arises because small perturbations in the point cloud induce interleavings in the corresponding persistence modules, with the factor of 2 stemming from the scaling properties of the Vietoris-Rips construction.[20] At a more abstract level, algebraic stability theorems address the stability of persistence modules themselves. For two tame persistence modules and over that are strongly -interleaved—meaning there exist module morphisms and that compose to within of the identity—the bottleneck distance between their persistence diagrams satisfies . This result unifies the geometric stability theorems by showing that interleaving distance on modules directly controls diagram perturbations.[21] These theorems have profound implications for topological data analysis: unlike single-scale homology, which can drastically change under noise, persistent homology identifies features that endure across scales, with perturbations causing only controlled shifts in birth and death times. This robustness enables the detection of true underlying topology in noisy datasets. Proofs of these stability results typically proceed by constructing near-isometries between the filtrations of the original and perturbed inputs. For instance, in the algebraic setting, -interleaving induces a chain of module maps that preserve homology ranks up to shifts, allowing the application of the algebraic isometry theorem, which equates the bottleneck distance to the minimal interleaving distance. Similar ideas underpin the geometric cases, where perturbations translate to bounded shifts in simplex inclusions across the filtration.[21]Distance Measures
In persistent homology, distance measures quantify the similarity between persistence diagrams, enabling the comparison of topological features across datasets. These metrics are essential for tasks such as shape classification, data perturbation analysis, and statistical inference in topological data analysis. Common distances account for the multi-set nature of diagrams by allowing partial matchings to the diagonal (representing ephemeral features) and full bijections between off-diagonal points. The bottleneck distance is a fundamental metric defined as , where and are persistence diagrams, ranges over all partial bijections (matchings) between the points of and (with unmatched points paired to their projections on the diagonal), and is the supremum norm in the plane.[22] This distance captures the largest deviation in a optimal matching, making it sensitive to prominent topological differences but robust to minor noise. It is computed efficiently using the Hungarian algorithm adapted for the bottleneck assignment problem, which finds the minimum maximum-cost matching in time for diagrams with points. The Wasserstein distance, also known as the earth mover's distance in this context, generalizes the bottleneck by minimizing the total transport cost: for , , where the infimum is again over partial bijections and unmatched points are transported to the diagonal at cost proportional to their persistence.[23] As , converges to the bottleneck distance. This metric, rooted in optimal transport theory, better reflects cumulative differences across all features and is particularly useful for averaging diagrams or handling varying feature densities. Computation involves solving a linear program formulation of the transportation problem, typically in time using simplex or interior-point methods, though approximations accelerate large-scale applications. To facilitate statistical analysis and machine learning integration, persistence diagrams can be summarized as one-dimensional functions, such as persistence landscapes and silhouettes, which map to Hilbert or Banach spaces for straightforward distance computations like or norms. A persistence landscape is constructed by ordering the "tent" functions (piecewise-linear peaks) derived from each birth-death interval in the diagram and stacking them as , where is the rank invariant and indexes the landscape layer; the full landscape is the sequence .[24] Similarly, a silhouette is a weighted average of these tent functions, with weights emphasizing persistent features, where are the tents and controls the weighting.[25] These summaries preserve stability properties and allow easy comparison via functional norms, often outperforming direct diagram distances in high-dimensional or noisy settings. For instance, consider two point clouds: one forming a clean circle (yielding a prominent 1D loop in its persistence diagram) and a perturbed version with added noise (introducing short-lived features). The bottleneck distance would highlight the preservation of the main loop's birth-death coordinates, remaining small if perturbations are minor, while the 2-Wasserstein distance would quantify the total displacement of noise-induced points to the diagonal, providing a more nuanced measure of overall topological fidelity. These distances underpin stability theorems, ensuring that small input perturbations yield bounded changes in diagrams.[22]Applications
Topological Data Analysis
Topological data analysis (TDA) leverages persistent homology to uncover robust, multi-scale topological features in datasets, providing insights into the underlying shape that traditional methods may overlook. The standard TDA workflow starts with a point cloud derived from the data, such as coordinates from sensors or pixels in an image. A filtration is then constructed, typically via the Vietoris-Rips complex, where simplices are added based on increasing distance thresholds, creating a nested sequence of topological spaces. Persistent homology is computed across this filtration, tracking the birth and death of homology classes—such as 0-dimensional connected components, 1-dimensional loops, and higher-dimensional voids—resulting in persistence diagrams that encode these lifetimes. Analysis of these diagrams enables shape inference by identifying persistent features that survive noise and deformations, distinguishing signal from artifact.[26][27] Feature detection in TDA often employs summaries of persistence diagrams, such as Betti curves and persistence landscapes, to facilitate tasks like clustering and anomaly detection. Betti curves plot the persistent Betti numbers—counting the number of k-dimensional holes at each filtration scale—offering a simple visualization of topological evolution that highlights clusters (as short-lived 0-dimensional features) or loops (persistent 1-dimensional features) for grouping similar data points. Persistence landscapes transform diagrams into piecewise-linear functions in a normed vector space, enabling stable averaging across samples and detection of anomalies by measuring deviations in landscape norms from expected topologies. These representations integrate seamlessly with machine learning pipelines, where, for example, landscape-based distances can cluster datasets by topological similarity or flag outliers via elevated persistence in rare features.[26][24] A practical case study of persistent homology in TDA arises in sensor networks, where point clouds of sensor positions are filtered to compute homology, revealing voids (2-dimensional holes) that indicate coverage gaps and connectivity (0- and 1-dimensional features) essential for network reliability. In this approach, the persistence of higher-dimensional holes quantifies uncovered regions, guiding sensor placement to minimize topological defects while ensuring robust communication paths. Similarly, for image analysis, pixel intensities or edge points form point clouds subjected to filtration; persistent homology then detects voids and connectivity patterns, such as separating foreground objects from background noise, even in grayscale images with added perturbations. These applications demonstrate how diagram analysis infers global structure, like hole persistence corresponding to enclosed regions in the image.[28][29] Integration of persistent homology with statistics in TDA supports rigorous inference through hypothesis testing on persistence diagrams, often via vectorized summaries. For example, two-sample tests compare vectorized diagrams—such as those from persistence landscapes—using permutation procedures to assess whether topological features differ significantly between groups, with p-values derived from distribution under the null hypothesis of identical shapes. This framework, stable under perturbations, enables applications like testing for topological changes in evolving datasets, where landscape averages serve as test statistics. Such methods extend classical statistics by incorporating topological invariants, providing p-values that quantify evidence for alternative hypotheses like differing connectivity.[24][30] A key limitation of persistent homology in TDA is the curse of dimensionality, where high-dimensional point clouds lead to unreliable persistence diagrams due to increased noise sensitivity and proliferation of short-lived, spurious features that obscure true topology. As ambient dimension grows, the expected persistence of genuine holes diminishes, exacerbating computational costs in filtration construction and homology computation, often rendering analysis impractical without preprocessing. Strategies like random projections or manifold learning are employed to reduce dimensions while preserving essential topological signals, though they introduce trade-offs in feature fidelity.[31]Broader Uses in Science and Engineering
In biology, persistent homology has been instrumental in analyzing protein structures by identifying topological features such as loops and cavities that persist across different scales of molecular data. For instance, it enables the detection of flexible regions in proteins by tracking the birth and death of homology classes in point cloud representations of atomic coordinates, revealing insights into folding dynamics and functional motifs. This approach, advanced in the 2010s, has improved the characterization of biomolecular flexibility compared to traditional geometric methods, with applications in predicting protein-ligand interactions.[32][33] In materials science, persistent homology characterizes the topology of porous structures and detects phase transitions in complex materials by quantifying the persistence of voids and connectivity patterns. Applied to 3D imaging data of porous media, it provides robust descriptors of pore geometry that correlate with fluid flow properties, outperforming conventional porosity metrics in heterogeneous samples. For phase transitions, it identifies critical points where topological features like loops emerge or vanish, offering a scale-invariant way to distinguish ordered from disordered states in spin models and alloys.[34][35] Neuroscience leverages persistent homology to map brain connectivity from fMRI data, capturing higher-order topological structures in functional networks that reveal synchronization loops and community overlaps. By computing persistence diagrams on correlation matrices, it quantifies the robustness of network motifs across subjects, aiding in the differentiation of healthy brains from those with mild cognitive impairment. This method highlights persistent cycles in resting-state data, providing interpretable features for understanding information flow in neural circuits.[36][37] In engineering, persistent homology supports path planning in robotics by analyzing sensor data to identify persistent voids in uncertain environments, enabling safer navigation through probabilistic occupancy maps. It detects cycles and holes in graph representations of sensor readings, guiding trajectory optimization that avoids obstacles over multiple scales. Recent advancements as of 2025 extend persistent homology to quantum data analysis, where it classifies topological phases in many-body systems via quantum barcodes that track persistent features in wavefunction filtrations.[38] In climate modeling, multiparameter persistence analyzes weather regimes by incorporating spatial scales via bifiltrations, identifying persistent structures in atmospheric data related to blocking patterns and regime transitions.[39]Implementations
Software Libraries
Several open-source software libraries have been developed to facilitate the computation of persistent homology, providing efficient implementations for various simplicial complexes and filtration methods. These libraries are primarily written in C++ for performance, often with Python bindings for accessibility, and support key algorithms such as matrix reduction for persistence computation.[40][41][42] The GUDHI (Geometry Understanding in Higher Dimensions) library is a comprehensive C++ framework with a Python interface, designed for topological data analysis. It supports the construction of Vietoris-Rips complexes from point clouds or distance matrices and alpha complexes via Delaunay triangulations, enabling persistent homology computations up to moderate dimensions. GUDHI is particularly efficient for large datasets, leveraging optimized data structures like simplex trees for filtration and persistence calculations.[43][44] Ripser is a lightweight C++ library with Python bindings, specializing in the efficient computation of Vietoris-Rips persistence barcodes through an optimized matrix reduction algorithm. Released in 2017, it excels in moderate dimensions, particularly for dimensions 0 and 1 (H_0 and H_1), where it achieves high speed on point clouds with thousands of points by minimizing memory usage and avoiding explicit simplex enumeration.[45][46] Perseus is a versatile program for computational topology, including persistent homology, developed for platforms like Linux, Windows, and macOS. It implements algorithms for constructing simplicial complexes from point clouds (e.g., Vietoris-Rips or alpha shapes) and computes persistence diagrams, with a focus on robustness for moderate-sized datasets and support for higher dimensions. Perseus is particularly useful for exploratory analysis due to its command-line interface and built-in visualization options.[47] Dionysus is a modular C++ library for persistent homology, offering flexibility for custom filtrations through its chain complex and boundary matrix representations. It supports a range of input types, including point clouds for Vietoris-Rips or alpha complexes, and includes tools for computing persistence diagrams and related summaries like persistence landscapes from the output. This modularity makes it suitable for extending to non-standard filtrations in research settings.[41][48] Comparisons of these libraries highlight trade-offs in performance and features: GUDHI offers the most versatility for diverse complex types and large-scale applications, while Ripser provides superior speed for low-dimensional Vietoris-Rips computations on H_0 and H_1, often outperforming others on benchmarks with up to 10,000 points. Dionysus and Perseus stand out for their extensibility in custom scenarios but may lag in raw speed for very large inputs compared to Ripser or GUDHI.[15][49] GUDHI can be installed via pip with the commandpip install gudhi, requiring a C++ compiler for the underlying bindings. A basic usage example for computing persistent homology on a point cloud involves creating a Rips complex and extracting the persistence pairs, as shown below:
import gudhi as gd
import numpy as np
# Example point cloud in 2D
points = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])
# Create Vietoris-Rips complex
rc = gd.RipsComplex(points=points, max_edge_length=2.0)
st = rc.create_simplex_tree(max_dimension=1)
# Compute [persistence](/page/The_Persistence)
persistence = st.persistence()
# Output: list of (dimension, (birth, death)) pairs
print(persistence)
import gudhi as gd
import numpy as np
# Example point cloud in 2D
points = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])
# Create Vietoris-Rips complex
rc = gd.RipsComplex(points=points, max_edge_length=2.0)
st = rc.create_simplex_tree(max_dimension=1)
# Compute [persistence](/page/The_Persistence)
persistence = st.persistence()
# Output: list of (dimension, (birth, death)) pairs
print(persistence)
