Persistent Homology
Tracking topological features across scales — the workhorse of topological data analysis
Overview & Motivation
Consider two point clouds with identical first- and second-order statistics — same mean, same variance, same covariance. One is sampled from an annulus (a ring), the other from a Gaussian cluster. Every classical statistical test says they’re the same. But one has a hole and the other doesn’t, and that difference can be the signal you need — whether you’re detecting anomalies in sensor networks, classifying dynamical regimes in time series, or finding structure in high-dimensional embeddings.
The problem is: what does “has a hole” even mean for a finite point cloud? Points are discrete; they don’t have holes in any obvious sense. The Vietoris-Rips construction (see Simplicial Complexes) gives us a way to build a continuous-ish structure from discrete data, but it introduces a new problem: the result depends on a scale parameter . Choose too small and you see only isolated points. Too large and everything collapses into a single blob.
Persistent homology is the answer: instead of choosing a single scale, track the topology at every scale simultaneously. Features that persist across a wide range of are real structure; features that flicker in and out are noise. This insight — that persistence is a proxy for significance — is what makes TDA useful for machine learning.
Not all complexes are created equal: the Vietoris-Rips complex can introduce phantom cycles — topological artifacts that do not correspond to real features of the data. The Čech complex avoids this via the Nerve Theorem.
Formal Framework
Filtrations
Definition 1.
A filtration of a simplicial complex is a nested sequence of subcomplexes:
Each is a simplicial complex, and the inclusion adds one or more simplices.
For the Vietoris-Rips complex, the filtration is parameterized by : as increases from 0 to , simplices enter the complex at their birth time (the minimum at which they appear). An edge is born at . A triangle is born at .
Chain Groups and the Boundary Operator
To detect “holes,” we need algebra. The key idea: a loop is a collection of edges that forms a cycle, and a hole is a loop that isn’t the boundary of something filled in.
Definition 2.
The -chain group of a simplicial complex is the free abelian group (or -vector space) generated by the -simplices of . A -chain is a formal sum of -simplices:
Remark.
Working over (coefficients are 0 or 1, with ) simplifies everything: we don’t need to worry about orientations. Most TDA software uses coefficients. Over , you get torsion information (e.g., distinguishing a Möbius band from a cylinder), but this is rarely needed in data analysis.
Definition 3.
The boundary operator maps each -simplex to the alternating sum of its -dimensional faces:
where denotes omission. Over , the signs disappear and this reduces to the sum of all faces.
Example 1.
For the triangle with vertices, edges, and the filled face:
: The boundary of the triangle is the sum of its edges:
: The boundary of each edge is the difference of its endpoints:
Written as matrices (with coefficients):
The rows of correspond to vertices ; columns to edges .
The following property is the engine of homology:
Theorem 1 (Fundamental Property of the Boundary Operator).
For all :
That is, the boundary of a boundary is always zero.
Proof.
It suffices to check on generators. Let be a -simplex. We compute by expanding both operators:
Applying to each term, we omit a second vertex . For each term , the boundary is:
where the exponent depends on the position of in the face after has been removed: if , and if .
Now consider a specific -face with . It appears exactly twice in the double sum:
- First omit , then omit : sign is
- First omit , then omit : sign is
The sum of these two signs is:
Every -face appears exactly twice with opposite signs, so the entire sum vanishes. Over , the argument is even simpler: each face appears twice, and .
∎Example 2.
We can verify this directly for the triangle. From Example 1:
The boundary of the triangle is the cycle , and the boundary of that cycle is zero — exactly because it closes up.
Homology Groups
The property gives us two important subgroups:
Definition 4.
The cycle group is — chains with zero boundary (they “close up”).
The boundary group is — chains that are the boundary of something.
Since , every boundary is a cycle: .
Definition 5.
The -th homology group of is the quotient:
The -th Betti number is .
Homology counts holes that are not filled in:
- = number of connected components
- = number of loops (independent 1-cycles not bounding a filled region)
- = number of voids (enclosed cavities)
Example 3.
For the filled triangle :
- has rank 1 (one independent cycle: the loop around the triangle)
- has rank 1 (that loop is the boundary of the filled face)
- — no holes. The apparent loop is a boundary.
- — one connected component.
Now remove the 2-simplex (just the wireframe triangle, no fill):
- still has rank 1
- now has rank 0 (no 2-simplices to bound anything)
- — one hole. The loop is no longer a boundary.
This is the key insight: the same cycle can be a hole or not, depending on whether there’s something to fill it.
Persistent Homology
Now we combine filtrations with homology. As increases, new simplices enter the complex. Each new simplex either:
- Creates a topological feature (a new cycle appears that isn’t a boundary), or
- Destroys a topological feature (a cycle that was a hole gets filled in)
Definition 6.
Given a filtration , the persistent homology tracks, for each topological feature:
- Its birth time : the filtration index where the feature first appears
- Its death time : the filtration index where the feature is destroyed
- Its persistence: , measuring how long the feature survives
A feature with (never destroyed) is called an essential feature.
Definition 7.
The persistence diagram is the multiset of points corresponding to birth-death pairs, together with infinitely many copies of every point on the diagonal .
Equivalently, the persistence barcode represents each feature as a horizontal interval .
Points far from the diagonal have high persistence — they represent features that survive across a wide range of scales. Points near the diagonal have low persistence — they are topological “noise,” features that appear and immediately disappear. The diagonal points are a technical convenience that makes the space of persistence diagrams well-behaved.
Remark.
There is a deep algebraic structure here that we’re glossing over: the inclusions induce maps on homology . The persistence module is the entire sequence of these maps, and the Structure Theorem for persistence modules (a consequence of the classification of finitely generated modules over a PID) guarantees that this module decomposes into interval summands — which are exactly the bars in the barcode. This decomposition is unique and computable via matrix reduction.
The Matrix Reduction Algorithm
Persistent homology is defined algebraically, but it is computed via matrix reduction — a variant of Gaussian elimination applied to the boundary matrix. Understanding this algorithm demystifies what ripser and gudhi are actually doing.
Definition 9.
The filtered boundary matrix is the boundary matrix of the full complex , with columns ordered by filtration time. Each column corresponds to a simplex , and the column entries encode the boundary over . If enters the filtration at time , columns are ordered so that .
The key operation is column reduction: for each column (processed left to right), find the lowest nonzero entry (the “pivot” or “low” of column ). If another column has the same pivot row, add column to column (over , addition is XOR). Repeat until no two columns share the same pivot.
Theorem 3 (Persistence from Reduced Matrix).
After column reduction of the filtered boundary matrix to its reduced form :
-
If column has pivot in row (i.e., ), then the simplex is a creator and is a destroyer. The pair records a topological feature born at time and dying at time .
-
If column is entirely zero after reduction, the simplex creates a feature that is never destroyed — an essential feature with birth time and death time .
-
If row is never a pivot (never appears as for any ), then does not participate in any birth-death pair as a creator — it is “invisible” to persistence.
Example 4.
Consider three points with distances , , . The filtration adds simplices in order of birth time:
- at (three vertices)
- at
- at
- at
The boundary matrix (rows = vertices, columns = edges in filtration order):
Reducing: Column 1 has low = row 1. Column 2 has low = row 2 (different from column 1, so no reduction needed). Column 3 has low = row 2, conflicting with column 2. Add column 2 to column 3: new column 3 = , which has low = row 1, conflicting with column 1. Add column 1: . Column 3 is now zero.
Reading the persistence pairs:
- Column 1 (edge , born at ) has row 1 (, born at ): the edge merges into ‘s component. This gives pair .
- Column 2 (edge , born at ) has row 2 (, born at ): the edge merges . This gives pair .
- Column 3 (edge , born at ) is zero after reduction: this edge creates a 1-cycle that is never destroyed (no 2-simplices in our complex). This gives an essential feature born at .
- (row 0) is never a pivot target: it is the surviving component, essential with pair .
Proposition 1 (Uniqueness of Persistence Pairing).
The persistence pairing produced by column reduction is unique: it does not depend on the order in which column operations are performed (as long as we only add earlier columns to later ones). This is because the set of pivot positions in the reduced matrix is an invariant of the persistence module, not of the particular reduction.
Remark.
The standard algorithm has complexity where is the number of simplices. Ripser achieves dramatic speedups via four key optimizations: (1) implicit matrix representation (never store the full boundary matrix — generate columns on demand from the distance matrix), (2) the clearing optimization (when a column creates a death, clear its partner column without processing it), (3) computing cohomology instead of homology (the coboundary matrix is sparser for VR complexes), and (4) the apparent pairs shortcut (many birth-death pairs can be read off from the filtration structure without any matrix reduction). These reduce the practical running time to near-linear in the number of features detected.
Matrix Reduction Animator
Interactive visualization coming soon — step through the column reduction algorithm and watch birth-death pairs emerge.
Visual Intuition
The visualization below is the core interactive experience. The top panel shows a Vietoris-Rips complex built from points sampled near a circle. The bottom panels show the persistence diagram and barcode.
Drag the slider and watch all three panels update together:
Persistence Diagram
Persistence Barcode
What to notice:
-
features (teal): At , every point is its own connected component — 15 features are born simultaneously. As increases, components merge. Each merge event kills a component: you see teal points appear in the persistence diagram with increasing death values. One component survives to (the essential feature).
-
features (purple): At a critical around 0.7-0.9, the edges form a cycle around the point cloud but the interior hasn’t filled in yet — a loop is born. As increases further, 2-simplices fill the interior, and the loop dies. The persistence (death minus birth) of this feature is the key topological signal: it tells you “this point cloud has a circular structure.”
-
Signal vs. noise: The long-lived bar stands out from the short bars near the diagonal. That gap between persistent and ephemeral features is what makes TDA work in practice — it’s a built-in notion of statistical significance, grounded in geometry rather than distributional assumptions.
The Stability Theorem
The practical utility of persistent homology rests on a single theorem: small perturbations of the input data produce small changes in the persistence diagram. Without this, the method would be useless — any noise would scramble the output.
Definition 8.
The bottleneck distance between two persistence diagrams and is:
where the infimum is over all bijections (including diagonal points).
Theorem 2 (Stability Theorem (Cohen-Steiner, Edelsbrunner, Harer 2007)).
Let be tame functions on a topological space . Then:
The bottleneck distance between persistence diagrams is bounded by the distance between the functions that generate them.
Proof.
The full proof uses the theory of interleavings of persistence modules and is beyond our scope. The essential idea: if , then the sublevel set filtrations of and are -interleaved — and vice versa — which forces a matching between their persistence diagrams that moves no point by more than . See Cohen-Steiner, Edelsbrunner & Harer (2007) for the complete argument.
∎For point cloud data, the relevant corollary is:
Corollary 1.
Let and be finite point clouds with (Hausdorff distance). Then:
In plain language: if you jiggle each point by at most , the persistence diagram moves by at most . This is what makes persistent homology safe to use on noisy real-world data.
Extended Persistence
Ordinary persistent homology tracks features born and killed in a sublevel set filtration: as increases from 0 to , simplices enter and topological features appear and disappear. But some features are essential — they are born and never die. The most important essential feature is the single surviving connected component ( with death ), but for spaces with nontrivial higher homology, essential features carry real topological information that ordinary persistence cannot capture.
Extended persistence (Cohen-Steiner, Edelsbrunner & Harer, 2009) solves this by appending a superlevel set filtration to the sublevel set filtration, creating a complete picture where every feature has a finite death time.
Definition 10.
Given a function on a simplicial complex , the extended filtration consists of two phases:
- Sublevel phase: for increasing from to . Simplices enter as increases.
- Superlevel phase: for decreasing from to . The complex is “peeled back” from above.
The two phases are concatenated to form a single filtration that begins with the empty set and ends with the empty set.
Definition 11.
The birth-death pairs in extended persistence fall into three classes:
- Ordinary pairs : the feature is born and killed during the sublevel phase. These are the same as standard persistent homology.
- Relative pairs : the feature is born during the superlevel phase and killed later in the superlevel phase. These capture topology “from above.”
- Extended pairs : the feature is born during the sublevel phase and killed during the superlevel phase. These are the features that are essential in ordinary persistence — extended persistence gives them finite death times.
Example 5.
Consider the height function on a torus embedded in . The torus has , (two independent loops — meridional and longitudinal), and (an enclosed void).
Sublevel phase: As increases from the bottom of the torus, we see the following: one component is born at the minimum (essential ). Two features are born as the torus’s handle forms — one is killed when the hole fills in at the saddle, the other is essential. The void is essential.
Extended persistence kills all essential features during the superlevel phase: the surviving component, the surviving loop, and the void each receive a finite death time. The result is a complete set of finite birth-death pairs capturing the full topology of the torus.
Extended Persistence Diagram
Interactive visualization coming soon — explore ordinary, relative, and extended persistence pairs on a torus.
Remark.
Extended persistence is particularly useful for Morse-theoretic analysis of functions on manifolds. The extended persistence diagram of a Morse function provides a complete topological summary of — every critical point of appears in exactly one persistence pair, and the persistence (death minus birth) measures the “importance” of that critical point.
Persistence Modules
The algebraic structure underlying persistent homology is the persistence module — a one-parameter family of vector spaces connected by linear maps. Understanding this algebraic framework clarifies why barcodes exist, why stability holds, and why multi-parameter persistence is fundamentally harder.
Definition 12.
A persistence module over a field is a functor from the poset to the category of -vector spaces. Concretely, it consists of:
- A family of vector spaces
- Linear maps for all
satisfying and for .
The persistence module arising from a filtration is the family of homology groups , with the maps induced by the inclusions . The maps tell us which features at scale “survive to” scale .
Theorem 4 (Structure Theorem for Persistence Modules).
Every pointwise finite-dimensional persistence module over a field decomposes uniquely (up to reordering) into a direct sum of interval modules:
where is the module that is for and elsewhere. Each interval corresponds to one bar in the persistence barcode.
This is the theorem that makes barcodes well-defined. Without it, the barcode would be an arbitrary visualization choice; with it, the barcode is the unique decomposition of the persistence module into indecomposable summands. The proof relies on the classification of finitely generated modules over a principal ideal domain — the same algebraic machinery that gives the Jordan normal form for matrices.
Remark.
The Structure Theorem fails for multi-parameter persistence. When the filtration is indexed by (or any poset more complex than a total order), persistence modules do not decompose into interval summands. There is no barcode. This is the fundamental obstacle to multi-parameter persistence theory, and a major open area in applied algebraic topology. Partial invariants exist (rank invariants, persistence landscapes for multi-parameter data), but no complete discrete invariant comparable to the barcode.
For the metric structure on the space of persistence modules — including the interleaving distance, the Isometry Theorem, and the stability framework — see the Barcodes & Bottleneck Distance page.
Working Code
Computing Persistence with Ripser
import numpy as np
from ripser import ripser
from persim import plot_diagrams
# Generate an annulus (has a hole) and a Gaussian cluster (no hole)
np.random.seed(42)
n = 200
theta = np.random.uniform(0, 2 * np.pi, n)
r = 1.0 + np.random.normal(0, 0.1, n)
annulus = np.column_stack([r * np.cos(theta), r * np.sin(theta)])
cluster = np.random.randn(n, 2) * np.std(annulus, axis=0) + np.mean(annulus, axis=0)
# Compute persistence (H₀ and H₁)
result_annulus = ripser(annulus, maxdim=1)
result_cluster = ripser(cluster, maxdim=1)
# The annulus has a high-persistence H₁ feature (the loop)
dgm1 = result_annulus['dgms'][1]
max_pers = np.max(dgm1[:, 1] - dgm1[:, 0])
print(f"Annulus max H₁ persistence: {max_pers:.4f}")
# The cluster has only low-persistence H₁ noise
dgm1_c = result_cluster['dgms'][1]
max_pers_c = np.max(dgm1_c[:, 1] - dgm1_c[:, 0])
print(f"Cluster max H₁ persistence: {max_pers_c:.4f}")
Verifying the Stability Theorem Empirically
from persim import bottleneck
# Clean circle
n_pts = 100
theta_base = np.linspace(0, 2 * np.pi, n_pts, endpoint=False)
base_cloud = np.column_stack([np.cos(theta_base), np.sin(theta_base)])
base_dgm = ripser(base_cloud, maxdim=1)['dgms'][1]
# Add increasing noise and measure diagram movement
for noise_std in [0.0, 0.1, 0.2, 0.3, 0.5]:
noise = noise_std * np.random.randn(n_pts, 2)
noisy_cloud = base_cloud + noise
noisy_dgm = ripser(noisy_cloud, maxdim=1)['dgms'][1]
bn = bottleneck(base_dgm, noisy_dgm)
linf = np.max(np.abs(noise)) if noise_std > 0 else 0
print(f"noise σ={noise_std:.1f}: ||perturbation||∞={linf:.3f}, d_B={bn:.3f}, "
f"bound holds: {bn <= linf + 1e-10 or noise_std == 0}")
Vectorization: Persistence Images for ML
Persistence diagrams aren’t fixed-size vectors, so we can’t feed them directly to most classifiers. Persistence images (Adams et al., 2017) solve this by placing a Gaussian at each point and evaluating on a grid:
from persim import PersistenceImager
# Create a persistence image from H₁
pimgr = PersistenceImager(pixel_size=0.05, birth_range=(0, 1.5), pers_range=(0, 1.0))
pimgr.fit([result_annulus['dgms'][1]])
img = pimgr.transform(result_annulus['dgms'][1])
print(f"Feature vector shape: {img.flatten().shape}")
# This fixed-size vector goes into any sklearn classifier
Full Pipeline: Time Series Classification
The Takens delay embedding theorem connects time series analysis to topology:
Periodic signals produce loops in the embedding; chaotic signals do not.
- 1D time series x(t)
- Takens delay embedding dim d, delay τ
- Point cloud in ℝᵈ
- Persistent homology Ripser
- Feature vector summary stats of persistence
- Classifier Random Forest, SVM, …
- Regime label
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
def takens_embedding(ts, dim=2, delay=10):
"""Takens delay embedding of a 1D time series."""
n = len(ts) - (dim - 1) * delay
return np.array([ts[i:i + dim * delay:delay] for i in range(n)])
def topological_features(ts):
"""Extract topological feature vector from a time series."""
embedded = takens_embedding(ts, dim=2, delay=10)
result = ripser(embedded, maxdim=1)
features = []
for d in range(2): # H₀ and H₁
dgm = result['dgms'][d]
finite = dgm[np.isfinite(dgm[:, 1])]
if len(finite) > 0:
pers = finite[:, 1] - finite[:, 0]
features.extend([
np.max(pers), # Max persistence
np.mean(pers), # Mean persistence
np.std(pers), # Std of persistence
np.sum(pers > 0.1), # Count of significant features
np.sum(pers ** 2), # Total persistence (L² norm)
])
else:
features.extend([0, 0, 0, 0, 0])
return np.array(features)
# Generate labeled time series from 3 dynamical regimes
X, y = [], []
for regime in range(3):
for _ in range(50):
t = np.linspace(0, 6 * np.pi, 300)
noise = 0.05 * np.random.randn(300)
if regime == 0: # Periodic
ts = np.sin(t) + noise
elif regime == 1: # Quasi-periodic
ts = np.sin(t) + 0.5 * np.sin(np.sqrt(2) * t) + noise
else: # Chaotic (logistic map)
ts = np.zeros(300)
ts[0] = 0.1
for i in range(1, 300):
ts[i] = 3.9 * ts[i-1] * (1 - ts[i-1])
X.append(topological_features(ts))
y.append(regime)
X, y = np.array(X), np.array(y)
scores = cross_val_score(RandomForestClassifier(n_estimators=100, random_state=42),
X, y, cv=5)
print(f"5-fold CV accuracy: {scores.mean():.3f} ± {scores.std():.3f}")
This pipeline — time series → Takens embedding → persistent homology → feature vector → classifier — is one of the most successful applications of TDA in practice. It works because the persistence features are invariant to reparameterization of the time series (you can stretch or compress time and the topology doesn’t change) and stable under noise (by the Stability Theorem).
Matrix Reduction from Scratch
The standard algorithm for computing persistence, implemented directly from Theorem 3:
import numpy as np
def reduce_boundary_matrix(D):
"""Column-reduce a boundary matrix over Z₂.
Parameters
----------
D : ndarray of shape (m, m) — the filtered boundary matrix (Z₂ entries)
Returns
-------
R : ndarray — the reduced matrix
pairs : list of (creator_idx, destroyer_idx) or (creator_idx, None)
"""
R = D.copy() % 2
m = R.shape[1]
low = {} # row_index -> col_index of column that has this row as its pivot
pairs = []
for j in range(m):
# Find the lowest 1 in column j
nonzero = np.where(R[:, j] == 1)[0]
while len(nonzero) > 0 and nonzero[-1] in low:
# Another column i has the same low — add it (Z₂: XOR)
i = low[nonzero[-1]]
R[:, j] = (R[:, j] + R[:, i]) % 2
nonzero = np.where(R[:, j] == 1)[0]
if len(nonzero) > 0:
pivot = nonzero[-1]
low[pivot] = j
pairs.append((pivot, j)) # pivot creates, j destroys
# else: column j is zero — creates an essential feature
# Identify essential features (creators not paired as destroyers)
all_destroyers = set(j for _, j in pairs)
all_creators = set(i for i, _ in pairs)
for j in range(m):
if j not in all_destroyers and j not in all_creators:
pairs.append((j, None)) # essential feature
return R, pairs
# Example: three points with edges at distances 1, 2, 3
# Filtration order: v0, v1, v2, [v0,v1], [v0,v2], [v1,v2]
D = np.array([
# v0 v1 v2 e01 e02 e12
[0, 0, 0, 1, 1, 0], # v0
[0, 0, 0, 1, 0, 1], # v1
[0, 0, 0, 0, 1, 1], # v2
[0, 0, 0, 0, 0, 0], # e01
[0, 0, 0, 0, 0, 0], # e02
[0, 0, 0, 0, 0, 0], # e12
])
R, pairs = reduce_boundary_matrix(D)
print("Persistence pairs (creator_idx, destroyer_idx):")
for c, d in pairs:
print(f" σ_{c} paired with {'σ_' + str(d) if d is not None else '∞'}")
Extended Persistence with GUDHI
import gudhi
import numpy as np
# Height function on a sampled torus
n = 500
theta = np.random.uniform(0, 2 * np.pi, n)
phi = np.random.uniform(0, 2 * np.pi, n)
R, r = 2.0, 1.0
torus = np.column_stack([
(R + r * np.cos(phi)) * np.cos(theta),
(R + r * np.cos(phi)) * np.sin(theta),
r * np.sin(phi)
])
# Build alpha complex and compute extended persistence
alpha = gudhi.AlphaComplex(points=torus)
stree = alpha.create_simplex_tree()
# Extended persistence of the height function (z-coordinate)
# gudhi computes this via the extended filtration
stree.extend_filtration()
dgms = stree.extended_persistence()
print("Extended persistence pairs by type:")
for i, (name, dgm) in enumerate(zip(
['Ordinary', 'Relative', 'Extended+', 'Extended-'], dgms)):
print(f" {name}: {len(dgm)} pairs")
for dim, (b, d) in dgm[:3]:
print(f" H_{dim}: [{b:.2f}, {d:.2f})")
Connections & Cross-Track Bridges
Persistent homology sits at the center of a growing web of mathematical and applied techniques, connecting deeply to several other areas on formalML.
Within the Topology & TDA track:
- Simplicial Complexes provides the combinatorial substrate: the VR filtration is a nested sequence of simplicial complexes, and the boundary operator defined there is the algebraic engine of homology.
- Barcodes & Bottleneck Distance develops the metric structure on persistence diagrams. The Structure Theorem (Theorem 4 above) guarantees that barcodes are well-defined; the bottleneck and Wasserstein distances turn the space of diagrams into a metric space where stability makes sense.
- Statistical TDA treats persistence diagrams as random objects — statistical estimators of the true homology of an underlying distribution. Bootstrap confidence sets separate topological signal from sampling noise.
- The Mapper Algorithm provides a complementary geometric/visual summary. Where persistent homology produces algebraic invariants (barcodes), Mapper produces an interpretable graph. PH on the Mapper graph quantifies its topological features; multiscale Mapper uses PH to select optimal parameters.
Across tracks:
- Graph Laplacians & Spectrum: The 0th persistent homology of a VR filtration is equivalent to single-linkage clustering — the threshold at which two points first become connected in equals the single-linkage distance. The number of connected components at scale equals the number of zero eigenvalues of the graph Laplacian of the -neighborhood graph.
- Measure-Theoretic Probability: The convergence of empirical persistence diagrams to the population diagram relies on measure-theoretic concentration bounds. The convergence rate exhibits the curse of dimensionality — the same dimension-dependent rate that appears in density estimation and minimax theory.
- Concentration Inequalities: The stability of persistence diagrams under noise is a Lipschitz condition, and the convergence rate of empirical diagrams involves Hausdorff concentration bounds that parallel McDiarmid-type arguments.
Applications: Persistence features have been used for protein classification (Cang & Wei, 2017), materials science (Hiraoka et al., 2016), cosmology (Xu et al., 2019), neuroscience (Giusti et al., 2015), and financial market analysis. The Takens delay embedding connects persistent homology to time series classification — see the working code above for the full pipeline.
Exercises
Foundational
Exercise 1. Consider four points at the corners of a unit square in : , , , . Write out the filtered boundary matrix for the VR filtration (vertices born at 0, four side-edges born at 1, two diagonal edges born at ). Perform column reduction by hand and read off all birth-death pairs for and .
Exercise 2. Verify for the boundary of a tetrahedron over . Write out the matrix (rows = edges, columns = triangular faces) and the matrix (rows = vertices, columns = edges). Compute the product and verify every entry is 0 mod 2.
Intermediate
Exercise 3. Prove that the longest bar in any VR persistence barcode always has death time . Hint: The last surviving connected component is never merged into anything.
Exercise 4. The total persistence of a persistence diagram is for some power . Show that for persistence of points, the total persistence (with ) equals the total weight of the minimum spanning tree of the point cloud. Hint: Each death corresponds to an MST edge.
Advanced
Exercise 5. Implement the full Takens embedding pipeline from the Working Code section on a real dataset: download daily closing prices for a stock index (e.g., S&P 500), compute the delay embedding, run persistent homology, and classify 60-day rolling windows into “high-volatility” vs “low-volatility” regimes using persistence features. Report cross-validation accuracy.
Exercise 6. Prove that persistent homology is invariant under bijective reparameterization of the filtration. That is, if is a monotonically increasing bijection and is a filtration, then the persistence diagram of is related to that of by the coordinate change . Hint: Reparameterization preserves the inclusion order of simplices, hence the matrix reduction is identical.
Connections
- builds on simplicial-complexes
- is closely related to cech-complexes
- The Structure Theorem for persistence modules gives barcodes their mathematical foundation; bottleneck and Wasserstein distances turn the space of persistence diagrams into a metric space. barcodes-bottleneck
- Statistical TDA treats persistence diagrams as random objects — statistical estimators of the true homology of an underlying distribution. statistical-tda
- The 0th persistent homology of a VR filtration is equivalent to single-linkage clustering; the threshold at which two components merge equals the single-linkage distance, connecting persistence to the spectral gap of the graph Laplacian. graph-laplacians
- Convergence of empirical persistence diagrams relies on measure-theoretic concentration bounds; the convergence rate exhibits the curse of dimensionality. measure-theoretic-probability
References & Further Reading
- paper Topology and Data — Gunnar Carlsson (2009) The survey that defined TDA as a field — essential reading
- paper Stability of Persistence Diagrams — David Cohen-Steiner, Herbert Edelsbrunner & John Harer (2007) The foundational stability result; Theorem 5 in this topic
- book Computational Topology: An Introduction — Herbert Edelsbrunner & John Harer (2010) Chapters 4-7 cover persistent homology comprehensively
- paper Persistence Images: A Stable Vector Representation of Persistent Homology — Henry Adams, Tegan Emerson, Michael Kirby, Rachel Neville, Chris Peterson, Patrick Shipman, Sofya Chepushtanova, Eric Hanson, Francis Motta & Lori Ziegelmeier (2017) The standard vectorization method for ML pipelines
- paper Sliding Windows and Persistence: An Application of Topological Methods to Signal Analysis — Jose A. Perea & John Harer (2015) Time series → point cloud → topology via Takens embedding
- paper Ripser: Efficient Computation of Vietoris-Rips Persistence Barcodes — Ulrich Bauer (2021) State-of-the-art algorithm — what powers the ripser library
- book Elementary Applied Topology — Robert Ghrist (2014) The most visual introduction to applied algebraic topology