intermediate topology 35 min read

Persistent Homology

Tracking topological features across scales — the workhorse of topological data analysis

Prerequisites: Simplicial Complexes

Overview & Motivation

Consider two point clouds with identical first- and second-order statistics — same mean, same variance, same covariance. One is sampled from an annulus (a ring), the other from a Gaussian cluster. Every classical statistical test says they’re the same. But one has a hole and the other doesn’t, and that difference can be the signal you need — whether you’re detecting anomalies in sensor networks, classifying dynamical regimes in time series, or finding structure in high-dimensional embeddings.

The problem is: what does “has a hole” even mean for a finite point cloud? Points are discrete; they don’t have holes in any obvious sense. The Vietoris-Rips construction (see Simplicial Complexes) gives us a way to build a continuous-ish structure from discrete data, but it introduces a new problem: the result depends on a scale parameter ε\varepsilon. Choose ε\varepsilon too small and you see only isolated points. Too large and everything collapses into a single blob.

Persistent homology is the answer: instead of choosing a single scale, track the topology at every scale simultaneously. Features that persist across a wide range of ε\varepsilon are real structure; features that flicker in and out are noise. This insight — that persistence is a proxy for significance — is what makes TDA useful for machine learning.


Formal Framework

Filtrations

Definition 1.

A filtration of a simplicial complex KK is a nested sequence of subcomplexes:

=K0K1K2Km=K\emptyset = K_0 \subseteq K_1 \subseteq K_2 \subseteq \cdots \subseteq K_m = K

Each KiK_i is a simplicial complex, and the inclusion KiKi+1K_i \hookrightarrow K_{i+1} adds one or more simplices.

For the Vietoris-Rips complex, the filtration is parameterized by ε\varepsilon: as ε\varepsilon increases from 0 to \infty, simplices enter the complex at their birth time (the minimum ε\varepsilon at which they appear). An edge [vi,vj][v_i, v_j] is born at ε=d(vi,vj)\varepsilon = d(v_i, v_j). A triangle [vi,vj,vk][v_i, v_j, v_k] is born at ε=max(d(vi,vj),d(vi,vk),d(vj,vk))\varepsilon = \max(d(v_i, v_j), d(v_i, v_k), d(v_j, v_k)).

Chain Groups and the Boundary Operator

To detect “holes,” we need algebra. The key idea: a loop is a collection of edges that forms a cycle, and a hole is a loop that isn’t the boundary of something filled in.

Definition 2.

The kk-chain group Ck(K)C_k(K) of a simplicial complex KK is the free abelian group (or Z2\mathbb{Z}_2-vector space) generated by the kk-simplices of KK. A kk-chain is a formal sum of kk-simplices:

c=iaiσi,aiZ2,σiK with dim(σi)=kc = \sum_i a_i \sigma_i, \quad a_i \in \mathbb{Z}_2, \quad \sigma_i \in K \text{ with } \dim(\sigma_i) = k

Remark.

Working over Z2\mathbb{Z}_2 (coefficients are 0 or 1, with 1+1=01 + 1 = 0) simplifies everything: we don’t need to worry about orientations. Most TDA software uses Z2\mathbb{Z}_2 coefficients. Over Z\mathbb{Z}, you get torsion information (e.g., distinguishing a Möbius band from a cylinder), but this is rarely needed in data analysis.

Definition 3.

The boundary operator k:Ck(K)Ck1(K)\partial_k : C_k(K) \to C_{k-1}(K) maps each kk-simplex to the alternating sum of its (k1)(k-1)-dimensional faces:

k[v0,v1,,vk]=i=0k(1)i[v0,,v^i,,vk]\partial_k [v_0, v_1, \ldots, v_k] = \sum_{i=0}^{k} (-1)^i [v_0, \ldots, \hat{v}_i, \ldots, v_k]

where v^i\hat{v}_i denotes omission. Over Z2\mathbb{Z}_2, the signs disappear and this reduces to the sum of all faces.

Example 1.

For the triangle [v0,v1,v2][v_0, v_1, v_2] with vertices, edges, and the filled face:

2\partial_2: The boundary of the triangle is the sum of its edges: 2[v0,v1,v2]=[v1,v2][v0,v2]+[v0,v1]\partial_2 [v_0, v_1, v_2] = [v_1, v_2] - [v_0, v_2] + [v_0, v_1]

1\partial_1: The boundary of each edge is the difference of its endpoints: 1[vi,vj]=[vj][vi]\partial_1 [v_i, v_j] = [v_j] - [v_i]

Written as matrices (with Z\mathbb{Z} coefficients):

2=(111),1=(110101011)\partial_2 = \begin{pmatrix} 1 \\ -1 \\ 1 \end{pmatrix}, \quad \partial_1 = \begin{pmatrix} -1 & -1 & 0 \\ 1 & 0 & -1 \\ 0 & 1 & 1 \end{pmatrix}

The rows of 1\partial_1 correspond to vertices v0,v1,v2v_0, v_1, v_2; columns to edges [v0,v1],[v0,v2],[v1,v2][v_0,v_1], [v_0,v_2], [v_1,v_2].

The following property is the engine of homology:

Theorem 1 (Fundamental Property of the Boundary Operator).

For all k1k \geq 1:

k1k=0\partial_{k-1} \circ \partial_k = 0

That is, the boundary of a boundary is always zero.

Proof.

It suffices to check on generators. Let σ=[v0,v1,,vk]\sigma = [v_0, v_1, \ldots, v_k] be a kk-simplex. We compute k1(k(σ))\partial_{k-1}(\partial_k(\sigma)) by expanding both operators:

k(σ)=i=0k(1)i[v0,,v^i,,vk]\partial_k(\sigma) = \sum_{i=0}^{k} (-1)^i [v_0, \ldots, \hat{v}_i, \ldots, v_k]

Applying k1\partial_{k-1} to each term, we omit a second vertex vjv_j. For each term (1)i[v0,,v^i,,vk](-1)^i [v_0, \ldots, \hat{v}_i, \ldots, v_k], the boundary is:

k1((1)i[v0,,v^i,,vk])=(1)ij=0jik(1)j[v0,,v^i,,v^j,,vk]\partial_{k-1}\bigl((-1)^i [v_0, \ldots, \hat{v}_i, \ldots, v_k]\bigr) = (-1)^i \sum_{\substack{j=0 \\ j \neq i}}^{k} (-1)^{j'} [v_0, \ldots, \hat{v}_i, \ldots, \hat{v}_j, \ldots, v_k]

where the exponent jj' depends on the position of vjv_j in the face after viv_i has been removed: j=jj' = j if j<ij < i, and j=j1j' = j - 1 if j>ij > i.

Now consider a specific (k2)(k-2)-face [v0,,v^i,,v^j,,vk][v_0, \ldots, \hat{v}_i, \ldots, \hat{v}_j, \ldots, v_k] with i<ji < j. It appears exactly twice in the double sum:

  1. First omit viv_i, then omit vjv_j: sign is (1)i(1)j1(-1)^i \cdot (-1)^{j-1}
  2. First omit vjv_j, then omit viv_i: sign is (1)j(1)i(-1)^j \cdot (-1)^{i}

The sum of these two signs is:

(1)i+j1+(1)i+j=(1)i+j1(1+(1))=0(-1)^{i+j-1} + (-1)^{i+j} = (-1)^{i+j-1}(1 + (-1)) = 0

Every (k2)(k-2)-face appears exactly twice with opposite signs, so the entire sum vanishes. Over Z2\mathbb{Z}_2, the argument is even simpler: each face appears twice, and 1+1=01 + 1 = 0.

Example 2.

We can verify this directly for the triangle. From Example 1:

12=(110101011)(111)=(000)\partial_1 \circ \partial_2 = \begin{pmatrix} -1 & -1 & 0 \\ 1 & 0 & -1 \\ 0 & 1 & 1 \end{pmatrix} \begin{pmatrix} 1 \\ -1 \\ 1 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix}

The boundary of the triangle is the cycle [v0,v1][v0,v2]+[v1,v2][v_0,v_1] - [v_0,v_2] + [v_1,v_2], and the boundary of that cycle is zero — exactly because it closes up.

Homology Groups

The =0\partial \circ \partial = 0 property gives us two important subgroups:

Definition 4.

The cycle group is Zk=ker(k)Z_k = \ker(\partial_k) — chains with zero boundary (they “close up”).

The boundary group is Bk=im(k+1)B_k = \text{im}(\partial_{k+1}) — chains that are the boundary of something.

Since kk+1=0\partial_k \circ \partial_{k+1} = 0, every boundary is a cycle: BkZkB_k \subseteq Z_k.

Definition 5.

The kk-th homology group of KK is the quotient:

Hk(K)=Zk/Bk=ker(k)/im(k+1)H_k(K) = Z_k / B_k = \ker(\partial_k) / \text{im}(\partial_{k+1})

The kk-th Betti number is βk=rank(Hk)\beta_k = \text{rank}(H_k).

Homology counts holes that are not filled in:

  • β0\beta_0 = number of connected components
  • β1\beta_1 = number of loops (independent 1-cycles not bounding a filled region)
  • β2\beta_2 = number of voids (enclosed cavities)

Example 3.

For the filled triangle K={[v0,v1,v2],[v0,v1],[v0,v2],[v1,v2],[v0],[v1],[v2]}K = \{[v_0,v_1,v_2], [v_0,v_1], [v_0,v_2], [v_1,v_2], [v_0], [v_1], [v_2]\}:

  • ker(1)\ker(\partial_1) has rank 1 (one independent cycle: the loop around the triangle)
  • im(2)\text{im}(\partial_2) has rank 1 (that loop is the boundary of the filled face)
  • β1=11=0\beta_1 = 1 - 1 = 0 — no holes. The apparent loop is a boundary.
  • β0=1\beta_0 = 1 — one connected component.

Now remove the 2-simplex (just the wireframe triangle, no fill):

  • ker(1)\ker(\partial_1) still has rank 1
  • im(2)\text{im}(\partial_2) now has rank 0 (no 2-simplices to bound anything)
  • β1=10=1\beta_1 = 1 - 0 = 1 — one hole. The loop is no longer a boundary.

This is the key insight: the same cycle can be a hole or not, depending on whether there’s something to fill it.

Persistent Homology

Now we combine filtrations with homology. As ε\varepsilon increases, new simplices enter the complex. Each new simplex either:

  1. Creates a topological feature (a new cycle appears that isn’t a boundary), or
  2. Destroys a topological feature (a cycle that was a hole gets filled in)

Definition 6.

Given a filtration K0K1KmK_0 \subseteq K_1 \subseteq \cdots \subseteq K_m, the persistent homology tracks, for each topological feature:

  • Its birth time bb: the filtration index where the feature first appears
  • Its death time dd: the filtration index where the feature is destroyed
  • Its persistence: dbd - b, measuring how long the feature survives

A feature with d=d = \infty (never destroyed) is called an essential feature.

Definition 7.

The persistence diagram Dgm(K)\text{Dgm}(K_\bullet) is the multiset of points (bi,di)R2(b_i, d_i) \in \mathbb{R}^2 corresponding to birth-death pairs, together with infinitely many copies of every point on the diagonal {(x,x)xR}\{(x,x) \mid x \in \mathbb{R}\}.

Equivalently, the persistence barcode represents each feature as a horizontal interval [bi,di)[b_i, d_i).

Points far from the diagonal have high persistence — they represent features that survive across a wide range of scales. Points near the diagonal have low persistence — they are topological “noise,” features that appear and immediately disappear. The diagonal points are a technical convenience that makes the space of persistence diagrams well-behaved.

Remark.

There is a deep algebraic structure here that we’re glossing over: the inclusions KiKjK_i \hookrightarrow K_j induce maps on homology Hk(Ki)Hk(Kj)H_k(K_i) \to H_k(K_j). The persistence module is the entire sequence of these maps, and the Structure Theorem for persistence modules (a consequence of the classification of finitely generated modules over a PID) guarantees that this module decomposes into interval summands — which are exactly the bars in the barcode. This decomposition is unique and computable via matrix reduction.


Visual Intuition

The visualization below is the core interactive experience. The top panel shows a Vietoris-Rips complex built from points sampled near a circle. The bottom panels show the persistence diagram and barcode.

Drag the ε\varepsilon slider and watch all three panels update together:

Persistence Diagram

Persistence Barcode

What to notice:

  1. H0H_0 features (teal): At ε=0\varepsilon = 0, every point is its own connected component — 15 features are born simultaneously. As ε\varepsilon increases, components merge. Each merge event kills a component: you see teal points appear in the persistence diagram with increasing death values. One component survives to \infty (the essential H0H_0 feature).

  2. H1H_1 features (purple): At a critical ε\varepsilon around 0.7-0.9, the edges form a cycle around the point cloud but the interior hasn’t filled in yet — a loop is born. As ε\varepsilon increases further, 2-simplices fill the interior, and the loop dies. The persistence (death minus birth) of this H1H_1 feature is the key topological signal: it tells you “this point cloud has a circular structure.”

  3. Signal vs. noise: The long-lived H1H_1 bar stands out from the short bars near the diagonal. That gap between persistent and ephemeral features is what makes TDA work in practice — it’s a built-in notion of statistical significance, grounded in geometry rather than distributional assumptions.


The Stability Theorem

The practical utility of persistent homology rests on a single theorem: small perturbations of the input data produce small changes in the persistence diagram. Without this, the method would be useless — any noise would scramble the output.

Definition 8.

The bottleneck distance between two persistence diagrams DD and DD' is:

dB(D,D)=infγsuppDpγ(p)d_B(D, D') = \inf_\gamma \sup_{p \in D} \| p - \gamma(p) \|_\infty

where the infimum is over all bijections γ:DD\gamma: D \to D' (including diagonal points).

Theorem 2 (Stability Theorem (Cohen-Steiner, Edelsbrunner, Harer 2007)).

Let f,g:XRf, g: X \to \mathbb{R} be tame functions on a topological space XX. Then:

dB(Dgm(f),Dgm(g))fgd_B(\text{Dgm}(f), \text{Dgm}(g)) \leq \| f - g \|_\infty

The bottleneck distance between persistence diagrams is bounded by the LL^\infty distance between the functions that generate them.

Proof.

The full proof uses the theory of interleavings of persistence modules and is beyond our scope. The essential idea: if fgδ\|f - g\|_\infty \leq \delta, then the sublevel set filtrations of ff and gg are δ\delta-interleaved — f1(,t]g1(,t+δ]f^{-1}(-\infty, t] \subseteq g^{-1}(-\infty, t + \delta] and vice versa — which forces a matching between their persistence diagrams that moves no point by more than δ\delta. See Cohen-Steiner, Edelsbrunner & Harer (2007) for the complete argument.

For point cloud data, the relevant corollary is:

Corollary 1.

Let XX and YY be finite point clouds with dH(X,Y)δd_H(X, Y) \leq \delta (Hausdorff distance). Then:

dB(Dgm(VR(X)),Dgm(VR(Y)))2δd_B(\text{Dgm}(\text{VR}(X)), \text{Dgm}(\text{VR}(Y))) \leq 2\delta

In plain language: if you jiggle each point by at most δ\delta, the persistence diagram moves by at most 2δ2\delta. This is what makes persistent homology safe to use on noisy real-world data.


Working Code

Computing Persistence with Ripser

import numpy as np
from ripser import ripser
from persim import plot_diagrams

# Generate an annulus (has a hole) and a Gaussian cluster (no hole)
np.random.seed(42)
n = 200

theta = np.random.uniform(0, 2 * np.pi, n)
r = 1.0 + np.random.normal(0, 0.1, n)
annulus = np.column_stack([r * np.cos(theta), r * np.sin(theta)])

cluster = np.random.randn(n, 2) * np.std(annulus, axis=0) + np.mean(annulus, axis=0)

# Compute persistence (H₀ and H₁)
result_annulus = ripser(annulus, maxdim=1)
result_cluster = ripser(cluster, maxdim=1)

# The annulus has a high-persistence H₁ feature (the loop)
dgm1 = result_annulus['dgms'][1]
max_pers = np.max(dgm1[:, 1] - dgm1[:, 0])
print(f"Annulus max H₁ persistence: {max_pers:.4f}")

# The cluster has only low-persistence H₁ noise
dgm1_c = result_cluster['dgms'][1]
max_pers_c = np.max(dgm1_c[:, 1] - dgm1_c[:, 0])
print(f"Cluster max H₁ persistence: {max_pers_c:.4f}")

Verifying the Stability Theorem Empirically

from persim import bottleneck

# Clean circle
n_pts = 100
theta_base = np.linspace(0, 2 * np.pi, n_pts, endpoint=False)
base_cloud = np.column_stack([np.cos(theta_base), np.sin(theta_base)])
base_dgm = ripser(base_cloud, maxdim=1)['dgms'][1]

# Add increasing noise and measure diagram movement
for noise_std in [0.0, 0.1, 0.2, 0.3, 0.5]:
    noise = noise_std * np.random.randn(n_pts, 2)
    noisy_cloud = base_cloud + noise
    noisy_dgm = ripser(noisy_cloud, maxdim=1)['dgms'][1]

    bn = bottleneck(base_dgm, noisy_dgm)
    linf = np.max(np.abs(noise)) if noise_std > 0 else 0
    print(f"noise σ={noise_std:.1f}: ||perturbation||∞={linf:.3f}, d_B={bn:.3f}, "
          f"bound holds: {bn <= linf + 1e-10 or noise_std == 0}")

Vectorization: Persistence Images for ML

Persistence diagrams aren’t fixed-size vectors, so we can’t feed them directly to most classifiers. Persistence images (Adams et al., 2017) solve this by placing a Gaussian at each point (b,db)(b, d-b) and evaluating on a grid:

from persim import PersistenceImager

# Create a persistence image from H₁
pimgr = PersistenceImager(pixel_size=0.05, birth_range=(0, 1.5), pers_range=(0, 1.0))
pimgr.fit([result_annulus['dgms'][1]])

img = pimgr.transform(result_annulus['dgms'][1])
print(f"Feature vector shape: {img.flatten().shape}")
# This fixed-size vector goes into any sklearn classifier

Full Pipeline: Time Series Classification

The Takens delay embedding theorem connects time series analysis to topology:

x(t)(x(t),  x(t+τ),  x(t+2τ),  ,  x(t+(d1)τ))x(t) \mapsto \bigl(x(t),\; x(t+\tau),\; x(t+2\tau),\; \ldots,\; x(t+(d-1)\tau)\bigr)

Periodic signals produce loops in the embedding; chaotic signals do not.

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

def takens_embedding(ts, dim=2, delay=10):
    """Takens delay embedding of a 1D time series."""
    n = len(ts) - (dim - 1) * delay
    return np.array([ts[i:i + dim * delay:delay] for i in range(n)])

def topological_features(ts):
    """Extract topological feature vector from a time series."""
    embedded = takens_embedding(ts, dim=2, delay=10)
    result = ripser(embedded, maxdim=1)
    features = []
    for d in range(2):  # H₀ and H₁
        dgm = result['dgms'][d]
        finite = dgm[np.isfinite(dgm[:, 1])]
        if len(finite) > 0:
            pers = finite[:, 1] - finite[:, 0]
            features.extend([
                np.max(pers),        # Max persistence
                np.mean(pers),       # Mean persistence
                np.std(pers),        # Std of persistence
                np.sum(pers > 0.1),  # Count of significant features
                np.sum(pers ** 2),   # Total persistence (L² norm)
            ])
        else:
            features.extend([0, 0, 0, 0, 0])
    return np.array(features)

# Generate labeled time series from 3 dynamical regimes
X, y = [], []
for regime in range(3):
    for _ in range(50):
        t = np.linspace(0, 6 * np.pi, 300)
        noise = 0.05 * np.random.randn(300)
        if regime == 0:    # Periodic
            ts = np.sin(t) + noise
        elif regime == 1:  # Quasi-periodic
            ts = np.sin(t) + 0.5 * np.sin(np.sqrt(2) * t) + noise
        else:              # Chaotic (logistic map)
            ts = np.zeros(300)
            ts[0] = 0.1
            for i in range(1, 300):
                ts[i] = 3.9 * ts[i-1] * (1 - ts[i-1])
        X.append(topological_features(ts))
        y.append(regime)

X, y = np.array(X), np.array(y)
scores = cross_val_score(RandomForestClassifier(n_estimators=100, random_state=42),
                         X, y, cv=5)
print(f"5-fold CV accuracy: {scores.mean():.3f} ± {scores.std():.3f}")

This pipeline — time series → Takens embedding → persistent homology → feature vector → classifier — is one of the most successful applications of TDA in practice. It works because the persistence features are invariant to reparameterization of the time series (you can stretch or compress time and the topology doesn’t change) and stable under noise (by the Stability Theorem).


Connections & Applications

Persistent homology sits at the center of a growing web of mathematical and applied techniques:

  • Topological machine learning: Persistence features have been used for protein classification (Cang & Wei, 2017), materials science (Hiraoka et al., 2016), cosmology (Xu et al., 2019), and neuroscience (Giusti et al., 2015). The common thread: data has shape, and shape carries information that statistics alone miss.

  • Persistence landscapes (Bubenik, 2015) and persistence images (Adams et al., 2017) transform diagrams into Banach space elements or fixed-size vectors, enabling statistical inference and ML integration.

  • Extended persistence and zigzag persistence generalize the one-parameter filtration to more complex parameter spaces, capturing richer topological information.

  • The Stability Theorem extends to other metrics: Wasserstein distances between diagrams, interleaving distances between persistence modules, and Gromov-Hausdorff distances between metric spaces. This creates a rich metric geometry on the space of shapes.

  • Takens embedding connects persistent homology to dynamical systems and time series analysis — see Perea & Harer (2015) for the theoretical foundation and the working code above for the practical pipeline.

Connections

References & Further Reading