foundational topology 18 min read

Simplicial Complexes

The combinatorial scaffolding that turns point clouds into topology

Overview & Motivation

Take 200 points sampled from a ring (an annulus) and 200 points sampled from a Gaussian blob. Match their means and variances exactly. Now try to tell them apart: a tt-test sees nothing, PCA sees nothing, and the covariance matrices are indistinguishable. Yet the ring has a hole through its center and the blob doesn’t — and that topological difference can be the signal that matters, whether you’re classifying dynamical regimes, detecting sensor network anomalies, or finding structure in high-dimensional embeddings.

The problem is that point clouds are discrete — they don’t have “holes” in any obvious sense. To detect shape, we need to build a continuous structure on top of the raw data. That structure is a simplicial complex: a systematic way of connecting nearby points with edges, triangles, and their higher-dimensional analogues to create a combinatorial scaffolding that does have well-defined topology.

Once you have a simplicial complex, the full machinery of algebraic topology becomes available — you can count connected components (β0\beta_0), detect loops (β1\beta_1), find enclosed voids (β2\beta_2), and quantify how these features persist across scales. If you’ve used ripser, gudhi, or scikit-tda, you’ve been building simplicial complexes without thinking about it. This topic makes the implicit explicit.


Formal Framework

The Building Blocks: Simplices

Definition 1.

A kk-simplex σ\sigma is the convex hull of k+1k+1 affinely independent points v0,v1,,vkv_0, v_1, \ldots, v_k in Rn\mathbb{R}^n:

σ=[v0,v1,,vk]={i=0ktivi  |  ti0,  i=0kti=1}\sigma = [v_0, v_1, \ldots, v_k] = \left\{ \sum_{i=0}^{k} t_i v_i \;\middle|\; t_i \geq 0, \; \sum_{i=0}^{k} t_i = 1 \right\}

The number kk is the dimension of the simplex. The points v0,,vkv_0, \ldots, v_k are its vertices.

In concrete terms:

  • A 0-simplex is a point
  • A 1-simplex is an edge (line segment between two points)
  • A 2-simplex is a filled triangle
  • A 3-simplex is a filled tetrahedron
  • A kk-simplex for k4k \geq 4 follows the same pattern — it’s the “solid” version of the shape, not just the boundary

Definition 2.

A face of a simplex σ=[v0,,vk]\sigma = [v_0, \ldots, v_k] is any simplex formed by a non-empty subset of its vertices. If τ\tau is a face of σ\sigma, we write τσ\tau \leq \sigma.

A proper face satisfies τ<σ\tau < \sigma (i.e., τσ\tau \neq \sigma).

Example 1.

The triangle [v0,v1,v2][v_0, v_1, v_2] has seven faces:

  • Three 0-faces (vertices): [v0][v_0], [v1][v_1], [v2][v_2]
  • Three 1-faces (edges): [v0,v1][v_0, v_1], [v0,v2][v_0, v_2], [v1,v2][v_1, v_2]
  • One 2-face (itself): [v0,v1,v2][v_0, v_1, v_2]

In general, a kk-simplex has (k+1j+1)\binom{k+1}{j+1} faces of dimension jj, for a total of 2k+112^{k+1} - 1 faces.

From Simplices to Complexes

Definition 3.

A simplicial complex KK is a finite collection of simplices satisfying two properties:

  1. Face closure: If σK\sigma \in K and τσ\tau \leq \sigma, then τK\tau \in K.
  2. Intersection property: If σ,τK\sigma, \tau \in K, then στ\sigma \cap \tau is either empty or a face of both σ\sigma and τ\tau.

The first condition says “if you include a triangle, you must include all its edges and vertices.” The second says “two simplices can only intersect along a shared face — no partial overlaps.” Together, these ensure the complex is combinatorially well-behaved.

Definition 4.

The dimension of a simplicial complex KK is the maximum dimension of any simplex in KK:

dim(K)=maxσKdim(σ)\dim(K) = \max_{\sigma \in K} \dim(\sigma)

Remark.

In practice, TDA pipelines rarely compute simplices beyond dimension 2 or 3. Computing the full Vietoris-Rips complex is combinatorially explosive — for nn points, the number of potential kk-simplices is (nk+1)\binom{n}{k+1}. Algorithms like Ripser (Bauer, 2021) use clever algebraic shortcuts to compute persistent homology without ever explicitly constructing high-dimensional simplices.

Abstract vs. Geometric

Definition 5.

An abstract simplicial complex on a vertex set VV is a collection A\mathcal{A} of finite subsets of VV such that if σA\sigma \in \mathcal{A} and τσ\tau \subseteq \sigma, then τA\tau \in \mathcal{A}.

The distinction matters: a geometric simplicial complex lives in Rn\mathbb{R}^n with coordinates and convex hulls. An abstract simplicial complex is purely combinatorial — it’s a list of which vertex subsets are “connected.” For TDA, we work with abstract complexes (defined by distance thresholds) and use geometric realizations only for visualization.

The Vietoris-Rips Construction

This is the construction that matters for data analysis. Given a point cloud and a scale parameter, it builds a simplicial complex by connecting points that are “close enough.”

Definition 6.

Let (X,d)(X, d) be a finite metric space and ε0\varepsilon \geq 0. The Vietoris-Rips complex VR(X,ε)\text{VR}(X, \varepsilon) is the abstract simplicial complex whose simplices are:

VR(X,ε)={σXd(x,y)ε for all x,yσ}\text{VR}(X, \varepsilon) = \{ \sigma \subseteq X \mid d(x, y) \leq \varepsilon \text{ for all } x, y \in \sigma \}

Equivalently, a subset σ\sigma of k+1k+1 points forms a kk-simplex if and only if every pair of points in σ\sigma is within distance ε\varepsilon.

Remark.

Čech vs. Vietoris-Rips. The pairwise condition in Definition 6 is what makes VR complexes computationally tractable — you only need the distance matrix. The theoretically “correct” construction is the Čech complex Cˇ(X,ε)\check{C}(X, \varepsilon), where a subset σX\sigma \subseteq X forms a simplex if and only if the ε/2\varepsilon/2-balls centered at its points have a common intersection:

Cˇ(X,ε)={σX  |  xσB(x,ε/2)}\check{C}(X, \varepsilon) = \left\{ \sigma \subseteq X \;\middle|\; \bigcap_{x \in \sigma} B(x, \varepsilon/2) \neq \emptyset \right\}

The Čech complex satisfies the Nerve Theorem: its geometric realization is homotopy equivalent to the union of balls xXB(x,ε/2)\bigcup_{x \in X} B(x, \varepsilon/2), which means it captures the “true” topology of the thickened point cloud. The VR complex doesn’t have this guarantee — it can introduce spurious high-dimensional simplices. However, the two complexes bracket each other:

Cˇ(X,ε)VR(X,ε)Cˇ(X,2ε)\check{C}(X, \varepsilon) \subseteq \text{VR}(X, \varepsilon) \subseteq \check{C}(X, \sqrt{2}\,\varepsilon)

In practice, this means VR and Čech give the same persistent homology up to a constant factor in the scale parameter, and for H0H_0 and H1H_1 in typical datasets they agree exactly. Since the Čech complex requires computing higher-order ball intersections (expensive in high dimensions) while VR needs only pairwise distances, every major TDA library — ripser, gudhi, dionysus — defaults to Vietoris-Rips.

Example 2.

Consider five points in R2\mathbb{R}^2: three forming an equilateral triangle with side length 1.0, and two more at distance 0.6 from each other but far from the triangle.

  • At ε=0.3\varepsilon = 0.3: Only vertices (5 isolated points). No edges.
  • At ε=0.8\varepsilon = 0.8: The distant pair connects (one edge). The triangle vertices are still isolated.
  • At ε=1.1\varepsilon = 1.1: The triangle’s edges appear (three edges forming a cycle). The pair is connected.
  • At ε=1.6\varepsilon = 1.6: The triangle fills in (one 2-simplex). All pairwise distances in the triangle are ε\leq \varepsilon, so the face closure condition is satisfied automatically.

This progression — vertices → edges → triangles appearing at increasing scales — is the filtration that persistent homology tracks.


Visual Intuition

The visualization below shows a Vietoris-Rips complex built from points sampled near a circle. Drag the ε\varepsilon slider to watch the complex form:

  • At small ε\varepsilon: only isolated vertices — the complex captures no structure
  • At moderate ε\varepsilon: edges connect nearby points, and you can see the circular arrangement emerge
  • At large ε\varepsilon: triangles fill in, eventually collapsing the loop into a solid disk

What to notice: There’s a “sweet spot” for ε\varepsilon where the edges form a cycle around the circle but the interior hasn’t filled in yet. At that scale, the complex has a one-dimensional hole — this is the H1H_1 feature that persistent homology will detect. Finding that sweet spot automatically, across all possible scales simultaneously, is exactly what persistent homology does.


Working Code

Building the Vietoris-Rips complex from scratch in Python:

import numpy as np
from itertools import combinations
from scipy.spatial.distance import pdist, squareform

def vr_complex(points, epsilon):
    """Build Vietoris-Rips complex at scale epsilon.

    Parameters
    ----------
    points : ndarray of shape (n, d)
        Point cloud in R^d.
    epsilon : float
        Scale parameter. Points within this distance are connected.

    Returns
    -------
    vertices : list of int
    edges : list of (int, int)
    triangles : list of (int, int, int)
    """
    dist_matrix = squareform(pdist(points))
    n = len(points)
    vertices = list(range(n))
    edges = [(i, j) for i, j in combinations(range(n), 2)
             if dist_matrix[i, j] <= epsilon]
    triangles = [(i, j, k) for i, j, k in combinations(range(n), 3)
                 if all(dist_matrix[a, b] <= epsilon
                        for a, b in combinations([i, j, k], 2))]
    return vertices, edges, triangles

# Example: 12 points on a circle
theta = np.linspace(0, 2 * np.pi, 12, endpoint=False)
points = np.column_stack([np.cos(theta), np.sin(theta)])

for eps in [0.3, 0.8, 1.2, 1.8]:
    v, e, t = vr_complex(points, eps)
    print(f"ε={eps:.1f}: {len(v)} vertices, {len(e)} edges, {len(t)} triangles")
ε=0.3: 12 vertices, 0 edges, 0 triangles
ε=0.8: 12 vertices, 12 edges, 0 triangles
ε=1.2: 12 vertices, 24 edges, 12 triangles
ε=1.8: 12 vertices, 42 edges, 64 triangles

At ε=0.8\varepsilon = 0.8, we get exactly 12 edges forming a cycle — the complex “sees” the circle. By ε=1.8\varepsilon = 1.8, the complex is nearly complete (66 possible edges, 220 possible triangles) and the topological signal is lost in combinatorial noise. This is why we need persistent homology: it tracks which features are real (they persist across a range of scales) and which are artifacts.

For real-world analysis, use ripser instead of this brute-force construction:

from ripser import ripser

# ripser builds the VR filtration and computes persistence in one step
result = ripser(points, maxdim=1)
print(result['dgms'][0][:5])  # H₀ intervals (connected components)
print(result['dgms'][1][:5])  # H₁ intervals (loops)

Connections & Applications

Simplicial complexes are the combinatorial substrate for nearly everything in topological data analysis:

  • Persistent homology tracks how the topology of VR(X,ε)\text{VR}(X, \varepsilon) changes as ε\varepsilon increases — see Persistent Homology
  • Mapper (Singh, Mémoli, Carlsson 2007) builds simplicial complexes via a different construction — nerve of a cover — but the underlying idea is the same: connect data points that are “close” in some sense
  • Witness complexes offer a sparser alternative to VR when the point cloud is large: landmark points define the complex, and data points “witness” simplices
  • In graph neural networks, the message-passing framework can be generalized from graphs (1-complexes) to simplicial complexes, passing messages along triangles and higher simplices — this is the domain of simplicial neural networks

Connections

References & Further Reading