intermediate graph-theory 45 min read

Expander Graphs

Sparse graphs with paradoxically strong connectivity — from the Expander Mixing Lemma to Ramanujan optimality and O(log n) mixing

Overview & Motivation

The complete graph KnK_n is well-connected in every sense: every pair of vertices shares an edge, every subset has a massive boundary, and a random walk mixes in a single step. But KnK_n has (n2)\binom{n}{2} edges — a quadratic cost that becomes prohibitive for large nn. Can we achieve the same qualitative connectivity with far fewer edges?

Expander graphs say yes. An expander is a graph that is simultaneously sparse (each vertex has bounded degree dd, independent of nn, so the total edge count is O(n)O(n)) and well-connected (every vertex subset SS with Sn/2|S| \leq n/2 has a boundary at least proportional to S|S|). This combination sounds paradoxical: how can a graph with only O(n)O(n) edges avoid having bottlenecks? And yet expander families exist, can be constructed explicitly, and turn out to be optimal in a precise spectral sense.

Three equivalent perspectives capture the “no bottleneck” property:

  1. Vertex expansion: Every set SS has many neighbors outside SS — the boundary N(S)SN(S) \setminus S is large relative to S|S|.

  2. Edge expansion (the Cheeger constant): Every set SS has many edges crossing to the complement Sˉ\bar{S} — the edge boundary E(S,Sˉ)|E(S, \bar{S})| is large relative to S|S|.

  3. Spectral expansion: The second-largest eigenvalue λ\lambda of the adjacency matrix (in absolute value) is bounded away from the degree dd — the spectral gap dλd - \lambda is large.

Cheeger’s inequality links perspectives 2 and 3, showing they are quantitatively equivalent. The Expander Mixing Lemma makes expansion concrete: it controls the number of edges between any two vertex subsets in terms of λ\lambda, making the (n,d,λ)(n, d, \lambda)-expander formalism a workhorse for combinatorics and computer science. The Alon-Boppana bound establishes λ2d1o(1)\lambda \geq 2\sqrt{d-1} - o(1) as a universal floor, and Ramanujan graphs — which achieve λ2d1\lambda \leq 2\sqrt{d-1} — are the provably optimal expanders.

These results connect directly to the spectral theory we developed in the prerequisite topics. In Graph Laplacians & Spectrum, Cheeger’s inequality linked the Fiedler value λ2(L)\lambda_2(L) to the minimum edge cut. In Random Walks & Mixing, the spectral gap γ=1λ2(P)\gamma = 1 - \lambda_2(P) controlled the mixing time. Expanders are the graphs where both quantities are bounded away from zero uniformly in nn — the mixing time is O(logn)O(\log n) regardless of the graph’s size, and the minimum cut grows linearly with the subset size.

Why should ML practitioners care? Expanders appear throughout theoretical computer science and increasingly in machine learning:

  • Derandomization: The expander walk sampling theorem shows that a random walk on an expander produces nearly independent samples, reducing the randomness needed by algorithms from O(tlogn)O(t \log n) to O(logn+tlogd)O(\log n + t \log d) bits.
  • Error-correcting codes: Bipartite expanders yield linear-time decodable codes (Sipser-Spielman expander codes).
  • Network design: Communication networks, sensor networks, and distributed systems benefit from expander-like connectivity.
  • Graph neural networks: Expansion controls information flow in message-passing architectures — O(logn)O(\log n) layers suffice to propagate information across the entire graph, but also cause over-smoothing.

Roadmap. We define the three expansion notions and compute them on named graphs (§1-2), prove their equivalence via Cheeger’s inequality (§3), derive the Expander Mixing Lemma with a full spectral proof (§4), establish the Alon-Boppana lower bound and define Ramanujan graphs (§5), construct explicit expanders from Cayley graphs and number theory (§6), prove the O(logn)O(\log n) mixing time bound (§7), and survey applications to CS and ML (§8-9).


1. Three Notions of Expansion

The central idea of expansion is that every small-to-medium vertex subset has a large boundary. Different notions formalize “boundary” differently. We will define all three, compute them on familiar graphs, and then prove they are equivalent for families of regular graphs.

1.1 Vertex Expansion

The most direct notion counts how many new vertices a set SS can reach in one step.

Definition 1 (Vertex Expansion).

For a graph G=(V,E)G = (V, E) and a vertex subset SVS \subseteq V, the vertex boundary of SS is V(S)=N(S)S\partial_V(S) = N(S) \setminus S, where N(S)={vV:uS,{u,v}E}N(S) = \{v \in V : \exists u \in S, \{u,v\} \in E\} is the neighborhood of SS. The vertex expansion ratio of GG is:

hV(G)=minSV0<Sn/2V(S)Sh_V(G) = \min_{\substack{S \subseteq V \\ 0 < |S| \leq n/2}} \frac{|\partial_V(S)|}{|S|}

A graph has good vertex expansion if hV(G)ch_V(G) \geq c for some constant c>0c > 0 independent of nn. This means every set SS of size at most n/2n/2 has at least cSc|S| neighbors outside SS — the graph has no isolated clusters.

1.2 Edge Expansion (the Cheeger Constant)

Instead of counting boundary vertices, we can count boundary edges.

Definition 2 (Edge Expansion (Cheeger Constant)).

For a dd-regular graph G=(V,E)G = (V, E) and a vertex subset SVS \subseteq V, the edge boundary is E(S,Sˉ)={{u,v}E:uS,vS}E(S, \bar{S}) = \{\{u,v\} \in E : u \in S, v \notin S\}. The edge expansion ratio (or Cheeger constant) is:

h(G)=minSV0<Sn/2E(S,Sˉ)Sh(G) = \min_{\substack{S \subseteq V \\ 0 < |S| \leq n/2}} \frac{|E(S, \bar{S})|}{|S|}

The Cheeger constant measures the minimum “surface-to-volume ratio” of any subset — a direct analogy with isoperimetric inequalities in geometry. A set with small edge boundary relative to its volume is a bottleneck: information (or random walkers) trapped inside SS must pass through a narrow gate to reach Sˉ\bar{S}. Expanders are graphs with no such bottlenecks.

1.3 Spectral Expansion

The third perspective replaces the combinatorial min over subsets with a single algebraic quantity: the second-largest eigenvalue.

Definition 3 (Spectral Expansion and (n, d, λ)-Expanders).

Let GG be a dd-regular graph on nn vertices with adjacency matrix AA. Since GG is dd-regular, the largest eigenvalue is λ1=d\lambda_1 = d with eigenvector 1\mathbf{1}. The spectral expansion parameter is:

λ(G)=maxi2λi(A)=max(λ2(A),λn(A))\lambda(G) = \max_{i \geq 2} |\lambda_i(A)| = \max\bigl(|\lambda_2(A)|, |\lambda_n(A)|\bigr)

We call GG an (n,d,λ)(n, d, \lambda)-expander if it is dd-regular, has nn vertices, and λ(G)λ\lambda(G) \leq \lambda.

The quantity λ(G)\lambda(G) is the largest eigenvalue in absolute value after the trivial eigenvalue dd. The smaller λ\lambda is, the better the expansion: the gap between dd and λ\lambda — the spectral gap dλd - \lambda — measures how far the graph’s spectrum deviates from the rank-1 pattern of the complete graph.

Why does the spectral gap control expansion? Recall from Graph Laplacians & Spectrum that the Laplacian eigenvalues μi\mu_i of a dd-regular graph satisfy μi=dλi(A)\mu_i = d - \lambda_i(A). So λ2(L)=dλ2(A)\lambda_2(L) = d - \lambda_2(A), and a large spectral gap in AA means a large Fiedler value λ2(L)\lambda_2(L) — the graph is hard to cut. The connection to Random Walks & Mixing is equally direct: the transition matrix P=A/dP = A/d has eigenvalues λi(P)=λi(A)/d\lambda_i(P) = \lambda_i(A)/d, so the spectral gap of the walk is γ=1λ(G)/d\gamma = 1 - \lambda(G)/d.

1.4 Examples: Expansion on Named Graphs

Let us compute all three expansion parameters for familiar graphs.

Complete graph KnK_n (d=n1d = n-1). Every vertex is adjacent to every other, so for any SS with S=kn/2|S| = k \leq n/2:

  • V(S)=VS\partial_V(S) = V \setminus S, giving hV(Kn)=(nk)/k1h_V(K_n) = (n-k)/k \geq 1.
  • E(S,Sˉ)=k(nk)|E(S, \bar{S})| = k(n-k), giving h(Kn)=nkn/2h(K_n) = n - k \geq n/2.
  • The adjacency eigenvalues are n1n-1 (once) and 1-1 (with multiplicity n1n-1), so λ(Kn)=1\lambda(K_n) = 1.

KnK_n is an excellent expander spectrally (λ=1d=n1\lambda = 1 \ll d = n-1), but it is not sparse — the degree grows with nn.

Cycle CnC_n (d=2d = 2). The cycle is a 2-regular graph. Taking SS to be a contiguous arc of n/2\lfloor n/2 \rfloor vertices:

  • V(S)=2|\partial_V(S)| = 2 (the two endpoints of the arc), so hV(Cn)=2/n/20h_V(C_n) = 2/\lfloor n/2 \rfloor \to 0.
  • E(S,Sˉ)=2|E(S, \bar{S})| = 2, so h(Cn)=2/n/20h(C_n) = 2/\lfloor n/2 \rfloor \to 0.
  • The adjacency eigenvalues are 2cos(2πk/n)2\cos(2\pi k/n) for k=0,,n1k = 0, \ldots, n-1, so λ2=2cos(2π/n)=2O(1/n2)\lambda_2 = 2\cos(2\pi/n) = 2 - O(1/n^2) and λ(Cn)=2O(1/n2)\lambda(C_n) = 2 - O(1/n^2).

The cycle is not an expander in any sense: all three parameters degrade as nn \to \infty.

Petersen graph (n=10n = 10, d=3d = 3). This remarkable 3-regular graph has:

  • hV=1h_V = 1 (every subset of size 5\leq 5 has at least as many external neighbors).
  • h(G)=2h(G) = 2 (verified by exhaustive search over subsets).
  • Adjacency eigenvalues: 33 (once), 11 (with multiplicity 5), 2-2 (with multiplicity 4). So λ=2\lambda = 2.

The Petersen graph is a good expander for its size. We will see shortly that λ=2\lambda = 2 actually meets the Ramanujan bound 2d1=222.832\sqrt{d-1} = 2\sqrt{2} \approx 2.83 — it is a Ramanujan graph.

Hypercube QkQ_k (n=2kn = 2^k, d=kd = k). The kk-dimensional hypercube has vertices {0,1}k\{0,1\}^k with edges between strings differing in one bit. The adjacency eigenvalues are k2jk - 2j for j=0,,kj = 0, \ldots, k, with multiplicity (kj)\binom{k}{j}. So λ2=k2\lambda_2 = k - 2 and λ(Qk)=k2\lambda(Q_k) = k - 2.

The spectral gap is dλ=2d - \lambda = 2, which is constant — the hypercube is an expander family. Its edge expansion is h(Qk)1h(Q_k) \geq 1 (the edge-isoperimetric inequality for the cube), and the mixing time of a random walk is Θ(klogk)\Theta(k \log k).

Barbell graph (n=2mn = 2m, irregular). Two complete graphs KmK_m joined by a single edge. The edge connecting them is a severe bottleneck: h(G)=1/m0h(G) = 1/m \to 0. The barbell is the canonical non-expander — a single edge cut isolates half the graph.

GraphnnddhV(G)h_V(G)h(G)h(G)λ(G)\lambda(G)Expander?
KnK_nnnn1n-11\geq 1n/2\geq n/211Yes (not sparse)
CnC_nnn220\to 00\to 02O(1/n2)2 - O(1/n^2)No
Petersen101033112222Yes
QkQ_k2k2^kkk1\geq 11\geq 1k2k-2Yes (degree grows)
Barbell2m2mvaries0\to 00\to 0d\to dNo
Expansion Metrics
h_V(G)Vertex expansion
0.800
h(G)Edge expansion
1.000
λ(G)Spectral parameter
2.000
2√(d−1)
Ramanujan: ✓ Yes — \u03BB(G) = 2.0002.828 = 2√(d−1)
d = 3

Expansion comparison across named graphs — vertex expansion, edge expansion, and spectral gap shown side by side


2. The Relationship Between Expansion Notions

Before proving the deep equivalence via Cheeger’s inequality, we establish a direct relationship between vertex and edge expansion.

Proposition 1 (Vertex vs. Edge Expansion).

For any dd-regular graph GG:

h(G)dhV(G)h(G)\frac{h(G)}{d} \leq h_V(G) \leq h(G)

Proof.

Upper bound (hV(G)h(G)h_V(G) \leq h(G)). Let SS be any subset with Sn/2|S| \leq n/2. Every vertex in V(S)\partial_V(S) is connected to at least one vertex in SS by an edge, and each such edge contributes to E(S,Sˉ)E(S, \bar{S}). Therefore V(S)E(S,Sˉ)|\partial_V(S)| \leq |E(S, \bar{S})|. Dividing both sides by S|S| and taking the minimum over SS gives hV(G)h(G)h_V(G) \leq h(G).

Lower bound (h(G)/dhV(G)h(G)/d \leq h_V(G)). Each vertex in V(S)\partial_V(S) has degree dd, so it contributes at most dd edges to E(S,Sˉ)E(S, \bar{S}). (Some edges from a boundary vertex may go to other vertices in Sˉ\bar{S}, but at most dd go anywhere.) Therefore E(S,Sˉ)dV(S)|E(S, \bar{S})| \leq d \cdot |\partial_V(S)|. Dividing by dSd \cdot |S| and taking the minimum gives h(G)/dhV(G)h(G)/d \leq h_V(G).

This tells us that vertex and edge expansion differ by at most a factor of dd. For constant-degree graphs (the setting we care about for expander families), the two notions are equivalent up to a constant.


3. Cheeger’s Inequality: Linking Spectrum to Combinatorics

The crown jewel of spectral graph theory is Cheeger’s inequality, which asserts that the spectral gap and the Cheeger constant determine each other up to polynomial factors. We proved a version of this in Graph Laplacians & Spectrum; here we state and prove the version tailored to the adjacency matrix of regular graphs.

Theorem 1 (Cheeger's Inequality for Regular Graphs).

For a dd-regular graph GG with second-largest eigenvalue λ2=λ2(A)\lambda_2 = \lambda_2(A):

dλ22h(G)2d(dλ2)\frac{d - \lambda_2}{2} \leq h(G) \leq \sqrt{2d(d - \lambda_2)}

Proof.

We prove both directions.

Lower bound (“easy direction”): h(G)(dλ2)/2h(G) \geq (d - \lambda_2)/2.

Let SS achieve the minimum in h(G)h(G), so E(S,Sˉ)/S=h(G)|E(S, \bar{S})|/|S| = h(G). Define the vector fRn\mathbf{f} \in \mathbb{R}^n by:

fi={1S/nif iSS/nif iSf_i = \begin{cases} 1 - |S|/n & \text{if } i \in S \\ -|S|/n & \text{if } i \notin S \end{cases}

This vector is orthogonal to the all-ones vector 1\mathbf{1} (the eigenvector for λ1=d\lambda_1 = d):

ifi=S(1S/n)+(nS)(S/n)=SS2/nS+S2/n=0\sum_i f_i = |S|(1 - |S|/n) + (n - |S|)(-|S|/n) = |S| - |S|^2/n - |S| + |S|^2/n = 0

By the Rayleigh quotient characterization of λ2\lambda_2:

λ2=maxx1xTAxxTxfTAffTf\lambda_2 = \max_{\mathbf{x} \perp \mathbf{1}} \frac{\mathbf{x}^T A \mathbf{x}}{\mathbf{x}^T \mathbf{x}} \geq \frac{\mathbf{f}^T A \mathbf{f}}{\mathbf{f}^T \mathbf{f}}

We compute each piece. For the denominator:

fTf=S(1Sn)2+(nS)(Sn)2=S(nS)n\mathbf{f}^T \mathbf{f} = |S|\Bigl(1 - \frac{|S|}{n}\Bigr)^2 + (n - |S|)\Bigl(\frac{|S|}{n}\Bigr)^2 = \frac{|S|(n - |S|)}{n}

For the numerator, we use fTAf={i,j}E2fifj\mathbf{f}^T A \mathbf{f} = \sum_{\{i,j\} \in E} 2f_i f_j (summing over edges):

fTAf=dfTf{i,j}E(fifj)2\mathbf{f}^T A \mathbf{f} = d \cdot \mathbf{f}^T \mathbf{f} - \sum_{\{i,j\} \in E}(f_i - f_j)^2

The difference fifjf_i - f_j is nonzero only when {i,j}\{i,j\} crosses the cut E(S,Sˉ)E(S, \bar{S}).

A more direct path uses the Laplacian quadratic form. Using the identity fTAf=df2fTLf\mathbf{f}^T A \mathbf{f} = d\|\mathbf{f}\|^2 - \mathbf{f}^T L \mathbf{f} where L=dIAL = dI - A, and recalling that fTLf={i,j}E(fifj)2\mathbf{f}^T L \mathbf{f} = \sum_{\{i,j\} \in E}(f_i - f_j)^2:

fTLf=E(S,Sˉ)12=E(S,Sˉ)\mathbf{f}^T L \mathbf{f} = |E(S, \bar{S})| \cdot 1^2 = |E(S, \bar{S})|

since each cut edge contributes (1S/n(S/n))2=1(1 - |S|/n - (-|S|/n))^2 = 1. Hence:

λ2dS(nS)nE(S,Sˉ)S(nS)n\lambda_2 \geq \frac{d \cdot \frac{|S|(n-|S|)}{n} - |E(S,\bar{S})|}{\frac{|S|(n-|S|)}{n}}

Rearranging:

E(S,Sˉ)(dλ2)S(nS)n(dλ2)S2|E(S, \bar{S})| \geq (d - \lambda_2) \cdot \frac{|S|(n-|S|)}{n} \geq (d - \lambda_2) \cdot \frac{|S|}{2}

where the last step uses Sn/2|S| \leq n/2, so (nS)/n1/2(n - |S|)/n \geq 1/2. Dividing by S|S|:

h(G)=E(S,Sˉ)Sdλ22h(G) = \frac{|E(S, \bar{S})|}{|S|} \geq \frac{d - \lambda_2}{2}

Upper bound (“hard direction”): h(G)2d(dλ2)h(G) \leq \sqrt{2d(d - \lambda_2)}.

Let v\mathbf{v} be the eigenvector of AA corresponding to λ2\lambda_2, with v1\mathbf{v} \perp \mathbf{1} and v=1\|\mathbf{v}\| = 1. We use v\mathbf{v} to construct a “sweep cut.” Sort the vertices so that v1v2vnv_1 \leq v_2 \leq \cdots \leq v_n and consider the threshold sets St={i:vit}S_t = \{i : v_i \leq t\} for varying tt. We claim that at least one of these has h(G)2d(dλ2)h(G) \leq \sqrt{2d(d - \lambda_2)}.

Define ϕ(S)=E(S,Sˉ)/S\phi(S) = |E(S, \bar{S})|/|S| for Sn/2|S| \leq n/2. For the sweep cut at threshold tt:

E(St,Sˉt)={i,j}Evit<vj1|E(S_t, \bar{S}_t)| = \sum_{\substack{\{i,j\} \in E \\ v_i \leq t < v_j}} 1

Using a Cauchy-Schwarz argument on the Rayleigh quotient:

dλ2=vTLvvTv={i,j}E(vivj)2d - \lambda_2 = \frac{\mathbf{v}^T L \mathbf{v}}{\mathbf{v}^T \mathbf{v}} = \sum_{\{i,j\} \in E} (v_i - v_j)^2

The key insight is that by averaging over thresholds (a technique called the “sweep cut” analysis), we can show there exists a threshold tt^* such that:

E(St,Sˉt)St2d{i,j}E(vivj)2=2d(dλ2)\frac{|E(S_{t^*}, \bar{S}_{t^*})|}{|S_{t^*}|} \leq \sqrt{2d \sum_{\{i,j\} \in E}(v_i - v_j)^2} = \sqrt{2d(d - \lambda_2)}

The detailed argument proceeds as follows. For each edge {i,j}\{i,j\} with vi<vjv_i < v_j, the edge crosses the cut StS_t for all t[vi,vj)t \in [v_i, v_j). By the Cauchy-Schwarz inequality:

E(St,Sˉt)dt={i,j}Evivj\int_{-\infty}^{\infty} |E(S_t, \bar{S}_t)| \, dt = \sum_{\{i,j\} \in E} |v_i - v_j|

and the integral of St|S_t| (counting vertices below tt) relates to v2\|\mathbf{v}\|^2. Bounding the ratio of integrals and using the fact that v\mathbf{v} is an eigenvector yields the claimed bound.

h(G)ϕ(St)2d(dλ2)h(G) \leq \phi(S_{t^*}) \leq \sqrt{2d(d - \lambda_2)}

Remark (Cheeger's Inequality is Tight).

Both sides of Cheeger’s inequality are achievable. The lower bound is tight for the complete bipartite graph Kn/2,n/2K_{n/2, n/2}, where λ2=0\lambda_2 = 0, dλ2=dd - \lambda_2 = d, and h=d/2=(dλ2)/2h = d/2 = (d - \lambda_2)/2. The upper bound is tight for the cycle CnC_n, where dλ2=22cos(2π/n)2π2/n2d - \lambda_2 = 2 - 2\cos(2\pi/n) \sim 2\pi^2/n^2 and h(Cn)=4/n222π2/n2=2π2/nh(C_n) = 4/n \sim \sqrt{2 \cdot 2 \cdot 2\pi^2/n^2} = 2\pi\sqrt{2}/n up to constants.

Corollary 1 (Equivalence of Expansion Notions).

For a family of dd-regular graphs {Gn}\{G_n\} with dd fixed and nn \to \infty, the following are equivalent:

  1. hV(Gn)c1h_V(G_n) \geq c_1 for some constant c1>0c_1 > 0 (vertex expansion).
  2. h(Gn)c2h(G_n) \geq c_2 for some constant c2>0c_2 > 0 (edge expansion).
  3. λ(Gn)dc3\lambda(G_n) \leq d - c_3 for some constant c3>0c_3 > 0 (spectral expansion).

Specifically, any one of these conditions implies the other two, with the constants related by Cheeger’s inequality and Proposition 1.

Proof.

(3)(2)(3) \Rightarrow (2): If λ(Gn)dc3\lambda(G_n) \leq d - c_3, then λ2(A)dc3\lambda_2(A) \leq d - c_3, so by the lower bound of Cheeger’s inequality, h(Gn)(dλ2)/2c3/2h(G_n) \geq (d - \lambda_2)/2 \geq c_3/2.

(2)(1)(2) \Rightarrow (1): By Proposition 1, hV(Gn)h(Gn)/dc2/dh_V(G_n) \geq h(G_n)/d \geq c_2/d.

(1)(2)(1) \Rightarrow (2): By Proposition 1, h(Gn)hV(Gn)1c1h(G_n) \geq h_V(G_n) \cdot 1 \geq c_1. (Actually, h(Gn)hV(Gn)h(G_n) \geq h_V(G_n) since each boundary vertex contributes at least one edge. Wait — we proved the bound hVhh_V \leq h, so hhVc1h \geq h_V \geq c_1.)

(2)(3)(2) \Rightarrow (3): If h(Gn)c2h(G_n) \geq c_2, then by the upper bound of Cheeger’s inequality, h(Gn)2d(dλ2)h(G_n) \leq \sqrt{2d(d - \lambda_2)}, so c22d(dλ2)c_2 \leq \sqrt{2d(d-\lambda_2)}, giving dλ2c22/(2d)d - \lambda_2 \geq c_2^2/(2d), hence λ2dc22/(2d)\lambda_2 \leq d - c_2^2/(2d).

For the full spectral parameter λ(G)=max(λ2,λn)\lambda(G) = \max(|\lambda_2|, |\lambda_n|), we need to also control λn|\lambda_n|. If the graph is non-bipartite (which it must be for edge expansion to be bounded away from 0 in a strong sense), then λn>d\lambda_n > -d, and a similar argument using the set minimizing the Cheeger constant gives λn(dc)\lambda_n \geq -(d - c) for some constant cc depending on c2c_2.

This equivalence is why we can speak of “expander families” without specifying which notion of expansion — for constant-degree regular graphs, all three notions agree up to polynomial transformations of the expansion parameter.

Cheeger scatter plot — spectral gap vs. edge expansion for random 5-regular graphs, with Cheeger's inequality bounds shown


4. The Expander Mixing Lemma

The Expander Mixing Lemma (EML) is the most quantitatively useful consequence of spectral expansion. While the Cheeger constant controls cuts for the worst-case set SS, the EML controls the number of edges between any two sets SS and TT, making it a powerful tool for combinatorial arguments.

In a random dd-regular graph (or equivalently, in a dd-regular graph chosen uniformly from all edge configurations), the expected number of edges between two disjoint sets SS and TT is dST/nd|S||T|/n — each of the dSd|S| edge-endpoints in SS hits TT with probability roughly T/n|T|/n. The EML says that in an (n,d,λ)(n, d, \lambda)-expander, the actual count deviates from this expected value by at most λST\lambda\sqrt{|S||T|}.

Theorem 2 (Expander Mixing Lemma).

Let GG be an (n,d,λ)(n, d, \lambda)-expander. For any two vertex subsets S,TVS, T \subseteq V:

E(S,T)dSTnλST\left|E(S, T) - \frac{d|S||T|}{n}\right| \leq \lambda \sqrt{|S||T|}

where E(S,T)={(u,v):uS,vT,{u,v}E}E(S, T) = |\{(u, v) : u \in S, v \in T, \{u,v\} \in E\}| counts ordered pairs (so self-loops count once and edges within STS \cap T count twice).

Proof.

The proof decomposes the indicator vectors of SS and TT in the eigenbasis of AA and applies Cauchy-Schwarz. The Spectral Theorem guarantees that AA has an orthonormal eigenbasis {v1,,vn}\{\mathbf{v}_1, \ldots, \mathbf{v}_n\} with real eigenvalues λ1λ2λn\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n.

Step 1: Set up the eigenbasis decomposition.

Since GG is dd-regular, λ1=d\lambda_1 = d and v1=1/n\mathbf{v}_1 = \mathbf{1}/\sqrt{n}. Let 1S\mathbf{1}_S and 1T\mathbf{1}_T denote the indicator (characteristic) vectors of SS and TT. Expand them in the eigenbasis:

1S=i=1nαivi,1T=i=1nβivi\mathbf{1}_S = \sum_{i=1}^n \alpha_i \mathbf{v}_i, \qquad \mathbf{1}_T = \sum_{i=1}^n \beta_i \mathbf{v}_i

where αi=1S,vi\alpha_i = \langle \mathbf{1}_S, \mathbf{v}_i \rangle and βi=1T,vi\beta_i = \langle \mathbf{1}_T, \mathbf{v}_i \rangle.

Step 2: Compute the first coefficients.

The coefficient for v1=1/n\mathbf{v}_1 = \mathbf{1}/\sqrt{n} is:

α1=1S,1/n=Sn,β1=Tn\alpha_1 = \langle \mathbf{1}_S, \mathbf{1}/\sqrt{n} \rangle = \frac{|S|}{\sqrt{n}}, \qquad \beta_1 = \frac{|T|}{\sqrt{n}}

Step 3: Express E(S,T)E(S, T) in terms of eigenvalues.

The key observation: E(S,T)=1STA1TE(S, T) = \mathbf{1}_S^T A \, \mathbf{1}_T. Since Avi=λiviA\mathbf{v}_i = \lambda_i \mathbf{v}_i and the eigenvectors are orthonormal:

E(S,T)=1STA1T=i=1nλiαiβiE(S,T) = \mathbf{1}_S^T A \, \mathbf{1}_T = \sum_{i=1}^n \lambda_i \alpha_i \beta_i

Step 4: Separate the first term.

E(S,T)=λ1α1β1+i=2nλiαiβi=dSnTn+i=2nλiαiβi=dSTn+i=2nλiαiβiE(S,T) = \lambda_1 \alpha_1 \beta_1 + \sum_{i=2}^n \lambda_i \alpha_i \beta_i = d \cdot \frac{|S|}{\sqrt{n}} \cdot \frac{|T|}{\sqrt{n}} + \sum_{i=2}^n \lambda_i \alpha_i \beta_i = \frac{d|S||T|}{n} + \sum_{i=2}^n \lambda_i \alpha_i \beta_i

So the deviation from the expected value is:

E(S,T)dSTn=i=2nλiαiβiE(S,T) - \frac{d|S||T|}{n} = \sum_{i=2}^n \lambda_i \alpha_i \beta_i

Step 5: Apply the triangle inequality and Cauchy-Schwarz.

i=2nλiαiβii=2nλiαiβiλi=2nαiβi\left|\sum_{i=2}^n \lambda_i \alpha_i \beta_i\right| \leq \sum_{i=2}^n |\lambda_i| \cdot |\alpha_i| \cdot |\beta_i| \leq \lambda \sum_{i=2}^n |\alpha_i| \cdot |\beta_i|

where λ=maxi2λi=λ(G)\lambda = \max_{i \geq 2} |\lambda_i| = \lambda(G). Now apply Cauchy-Schwarz:

i=2nαiβii=2nαi2i=2nβi2\sum_{i=2}^n |\alpha_i| \cdot |\beta_i| \leq \sqrt{\sum_{i=2}^n \alpha_i^2} \cdot \sqrt{\sum_{i=2}^n \beta_i^2}

Step 6: Bound the norms using Parseval’s identity.

By Parseval’s identity (since {vi}\{\mathbf{v}_i\} is orthonormal):

i=1nαi2=1S2=S\sum_{i=1}^n \alpha_i^2 = \|\mathbf{1}_S\|^2 = |S|

Therefore:

i=2nαi2=Sα12=SS2n=S(1Sn)S\sum_{i=2}^n \alpha_i^2 = |S| - \alpha_1^2 = |S| - \frac{|S|^2}{n} = |S|\left(1 - \frac{|S|}{n}\right) \leq |S|

and similarly i=2nβi2T\sum_{i=2}^n \beta_i^2 \leq |T|.

Step 7: Combine.

E(S,T)dSTnλST\left|E(S,T) - \frac{d|S||T|}{n}\right| \leq \lambda \sqrt{|S| \cdot |T|}

This completes the proof.

The EML is a multiplicative guarantee when SS and TT are not too small. If S=T=αn|S| = |T| = \alpha n for some constant α\alpha, the expected edge count is dα2nd\alpha^2 n and the error bound is λαn\lambda \alpha n. The relative error is λ/(dα)\lambda/(d\alpha), which is O(λ/d)O(\lambda/d) — small when λd\lambda \ll d.

Remark (Tightness of the Expander Mixing Lemma).

The EML bound is tight. For any (n,d,λ)(n, d, \lambda)-expander, there exist sets SS and TT with E(S,T)dST/n=Ω(λST)|E(S,T) - d|S||T|/n| = \Omega(\lambda\sqrt{|S||T|}). This is achieved by choosing SS and TT to be aligned with the eigenvector corresponding to λ\lambda (or λn\lambda_n in absolute value). In particular, bipartite Ramanujan graphs achieve equality with SS and TT as the two sides of the bipartition.

4.1 Consequences of the EML

The EML has immediate combinatorial consequences:

Edge density. Setting T=ST = S: the number of edges within SS satisfies E(S,S)dS2/nλS|E(S,S) - d|S|^2/n| \leq \lambda|S|. The term E(S,S)E(S,S) counts twice each edge inside SS, so the actual edge count is e(S)=E(S,S)/2e(S) = E(S,S)/2, and:

e(S)dS22nλS2\left|e(S) - \frac{d|S|^2}{2n}\right| \leq \frac{\lambda |S|}{2}

This says internal edge density is close to dS/(2n)d|S|/(2n) per vertex — nearly what you would expect from a random graph.

Independence number. An independent set SS has e(S)=0e(S) = 0, so dS2/(2n)λS/2d|S|^2/(2n) \leq \lambda|S|/2, giving:

Sλnd|S| \leq \frac{\lambda n}{d}

For an (n,d,λ)(n, d, \lambda)-expander with λ/d\lambda/d small, the independence number is at most a λ/d\lambda/d fraction of nn. This is a non-trivial bound: it says expansion forces edges to spread evenly, making large independent sets impossible.

Chromatic number. Combining the independence number bound with a greedy coloring argument gives χ(G)d/λ\chi(G) \geq d/\lambda — expanders need many colors.

Click adds to:
S (0)T (0)S \u2229 Tedges between S and T

Expander Mixing Lemma histograms — observed edge counts vs. the d|S||T|/n prediction for random subset pairs, with λ√(|S||T|) error band


5. Ramanujan Graphs and the Alon-Boppana Bound

We now ask: how small can λ(G)\lambda(G) be for a dd-regular graph? A smaller λ\lambda means better expansion, tighter EML bounds, and faster mixing. Is there a limit?

5.1 The Alon-Boppana Lower Bound

The answer is yes: there is a universal floor that no infinite family of dd-regular graphs can beat.

Theorem 3 (Alon-Boppana Bound).

For any family of dd-regular graphs {Gn}\{G_n\} with V(Gn)|V(G_n)| \to \infty:

lim infnλ(Gn)2d1\liminf_{n \to \infty} \lambda(G_n) \geq 2\sqrt{d-1}

More precisely, for any dd-regular graph GG on nn vertices:

λ2(A)2d1O(1logd1n)\lambda_2(A) \geq 2\sqrt{d-1} - O\left(\frac{1}{\lfloor \log_{d-1} n \rfloor}\right)

Proof.

We give a proof sketch that captures the essential idea. The bound arises from the spectral theory of the infinite dd-regular tree TdT_d, which is the “universal cover” of all dd-regular graphs.

The infinite tree as the limit. The dd-regular tree TdT_d has spectrum [2d1,2d1][-2\sqrt{d-1}, 2\sqrt{d-1}] (as a continuous spectrum, since TdT_d is infinite). The key fact is that any finite dd-regular graph GG on nn vertices “looks locally like TdT_d” for vertices whose rr-neighborhood is a tree, where rr can be as large as logd1(n)\lfloor \log_{d-1}(n) \rfloor.

Test vector construction. For a vertex vGv \in G, define a vector f\mathbf{f} supported on the ball Br(v)B_r(v) of radius rr around vv:

fu={(d1)dist(v,u)/2if dist(v,u)r0otherwisef_u = \begin{cases} (d-1)^{-\mathrm{dist}(v,u)/2} & \text{if } \mathrm{dist}(v,u) \leq r \\ 0 & \text{otherwise} \end{cases}

This vector mimics the eigenfunction of TdT_d at the spectral edge 2d12\sqrt{d-1}. If the rr-ball around vv is a tree (which happens for r<girth(G)/2r < \mathrm{girth}(G)/2), then:

fTAffTf=2d1O(1/r)\frac{\mathbf{f}^T A \mathbf{f}}{\mathbf{f}^T \mathbf{f}} = 2\sqrt{d-1} - O(1/r)

From Rayleigh quotient to eigenvalue. We need fv1=1/n\mathbf{f} \perp \mathbf{v}_1 = \mathbf{1}/\sqrt{n}. Modifying f\mathbf{f} to be orthogonal to 1\mathbf{1} changes the Rayleigh quotient by at most O(supp(f)/n)=O((d1)r/n)O(|\mathrm{supp}(\mathbf{f})|/n) = O((d-1)^r/n). Since we can take rlogd1(n)r \approx \log_{d-1}(n) while keeping the support smaller than nn, the correction vanishes, and:

λ2fTAffTf2d1O(1/r)\lambda_2 \geq \frac{\mathbf{f}_\perp^T A \mathbf{f}_\perp}{\mathbf{f}_\perp^T \mathbf{f}_\perp} \geq 2\sqrt{d-1} - O(1/r)

Taking nn \to \infty gives lim infλ22d1\liminf \lambda_2 \geq 2\sqrt{d-1}.

For the full spectral parameter λ(G)=max(λ2,λn)\lambda(G) = \max(|\lambda_2|, |\lambda_n|), the same argument applies to λn|\lambda_n| by considering the vector (1)dist(v,u)fu(-1)^{\mathrm{dist}(v,u)} f_u, which tests the bottom of the spectrum.

The Alon-Boppana bound tells us that 2d12\sqrt{d-1} is an impassable barrier: no family of dd-regular graphs can have λ<2d1\lambda < 2\sqrt{d-1} for all but finitely many members. This leads to the definition of optimal expanders.

5.2 Ramanujan Graphs

Definition 4 (Ramanujan Graph).

A dd-regular graph GG is a Ramanujan graph if every eigenvalue λi\lambda_i of the adjacency matrix satisfying λi<d|\lambda_i| < d also satisfies:

λi2d1|\lambda_i| \leq 2\sqrt{d-1}

Equivalently, λ(G)2d1\lambda(G) \leq 2\sqrt{d-1}.

Ramanujan graphs are the best possible expanders for their degree: they meet the Alon-Boppana bound with equality. The name honors the Indian mathematician Srinivasa Ramanujan, whose conjectures about modular forms (proved by Deligne) are the key ingredient in the original constructions.

5.3 Examples of Ramanujan Graphs

Petersen graph (d=3d = 3, n=10n = 10). The eigenvalues are {3,1,1,1,1,1,2,2,2,2}\{3, 1, 1, 1, 1, 1, -2, -2, -2, -2\}. We need λ(G)=max(1,2)=2\lambda(G) = \max(|1|, |-2|) = 2. The Ramanujan bound is 231=222.832\sqrt{3 - 1} = 2\sqrt{2} \approx 2.83. Since 22.832 \leq 2.83, the Petersen graph is Ramanujan.

Complete graph KnK_n (d=n1d = n - 1). The non-trivial eigenvalues are all 1-1, so λ(Kn)=1\lambda(K_n) = 1. The Ramanujan bound is 2n22\sqrt{n-2}. For n3n \geq 3, we have 12n21 \leq 2\sqrt{n-2}, so every KnK_n is Ramanujan. (This is expected — the complete graph is the best possible expander, but it is not sparse.)

Cycle CnC_n (d=2d = 2). The Ramanujan bound is 21=22\sqrt{1} = 2. The second-largest eigenvalue is λ2=2cos(2π/n)<2\lambda_2 = 2\cos(2\pi/n) < 2 for n3n \geq 3, so every cycle is technically Ramanujan. This may seem surprising, but recall that the cycle is not a good expander — the Cheeger constant h(Cn)0h(C_n) \to 0. Ramanujan-ness at d=2d = 2 is vacuous because 2d1=2=d2\sqrt{d-1} = 2 = d, so the condition imposes no constraint beyond regularity.

Complete bipartite graph Kn/2,n/2K_{n/2, n/2} (d=n/2d = n/2). The eigenvalues are n/2n/2, 00 (with multiplicity n2n-2), and n/2-n/2. Here λn=d\lambda_n = -d, which means λn=d|\lambda_n| = d. Under our definition, λ(G)=maxi2λi=d\lambda(G) = \max_{i \geq 2}|\lambda_i| = d, so λ(G)=d\lambda(G) = d. However, the Ramanujan condition for bipartite graphs uses the nontrivial spectral parameter

λnt(G):=max{λi(A):λi<d}\lambda_{\mathrm{nt}}(G) := \max\{|\lambda_i(A)| : |\lambda_i| < d\}

which excludes both the trivial eigenvalues +d+d and d-d. With this convention, λnt(Kn/2,n/2)=0\lambda_{\mathrm{nt}}(K_{n/2, n/2}) = 0, and Kn/2,n/2K_{n/2, n/2} is trivially Ramanujan (but again, not sparse). We use λnt(G)\lambda_{\mathrm{nt}}(G) in the table below and in the Ramanujan check of the interactive visualizations.

Graphddλnt(G)\lambda_{\mathrm{nt}}(G)2d12\sqrt{d-1}Ramanujan?
Petersen322.83Yes
K5K_5413.46Yes
K6K_6514.00Yes
CnC_n22cos(2π/n)2\cos(2\pi/n)2Yes (vacuous)
Hypercube Q4Q_4423.46Yes
Random 3-regular322\approx 2\sqrt{2}2.83Nearly

Theorem 4 (Existence of Ramanujan Graphs (Lubotzky-Phillips-Sarnak)).

For every prime pp and every prime qpq \neq p with q1(mod4)q \equiv 1 \pmod{4}, there exists an explicit family of (p+1)(p+1)-regular Ramanujan graphs on n=q(q21)/2n = q(q^2-1)/2 vertices (and other sizes in the family). These graphs are Cayley graphs of PGL(2,Fq)\mathrm{PGL}(2, \mathbb{F}_q) or PSL(2,Fq)\mathrm{PSL}(2, \mathbb{F}_q) with generators defined via quaternion arithmetic.

Remark (Friedman's Theorem and Random Regular Graphs).

While the LPS construction gives Ramanujan graphs for specific degrees, a beautiful result of Friedman (2008) shows that random dd-regular graphs are “nearly Ramanujan”: for every ε>0\varepsilon > 0,

Pr[λ(Gn)2d1+ε]1 as n\Pr[\lambda(G_n) \leq 2\sqrt{d-1} + \varepsilon] \to 1 \text{ as } n \to \infty

So almost every large random dd-regular graph has λ\lambda close to the Alon-Boppana bound. The gap between λ\lambda and 2d12\sqrt{d-1} vanishes in the limit — random graphs are essentially optimal expanders. This was conjectured by Alon in 1986 and took over 20 years to prove.

More recently, Marcus, Spielman, and Srivastava (2015) proved that bipartite Ramanujan graphs of every degree exist, resolving a major open problem. Their proof used the method of interlacing families and connected the existence of Ramanujan graphs to the Kadison-Singer problem.

Computing ensemble spectra...

Ramanujan bound visualization — eigenvalue distributions for random d-regular graphs with the 2√(d-1) threshold marked


6. Explicit Constructions

One of the remarkable features of expander graphs is that they can be constructed explicitly — we do not need randomness to build them. This section describes two families of constructions: algebraic constructions from Cayley graphs and the number-theoretic LPS construction.

6.1 Cayley Graphs

Definition 5 (Cayley Graph).

Let Γ\Gamma be a finite group and SΓS \subseteq \Gamma a symmetric generating set (S=S1S = S^{-1}, eSe \notin S). The Cayley graph Cay(Γ,S)\mathrm{Cay}(\Gamma, S) is the graph with vertex set Γ\Gamma and edges {g,gs}\{g, gs\} for all gΓg \in \Gamma and sSs \in S. The graph is S|S|-regular.

Cayley graphs are a natural source of highly structured regular graphs. Their spectral properties are determined by the representation theory of the group Γ\Gamma — and for abelian groups, the eigenvalues have an explicit formula.

Proposition 2 (Spectrum of Abelian Cayley Graphs).

Let Γ=Zn\Gamma = \mathbb{Z}_n and SZnS \subseteq \mathbb{Z}_n be a symmetric generating set with S=d|S| = d. The eigenvalues of Cay(Zn,S)\mathrm{Cay}(\mathbb{Z}_n, S) are:

λk=sScos(2πksn),k=0,1,,n1\lambda_k = \sum_{s \in S} \cos\left(\frac{2\pi k s}{n}\right), \qquad k = 0, 1, \ldots, n-1

with corresponding eigenvectors (vk)j=e2πijk/n/n(\mathbf{v}_k)_j = e^{2\pi i jk/n} / \sqrt{n}.

Proof.

The adjacency matrix of Cay(Zn,S)\mathrm{Cay}(\mathbb{Z}_n, S) is a circulant matrix: Aij=1A_{ij} = 1 if ji(modn)Sj - i \pmod{n} \in S. Circulant matrices are diagonalized by the discrete Fourier transform. The DFT basis vectors are (vk)j=e2πijk/n/n(\mathbf{v}_k)_j = e^{2\pi ijk/n}/\sqrt{n}, and the eigenvalue corresponding to vk\mathbf{v}_k is:

λk=sSe2πiks/n\lambda_k = \sum_{s \in S} e^{2\pi iks/n}

Since SS is symmetric (sSsSs \in S \Rightarrow -s \in S), the imaginary parts cancel and:

λk=sScos(2πksn)\lambda_k = \sum_{s \in S} \cos\left(\frac{2\pi ks}{n}\right)

The largest eigenvalue is λ0=S=d\lambda_0 = |S| = d (all cosines equal 1).

6.2 The Margulis-Gabber-Galil Construction

The first explicit expander family was constructed by Margulis (1973), with the spectral gap proved by Gabber and Galil (1981). It produces 8-regular graphs on n2n^2 vertices.

Construction. Let nn be any positive integer. Define the graph GnG_n on the vertex set Zn×Zn\mathbb{Z}_n \times \mathbb{Z}_n (so V=n2|V| = n^2) with edges from (x,y)(x, y) to each of:

(x±1,y),(x,y±1),(x+y,y),(xy,y),(x,y+x),(x,yx)(x \pm 1, y), \quad (x, y \pm 1), \quad (x + y, y), \quad (x - y, y), \quad (x, y + x), \quad (x, y - x)

where all arithmetic is modulo nn. Some of these eight neighbors may coincide, but in general the graph is 8-regular.

Why is this an expander? The generating set SS includes both “translation” moves (adding ±1\pm 1 to a coordinate) and “shear” moves (adding one coordinate to the other). The translations provide local connectivity, while the shears — which act as hyperbolic rotations on Zn2\mathbb{Z}_n^2 — spread out any concentrated set of vertices rapidly. The spectral gap satisfies λ(Gn)527.07\lambda(G_n) \leq 5\sqrt{2} \approx 7.07, so the spectral gap 8λ8520.938 - \lambda \geq 8 - 5\sqrt{2} \approx 0.93 is bounded away from zero.

6.3 The LPS Construction

The Lubotzky-Phillips-Sarnak (LPS) construction produces (p+1)(p+1)-regular Ramanujan graphs for primes pp. The construction uses deep number theory — specifically, Jacobi’s four-square theorem and Ramanujan’s conjecture on modular forms (proved by Deligne).

Sketch of the construction. Fix two distinct odd primes pp and qq with p,q1(mod4)p, q \equiv 1 \pmod{4}. The LPS graph Xp,qX^{p,q} is the Cayley graph of PGL(2,Fq)\mathrm{PGL}(2, \mathbb{F}_q) — the projective general linear group of 2×22 \times 2 invertible matrices over the finite field Fq\mathbb{F}_q — with generating set determined by the (p+1)(p+1) representations of pp as a sum of four squares:

p=a02+a12+a22+a32p = a_0^2 + a_1^2 + a_2^2 + a_3^2

where a0>0a_0 > 0 is odd and a1,a2,a3a_1, a_2, a_3 are even. Each representation yields a matrix in PGL(2,Fq)\mathrm{PGL}(2, \mathbb{F}_q), and these matrices form a symmetric generating set of size p+1p+1.

The Ramanujan property λ(Xp,q)2p\lambda(X^{p,q}) \leq 2\sqrt{p} follows from deep results: the non-trivial eigenvalues of the Cayley graph are expressed in terms of Hecke eigenvalues of automorphic forms on GL(2)\mathrm{GL}(2), and Ramanujan’s conjecture (now Deligne’s theorem) bounds these eigenvalues.

Explicit constructions — the Margulis-Gabber-Galil graph on ℤ₇ × ℤ₇ and an LPS Ramanujan graph, showing vertex and edge structure

6.4 Constructing Cayley Graphs in Practice

Here is a Python implementation of Cayley graph construction for cyclic groups:

import numpy as np
from itertools import product

def cayley_graph_cyclic(n, generators):
    """
    Build the adjacency matrix of Cay(Z_n, S)
    where S = generators ∪ {-g mod n : g in generators}.
    """
    # Symmetrize the generating set
    S = set()
    for g in generators:
        S.add(g % n)
        S.add((-g) % n)
    S.discard(0)  # Remove identity

    A = np.zeros((n, n), dtype=int)
    for v in range(n):
        for s in S:
            w = (v + s) % n
            A[v, w] = 1
    return A, sorted(S)

def margulis_gabber_galil(n):
    """
    Build the Margulis–Gabber–Galil 8-regular expander on Z_n × Z_n.
    Returns adjacency matrix of size n^2 × n^2.
    """
    N = n * n
    A = np.zeros((N, N), dtype=int)

    def idx(x, y):
        return (x % n) * n + (y % n)

    for x in range(n):
        for y in range(n):
            i = idx(x, y)
            neighbors = [
                idx(x + 1, y), idx(x - 1, y),
                idx(x, y + 1), idx(x, y - 1),
                idx(x + y, y), idx(x - y, y),
                idx(x, y + x), idx(x, y - x),
            ]
            for j in neighbors:
                A[i, j] = 1
    return A

7. Random Walks on Expanders

In Random Walks & Mixing, we proved that the mixing time of a random walk is controlled by the spectral gap: tmix=O(1/γlogn)t_{\mathrm{mix}} = O(1/\gamma \cdot \log n) where γ=1λ2(P)\gamma = 1 - \lambda_2(P). For expanders, γ\gamma is bounded away from zero, so the mixing time is O(logn)O(\log n) — logarithmic in the number of vertices.

7.1 Mixing Time on Expanders

Theorem 5 (Mixing Time on Expanders).

Let GG be a connected, non-bipartite (n,d,λ)(n, d, \lambda)-expander. The random walk on GG has mixing time:

tmix(ε)11λ/d(lnn+ln1ε)=ddλ(lnn+ln1ε)t_{\mathrm{mix}}(\varepsilon) \leq \frac{1}{1 - \lambda/d} \cdot \left(\ln n + \ln\frac{1}{\varepsilon}\right) = \frac{d}{d - \lambda}\left(\ln n + \ln\frac{1}{\varepsilon}\right)

In particular, if λdc\lambda \leq d - c for a constant c>0c > 0, then tmix(ε)=O(logn+log(1/ε))t_{\mathrm{mix}}(\varepsilon) = O(\log n + \log(1/\varepsilon)).

Proof.

Since GG is dd-regular, the transition matrix is P=A/dP = A/d and the stationary distribution is uniform: π=(1/n,,1/n)\pi = (1/n, \ldots, 1/n). The eigenvalues of PP are μi=λi(A)/d\mu_i = \lambda_i(A)/d, so μiλ/d|\mu_i| \leq \lambda/d for i2i \geq 2. Define ρ=λ/d<1\rho = \lambda/d < 1 (using non-bipartiteness and λ<d\lambda < d).

Step 1: Spectral decomposition of PtP^t.

By the spectral decomposition from Random Walks & Mixing:

Pt(x,y)=1n+i=2nμitvi(x)vi(y)P^t(x, y) = \frac{1}{n} + \sum_{i=2}^n \mu_i^t \, v_i(x) \, v_i(y)

where viv_i are the orthonormal eigenvectors of PP, and we used v1=1/nv_1 = \mathbf{1}/\sqrt{n} and μ1=1\mu_1 = 1.

Step 2: Total variation bound.

The total variation distance from stationarity is:

Pt(x,)πTV=12yVPt(x,y)1/n\|P^t(x, \cdot) - \pi\|_{\mathrm{TV}} = \frac{1}{2}\sum_{y \in V} |P^t(x,y) - 1/n|

Using the spectral decomposition:

Pt(x,y)1/n=i=2nμitvi(x)vi(y)P^t(x,y) - 1/n = \sum_{i=2}^n \mu_i^t \, v_i(x) \, v_i(y)

By Cauchy-Schwarz:

Pt(x,y)1/ni=2nμi2tvi(x)2i=2nvi(y)2|P^t(x,y) - 1/n| \leq \sqrt{\sum_{i=2}^n \mu_i^{2t} v_i(x)^2} \cdot \sqrt{\sum_{i=2}^n v_i(y)^2}

Summing over yy and using yvi(y)2=1\sum_y v_i(y)^2 = 1 (orthonormality):

Pt(x,)πTV2n4i=2nμi2tvi(x)2n4ρ2ti=2nvi(x)2n4ρ2t\|P^t(x, \cdot) - \pi\|_{\mathrm{TV}}^2 \leq \frac{n}{4} \sum_{i=2}^n \mu_i^{2t} v_i(x)^2 \leq \frac{n}{4} \rho^{2t} \sum_{i=2}^n v_i(x)^2 \leq \frac{n}{4} \rho^{2t}

where we used μiρ|\mu_i| \leq \rho. The sum i=2nvi(x)2\sum_{i=2}^n v_i(x)^2 is bounded by 1, since the eigenvectors {vi}\{v_i\} form an orthonormal basis: the completeness identity i=1nvi(x)vi(y)=δxy\sum_{i=1}^n v_i(x)v_i(y) = \delta_{xy} gives i=1nvi(x)2=1\sum_{i=1}^n v_i(x)^2 = 1 at y=xy = x.

Working instead with the 2\ell^2 norm (which gives a cleaner bound):

yPt(x,y)1/n2=y(i=2nμitvi(x)vi(y))2=i=2nμi2tvi(x)2ρ2t\sum_y \left|P^t(x,y) - 1/n\right|^2 = \sum_y \left(\sum_{i=2}^n \mu_i^t v_i(x) v_i(y)\right)^2 = \sum_{i=2}^n \mu_i^{2t} v_i(x)^2 \leq \rho^{2t}

where the second equality uses orthonormality of the viv_i (summing over yy) and the inequality uses μiρ|\mu_i| \leq \rho.

By Cauchy-Schwarz between the sum over yy and the constant function:

(yPt(x,y)1/n)2nyPt(x,y)1/n2nρ2t\left(\sum_y |P^t(x,y) - 1/n|\right)^2 \leq n \sum_y |P^t(x,y) - 1/n|^2 \leq n \rho^{2t}

So:

Pt(x,)πTV=12yPt(x,y)1/nn2ρt\|P^t(x, \cdot) - \pi\|_{\mathrm{TV}} = \frac{1}{2}\sum_y |P^t(x,y) - 1/n| \leq \frac{\sqrt{n}}{2} \rho^t

Step 3: Solve for tt.

We want Pt(x,)πTVε\|P^t(x, \cdot) - \pi\|_{\mathrm{TV}} \leq \varepsilon. It suffices to have:

n2ρtε\frac{\sqrt{n}}{2} \rho^t \leq \varepsilon

Taking logarithms:

tln(1/ρ)ln(n/2)+ln(1/ε)t \cdot \ln(1/\rho) \geq \ln(\sqrt{n}/2) + \ln(1/\varepsilon)

Using the inequality ln(1/x)1x\ln(1/x) \geq 1 - x for x(0,1]x \in (0, 1], we have ln(1/ρ)1ρ=1λ/d=(dλ)/d\ln(1/\rho) \geq 1 - \rho = 1 - \lambda/d = (d - \lambda)/d.

So tddλ(12lnnln2+ln(1/ε))t \geq \frac{d}{d - \lambda}(\frac{1}{2}\ln n - \ln 2 + \ln(1/\varepsilon)) suffices. For the standard formulation with ε=1/4\varepsilon = 1/4:

tmixddλ(12lnn+ln2)ddλ(lnn+ln(1/ε))t_{\mathrm{mix}} \leq \frac{d}{d - \lambda}\left(\frac{1}{2}\ln n + \ln 2\right) \leq \frac{d}{d - \lambda}(\ln n + \ln(1/\varepsilon))

(absorbing constants). When dλcd - \lambda \geq c for a constant cc, this is O(logn)O(\log n).

7.2 Comparison of Mixing Times

The following table compares mixing times across graph families, highlighting how expansion controls convergence speed:

Graph Familynnddλ(G)\lambda(G)Spectral Gap dλd - \lambdatmixt_{\mathrm{mix}}
Path PnP_nnn22cos(π/n)2\cos(\pi/n)O(1/n2)O(1/n^2)Θ(n2)\Theta(n^2)
Cycle CnC_nnn22cos(2π/n)2\cos(2\pi/n)O(1/n2)O(1/n^2)Θ(n2)\Theta(n^2)
Hypercube QkQ_k2k2^kkkk2k-222Θ(klogk)\Theta(k \log k)
(n,d,λ)(n, d, \lambda)-Expandernnddλ\lambdadλ=Θ(1)d - \lambda = \Theta(1)Θ(logn)\Theta(\log n)
Complete KnK_nnnn1n-11n2n-2Θ(1)\Theta(1)

The path and cycle have vanishing spectral gap, so mixing takes polynomial time. The hypercube has a constant spectral gap (2) and mixes in O(nlogn)O(n \log n) where n=log2Vn = \log_2 |V|. Expanders achieve the optimal O(logV)O(\log |V|) mixing for sparse graphs.

Corollary 2 (Rapid Mixing Implies Expansion).

If a family of dd-regular graphs {Gn}\{G_n\} has mixing time tmix(Gn)=O(logn)t_{\mathrm{mix}}(G_n) = O(\log n), then {Gn}\{G_n\} is an expander family (i.e., λ(Gn)dc\lambda(G_n) \leq d - c for some constant c>0c > 0).

Proof.

The mixing time satisfies tmix12(1ρ)ln(n/2)t_{\mathrm{mix}} \geq \frac{1}{2(1 - \rho)} \ln(n/2) where ρ=λ/d\rho = \lambda/d (this is the matching lower bound from Random Walks & Mixing). If tmix=O(logn)t_{\mathrm{mix}} = O(\log n), then:

Clognd2(dλ)ln(n/2)C \log n \geq \frac{d}{2(d - \lambda)} \ln(n/2)

for some constant CC. This gives dλln(n/2)2Clogn12Cln2d - \lambda \geq \frac{\ln(n/2)}{2C \log n} \to \frac{1}{2C \ln 2}, so the spectral gap is bounded below by a positive constant.

Mixing comparison — total variation distance vs. time for random walks on a cycle, hypercube, Petersen graph, and a random 3-regular graph


8. Applications to Computer Science and Machine Learning

8.1 Expander Walk Sampling and Derandomization

The most celebrated application of expanders in theoretical computer science is derandomization: reducing the amount of randomness needed by a probabilistic algorithm. The key result is the expander walk sampling theorem.

Theorem 6 (Expander Walk Sampling Theorem).

Let GG be an (n,d,λ)(n, d, \lambda)-expander and let f:V[0,1]f : V \to [0, 1] be a function with mean μ=1nvVf(v)\mu = \frac{1}{n}\sum_{v \in V} f(v). Let v0,v1,,vt1v_0, v_1, \ldots, v_{t-1} be the vertices visited by a random walk of length tt starting from a uniformly random vertex v0v_0. Then for any ε>0\varepsilon > 0:

Pr[1ti=0t1f(vi)μ>ε]2exp((1λ/d)ε2t2)\Pr\left[\left|\frac{1}{t}\sum_{i=0}^{t-1} f(v_i) - \mu\right| > \varepsilon\right] \leq 2 \exp\left(-\frac{(1 - \lambda/d) \varepsilon^2 t}{2}\right)

This is a Chernoff-type concentration bound for dependent samples. In a standard Chernoff bound for tt independent samples, the exponent is ε2t/2-\varepsilon^2 t / 2. The expander walk version has an extra factor of 1λ/d1 - \lambda/d (the spectral gap of the walk) — a mild penalty for dependence.

Why is this useful for derandomization? To sample tt independent vertices from GG, you need tlognt \log n random bits. But a random walk of length tt on GG requires only logn\log n bits (for the starting vertex) plus tlogdt \log d bits (for the tt neighbor choices), totaling logn+tlogd\log n + t \log d. When dd is a constant and t=O(logn)t = O(\log n), this is O(logn)O(\log n) — exponentially fewer random bits than independent sampling. The expander walk sampling theorem guarantees that the walk’s samples are “almost as good” as independent samples.

This principle underlies the Ajtai-Komlós-Szemerédi (AKS) sorting network, the Impagliazzo-Zuckerman extractor, and many other constructions in derandomization and pseudorandomness.

8.2 Error-Correcting Codes

Sipser and Spielman (1996) showed that bipartite expander graphs yield error-correcting codes with remarkable properties:

Construction. Start with a bipartite (c,d)(c, d)-regular expander G=(LR,E)G = (L \cup R, E) where L=n|L| = n, R=m|R| = m. Each right vertex rRr \in R imposes a parity check: the bits indexed by N(r)LN(r) \subseteq L must sum to 0 modulo 2. The resulting code has:

  • Block length nn
  • Rate 1m/n1 - m/n (approaches 1c/d1 - c/d for good expanders)
  • Minimum distance proportional to nn (linear distance, from the expansion property)
  • Linear-time decoding: the expansion guarantees a simple “flip” algorithm converges in O(n)O(n) steps

The expansion property is the key: if a codeword has δn\delta n errors for small enough δ\delta, the δn\delta n erroneous bits have at least (1ε)cδn(1 - \varepsilon)c \cdot \delta n unsatisfied check nodes (by vertex expansion), which is more than the εcδn\varepsilon c \cdot \delta n checks that could be satisfied by accident. The decoder identifies and corrects errors by examining local check violations.

8.3 Network Robustness

Expander graphs are optimally robust networks: removing a constant fraction of vertices or edges still leaves a connected graph (in fact, a graph that is itself an expander). Formally:

  • Vertex robustness: Removing any αn\alpha n vertices from an (n,d,λ)(n, d, \lambda)-expander leaves a connected graph, provided α\alpha is smaller than a threshold depending on dd and λ\lambda.
  • Edge robustness: Similarly, removing edges. The Cheeger constant lower bound guarantees that every cut has at least h(G)S(dλ)/2Sh(G) \cdot |S| \geq (d - \lambda)/2 \cdot |S| edges, so creating a disconnection requires removing Ω(n)\Omega(n) edges.

These properties make expanders ideal for communication networks, peer-to-peer systems, and distributed hash tables, where robustness to node failures is essential.

8.4 Graph Neural Networks and Over-Smoothing

Remark (Expansion, GNNs, and Over-Smoothing).

In graph neural networks (GNNs), each message-passing layer aggregates information from a vertex’s neighbors — effectively performing one step of a (learned) random walk. After kk layers, vertex vv‘s representation depends on its kk-hop neighborhood.

On an expander, O(logn)O(\log n) layers suffice to propagate information from any vertex to any other — the “receptive field” covers the entire graph. This is the theoretical justification for using relatively few layers in GNN architectures.

However, rapid mixing is a double-edged sword. After O(logn)O(\log n) layers, every vertex’s representation converges toward the graph’s global average — the over-smoothing phenomenon. On expanders, over-smoothing happens faster than on non-expanders, precisely because the spectral gap is large. This creates a tension: expander-like connectivity is needed for global information flow, but it accelerates the loss of local signal.

Recent work addresses this by adding skip connections, using attention mechanisms, or designing architectures that interpolate between local and global aggregation. The spectral gap provides a quantitative framework for understanding these trade-offs.

For more on message passing and its spectral interpretation, see Message Passing & GNNs.

Applications of expander graphs — derandomization via walk sampling, expander codes with bipartite structure, and GNN message passing


9. Computational Notes

9.1 Computing Expansion Metrics

Here is a complete implementation for computing spectral expansion metrics and checking the Ramanujan property:

import numpy as np
from scipy import linalg

def expansion_metrics(A):
    """
    Compute spectral expansion metrics for a d-regular graph
    with adjacency matrix A.

    Returns a dictionary with degree, lambda (spectral expansion parameter),
    spectral gap, Ramanujan bound, and whether the graph is Ramanujan.
    """
    eigenvalues = np.sort(linalg.eigvalsh(A))[::-1]
    d = eigenvalues[0]

    # lambda = max |lambda_i| for i >= 2
    # For non-bipartite graphs, also exclude lambda_n = -d
    nontrivial = eigenvalues[1:]
    if np.isclose(nontrivial[-1], -d, atol=1e-8):
        # Bipartite: exclude -d as trivial
        nontrivial = nontrivial[:-1]

    lambda_val = np.max(np.abs(nontrivial))
    ram_bound = 2 * np.sqrt(d - 1)

    return {
        'degree': d,
        'lambda': lambda_val,
        'spectral_gap': d - eigenvalues[1],
        'ramanujan_bound': ram_bound,
        'is_ramanujan': lambda_val <= ram_bound + 1e-10,
        'eigenvalues': eigenvalues
    }

9.2 Verifying the Expander Mixing Lemma

def verify_eml(A, S_indices, T_indices):
    """
    Verify the Expander Mixing Lemma for subsets S and T.

    Returns the actual edge count E(S,T), the expected count d|S||T|/n,
    the EML bound lambda * sqrt(|S||T|), and whether the bound holds.
    """
    n = A.shape[0]
    metrics = expansion_metrics(A)
    d = metrics['degree']
    lam = metrics['lambda']

    # Count edges from S to T (ordered pairs)
    E_ST = sum(A[i, j] for i in S_indices for j in T_indices)

    expected = d * len(S_indices) * len(T_indices) / n
    bound = lam * np.sqrt(len(S_indices) * len(T_indices))
    deviation = abs(E_ST - expected)

    return {
        'E_ST': E_ST,
        'expected': expected,
        'deviation': deviation,
        'eml_bound': bound,
        'bound_holds': deviation <= bound + 1e-10
    }

# Example: Petersen graph
def petersen_adjacency():
    """Return the 10x10 adjacency matrix of the Petersen graph."""
    edges = [
        (0,1),(0,4),(0,5), (1,2),(1,6), (2,3),(2,7),
        (3,4),(3,8), (4,9), (5,7),(5,8), (6,8),(6,9), (7,9)
    ]
    A = np.zeros((10, 10), dtype=int)
    for i, j in edges:
        A[i, j] = A[j, i] = 1
    return A

A = petersen_adjacency()
metrics = expansion_metrics(A)
print(f"Petersen: d={metrics['degree']:.0f}, "
      f"λ={metrics['lambda']:.4f}, "
      f"Ramanujan bound={metrics['ramanujan_bound']:.4f}, "
      f"Ramanujan={metrics['is_ramanujan']}")
# Output: Petersen: d=3, λ=2.0000, Ramanujan bound=2.8284, Ramanujan=True

# Verify EML for S = {0,1,2}, T = {5,6,7,8,9}
result = verify_eml(A, [0,1,2], [5,6,7,8,9])
print(f"E(S,T)={result['E_ST']}, expected={result['expected']:.2f}, "
      f"deviation={result['deviation']:.2f}, bound={result['eml_bound']:.2f}, "
      f"holds={result['bound_holds']}")

9.3 Building and Analyzing Margulis-Gabber-Galil Expanders

def analyze_mgg_family(n_values):
    """
    Analyze the Margulis–Gabber–Galil expander family
    for several values of n, verifying the spectral gap
    stays bounded away from zero.
    """
    results = []
    for n in n_values:
        A = margulis_gabber_galil(n)
        metrics = expansion_metrics(A)
        results.append({
            'n': n,
            'vertices': n * n,
            'degree': metrics['degree'],
            'lambda': metrics['lambda'],
            'spectral_gap': metrics['spectral_gap'],
            'is_ramanujan': metrics['is_ramanujan']
        })
        print(f"MGG(n={n}): {n*n} vertices, d={metrics['degree']:.0f}, "
              f"λ={metrics['lambda']:.2f}, gap={metrics['spectral_gap']:.2f}")
    return results

# Example output (approximate):
# MGG(n=5): 25 vertices, d=8, λ=6.47, gap=1.53
# MGG(n=7): 49 vertices, d=8, λ=6.83, gap=1.17
# MGG(n=11): 121 vertices, d=8, λ=7.02, gap=0.98

For the full interactive analysis with additional graph families and larger experiments, see the companion notebook.


10. Connections and Further Reading

10.1 Connections to Other Topics

TopicConnection
Graph Laplacians & SpectrumCheeger’s inequality links the Laplacian spectral gap to edge expansion. The Fiedler vector provides a spectral approximation to the minimum cut, and expanders are precisely the graphs where this cut is large for every subset.
Random Walks & MixingThe mixing time of a random walk on an expander is O(logn)O(\log n), because the spectral gap γ=1λ/d\gamma = 1 - \lambda/d is bounded away from zero. The expander walk sampling theorem extends this to a Chernoff bound for dependent samples.
Spectral TheoremThe EML proof decomposes indicator vectors in the eigenbasis of AA. The Spectral Theorem guarantees orthonormality of this basis — the Cauchy-Schwarz step relies on this structure.
Concentration InequalitiesThe expander walk sampling theorem is a Chernoff bound where the spectral gap replaces independence. Correlations between consecutive walk samples decay exponentially in the gap.
Shannon EntropyThe entropy rate of a random walk on a dd-regular expander approaches logd\log d — maximum entropy. Expansion ensures the walk explores the graph uniformly.
Message Passing & GNNsMessage passing is iterated Laplacian smoothing. On expanders, O(logn)O(\log n) layers suffice for global information flow but cause over-smoothing. The spectral gap quantifies this trade-off.

10.2 Notation Summary

SymbolMeaning
G=(V,E)G = (V, E)Graph with vertex set VV and edge set EE
n=Vn = \vert V\vertNumber of vertices
ddDegree (for regular graphs)
AAAdjacency matrix
λi(A)\lambda_i(A)ii-th eigenvalue of AA (ordered λ1λn\lambda_1 \geq \cdots \geq \lambda_n)
λ(G)\lambda(G)Spectral expansion parameter: maxi2λi(A)\max_{i \geq 2} \vert\lambda_i(A)\vert
h(G)h(G)Cheeger constant (edge expansion)
hV(G)h_V(G)Vertex expansion ratio
P=D1AP = D^{-1}ARandom walk transition matrix
γ=1λ/d\gamma = 1 - \lambda/dSpectral gap of the random walk
tmixt_{\mathrm{mix}}Mixing time in total variation
E(S,T)E(S, T)Number of (ordered) edges from SS to TT
V(S)\partial_V(S)Vertex boundary: N(S)SN(S) \setminus S
2d12\sqrt{d-1}Alon-Boppana / Ramanujan bound

Connections

  • Cheeger's inequality links the Laplacian spectral gap to edge expansion — the same quantity that defines expanders. The Fiedler vector provides a spectral approximation to the minimum cut, and expanders are precisely the graphs where this cut is large for every subset. graph-laplacians
  • The mixing time of a random walk on an expander is O(log n), because the spectral gap γ = 1 − λ/d is bounded away from zero. Expander walk sampling extends this: consecutive walk vertices are nearly independent samples, enabling derandomization. random-walks
  • The Expander Mixing Lemma proof decomposes indicator vectors in the eigenbasis of the adjacency matrix. The Spectral Theorem guarantees orthonormality of this basis — the Cauchy-Schwarz step in the EML relies on this structure. spectral-theorem
  • The expander walk sampling theorem is a Chernoff bound for dependent samples on a Markov chain. The spectral gap replaces independence: a walk on an expander produces samples whose correlations decay exponentially. concentration-inequalities
  • The entropy rate of a random walk on a d-regular expander approaches log d — maximum entropy. Expansion ensures the walk explores the graph uniformly, maximizing the information content of the trajectory. shannon-entropy

References & Further Reading