foundational graph-theory 40 min read

Graph Laplacians & Spectrum

The eigenvalues and eigenvectors of graph Laplacians — from connectivity to clustering to graph neural networks

Overview & Motivation

A graph encodes pairwise relationships: who knows whom, which proteins interact, which cities are connected by flights. The adjacency matrix AA records these relationships, but it is the Laplacian L=DAL = D - A that reveals the graph’s deeper structure.

Why the Laplacian and not the adjacency matrix? Because the Laplacian is a real symmetric positive semidefinite matrix, and the Spectral Theorem guarantees it has a complete orthonormal eigenbasis with non-negative eigenvalues. These eigenvalues encode global properties that are invisible to local inspection:

  • How many pieces? The multiplicity of the zero eigenvalue equals the number of connected components.
  • How well-connected? The second-smallest eigenvalue λ2\lambda_2 — the Fiedler value or algebraic connectivity — quantifies the graph’s bottleneck.
  • Where to cut? The eigenvector corresponding to λ2\lambda_2 — the Fiedler vector — identifies the optimal bipartition.
  • How regular? The entropy of the random walk’s stationary distribution, connected to the spectrum via Shannon Entropy, measures how uniformly the graph distributes flow.

Cheeger’s inequality makes the connection between algebra and combinatorics precise: h(G)22λ22h(G)\frac{h(G)^2}{2} \leq \lambda_2 \leq 2h(G), where h(G)h(G) is the Cheeger constant measuring the graph’s combinatorial bottleneck.

Spectral clustering exploits this theory: embed vertices into the eigenspace of the Laplacian, then cluster in the embedding. Points that are non-linearly separable in the original space become linearly separable in the spectral embedding — the same idea as PCA, but with the Laplacian replacing the covariance matrix.

What we cover:

  1. Graphs and matrices — adjacency, degree, and Laplacian matrices for named graph families.
  2. The graph Laplacian — definition, quadratic form, positive semidefiniteness, and the zero eigenvalue.
  3. Spectral properties — eigenvalue ordering, connectivity, the Fiedler value and vector.
  4. The normalized Laplacian — connection to random walks and transition matrices.
  5. Cheeger’s inequality — the spectral gap bounds the combinatorial bottleneck.
  6. Spectral clustering — from similarity graphs to Laplacian eigenmaps to kk-means.
  7. Connections to ML — GNNs as Laplacian smoothing, graph signal processing.
  8. Computational notes — practical implementations in Python.

Graphs, Adjacency Matrices, and the Degree Matrix

We work with undirected, weighted graphs throughout. An undirected graph is the natural setting for the Laplacian: the resulting matrix is symmetric, which is exactly what the Spectral Theorem requires.

Definition 1 (Graph (Undirected, Weighted)).

An undirected weighted graph G=(V,E,w)G = (V, E, w) consists of:

  • A finite vertex set V={1,2,,n}V = \{1, 2, \ldots, n\},
  • An edge set E(V2)E \subseteq \binom{V}{2} of unordered pairs,
  • A weight function w:ER>0w: E \to \mathbb{R}_{>0} assigning a positive weight to each edge.

For unweighted graphs, w(e)=1w(e) = 1 for all eEe \in E.

Definition 2 (Adjacency Matrix).

The adjacency matrix ARn×nA \in \mathbb{R}^{n \times n} of GG is defined by:

Aij={w({i,j})if {i,j}E,0otherwise.A_{ij} = \begin{cases} w(\{i, j\}) & \text{if } \{i, j\} \in E, \\ 0 & \text{otherwise.} \end{cases}

Since GG is undirected, AA is symmetric: Aij=AjiA_{ij} = A_{ji}.

Definition 3 (Degree Matrix).

The degree matrix DRn×nD \in \mathbb{R}^{n \times n} is the diagonal matrix with entries:

Dii=di=j=1nAijD_{ii} = d_i = \sum_{j=1}^{n} A_{ij}

where did_i is the degree (or weighted degree) of vertex ii — the sum of all edge weights incident to ii.

Concrete examples. The topology of a graph shapes its matrix representations. Here are the standard families we will use throughout:

  • Path PnP_n: nn vertices in a line. AA is tridiagonal. Each interior vertex has degree 2; endpoints have degree 1.
  • Cycle CnC_n: PnP_n with the endpoints connected. AA is tridiagonal plus corner entries. Every vertex has degree 2 (a 22-regular graph).
  • Complete graph KnK_n: every pair connected. A=11TIA = \mathbf{1}\mathbf{1}^T - I. Every vertex has degree n1n - 1.
  • Star SnS_n: one hub connected to n1n - 1 leaves. The hub has degree n1n - 1; every leaf has degree 1.
  • Barbell: two KkK_k cliques joined by a single bridge edge. This is the canonical bottleneck graph.
  • Grid m×mm \times m: vertices on a lattice. Interior vertices have degree 4; edges have degree 3; corners have degree 2.

Named graphs gallery showing path, cycle, complete, star, barbell, grid, Petersen, and Erdős–Rényi graphs


The Graph Laplacian

Definition 4 (Graph Laplacian (Unnormalized)).

The (unnormalized) graph Laplacian of GG is the matrix:

L=DAL = D - A

Equivalently, the entries of LL are:

Lij={diif i=j,w({i,j})if {i,j}E,0otherwise.L_{ij} = \begin{cases} d_i & \text{if } i = j, \\ -w(\{i, j\}) & \text{if } \{i, j\} \in E, \\ 0 & \text{otherwise.} \end{cases}

Every row and every column of LL sums to zero: L1=0L \mathbf{1} = \mathbf{0}.

The Laplacian looks like a minor rearrangement of AA, but the sign flip on the off-diagonal entries is what makes it positive semidefinite. The key insight is the quadratic form.

Theorem 1 (Laplacian Quadratic Form).

For any vector xRn\mathbf{x} \in \mathbb{R}^n:

xTLx={i,j}Ewij(xixj)2\mathbf{x}^T L \mathbf{x} = \sum_{\{i, j\} \in E} w_{ij} (x_i - x_j)^2

The quadratic form measures the total variation of the signal x\mathbf{x} across the edges of the graph. A signal that is constant on each connected component has zero variation; a signal that differs wildly across edges has large variation. This is the Dirichlet energy of x\mathbf{x} on GG.

Proof.

We expand xTLx\mathbf{x}^T L \mathbf{x} directly:

xTLx=xT(DA)x=xTDxxTAx\mathbf{x}^T L \mathbf{x} = \mathbf{x}^T (D - A) \mathbf{x} = \mathbf{x}^T D \mathbf{x} - \mathbf{x}^T A \mathbf{x}

The first term is xTDx=i=1ndixi2\mathbf{x}^T D \mathbf{x} = \sum_{i=1}^n d_i x_i^2. The second is xTAx=i=1nj=1nAijxixj=2{i,j}Ewijxixj\mathbf{x}^T A \mathbf{x} = \sum_{i=1}^n \sum_{j=1}^n A_{ij} x_i x_j = 2\sum_{\{i,j\} \in E} w_{ij} x_i x_j (the factor of 2 because each edge is counted twice in the double sum).

Meanwhile, i=1ndixi2=i=1nj:{i,j}Ewijxi2={i,j}Ewij(xi2+xj2)\sum_{i=1}^n d_i x_i^2 = \sum_{i=1}^n \sum_{j: \{i,j\} \in E} w_{ij} x_i^2 = \sum_{\{i,j\} \in E} w_{ij}(x_i^2 + x_j^2) (each edge contributes wijxi2w_{ij} x_i^2 from vertex ii and wijxj2w_{ij} x_j^2 from vertex jj).

Combining:

xTLx={i,j}Ewij(xi2+xj2)2{i,j}Ewijxixj={i,j}Ewij(xixj)2\mathbf{x}^T L \mathbf{x} = \sum_{\{i,j\} \in E} w_{ij}(x_i^2 + x_j^2) - 2\sum_{\{i,j\} \in E} w_{ij} x_i x_j = \sum_{\{i,j\} \in E} w_{ij}(x_i - x_j)^2

Theorem 2 (Positive Semidefiniteness of L).

The graph Laplacian LL is positive semidefinite: xTLx0\mathbf{x}^T L \mathbf{x} \geq 0 for all xRn\mathbf{x} \in \mathbb{R}^n. Moreover, L1=0L \mathbf{1} = \mathbf{0}, so λ1=0\lambda_1 = 0 is always an eigenvalue with eigenvector 1=(1,1,,1)T\mathbf{1} = (1, 1, \ldots, 1)^T.

Proof.

From Theorem 1, xTLx={i,j}Ewij(xixj)20\mathbf{x}^T L \mathbf{x} = \sum_{\{i,j\} \in E} w_{ij}(x_i - x_j)^2 \geq 0 since each term is non-negative (weights are positive, squares are non-negative). This is the definition of positive semidefiniteness.

For the zero eigenvalue: (L1)i=jLij1=dijiAij=didi=0(L\mathbf{1})_i = \sum_j L_{ij} \cdot 1 = d_i - \sum_{j \neq i} A_{ij} = d_i - d_i = 0, so L1=0L\mathbf{1} = \mathbf{0}.

Adjacency, degree, and Laplacian matrices for a 6-vertex example graph, with quadratic form verification

Remark (Graph Laplacian as Discrete Laplace Operator).

The graph Laplacian is the discrete analog of the continuous Laplace operator 2\nabla^2. For a function ff defined on a manifold, (2f)(p)(\nabla^2 f)(p) measures how much f(p)f(p) deviates from the average of ff over a small neighborhood of pp. Similarly, (Lf)i=difijAijfj=j:{i,j}Ewij(fifj)(Lf)_i = d_i f_i - \sum_{j} A_{ij} f_j = \sum_{j: \{i,j\} \in E} w_{ij}(f_i - f_j) measures how much f(i)f(i) deviates from its neighbors’ values. This connection deepens in the Differential Geometry track, where the Laplace–Beltrami operator on Riemannian manifolds generalizes both.

import numpy as np

# Laplacian construction from an adjacency matrix
A = np.array([[0,1,1,0],[1,0,1,1],[1,1,0,0],[0,1,0,0]], dtype=float)
D = np.diag(A.sum(axis=1))
L = D - A

print(f"L = D - A =\n{L}")
# [[ 2. -1. -1.  0.]
#  [-1.  3. -1. -1.]
#  [-1. -1.  2.  0.]
#  [ 0. -1.  0.  1.]]
Click empty space: add nodeClick node: select for edge toggleClick second node: toggle edgeDouble-click node: deleteDrag node: reposition

Spectral Properties of the Laplacian

Since LL is real symmetric and PSD, the Spectral Theorem guarantees that its eigenvalues are real and non-negative. We order them:

0=λ1λ2λn0 = \lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n

The first eigenvalue is always λ1=0\lambda_1 = 0 (with eigenvector 1\mathbf{1}). The central question is: what does the rest of the spectrum tell us about the graph?

Theorem 3 (Zero Eigenvalue Multiplicity = Number of Connected Components).

The multiplicity of λ=0\lambda = 0 as an eigenvalue of LL equals the number of connected components of GG.

Proof.

We characterize the null space of LL. A vector x\mathbf{x} satisfies Lx=0L\mathbf{x} = \mathbf{0} if and only if xTLx=0\mathbf{x}^T L \mathbf{x} = 0 (since LL is PSD, Lx=0xTLx=0L\mathbf{x} = \mathbf{0} \Leftrightarrow \mathbf{x}^T L \mathbf{x} = 0). By the quadratic form (Theorem 1):

xTLx={i,j}Ewij(xixj)2=0\mathbf{x}^T L \mathbf{x} = \sum_{\{i,j\} \in E} w_{ij}(x_i - x_j)^2 = 0

Since every term is non-negative and wij>0w_{ij} > 0, each term must vanish: xi=xjx_i = x_j for every edge {i,j}E\{i, j\} \in E. By transitivity along paths, xi=xjx_i = x_j for every pair of vertices in the same connected component.

Therefore xker(L)\mathbf{x} \in \ker(L) if and only if x\mathbf{x} is constant on each connected component. If GG has kk connected components C1,,CkC_1, \ldots, C_k, then ker(L)=span{1C1,,1Ck}\ker(L) = \mathrm{span}\{\mathbf{1}_{C_1}, \ldots, \mathbf{1}_{C_k}\} where 1Ci\mathbf{1}_{C_i} is the indicator vector of component CiC_i. These are linearly independent, so dim(ker(L))=k\dim(\ker(L)) = k.

Corollary 1 (Disconnected Graphs Have λ₂ = 0).

GG is connected if and only if λ2>0\lambda_2 > 0. Equivalently, GG is disconnected if and only if λ2=0\lambda_2 = 0.

Definition 7 (Algebraic Connectivity (Fiedler Value)).

The algebraic connectivity of a connected graph GG is the second-smallest eigenvalue λ2(L)\lambda_2(L), also called the Fiedler value (after Miroslav Fiedler, who introduced it in 1973). The corresponding eigenvector v2\mathbf{v}_2 is the Fiedler vector.

The Fiedler value is a spectral measure of “how connected” the graph is. A large λ2\lambda_2 means the graph is well-connected with no bottleneck; a small λ2\lambda_2 means there is a narrow bridge that nearly disconnects the graph.

Theorem 4 (Fiedler's Theorem (Spectral Bipartitioning)).

For a connected graph GG, the Fiedler vector v2\mathbf{v}_2 provides a near-optimal bipartition. Specifically, the partition S={i:v2,i0}S = \{i : v_{2,i} \geq 0\}, Sˉ={i:v2,i<0}\bar{S} = \{i : v_{2,i} < 0\} approximates the minimum ratio cut.

The Fiedler value has the variational characterization:

λ2=minx1,x0xTLxxTx=minx1,x0{i,j}Ewij(xixj)2ixi2\lambda_2 = \min_{\mathbf{x} \perp \mathbf{1}, \, \mathbf{x} \neq \mathbf{0}} \frac{\mathbf{x}^T L \mathbf{x}}{\mathbf{x}^T \mathbf{x}} = \min_{\mathbf{x} \perp \mathbf{1}, \, \mathbf{x} \neq \mathbf{0}} \frac{\sum_{\{i,j\} \in E} w_{ij}(x_i - x_j)^2}{\sum_i x_i^2}

This is the Courant–Fischer characterization from the Spectral Theorem applied to LL.

Proof.

The variational characterization follows directly from the Courant–Fischer theorem (see Spectral Theorem, Theorem 2): the kk-th eigenvalue of a symmetric matrix equals the min-max of its Rayleigh quotient. For k=2k = 2, we minimize over vectors orthogonal to the first eigenvector 1\mathbf{1}.

For the bipartitioning claim: the discrete optimization problem “find SVS \subset V minimizing cut(S,Sˉ)min(S,Sˉ)\frac{\text{cut}(S, \bar{S})}{\min(|S|, |\bar{S}|)}” is NP-hard. Relaxing the discrete constraint (indicator vectors x{1,+1}n\mathbf{x} \in \{-1, +1\}^n) to the continuous constraint (xRn\mathbf{x} \in \mathbb{R}^n, x1\mathbf{x} \perp \mathbf{1}, x=1\|\mathbf{x}\| = 1) yields exactly the Fiedler value problem. The Fiedler vector is the solution to this relaxation, and rounding its entries by sign gives a partition that approximates the discrete optimum. This is a convex relaxation of the combinatorial problem.

Corollary 2 (Complete Graph Spectrum).

The complete graph KnK_n has Laplacian eigenvalues λ1=0\lambda_1 = 0 (multiplicity 1) and λ2==λn=n\lambda_2 = \cdots = \lambda_n = n (multiplicity n1n - 1). Its algebraic connectivity is nn, the maximum possible for any graph on nn vertices — KnK_n is as well-connected as a graph can be.

Proposition 1 (Spectra of Named Graphs).

The Laplacian eigenvalues of standard graph families have closed-form expressions:

  • Path PnP_n: λk=22cos ⁣((k1)πn)\lambda_k = 2 - 2\cos\!\left(\frac{(k-1)\pi}{n}\right) for k=1,,nk = 1, \ldots, n.
  • Cycle CnC_n: λk=22cos ⁣(2(k1)πn)\lambda_k = 2 - 2\cos\!\left(\frac{2(k-1)\pi}{n}\right) for k=1,,nk = 1, \ldots, n.
  • Star SnS_n: λ1=0\lambda_1 = 0, λ2==λn1=1\lambda_2 = \cdots = \lambda_{n-1} = 1, λn=n\lambda_n = n.

Proposition 2 (Laplacian of d-regular Graphs).

If GG is dd-regular (every vertex has degree dd), then L=dIAL = dI - A and the Laplacian eigenvalues are λi=dμi\lambda_i = d - \mu_i, where μ1μ2μn\mu_1 \geq \mu_2 \geq \cdots \geq \mu_n are the eigenvalues of AA.

Proof.

Since D=dID = dI, we have L=dIAL = dI - A. If Av=μvA\mathbf{v} = \mu \mathbf{v}, then Lv=(dIA)v=(dμ)vL\mathbf{v} = (dI - A)\mathbf{v} = (d - \mu)\mathbf{v}. The eigenvalues of LL are dμid - \mu_i, sorted in ascending order (since μ1=d\mu_1 = d is the largest adjacency eigenvalue with eigenvector 1\mathbf{1}, giving λ1=0\lambda_1 = 0).

Laplacian spectra of named graphs and Fiedler vector coloring

# Eigendecomposition of the Laplacian
eigenvalues, eigenvectors = np.linalg.eigh(L)
print(f"Eigenvalues: {eigenvalues.round(4)}")
print(f"Fiedler value λ₂ = {eigenvalues[1]:.4f}")
print(f"Fiedler vector v₂ = {eigenvectors[:, 1].round(4)}")

The Normalized Laplacian

The unnormalized Laplacian LL treats all vertices equally, regardless of degree. For graphs with heterogeneous degree distributions, the normalized Laplacian accounts for the local connectivity of each vertex.

Definition 5 (Normalized Laplacian).

The symmetric normalized Laplacian is:

L=D1/2LD1/2=ID1/2AD1/2\mathcal{L} = D^{-1/2} L D^{-1/2} = I - D^{-1/2} A D^{-1/2}

where D1/2D^{-1/2} is the diagonal matrix with entries (D1/2)ii=1/di(D^{-1/2})_{ii} = 1/\sqrt{d_i} (defined only for vertices with di>0d_i > 0).

Definition 6 (Random Walk Laplacian).

The random walk Laplacian is:

Lrw=D1L=ID1A=IPL_{\text{rw}} = D^{-1}L = I - D^{-1}A = I - P

where P=D1AP = D^{-1}A is the transition matrix of the random walk on GG: Pij=Aij/diP_{ij} = A_{ij}/d_i is the probability of stepping from vertex ii to vertex jj.

The three Laplacians are closely related. If Lu=λu\mathcal{L}\mathbf{u} = \lambda \mathbf{u}, then Lrw(D1/2u)=λ(D1/2u)L_{\text{rw}}(D^{1/2}\mathbf{u}) = \lambda (D^{1/2}\mathbf{u}), so L\mathcal{L} and LrwL_{\text{rw}} share eigenvalues. The eigenvectors differ by the transformation D1/2D^{1/2}.

Theorem 5 (Normalized Laplacian Eigenvalue Range).

All eigenvalues of L\mathcal{L} lie in [0,2][0, 2]: 0=λ1λ2λn20 = \lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n \leq 2.

Proof.

The smallest eigenvalue is λ1=0\lambda_1 = 0 with eigenvector D1/21D^{1/2}\mathbf{1} (verify: LD1/21=D1/2L1=0\mathcal{L} D^{1/2}\mathbf{1} = D^{-1/2}L\mathbf{1} = \mathbf{0}).

For the upper bound, let u\mathbf{u} be any eigenvector with u=1\|\mathbf{u}\| = 1. Then:

λ=uTLu=uT(ID1/2AD1/2)u=1uTD1/2AD1/2u\lambda = \mathbf{u}^T \mathcal{L} \mathbf{u} = \mathbf{u}^T(I - D^{-1/2}AD^{-1/2})\mathbf{u} = 1 - \mathbf{u}^T D^{-1/2}AD^{-1/2}\mathbf{u}

Setting y=D1/2u\mathbf{y} = D^{-1/2}\mathbf{u}, we have uTD1/2AD1/2u=i,jAijyiyj/didj\mathbf{u}^T D^{-1/2}AD^{-1/2}\mathbf{u} = \sum_{i,j} A_{ij} y_i y_j / \sqrt{d_i d_j}. Using the Cauchy–Schwarz inequality and the constraint that row sums of D1/2AD1/2D^{-1/2}AD^{-1/2} are bounded, one can show uTD1/2AD1/2u1|\mathbf{u}^T D^{-1/2}AD^{-1/2}\mathbf{u}| \leq 1, giving λ2\lambda \leq 2.

More precisely: uTD1/2AD1/2u={i,j}E2wijdidjuiuj\mathbf{u}^T D^{-1/2}AD^{-1/2}\mathbf{u} = \sum_{\{i,j\} \in E} \frac{2w_{ij}}{\sqrt{d_i d_j}} u_i u_j. By AM-GM, 2uiujui2+uj22|u_i u_j| \leq u_i^2 + u_j^2, and {i,j}Ewijdidj(ui2+uj2)=iui2jwijdidjiui2=1\sum_{\{i,j\} \in E} \frac{w_{ij}}{\sqrt{d_i d_j}}(u_i^2 + u_j^2) = \sum_i u_i^2 \sum_j \frac{w_{ij}}{\sqrt{d_i d_j}} \leq \sum_i u_i^2 = 1 (since jwijdidj=dididi(average)1\sum_j \frac{w_{ij}}{\sqrt{d_i d_j}} = \frac{d_i}{\sqrt{d_i} \cdot \sqrt{d_i}} \cdot \text{(average)} \leq 1). Therefore uTD1/2AD1/2u1|\mathbf{u}^T D^{-1/2}AD^{-1/2}\mathbf{u}| \leq 1 and λ[0,2]\lambda \in [0, 2].

Theorem 7 (Bipartiteness and λₙ = 2).

λn(L)=2\lambda_n(\mathcal{L}) = 2 if and only if GG has a bipartite connected component. Equivalently, the largest normalized Laplacian eigenvalue equals 2 precisely when the graph contains an odd-cycle-free component.

Proof.

()(\Leftarrow) Suppose GG is bipartite with vertex partition V=V+VV = V_+ \cup V_-. Define u\mathbf{u} by ui=+1/diu_i = +1/\sqrt{d_i} for iV+i \in V_+ and ui=1/diu_i = -1/\sqrt{d_i} for iVi \in V_-. Then Lu=2u\mathcal{L}\mathbf{u} = 2\mathbf{u} because every edge connects V+V_+ to VV_-, so the sign flips perfectly.

()(\Rightarrow) If λn=2\lambda_n = 2, then the corresponding eigenvector u\mathbf{u} satisfies uT(ID1/2AD1/2)u=2\mathbf{u}^T(I - D^{-1/2}AD^{-1/2})\mathbf{u} = 2, which forces uTD1/2AD1/2u=1\mathbf{u}^T D^{-1/2}AD^{-1/2}\mathbf{u} = -1. This means every edge contributes maximally negative correlation: for each edge {i,j}\{i,j\}, uiu_i and uju_j have opposite signs. This is precisely the bipartition condition.

Connection to entropy. The random walk on GG has stationary distribution πi=di/(2m)\pi_i = d_i / (2m) where m={i,j}Ewijm = \sum_{\{i,j\} \in E} w_{ij} is the total edge weight. The Shannon entropy of π\boldsymbol{\pi} is H(π)=iπilog2πiH(\boldsymbol{\pi}) = -\sum_i \pi_i \log_2 \pi_i. For a dd-regular graph, π\boldsymbol{\pi} is uniform and H(π)=log2nH(\boldsymbol{\pi}) = \log_2 n — the maximum possible entropy. The spectral gap 1λ2(P)=λ2(L)1 - \lambda_2(P) = \lambda_2(\mathcal{L}) governs how quickly the walk’s distribution converges to π\boldsymbol{\pi}, i.e., how fast entropy increases toward equilibrium.

Normalized vs unnormalized Laplacian eigenvalue comparison across graph families

# Normalized Laplacian construction
D_inv_sqrt = np.diag(1.0 / np.sqrt(np.where(d > 0, d, 1)))
L_norm = D_inv_sqrt @ L @ D_inv_sqrt  # = I - D^{-1/2} A D^{-1/2}

eigenvalues_norm = np.linalg.eigh(L_norm)[0]
print(f"Normalized eigenvalues: {eigenvalues_norm.round(4)}")
print(f"All in [0, 2]: {np.all(eigenvalues_norm >= -1e-10) and np.all(eigenvalues_norm <= 2 + 1e-10)}")

Cheeger’s Inequality

The Fiedler value λ2\lambda_2 is an algebraic quantity. The Cheeger constant h(G)h(G) is a combinatorial quantity. Cheeger’s inequality links them, providing a bridge between linear algebra and graph theory.

Definition 8 (Cheeger Constant (Isoperimetric Number)).

The Cheeger constant (or isoperimetric number) of a graph GG is:

h(G)=minSVE(S,Sˉ)min(vol(S),vol(Sˉ))h(G) = \min_{\emptyset \neq S \subset V} \frac{|E(S, \bar{S})|}{\min(\mathrm{vol}(S), \mathrm{vol}(\bar{S}))}

where E(S,Sˉ)=iS,jSˉwij|E(S, \bar{S})| = \sum_{i \in S, j \in \bar{S}} w_{ij} is the total weight of edges crossing the cut, and vol(S)=iSdi\mathrm{vol}(S) = \sum_{i \in S} d_i is the volume of SS (the sum of degrees in SS).

The Cheeger constant measures the graph’s worst bottleneck: the smallest ratio of “edges crossing a cut” to “the smaller side’s connectivity.” A large h(G)h(G) means every cut has many crossing edges relative to the smaller side — the graph is well-connected everywhere. A small h(G)h(G) means there is a narrow bridge.

Theorem 6 (Cheeger's Inequality).

For any graph GG:

h(G)22λ2(L)2h(G)\frac{h(G)^2}{2} \leq \lambda_2(\mathcal{L}) \leq 2h(G)

The spectral gap λ2\lambda_2 of the normalized Laplacian is sandwiched between h(G)2/2h(G)^2/2 and 2h(G)2h(G).

Proof.

Easy direction (λ22h\lambda_2 \leq 2h). We exhibit a test vector that achieves a small Rayleigh quotient.

Let SS be the subset achieving the Cheeger constant: h(G)=E(S,Sˉ)/vol(S)h(G) = |E(S, \bar{S})| / \mathrm{vol}(S) with vol(S)vol(Sˉ)\mathrm{vol}(S) \leq \mathrm{vol}(\bar{S}). Define the test vector fRn\mathbf{f} \in \mathbb{R}^n by:

fi={1/vol(S)if iS,1/vol(Sˉ)if iSˉ.f_i = \begin{cases} 1/\mathrm{vol}(S) & \text{if } i \in S, \\ -1/\mathrm{vol}(\bar{S}) & \text{if } i \in \bar{S}. \end{cases}

Then f\mathbf{f} is orthogonal to D1/21D^{1/2}\mathbf{1} (the first eigenvector of L\mathcal{L}), and the Rayleigh quotient of D1/2fD^{1/2}\mathbf{f} gives:

λ2{i,j}Ewij(fifj)2idifi2\lambda_2 \leq \frac{\sum_{\{i,j\} \in E} w_{ij}(f_i - f_j)^2}{\sum_i d_i f_i^2}

The numerator is E(S,Sˉ)(1/vol(S)+1/vol(Sˉ))2|E(S, \bar{S})| \cdot (1/\mathrm{vol}(S) + 1/\mathrm{vol}(\bar{S}))^2 and the denominator is 1/vol(S)+1/vol(Sˉ)1/\mathrm{vol}(S) + 1/\mathrm{vol}(\bar{S}). Working through the algebra:

λ2E(S,Sˉ)vol(S)+vol(Sˉ)vol(S)vol(Sˉ)2E(S,Sˉ)vol(S)=2h(G)\lambda_2 \leq |E(S, \bar{S})| \cdot \frac{\mathrm{vol}(S) + \mathrm{vol}(\bar{S})}{\mathrm{vol}(S) \cdot \mathrm{vol}(\bar{S})} \leq \frac{2 \cdot |E(S, \bar{S})|}{\mathrm{vol}(S)} = 2h(G)

Hard direction (h2/2λ2h^2/2 \leq \lambda_2) — sketch. This is the deeper inequality. The idea is: given the Fiedler vector v2\mathbf{v}_2, construct a threshold cut by sorting vertices by v2,iv_{2,i} and sweeping a threshold. By a careful Cauchy–Schwarz argument, the best threshold cut achieves h(S)2λ2h(S) \leq \sqrt{2\lambda_2}, which gives h(G)2λ2h(G) \leq \sqrt{2\lambda_2} and hence h(G)2/2λ2h(G)^2/2 \leq \lambda_2. The full proof uses the co-area formula on graphs and can be found in Chung (1997), Chapter 2.

Interpretation. Cheeger’s inequality says that the spectral gap is a faithful proxy for the combinatorial bottleneck:

  • Large λ2\lambda_2 \Rightarrow large h(G)h(G) \Rightarrow every cut crosses many edges \Rightarrow the graph is an expander (the subject of Expander Graphs).
  • Small λ2\lambda_2 \Rightarrow small h(G)h(G) \Rightarrow there exists a narrow bottleneck \Rightarrow the graph is nearly disconnected.

Cheeger's inequality: spectral gap vs combinatorial bottleneck for several graph families

Cut:
h(G) = 0.0476
λ₂ = 0.0726
Bounds: 0.0011 0.0726 0.0952

Spectral Clustering

Spectral clustering is the most celebrated application of graph Laplacian theory in machine learning. The idea: transform data using the eigenvectors of a graph Laplacian, then cluster in the transformed space. Points that are non-linearly separable in the original space become linearly separable in the spectral embedding — a phenomenon that standard kk-means cannot achieve.

Remark (Why 'Spectral' in Spectral Clustering).

The word “spectral” comes from functional analysis, where the set of eigenvalues of an operator is called its spectrum. Spectral clustering uses the eigenvalues and eigenvectors of the graph Laplacian — its spectrum — to find clusters.

The Algorithm

Unnormalized spectral clustering (Laplacian eigenmaps + kk-means):

  1. Build a similarity graph. Given data points x1,,xnRd\mathbf{x}_1, \ldots, \mathbf{x}_n \in \mathbb{R}^d, construct a kk-nearest-neighbor graph or ε\varepsilon-ball graph. Weight edges with the Gaussian kernel:

wij=exp ⁣(xixj22σ2)w_{ij} = \exp\!\left(-\frac{\|\mathbf{x}_i - \mathbf{x}_j\|^2}{2\sigma^2}\right)

where σ\sigma is typically set to the median pairwise distance.

  1. Compute the Laplacian. Form L=DAL = D - A (unnormalized) or L=ID1/2AD1/2\mathcal{L} = I - D^{-1/2}AD^{-1/2} (normalized).

  2. Embed. Compute the kk smallest eigenvectors v1,,vk\mathbf{v}_1, \ldots, \mathbf{v}_k of LL (or L\mathcal{L}). Form the embedding matrix URn×kU \in \mathbb{R}^{n \times k} whose columns are these eigenvectors. Each row of UU is a point in Rk\mathbb{R}^k.

  3. Cluster. Run kk-means on the rows of UU.

Normalized variants (preferred in practice):

  • Shi–Malik (2000): Use LrwL_{\text{rw}} eigenvectors. Equivalent to normalizing the rows of UU before kk-means.
  • Ng–Jordan–Weiss (2001): Use L\mathcal{L} eigenvectors, then normalize each row to unit length before kk-means.

Why It Works

Consider a graph with kk ideal clusters — kk cliques with no edges between them. The Laplacian is block diagonal:

L=(L1Lk)L = \begin{pmatrix} L_1 & & \\ & \ddots & \\ & & L_k \end{pmatrix}

Each block LiL_i has a zero eigenvalue with eigenvector 1Ci\mathbf{1}_{C_i}. The bottom kk eigenvectors of LL are exactly the indicator vectors of the clusters. The rows of UU are one-hot vectors, so kk-means trivially separates them.

When the clusters are not perfectly separated — when there are a few edges between them — the eigenvectors are perturbed versions of the ideal indicators. The perturbation is small when the inter-cluster edges are few (relative to intra-cluster edges), and kk-means still succeeds.

This is why spectral clustering handles non-convex clusters that kk-means in the original space cannot: the Laplacian embedding transforms the data from Euclidean proximity (which fails for non-convex shapes) to graph connectivity (which captures the manifold structure).

Spectral clustering pipeline: data → similarity graph → spectral embedding → clusters

from sklearn.cluster import SpectralClustering
from sklearn.datasets import make_moons

# Two moons: non-convex clusters that k-means cannot separate
X, y = make_moons(200, noise=0.06, random_state=42)

sc = SpectralClustering(
    n_clusters=2,
    affinity='nearest_neighbors',
    n_neighbors=10,
    random_state=42
)
labels = sc.fit_predict(X)
accuracy = max(np.mean(labels == y), np.mean(1 - labels == y))
print(f"Spectral clustering accuracy: {accuracy:.1%}")  # ~100%

Connections to Machine Learning

Graph Signal Processing

A graph signal is a function fRn\mathbf{f} \in \mathbb{R}^n that assigns a real value to each vertex. The eigenvectors of the Laplacian form an orthonormal basis — the graph Fourier basis — and the graph Fourier transform of f\mathbf{f} is:

f^=UTf\hat{\mathbf{f}} = U^T \mathbf{f}

where U=[v1vn]U = [\mathbf{v}_1 \mid \cdots \mid \mathbf{v}_n] is the matrix of Laplacian eigenvectors. Low-eigenvalue components correspond to smooth signals (slowly varying across edges); high-eigenvalue components correspond to rough signals (rapidly varying). Filtering in the spectral domain — zeroing out high-frequency components — amounts to smoothing the signal on the graph.

GCNs as Laplacian Smoothing

The Graph Convolutional Network (GCN) of Kipf & Welling (2017) performs the update:

H(+1)=σ ⁣(D~1/2A~D~1/2H()W())H^{(\ell+1)} = \sigma\!\left(\tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2} H^{(\ell)} W^{(\ell)}\right)

where A~=A+I\tilde{A} = A + I is the adjacency matrix with added self-loops and D~\tilde{D} is its degree matrix. The matrix D~1/2A~D~1/2=IL~\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2} = I - \tilde{\mathcal{L}} is one minus the normalized Laplacian of the self-loop-augmented graph. This means each GCN layer smooths the node features: every node’s representation is replaced by a weighted average of itself and its neighbors.

Remark (GCN as Laplacian Smoothing).

The renormalization trick A~=A+I\tilde{A} = A + I avoids the need to explicitly add a skip connection. Without self-loops, D1/2AD1/2D^{-1/2}AD^{-1/2} averages over neighbors only; with self-loops, D~1/2A~D~1/2\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2} averages over neighbors and the node itself. This is a low-pass filter on the graph: it amplifies low-frequency (smooth) components and attenuates high-frequency (rough) components. Stacking too many GCN layers causes over-smoothing — all node features converge to the same value, dominated by the eigenvector for λ1=0\lambda_1 = 0.

Message Passing & GNNs develops this connection further, showing how message-passing neural networks generalize GCNs and how the Laplacian spectrum determines what a GNN can and cannot learn.

GCN as Laplacian smoothing: signal propagation over 7 steps on a community graph

import torch

# Single GCN layer: H' = σ(D̃^{-1/2} Ã D̃^{-1/2} H W)
A_tilde = A + torch.eye(n)                  # Add self-loops
D_tilde = torch.diag(A_tilde.sum(dim=1))
D_inv_sqrt = torch.diag(1.0 / torch.sqrt(D_tilde.diagonal()))
S = D_inv_sqrt @ A_tilde @ D_inv_sqrt       # Normalized adjacency

H = torch.randn(n, 16)                      # Node features
W = torch.randn(16, 8)                       # Learnable weights
H_next = torch.relu(S @ H @ W)              # One GCN layer

Computational Notes

NumPy / SciPy

import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse.csgraph import laplacian

# Dense Laplacian
A = np.array([[0,1,1,0],[1,0,1,1],[1,1,0,0],[0,1,0,0]], dtype=float)
D = np.diag(A.sum(axis=1))
L = D - A

# Eigendecomposition (use eigh for symmetric matrices — faster and more stable)
eigenvalues, eigenvectors = np.linalg.eigh(L)
fiedler_value = eigenvalues[1]
fiedler_vector = eigenvectors[:, 1]

# Sparse Laplacian (for large graphs)
A_sp = csr_matrix(A)
L_sp = laplacian(A_sp)

NetworkX

import networkx as nx

G = nx.from_numpy_array(A)
print(f"Spectrum: {np.sort(nx.laplacian_spectrum(G)).round(4)}")
print(f"Algebraic connectivity: {nx.algebraic_connectivity(G):.4f}")
print(f"Fiedler vector: {nx.fiedler_vector(G).round(4)}")

scikit-learn

from sklearn.cluster import SpectralClustering

sc = SpectralClustering(
    n_clusters=2,
    affinity='nearest_neighbors',
    n_neighbors=10,
    random_state=42
)
labels = sc.fit_predict(X)

Practical tips:

  • Always use np.linalg.eigh (not eig) for symmetric matrices — it is faster, more numerically stable, and guarantees real eigenvalues.
  • Threshold near-zero eigenvalues: treat λ<1010|\lambda| < 10^{-10} as zero when counting connected components.
  • For large sparse graphs, use scipy.sparse.linalg.eigsh(L, k=3, which='SM') to compute only the smallest kk eigenvalues.
  • Set the Gaussian kernel bandwidth σ\sigma to the median pairwise distance — a robust default.

Connections & Further Reading

TopicConnection
The Spectral TheoremThe graph Laplacian is real symmetric. The Spectral Theorem guarantees a complete orthonormal eigenbasis — the foundation of spectral graph theory. The Courant–Fischer theorem underlies the Fiedler value characterization and Cheeger’s inequality.
Shannon Entropy & Mutual InformationThe entropy of the random walk stationary distribution H(π)H(\boldsymbol{\pi}) measures graph regularity. Regular graphs have uniform π\boldsymbol{\pi} and maximum entropy log2n\log_2 n. The spectral gap governs the mixing rate.
PCA & Low-Rank ApproximationSpectral clustering uses the bottom eigenvectors of the Laplacian as an embedding, analogous to PCA using the top eigenvectors of the covariance matrix. Both are eigenmap embeddings: PCA preserves variance, spectral clustering preserves graph connectivity.
Convex AnalysisThe quadratic form xTLx\mathbf{x}^T L \mathbf{x} is convex. Spectral clustering relaxes the NP-hard normalized cut to a continuous eigenvalue problem — a convex relaxation.
Random Walks & MixingThe transition matrix P=D1AP = D^{-1}A has eigenvalues μi=1λi(L)\mu_i = 1 - \lambda_i(\mathcal{L}), and the spectral gap γ=λ2(L)\gamma = \lambda_2(\mathcal{L}) controls mixing time — how quickly the walk’s distribution converges to the stationary distribution.
Expander GraphsExpander Graphs studies the graphs that maximize connectivity at fixed sparsity. The spectral gap — bounded below by Cheeger’s inequality — is the defining quantity: expanders have spectral gap bounded away from zero.
Message Passing & GNNsGCN message passing is Laplacian smoothing. Over-smoothing occurs when too many layers act as a low-pass filter, collapsing all node features toward the dominant eigenvector.

Key Notation

SymbolMeaning
L=DAL = D - AUnnormalized graph Laplacian
L=D1/2LD1/2\mathcal{L} = D^{-1/2}LD^{-1/2}Normalized Laplacian
Lrw=ID1AL_{\text{rw}} = I - D^{-1}ARandom walk Laplacian
P=D1AP = D^{-1}ARandom walk transition matrix
λ2\lambda_2Fiedler value (algebraic connectivity)
v2\mathbf{v}_2Fiedler vector
h(G)h(G)Cheeger constant
vol(S)=iSdi\mathrm{vol}(S) = \sum_{i \in S} d_iVolume of vertex set

Connections

  • The graph Laplacian is a real symmetric matrix. The Spectral Theorem guarantees it has a complete orthonormal eigenbasis with real eigenvalues — the foundation of every result in spectral graph theory. spectral-theorem
  • The entropy of a random walk's stationary distribution measures graph regularity. On a d-regular graph, the stationary distribution is uniform with maximum entropy log(n). The spectral gap governs how quickly the walk's distribution converges to stationarity — how fast entropy increases. shannon-entropy
  • The Laplacian quadratic form x^T L x = sum_{(i,j)} w_{ij}(x_i - x_j)^2 is a convex function whose minimization (subject to constraints) yields the Fiedler vector. Graph cuts are combinatorial optimization problems relaxed to spectral problems via convex relaxation. convex-analysis
  • Spectral clustering uses the bottom eigenvectors of the Laplacian as a low-dimensional embedding — analogous to PCA using the top eigenvectors of the covariance matrix. Both are instances of eigenmap embeddings that preserve specific notions of structure. pca-low-rank

References & Further Reading

  • book Spectral Graph Theory — Chung (1997) The foundational text — covers the normalized Laplacian, Cheeger's inequality, and connections to random walks
  • book Algebraic Graph Theory — Godsil & Royle (2001) Chapters 8-9 cover the adjacency and Laplacian spectra with algebraic rigor
  • paper A Tutorial on Spectral Clustering — von Luxburg (2007) The standard reference for spectral clustering — bridges graph Laplacian theory to machine learning practice
  • paper Semi-Supervised Classification with Graph Convolutional Networks — Kipf & Welling (2017) GCN = renormalized Laplacian smoothing — the paper that made graph Laplacians central to deep learning
  • paper The Laplacian Spectrum of Graphs — Mohar (1991) Survey of Laplacian eigenvalue bounds and their combinatorial interpretations
  • book Graph Signal Processing: Overview, Challenges, and Applications — Ortega, Frossard, Kovačević, Moura & Vandergheynst (2018) Graph Fourier transform = eigenvector expansion of the Laplacian