foundational linear-algebra 35 min read

The Spectral Theorem

From eigenvalues to geometry — why symmetric matrices are the best-behaved objects in linear algebra

Overview & Motivation

Every machine learning algorithm that touches a covariance matrix, a kernel matrix, a graph Laplacian, or a Hessian is secretly relying on a single theorem: the Spectral Theorem.

The Spectral Theorem says that every real symmetric matrix ARn×nA \in \mathbb{R}^{n \times n} can be diagonalized by an orthogonal matrix:

A=QΛQTA = Q \Lambda Q^T

where QQ is orthogonal (QTQ=IQ^T Q = I) and Λ=diag(λ1,,λn)\Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n) is the diagonal matrix of real eigenvalues. This is not just a convenience — it’s a structural guarantee that underpins:

  • PCA: the sample covariance matrix Σ^=1n1XTX\hat{\Sigma} = \frac{1}{n-1} X^T X is symmetric, so its eigenvectors are orthogonal principal directions.
  • Spectral clustering: the graph Laplacian L=DWL = D - W is symmetric positive semidefinite, so its smallest eigenvectors encode cluster structure.
  • Optimization: the Hessian 2f\nabla^2 f is symmetric, and its eigenvalues determine convexity.
  • Kernel methods: kernel matrices Kij=k(xi,xj)K_{ij} = k(x_i, x_j) are symmetric PSD by construction.
  • The Sheaf Laplacian: from the Sheaf Theory topic on the Topology & TDA track, LF=δ0Tδ0L_\mathcal{F} = \delta_0^T \delta_0 is symmetric PSD, and the Spectral Theorem is what makes its eigendecomposition well-defined and interpretable.

This topic develops the Spectral Theorem from first principles, proves it, implements it, and applies it to spectral clustering — building the algebraic foundation that the rest of the Linear Algebra track depends on.

What We Cover

  1. Eigenvalues & Eigenvectors — characteristic polynomials, eigenspaces, diagonalization.
  2. Symmetric Matrices & Orthogonality — real eigenvalues, orthogonal eigenvectors, the key lemmas.
  3. The Spectral Theorem — the full statement and proof, geometric interpretation.
  4. The Rayleigh Quotient — variational characterization of eigenvalues, the Courant–Fischer min-max theorem.
  5. Quadratic Forms & Definiteness — positive definiteness, Sylvester’s criterion.
  6. Spectral Decomposition in Practice — power iteration, the QR algorithm.
  7. Application: Spectral Clustering — from similarity graph to clusters via the graph Laplacian’s eigenvectors.
  8. Application: PCA Preview — the Spectral Theorem applied to the covariance matrix.

Eigenvalues & Eigenvectors

The Eigenvalue Problem

Given a square matrix ARn×nA \in \mathbb{R}^{n \times n}, a scalar λC\lambda \in \mathbb{C} is an eigenvalue of AA if there exists a nonzero vector vCnv \in \mathbb{C}^n such that:

Av=λvAv = \lambda v

The vector vv is the eigenvector corresponding to λ\lambda. Geometrically, AA acts on vv by simply scaling it — vv is a direction that AA preserves.

Definition 1 (Eigenvalue and Eigenvector).

Let ARn×nA \in \mathbb{R}^{n \times n}. A scalar λ\lambda is an eigenvalue of AA if there exists a nonzero vector vv such that Av=λvAv = \lambda v. The vector vv is an eigenvector corresponding to λ\lambda.

The Characteristic Polynomial

Rearranging Av=λvAv = \lambda v gives (AλI)v=0(A - \lambda I)v = 0, which has a nonzero solution if and only if:

p(λ)=det(AλI)=0p(\lambda) = \det(A - \lambda I) = 0

Definition 2 (Characteristic Polynomial).

The characteristic polynomial of AA is p(λ)=det(AλI)p(\lambda) = \det(A - \lambda I). For an n×nn \times n matrix, it has degree nn, so by the Fundamental Theorem of Algebra there are exactly nn eigenvalues (counted with algebraic multiplicity) in C\mathbb{C}.

Eigenspaces and Multiplicity

Definition 3 (Eigenspace, Algebraic and Geometric Multiplicity).

The eigenspace for eigenvalue λ\lambda is Eλ=ker(AλI)E_\lambda = \ker(A - \lambda I) — the set of all eigenvectors for λ\lambda plus the zero vector. The algebraic multiplicity a(λ)a(\lambda) is the multiplicity of λ\lambda as a root of p(λ)p(\lambda). The geometric multiplicity g(λ)=dimEλg(\lambda) = \dim E_\lambda is the dimension of the eigenspace. Always 1g(λ)a(λ)1 \leq g(\lambda) \leq a(\lambda).

When g(λ)<a(λ)g(\lambda) < a(\lambda) for some λ\lambda, the matrix is defective and cannot be diagonalized. A central result of this topic is that symmetric matrices are never defective.

Eigenvector geometry: the unit circle maps to an ellipse whose axes are eigenvectors


Symmetric Matrices & Orthogonality

A matrix ARn×nA \in \mathbb{R}^{n \times n} is symmetric if A=ATA = A^T, meaning aij=ajia_{ij} = a_{ji} for all i,ji, j. Symmetric matrices appear everywhere in ML:

  • Covariance matrices: Σ^=1n1XTX\hat{\Sigma} = \frac{1}{n-1} X^T X
  • Graph Laplacians: L=DWL = D - W
  • Hessians: 2f(x)\nabla^2 f(x) (by equality of mixed partials)
  • Kernel matrices: Kij=k(xi,xj)K_{ij} = k(x_i, x_j)
  • Gram matrices: Gij=vi,vjG_{ij} = \langle v_i, v_j \rangle

The three lemmas below are the building blocks of the Spectral Theorem.

Real Eigenvalues

Lemma 1 (Real Eigenvalues of Symmetric Matrices).

Every eigenvalue of a real symmetric matrix is real.

Proof.

Let A=ATA = A^T and suppose Av=λvAv = \lambda v with v0v \neq 0, where λC\lambda \in \mathbb{C} and vCnv \in \mathbb{C}^n. Consider:

vˉTAv=vˉT(λv)=λvˉTv=λv2\bar{v}^T A v = \bar{v}^T (\lambda v) = \lambda \, \bar{v}^T v = \lambda \|v\|^2

Now take the conjugate transpose of vˉTAv\bar{v}^T A v. Since AA is real symmetric:

vˉTAv=vTAvˉ=vTλv=λˉvTvˉ=λˉv2\overline{\bar{v}^T A v} = v^T A \bar{v} = v^T \overline{\lambda v} = \bar{\lambda} \, v^T \bar{v} = \bar{\lambda} \|v\|^2

But vˉTAv=λv2=λˉv2\overline{\bar{v}^T A v} = \overline{\lambda \|v\|^2} = \bar{\lambda} \|v\|^2, giving us λv2=λˉv2\lambda \|v\|^2 = \bar{\lambda} \|v\|^2. Since v2>0\|v\|^2 > 0, we conclude λ=λˉ\lambda = \bar{\lambda}, i.e., λR\lambda \in \mathbb{R}.

Orthogonal Eigenvectors

Lemma 2 (Orthogonal Eigenvectors for Distinct Eigenvalues).

Eigenvectors of a real symmetric matrix corresponding to distinct eigenvalues are orthogonal.

Proof.

Let Av1=λ1v1Av_1 = \lambda_1 v_1 and Av2=λ2v2Av_2 = \lambda_2 v_2 with λ1λ2\lambda_1 \neq \lambda_2. Then:

λ1(v1v2)=(λ1v1)Tv2=(Av1)Tv2=v1TATv2=v1TAv2=v1T(λ2v2)=λ2(v1v2)\lambda_1 (v_1 \cdot v_2) = (\lambda_1 v_1)^T v_2 = (Av_1)^T v_2 = v_1^T A^T v_2 = v_1^T A v_2 = v_1^T (\lambda_2 v_2) = \lambda_2 (v_1 \cdot v_2)

So (λ1λ2)(v1v2)=0(\lambda_1 - \lambda_2)(v_1 \cdot v_2) = 0. Since λ1λ2\lambda_1 \neq \lambda_2, we must have v1v2=0v_1 \cdot v_2 = 0.

Invariant Complements

Lemma 3 (Invariant Complements).

If AA is real symmetric and WW is an AA-invariant subspace (meaning AwWAw \in W for all wWw \in W), then WW^\perp is also AA-invariant.

Proof.

Let uWu \in W^\perp and wWw \in W. Then Au,w=u,ATw=u,Aw=0\langle Au, w \rangle = \langle u, A^T w \rangle = \langle u, Aw \rangle = 0 since AwWAw \in W and uWu \perp W. So AuwAu \perp w for all wWw \in W, meaning AuWAu \in W^\perp.

This lemma is the engine of the inductive proof: once we find one eigenvector, we can restrict AA to its orthogonal complement and repeat.


The Spectral Theorem

Statement

Theorem 1 (Spectral Theorem for Real Symmetric Matrices).

Let ARn×nA \in \mathbb{R}^{n \times n} be symmetric. Then there exists an orthogonal matrix QRn×nQ \in \mathbb{R}^{n \times n} (with QTQ=QQT=IQ^T Q = QQ^T = I) and a diagonal matrix Λ=diag(λ1,,λn)\Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n) with λ1λn\lambda_1 \leq \cdots \leq \lambda_n such that:

A=QΛQT=i=1nλiqiqiTA = Q \Lambda Q^T = \sum_{i=1}^{n} \lambda_i \, q_i q_i^T

where q1,,qnq_1, \ldots, q_n are the columns of QQ — an orthonormal basis of eigenvectors.

The second form — the spectral decomposition — writes AA as a sum of rank-1 projection matrices qiqiTq_i q_i^T, each scaled by its eigenvalue. This is the form that directly gives PCA its interpretation.

Proof

Proof.

By induction on nn.

Base case (n=1n = 1). Every 1×11 \times 1 matrix [a][a] is trivially diagonalized: Q=[1]Q = [1], Λ=[a]\Lambda = [a].

Inductive step. Assume the theorem holds for all symmetric matrices of size (n1)×(n1)(n-1) \times (n-1). Let AA be n×nn \times n and symmetric.

Step 1. AA has at least one eigenvalue λ1R\lambda_1 \in \mathbb{R} (by Lemma 1, all eigenvalues are real; and the characteristic polynomial has degree n1n \geq 1, so it has at least one root). Let q1q_1 be a corresponding unit eigenvector: Aq1=λ1q1Aq_1 = \lambda_1 q_1, q1=1\|q_1\| = 1.

Step 2. Let W=span(q1)W = \text{span}(q_1). By Lemma 3, WW^\perp is AA-invariant. Choose any orthonormal basis for WW^\perp and let URn×(n1)U \in \mathbb{R}^{n \times (n-1)} be the matrix whose columns are this basis.

Step 3. Define B=UTAUR(n1)×(n1)B = U^T A U \in \mathbb{R}^{(n-1) \times (n-1)}. Then BB is symmetric: BT=UTATU=UTAU=BB^T = U^T A^T U = U^T A U = B.

Step 4. By the inductive hypothesis, B=PΛPTB = P \Lambda' P^T for some orthogonal PR(n1)×(n1)P \in \mathbb{R}^{(n-1) \times (n-1)} and diagonal Λ\Lambda'.

Step 5. Set Q=[q1UP]Q = \begin{bmatrix} q_1 & UP \end{bmatrix}. Then QQ is orthogonal, and:

QTAQ=[q1TPTUT]A[q1UP]=[λ100Λ]Q^T A Q = \begin{bmatrix} q_1^T \\ P^T U^T \end{bmatrix} A \begin{bmatrix} q_1 & UP \end{bmatrix} = \begin{bmatrix} \lambda_1 & 0 \\ 0 & \Lambda' \end{bmatrix}

So A=QΛQTA = Q \Lambda Q^T where Λ=diag(λ1,Λ)\Lambda = \text{diag}(\lambda_1, \Lambda').

Geometric Interpretation

The Spectral Theorem says that every symmetric linear transformation is, in the right coordinate system, just a stretch along orthogonal axes. The eigenvectors qiq_i define the axes, and the eigenvalues λi\lambda_i define the stretch factors.

This is why:

  • Covariance matrices define ellipsoidal level sets (axes = principal components).
  • Graph Laplacians have smooth eigenvectors that partition the graph.
  • Hessians at critical points tell you whether you’re at a min, max, or saddle — by the signs of the eigenvalues along orthogonal curvature directions.

Try it yourself — drag the sliders to change the matrix entries and watch the eigenvectors and ellipse update in real time:

Matrix A
[3.000,1.000]
[1.000,2.000]
Eigenvalues
λ₁ = 1.382|λ₂ = 3.618
Eigenvectors
q₁ = [-0.526, 0.851]
q₂ = [0.851, 0.526]
Positive Definite
A = QΛQᵀ

Spectral decomposition as a sum of rank-1 matrices

The Hermitian Extension

The Spectral Theorem extends to Hermitian (complex self-adjoint) matrices. A matrix ACn×nA \in \mathbb{C}^{n \times n} with A=A=AˉTA = A^* = \bar{A}^T has:

A=UΛUA = U \Lambda U^*

where UU is unitary (UU=IU^* U = I) and Λ\Lambda is real diagonal. The proof is identical, replacing transposes with conjugate transposes. Every real symmetric matrix is Hermitian, so the real version is a special case.

Numerical Verification

We can verify the Spectral Theorem in code. Here np.linalg.eigh is the dedicated function for symmetric/Hermitian matrices — it guarantees real eigenvalues and orthonormal eigenvectors:

import numpy as np

A = np.array([
    [5, 2, 0, 1],
    [2, 4, 1, 0],
    [0, 1, 3, 2],
    [1, 0, 2, 6]
], dtype=float)

# eigh is specifically for symmetric matrices
eigenvalues, Q = np.linalg.eigh(A)
Lambda = np.diag(eigenvalues)

# Verify A = Q Lambda Q^T
print(f"||A - Q Lambda Q^T||_F = {np.linalg.norm(A - Q @ Lambda @ Q.T):.2e}")
# Output: 1.01e-14

# Spectral decomposition: A = sum lambda_i q_i q_i^T
A_spectral = sum(eigenvalues[i] * np.outer(Q[:, i], Q[:, i]) for i in range(A.shape[0]))
print(f"||A - sum lambda_i q_i q_i^T||_F = {np.linalg.norm(A - A_spectral):.2e}")
# Output: 1.03e-14

The Rayleigh Quotient

Definition

Definition 4 (Rayleigh Quotient).

The Rayleigh quotient of a symmetric matrix AA at a nonzero vector xx is:

R(x)=xTAxxTxR(x) = \frac{x^T A x}{x^T x}

On the unit sphere x=1\|x\| = 1, this simplifies to R(x)=xTAxR(x) = x^T A x. Its critical points are the eigenvectors, and its critical values are the eigenvalues.

The Rayleigh quotient is the bridge between eigenvalues and optimization. Its value tells you “how much AA stretches in the direction xx” — and the eigenvectors are the directions where this stretch is extremal.

Extremal Characterization

Proposition 1 (Extremal Characterization of Eigenvalues).

For a symmetric matrix AA with eigenvalues λ1λ2λn\lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n:

λ1=minx=1xTAx,λn=maxx=1xTAx\lambda_1 = \min_{\|x\|=1} x^T A x, \qquad \lambda_n = \max_{\|x\|=1} x^T A x

The minimum is attained at q1q_1 (the eigenvector for λ1\lambda_1) and the maximum at qnq_n.

Proof.

Write x=i=1nciqix = \sum_{i=1}^n c_i q_i with ci2=1\sum c_i^2 = 1 (since {qi}\{q_i\} is an ONB). Then:

xTAx=(iciqi)T(jcjλjqj)=ici2λiλ1ici2=λ1x^T A x = \left(\sum_i c_i q_i\right)^T \left(\sum_j c_j \lambda_j q_j\right) = \sum_i c_i^2 \lambda_i \geq \lambda_1 \sum_i c_i^2 = \lambda_1

Equality holds when c1=1c_1 = 1 and all other ci=0c_i = 0, i.e., x=q1x = q_1. The maximum case is analogous.

The interactive visualization below shows the Rayleigh quotient R(θ)=xTAxR(\theta) = x^T A x as xx traces the unit circle. The polar plot reveals how the quadratic form bulges in the eigenvector directions, and the Cartesian plot shows the extrema at λ1\lambda_1 and λ2\lambda_2:

A = [[3.0, 1.0], [1.0, 2.0]]1.38 ≤ R(x) ≤ 3.62

Rayleigh quotient polar plot and R(θ) curve

The Courant–Fischer Min-Max Theorem

Theorem 2 (Courant–Fischer Min-Max Theorem).

For a symmetric matrix AA with eigenvalues λ1λn\lambda_1 \leq \cdots \leq \lambda_n:

λk=mindimV=kmaxxV,x=1xTAx=maxdimV=nk+1minxV,x=1xTAx\lambda_k = \min_{\dim V = k} \max_{x \in V, \|x\|=1} x^T A x = \max_{\dim V = n-k+1} \min_{x \in V, \|x\|=1} x^T A x

This characterizes the kk-th eigenvalue as a min-max over subspaces. It is the theoretical foundation of PCA: the kk-th principal component maximizes variance subject to orthogonality with the first k1k-1 components — exactly the Courant–Fischer characterization applied to the covariance matrix.


Quadratic Forms & Definiteness

Quadratic Forms

Definition 5 (Quadratic Form and Definiteness).

A quadratic form on Rn\mathbb{R}^n is a function Q(x)=xTAxQ(x) = x^T A x for a symmetric matrix AA. The definiteness of AA is determined by the signs of its eigenvalues:

ConditionNameEigenvaluesGeometry
xTAx>0x^T A x > 0 for all x0x \neq 0Positive definite (PD)All λi>0\lambda_i > 0Bowl
xTAx0x^T A x \geq 0 for all xxPositive semidefinite (PSD)All λi0\lambda_i \geq 0Bowl with flat directions
xTAx<0x^T A x < 0 for all x0x \neq 0Negative definite (ND)All λi<0\lambda_i < 0Inverted bowl
xTAx0x^T A x \leq 0 for all xxNegative semidefinite (NSD)All λi0\lambda_i \leq 0Inverted bowl with flat
Signs mixedIndefiniteSome >0> 0, some <0< 0Saddle

Why definiteness matters for ML: A covariance matrix is always PSD (xTΣ^x=Var(linear combination)0x^T \hat{\Sigma} x = \text{Var}(\text{linear combination}) \geq 0). A graph Laplacian is PSD. A kernel matrix is PSD by Mercer’s theorem. The Hessian at a local minimum is PSD.

Sylvester’s Criterion

Theorem 3 (Sylvester's Criterion).

A symmetric matrix AA is positive definite if and only if all its leading principal minors are positive:

a11>0,det(a11a12a21a22)>0,det(a11a12a13a21a22a23a31a32a33)>0,a_{11} > 0, \quad \det \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix} > 0, \quad \det \begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{pmatrix} > 0, \quad \ldots

This gives a determinant-based test that avoids computing eigenvalues — useful for theoretical arguments and small matrices.

Quadratic form surfaces: PD (bowl), ND (inverted bowl), PSD (degenerate), indefinite (saddle)


Spectral Decomposition in Practice

eig vs. eigh

Two functions for eigendecomposition:

FunctionForAlgorithmGuarantees
np.linalg.eig(A)General square matricesQR iteration (LAPACK dgeev)May return complex values
np.linalg.eigh(A)Symmetric/HermitianDivide-and-conquer (LAPACK dsyevd)Real eigenvalues; orthonormal eigenvectors

Always use eigh for symmetric matrices. It is faster (O(n3/3)O(n^3/3) vs. O(n3)O(n^3)), more numerically stable, and guarantees the Spectral Theorem’s output format.

Power Iteration

The power method is the simplest eigenvalue algorithm: repeatedly multiply by AA and normalize. This converges to the dominant eigenvector at a rate proportional to λn1/λn|\lambda_{n-1}/\lambda_n| — the ratio of the two largest eigenvalues.

def power_iteration(A, max_iter=100, tol=1e-10):
    """Power iteration for the dominant eigenpair of a symmetric matrix."""
    n = A.shape[0]
    x = np.random.randn(n)
    x = x / np.linalg.norm(x)

    lam = 0.0
    for _ in range(max_iter):
        y = A @ x
        lam_new = x @ y  # Rayleigh quotient
        x_new = y / np.linalg.norm(y)

        if abs(lam_new - lam) < tol:
            lam = lam_new
            x = x_new
            break

        lam = lam_new
        x = x_new

    return lam, x

Power iteration convergence: geometric rate controlled by the eigenvalue ratio

The QR Algorithm

The industry-standard eigenvalue algorithm is the QR algorithm: repeatedly factor Ak=QkRkA_k = Q_k R_k and form Ak+1=RkQkA_{k+1} = R_k Q_k. This is a similarity transformation (Ak+1=QkTAkQkA_{k+1} = Q_k^T A_k Q_k), so eigenvalues are preserved — but AkA_k converges to diagonal form.

def qr_algorithm(A, max_iter=200, tol=1e-12):
    """Basic QR algorithm (without shifts) for a symmetric matrix."""
    Ak = A.copy()

    for k in range(max_iter):
        Q, R = np.linalg.qr(Ak)
        Ak = R @ Q  # similarity transform: same eigenvalues

        off_diag = np.linalg.norm(Ak - np.diag(np.diag(Ak)))
        if off_diag < tol:
            break

    return np.sort(np.diag(Ak))

With shifts, deflation, and Householder reductions, the production QR algorithm achieves O(n3)O(n^3) complexity. The basic version above converges at a rate determined by the eigenvalue ratios — similar to power iteration, but for all eigenvalues simultaneously.

QR algorithm convergence: the off-diagonal norm decays to zero as A_k approaches diagonal form


Application: Spectral Clustering

Spectral clustering is the flagship application of the Spectral Theorem in unsupervised learning. The idea: transform data using the eigenvectors of a graph Laplacian, then cluster in the transformed space.

Why Spectral Clustering?

Standard k-means clustering fails on non-convex clusters (e.g., nested circles or interleaving moons). Spectral clustering handles these by:

  1. Building a similarity graph from the data.
  2. Computing the graph Laplacian (a symmetric PSD matrix).
  3. Using the Spectral Theorem to find its smallest eigenvectors.
  4. Clustering in the eigenvector embedding space.

The Graph Laplacian

Given a weighted adjacency matrix WW (with Wij=exp(xixj2/(2σ2))W_{ij} = \exp(-\|x_i - x_j\|^2 / (2\sigma^2)) for Gaussian similarity), the unnormalized graph Laplacian is:

L=DWL = D - W

where D=diag(d1,,dn)D = \text{diag}(d_1, \ldots, d_n) with di=jWijd_i = \sum_j W_{ij}.

Key properties (all from the Spectral Theorem):

  • LL is symmetric (since WW is symmetric).
  • LL is PSD: xTLx=12i,jWij(xixj)20x^T L x = \frac{1}{2} \sum_{i,j} W_{ij}(x_i - x_j)^2 \geq 0.
  • The number of zero eigenvalues of LL equals the number of connected components.
  • The eigenvectors for the smallest eigenvalues encode cluster structure.

The Algorithm

  1. Build the similarity matrix WW (e.g., Gaussian kernel with bandwidth σ\sigma).
  2. Compute L=DWL = D - W.
  3. Compute the kk smallest eigenvectors of LL: v1,,vkv_1, \ldots, v_k.
  4. Form the matrix URn×kU \in \mathbb{R}^{n \times k} with columns v1,,vkv_1, \ldots, v_k.
  5. Run k-means on the rows of UU.

Interactive Demo

Step through the spectral clustering pipeline on a two-moons dataset. Watch how the Laplacian’s eigenvectors transform a non-convex clustering problem into a linearly separable one:

Implementation

from scipy.spatial.distance import pdist, squareform
from sklearn.cluster import KMeans

def spectral_clustering(X, k, sigma=1.0):
    """Spectral clustering using the unnormalized graph Laplacian."""
    # Step 1: Similarity matrix (Gaussian kernel)
    dists = squareform(pdist(X, 'sqeuclidean'))
    W = np.exp(-dists / (2 * sigma**2))
    np.fill_diagonal(W, 0)

    # Step 2: Graph Laplacian
    D = np.diag(W.sum(axis=1))
    L = D - W

    # Step 3: Smallest k eigenvectors (skip constant eigenvector)
    eigenvalues, eigenvectors = np.linalg.eigh(L)
    U = eigenvectors[:, 1:k+1]

    # Step 4: Normalize rows
    row_norms = np.linalg.norm(U, axis=1, keepdims=True)
    U_norm = U / np.where(row_norms < 1e-10, 1, row_norms)

    # Step 5: K-means on the embedding
    labels = KMeans(n_clusters=k, random_state=42, n_init=10).fit_predict(U_norm)
    return labels

The graph Laplacian is symmetric PSD, so the Spectral Theorem guarantees real eigenvalues, orthogonal eigenvectors, and a well-defined spectral gap — exactly the properties that make spectral clustering work.

Spectral clustering demo: two moons and nested circles


Application: PCA Preview

Principal Component Analysis is the Spectral Theorem applied to the sample covariance matrix. The full development lives on the PCA & Low-Rank Approximation topic page — here we establish the connection.

The Covariance Matrix Is Symmetric PSD

Given centered data XRn×dX \in \mathbb{R}^{n \times d} (rows = observations, columns = features), the sample covariance matrix is:

Σ^=1n1XTX\hat{\Sigma} = \frac{1}{n-1} X^T X

This is symmetric (Σ^T=Σ^\hat{\Sigma}^T = \hat{\Sigma}) and PSD (vTΣ^v=1n1Xv20v^T \hat{\Sigma} v = \frac{1}{n-1} \|Xv\|^2 \geq 0).

PCA as the Spectral Theorem

By the Spectral Theorem, Σ^=QΛQT\hat{\Sigma} = Q \Lambda Q^T with QQ orthogonal and Λ\Lambda diagonal (non-negative). The columns of QQ are the principal components — orthogonal directions of maximum variance. The eigenvalues are the variances along these directions.

# PCA via eigendecomposition of the covariance matrix
X_centered = X - X.mean(axis=0)
Sigma_hat = (X_centered.T @ X_centered) / (n - 1)

eigenvalues, Q = np.linalg.eigh(Sigma_hat)
# eigh returns ascending order; PCA wants descending
idx = np.argsort(eigenvalues)[::-1]
eigenvalues, Q = eigenvalues[idx], Q[:, idx]

# Project onto first k principal components
X_pca = X_centered @ Q[:, :k]

The Courant–Fischer theorem (Theorem 2) guarantees this is the optimal rank-kk variance-preserving projection.

The SVD Connection

PCA is also connected to the Singular Value Decomposition (the next topic in this track). If X=UΣVTX = U \Sigma V^T is the SVD, then Σ^=1n1VΣ2VT\hat{\Sigma} = \frac{1}{n-1} V \Sigma^2 V^T, so the right singular vectors of XX are the eigenvectors of Σ^\hat{\Sigma}. The SVD generalizes the Spectral Theorem to rectangular matrices — the subject of the next topic.

PCA preview: 3D data, projection onto principal components, and scree plot


Connections & Further Reading

The Spectral Theorem is the algebraic foundation that supports the entire Linear Algebra track and connects back to the Topology & TDA track.

TopicConnection
Singular Value DecompositionThe SVD generalizes the Spectral Theorem to rectangular matrices. For symmetric AA, the SVD reduces to the spectral decomposition.
PCA & Low-Rank ApproximationPCA is the Spectral Theorem applied to Σ^\hat{\Sigma}. The Eckart–Young theorem follows from the spectral decomposition.
Tensor DecompositionsThe CP decomposition generalizes eigendecomposition to tensors. Symmetric tensor decomposition is the higher-order analog of the Spectral Theorem.
Sheaf TheoryThe Sheaf Laplacian LF=δ0Tδ0L_\mathcal{F} = \delta_0^T \delta_0 is symmetric PSD. The Spectral Theorem guarantees its eigendecomposition. ker(LF)=H0\ker(L_\mathcal{F}) = H^0 is read from the zero eigenvalues.
Convex AnalysisThe second-order convexity condition: 2f0\nabla^2 f \succeq 0 is verified through the eigendecomposition guaranteed by the Spectral Theorem. The PSD cone S+n\mathbb{S}^n_+ is a fundamental convex set.
Graph Laplacians & SpectrumThe graph Laplacian L=DAL = D - A is a real symmetric matrix. The Spectral Theorem guarantees its complete orthonormal eigenbasis — the foundation of spectral graph theory, spectral clustering, and graph neural networks.
Categories & FunctorsThe category Vec of finite-dimensional vector spaces and linear maps is the primary running example in category theory. The Spectral Theorem guarantees that symmetric endomorphisms in Vec have complete eigenbases. Hom(Rm,Rn)=Mat(n×m)\mathrm{Hom}(\mathbb{R}^m, \mathbb{R}^n) = \mathrm{Mat}(n \times m) illustrates the Hom functor concretely.

The Linear Algebra Track

The Spectral Theorem is the root of the track: everything that follows either extends it (SVD → rectangular matrices, tensors → higher order) or applies it (PCA → optimal projection).

Spectral Theorem (this topic)
    ├── Singular Value Decomposition
    │       └── PCA & Low-Rank Approximation
    └── Tensor Decompositions

Connections

  • the SVD generalizes the Spectral Theorem to rectangular matrices; for symmetric A, the SVD and spectral decomposition coincide svd
  • PCA is the Spectral Theorem applied to the sample covariance matrix pca-low-rank
  • symmetric tensor decomposition is the higher-order analog of the Spectral Theorem tensor-decompositions
  • the Sheaf Laplacian is symmetric PSD; the Spectral Theorem guarantees its eigendecomposition sheaf-theory
  • The second-order convexity condition requires the Hessian to be PSD — the Spectral Theorem provides the eigendecomposition that verifies this. convex-analysis
  • The tangent space at each point of a smooth manifold is a real vector space. A Riemannian metric makes curvature-related operators (such as the shape operator or curvature operator) symmetric, and the Spectral Theorem diagonalizes these maps so that their eigenvalues encode geometric invariants like principal or sectional curvatures. smooth-manifolds

References & Further Reading

  • book Introduction to Linear Algebra — Strang (2016) Primary pedagogical reference
  • book Matrix Analysis — Horn & Johnson (2013) Rigorous proofs and spectral theory
  • book Numerical Linear Algebra — Trefethen & Bau (1997) QR algorithm and numerical methods
  • book Linear Algebra Done Right — Axler (2024) Clean theoretical development
  • paper A tutorial on spectral clustering — von Luxburg (2007) Spectral clustering application
  • paper Normalized cuts and image segmentation — Shi & Malik (2000) Normalized graph Laplacian for clustering
  • paper On spectral clustering: Analysis and an algorithm — Ng, Jordan & Weiss (2001) Normalized spectral clustering algorithm
  • paper Toward a spectral theory of cellular sheaves — Hansen & Ghrist (2019) Sheaf Laplacian spectral theory — cross-track connection