intermediate category-theory 45 min read

Adjunctions

The free-forgetful paradigm, unit-counit pairs, and the universal optimization pattern that connects algebra, logic, and machine learning

Prerequisites: Natural Transformations

Overview & Motivation

Consider a problem we’ve already encountered concretely. You have a set S={a,b}S = \{a, b\} and a vector space V=R3V = \mathbb{R}^3. You want to define a linear map from “the vector space built from SS” to VV. The free vector space F(S)F(S) has basis {ea,eb}\{e_a, e_b\}, so F(S)R2F(S) \cong \mathbb{R}^2. A linear map T:R2R3T: \mathbb{R}^2 \to \mathbb{R}^3 is determined by the matrix [T(ea)T(eb)][T(e_a) \mid T(e_b)] — that is, by specifying where the two basis elements go. But specifying where basis elements go is just a function f:SR3f: S \to \mathbb{R}^3. We’ve established a bijection:

Linear maps F(S)V        Functions SU(V)\text{Linear maps } F(S) \to V \;\;\longleftrightarrow\;\; \text{Functions } S \to U(V)

where U(V)U(V) is the underlying set of VV (forgetting the vector space structure). This bijection is not a coincidence — it’s a natural isomorphism, and the pattern it instantiates is called an adjunction.

An adjunction FGF \dashv G between categories C\mathcal{C} and D\mathcal{D} says: for every ACA \in \mathcal{C} and BDB \in \mathcal{D}, there is a bijection

HomD(F(A),B)HomC(A,G(B))\mathrm{Hom}_\mathcal{D}(F(A), B) \cong \mathrm{Hom}_\mathcal{C}(A, G(B))

that is natural in both variables. The left adjoint FF “freely builds structure,” and the right adjoint GG “forgets structure.” This free-forgetful pattern appears everywhere:

  • Free groups: FU:GrpSetF \dashv U : \mathbf{Grp} \to \mathbf{Set} — a group homomorphism from F(S)F(S) is determined by where generators go.
  • Products: ΔΠ\Delta \dashv \Pi — a morphism into a product is a pair of morphisms.
  • Tensor-hom: (V)Hom(V,)(- \otimes V) \dashv \mathrm{Hom}(V, -) — bilinear maps from UVU \otimes V to WW correspond to linear maps UHom(V,W)U \to \mathrm{Hom}(V, W). This is currying.
  • Quantifiers: fff\exists_f \dashv f^* \dashv \forall_f — the existential and universal quantifiers are adjoints of substitution in logical syntax.

For machine learning, adjunctions formalize Lagrangian duality as a Galois connection between primal and dual optimization problems, the encoder-decoder paradigm as a pair of approximately inverse functors, and attention mechanisms as instances of the tensor-hom adjunction (currying).

Every adjunction FGF \dashv G also generates a monad T=GFT = GF — the round-trip composition that “freely builds and then forgets.” This connects adjunctions to the capstone of the Category Theory track.

Roadmap. We develop three equivalent definitions (Hom-set, unit-counit, universal morphism), prove their equivalence, then turn to properties: uniqueness of adjoints (via Yoneda), composition, and the RAPL theorem. Galois connections specialize adjunctions to posets, giving concrete examples from number theory, topology, and optimization. We close with the Adjoint Functor Theorem, ML connections, and a preview of monads.

Adjunctions — Hom-set isomorphism, free-forgetful example, unit and counit, triangle identities


Adjunctions: The Hom-Set Definition

We begin with the most symmetric formulation.

Definition 1 (Adjunction (Hom-Set)).

Let C\mathcal{C} and D\mathcal{D} be categories, and let F:CDF: \mathcal{C} \to \mathcal{D} and G:DCG: \mathcal{D} \to \mathcal{C} be functors. An adjunction FGF \dashv G (read ”FF is left adjoint to GG”) is a natural isomorphism

ΦA,B:HomD(F(A),B)    HomC(A,G(B))\Phi_{A,B}: \mathrm{Hom}_\mathcal{D}(F(A), B) \xrightarrow{\;\cong\;} \mathrm{Hom}_\mathcal{C}(A, G(B))

for all ACA \in \mathcal{C}, BDB \in \mathcal{D}, natural in both AA and BB.

We call FF the left adjoint and GG the right adjoint.

Naturality in both variables means: for any morphism h:AAh: A' \to A in C\mathcal{C} and k:BBk: B \to B' in D\mathcal{D}, the following diagrams commute:

ΦA,B(fˉF(h))=ΦA,B(fˉ)handΦA,B(kfˉ)=G(k)ΦA,B(fˉ)\Phi_{A',B}(\bar{f} \circ F(h)) = \Phi_{A,B}(\bar{f}) \circ h \qquad \text{and} \qquad \Phi_{A,B'}(k \circ \bar{f}) = G(k) \circ \Phi_{A,B}(\bar{f})

The bijection Φ\Phi gives every morphism fˉ:F(A)B\bar{f}: F(A) \to B in D\mathcal{D} a unique partner f:AG(B)f: A \to G(B) in C\mathcal{C}.

Definition 2 (Adjoint Transpose).

Given an adjunction FGF \dashv G with natural isomorphism Φ\Phi, the adjoint transpose of a morphism fˉ:F(A)B\bar{f}: F(A) \to B is

f=ΦA,B(fˉ):AG(B)f = \Phi_{A,B}(\bar{f}): A \to G(B)

Conversely, the adjoint transpose of f:AG(B)f: A \to G(B) is fˉ=ΦA,B1(f):F(A)B\bar{f} = \Phi^{-1}_{A,B}(f): F(A) \to B.

Example. In the free-forgetful adjunction FU:SetVecF \dashv U: \mathbf{Set} \to \mathbf{Vec}, if S={a,b}S = \{a, b\} and V=R3V = \mathbb{R}^3, then a function f:SU(V)f: S \to U(V) with f(a)=(1,0,2)f(a) = (1, 0, 2) and f(b)=(0,3,1)f(b) = (0, 3, 1) has adjoint transpose fˉ:R2R3\bar{f}: \mathbb{R}^2 \to \mathbb{R}^3 given by the matrix

fˉ=[100321]\bar{f} = \begin{bmatrix} 1 & 0 \\ 0 & 3 \\ 2 & 1 \end{bmatrix}

The columns are exactly f(a)f(a) and f(b)f(b) — the images of the basis elements. This is the free-forgetful bijection at work.

Description

Free ⊣ Forgetful (Set ↔ Vec): a linear map from F(S) is determined by where basis elements go — just a function from S.

Triangle Identities

εF ∘ Fη = idF :

Gε ∘ ηG = idG :


Unit, Counit, and the Triangle Identities

The Hom-set definition is elegant, but it hides the concrete data that makes an adjunction tick. The unit-counit formulation exposes that data.

Definition 3 (Unit of an Adjunction).

Given an adjunction FGF \dashv G with isomorphism Φ\Phi, the unit is the natural transformation η:IdCGF\eta: \mathrm{Id}_\mathcal{C} \Rightarrow GF defined by

ηA=ΦA,F(A)(idF(A)):AGF(A)\eta_A = \Phi_{A, F(A)}(\mathrm{id}_{F(A)}): A \to GF(A)

for each ACA \in \mathcal{C}. The unit “inserts AA into the round-trip GF(A)GF(A).”

Definition 4 (Counit of an Adjunction).

The counit is the natural transformation ε:FGIdD\varepsilon: FG \Rightarrow \mathrm{Id}_\mathcal{D} defined by

εB=ΦG(B),B1(idG(B)):FG(B)B\varepsilon_B = \Phi^{-1}_{G(B), B}(\mathrm{id}_{G(B)}): FG(B) \to B

for each BDB \in \mathcal{D}. The counit “evaluates the round-trip FG(B)FG(B) back to BB.”

Example. In the free-forgetful adjunction, the unit ηS:SUF(S)\eta_S: S \to UF(S) is the basis insertion: ηS(a)=ea\eta_S(a) = e_a. It takes each element of SS and maps it to the corresponding basis vector in the free vector space. The counit εV:FU(V)V\varepsilon_V: FU(V) \to V is the evaluation map: the free vector space on the underlying set of VV maps back to VV by extending the identity linearly.

The unit and counit are connected by a remarkable pair of equations.

Definition 5 (Triangle Identities (Zig-Zag Laws)).

The triangle identities for an adjunction FGF \dashv G with unit η\eta and counit ε\varepsilon are:

εF(A)F(ηA)=idF(A)for all AC\varepsilon_{F(A)} \circ F(\eta_A) = \mathrm{id}_{F(A)} \qquad \text{for all } A \in \mathcal{C}

G(εB)ηG(B)=idG(B)for all BDG(\varepsilon_B) \circ \eta_{G(B)} = \mathrm{id}_{G(B)} \qquad \text{for all } B \in \mathcal{D}

In string diagram notation, these are the “zig-zag” equations — each composition snakes up and back, then collapses to the straight line (the identity).

The first says: if we start at F(A)F(A), go up via F(ηA)F(\eta_A) to FGF(A)FGF(A), then come back down via εF(A)\varepsilon_{F(A)} to F(A)F(A), we end up where we started. The second is the dual statement for GG.

Definition 6 (Adjunction (Unit-Counit)).

An adjunction FGF \dashv G consists of functors F:CDF: \mathcal{C} \to \mathcal{D}, G:DCG: \mathcal{D} \to \mathcal{C} and natural transformations η:IdCGF\eta: \mathrm{Id}_\mathcal{C} \Rightarrow GF (the unit) and ε:FGIdD\varepsilon: FG \Rightarrow \mathrm{Id}_\mathcal{D} (the counit) satisfying the triangle identities.

1×
εF(A) ∘ F(ηA) = idF(A)
G(εB) ∘ ηG(B) = idG(B)

Adjunctions appear throughout mathematics once we know what to look for. The pattern is always the same: a “free” construction that builds structure from raw material, paired with a “forgetful” functor that strips structure away.

1. Free ⊣ Forgetful (Set ↔ Vec). F(S)=F(S) = free vector space on SS, U(V)=U(V) = underlying set of VV. A linear map from F(S)F(S) is determined by where the basis goes. This is our running example.

2. Free ⊣ Forgetful (Set ↔ Grp). F(S)=F(S) = free group on SS, U(G)=U(G) = underlying set of GG. A group homomorphism from F(S)F(S) is determined by where generators go. For S={a}S = \{a\}, we get F({a})=(Z,+)F(\{a\}) = (\mathbb{Z}, +), and every group homomorphism ZG\mathbb{Z} \to G is determined by the image of 11 — an arbitrary element of GG.

3. Diagonal ⊣ Product. Δ:CC×C\Delta: \mathcal{C} \to \mathcal{C} \times \mathcal{C} sends CC to (C,C)(C, C). Its right adjoint Π\Pi sends (A,B)(A, B) to the product A×BA \times B. The adjunction says:

Hom(Δ(C),(A,B))Hom(C,A×B)\mathrm{Hom}(\Delta(C), (A, B)) \cong \mathrm{Hom}(C, A \times B)

A morphism into a product is determined by a pair of morphisms — this is the universal property of products, repackaged as an adjunction.

4. Coproduct ⊣ Diagonal. Dually, :C×CC\coprod: \mathcal{C} \times \mathcal{C} \to \mathcal{C} sends (A,B)(A, B) to ABA \sqcup B, with Δ\Delta as right adjoint:

Hom(AB,C)Hom((A,B),Δ(C))\mathrm{Hom}(A \sqcup B, C) \cong \mathrm{Hom}((A, B), \Delta(C))

A morphism out of a coproduct is a pair of morphisms.

5. Tensor-Hom (Currying). In Vec\mathbf{Vec}, (V)Hom(V,)(- \otimes V) \dashv \mathrm{Hom}(V, -):

Hom(UV,W)Hom(U,Hom(V,W))\mathrm{Hom}(U \otimes V, W) \cong \mathrm{Hom}(U, \mathrm{Hom}(V, W))

A bilinear map from UVU \otimes V corresponds to a linear map from UU to the space of linear maps VWV \to W. This is the categorification of currying from functional programming — and it underlies the attention mechanism in transformers.

6. Galois Connections (Posets). When C\mathcal{C} and D\mathcal{D} are posets, an adjunction becomes f(p)q    pg(q)f(p) \leq q \iff p \leq g(q). Example: xn    xn\lfloor x \rfloor \leq n \iff x \leq n for the floor function and integer inclusion. We develop this in detail below.

7. Quantifiers as Adjoints. In logic, substitution ff^* along a function f:ABf: A \to B has both a left adjoint f\exists_f and a right adjoint f\forall_f:

fff\exists_f \dashv f^* \dashv \forall_f

The existential quantifier is left adjoint to substitution, and the universal quantifier is right adjoint. This explains why \exists preserves disjunctions (colimits) and \forall preserves conjunctions (limits).

Gallery of adjunctions — free-forgetful, tensor-hom, Galois connections, diagonal-product, quantifiers


Equivalence of Definitions

There are three equivalent ways to define an adjunction. We’ve seen two (Hom-set and unit-counit). The third uses universal morphisms.

Definition 7 (Universal Morphism).

Let G:DCG: \mathcal{D} \to \mathcal{C} be a functor and AA an object of C\mathcal{C}. A universal morphism from AA to GG is a pair (F(A),ηA:AG(F(A)))(F(A), \eta_A: A \to G(F(A))) such that for every morphism f:AG(B)f: A \to G(B), there exists a unique morphism fˉ:F(A)B\bar{f}: F(A) \to B with G(fˉ)ηA=fG(\bar{f}) \circ \eta_A = f.

The universal morphism says: ηA\eta_A is the “best” way to map AA into something of the form G(B)G(B), because every other such map factors through it uniquely. This is the optimization perspective on adjunctions — the universal morphism solves a universal optimization problem.

Proposition 1 (Hom-Set ⇔ Unit-Counit Equivalence).

The Hom-set definition and the unit-counit definition of an adjunction are equivalent.

Proof.

(Hom-set ⇒ Unit-Counit). Given the natural isomorphism Φ\Phi, define:

ηA=ΦA,F(A)(idF(A)):AGF(A)\eta_A = \Phi_{A, F(A)}(\mathrm{id}_{F(A)}): A \to GF(A)

εB=ΦG(B),B1(idG(B)):FG(B)B\varepsilon_B = \Phi^{-1}_{G(B), B}(\mathrm{id}_{G(B)}): FG(B) \to B

We must verify the triangle identities. For the first, we need εF(A)F(ηA)=idF(A)\varepsilon_{F(A)} \circ F(\eta_A) = \mathrm{id}_{F(A)}. By naturality of Φ\Phi in AA, for h=ηA:AGF(A)h = \eta_A: A \to GF(A):

ΦGF(A),F(A)(idF(A))ηA=ΦA,F(A)(idF(A))=ηA\Phi_{GF(A), F(A)}(\mathrm{id}_{F(A)}) \circ \eta_A = \Phi_{A, F(A)}(\mathrm{id}_{F(A)}) = \eta_A

Since ΦGF(A),F(A)(idF(A))=ηGF(A)\Phi_{GF(A), F(A)}(\mathrm{id}_{F(A)}) = \eta_{GF(A)}, we get that Φ1\Phi^{-1} applied to idGF(A)\mathrm{id}_{GF(A)} through the naturality in BB gives εF(A)F(ηA)=idF(A)\varepsilon_{F(A)} \circ F(\eta_A) = \mathrm{id}_{F(A)}. The second triangle identity follows by a dual argument.

(Unit-Counit ⇒ Hom-set). Given η\eta and ε\varepsilon satisfying the triangle identities, define:

ΦA,B(fˉ)=G(fˉ)ηAfor fˉ:F(A)B\Phi_{A,B}(\bar{f}) = G(\bar{f}) \circ \eta_A \qquad \text{for } \bar{f}: F(A) \to B

ΦA,B1(f)=εBF(f)for f:AG(B)\Phi^{-1}_{A,B}(f) = \varepsilon_B \circ F(f) \qquad \text{for } f: A \to G(B)

To verify these are inverses, compute Φ1(Φ(fˉ))=εBF(G(fˉ)ηA)=εBFG(fˉ)F(ηA)=fˉεF(A)F(ηA)=fˉidF(A)=fˉ\Phi^{-1}(\Phi(\bar{f})) = \varepsilon_B \circ F(G(\bar{f}) \circ \eta_A) = \varepsilon_B \circ FG(\bar{f}) \circ F(\eta_A) = \bar{f} \circ \varepsilon_{F(A)} \circ F(\eta_A) = \bar{f} \circ \mathrm{id}_{F(A)} = \bar{f}, where the last step uses the first triangle identity. The other direction uses the second triangle identity similarly. Naturality of Φ\Phi follows from naturality of η\eta and ε\varepsilon. \blacksquare

Proposition 2 (Universal Morphism ⇔ Unit-Counit Equivalence).

The universal morphism definition and the unit-counit definition of an adjunction are equivalent.

Proof.

(Unit-Counit ⇒ Universal Morphism). Given ηA:AGF(A)\eta_A: A \to GF(A), for any f:AG(B)f: A \to G(B) define fˉ=εBF(f):F(A)B\bar{f} = \varepsilon_B \circ F(f): F(A) \to B. Then G(fˉ)ηA=G(εB)GF(f)ηA=G(εB)ηG(B)f=idG(B)f=fG(\bar{f}) \circ \eta_A = G(\varepsilon_B) \circ GF(f) \circ \eta_A = G(\varepsilon_B) \circ \eta_{G(B)} \circ f = \mathrm{id}_{G(B)} \circ f = f by the second triangle identity and naturality of η\eta. Uniqueness: if fˉ\bar{f}' also satisfies G(fˉ)ηA=fG(\bar{f}') \circ \eta_A = f, then fˉ=εBFG(fˉ)F(ηA)=εBF(G(fˉ)ηA)=εBF(f)=fˉ\bar{f}' = \varepsilon_B \circ FG(\bar{f}') \circ F(\eta_A) = \varepsilon_B \circ F(G(\bar{f}') \circ \eta_A) = \varepsilon_B \circ F(f) = \bar{f}.

(Universal Morphism ⇒ Unit-Counit). If for each AA we have a universal morphism ηA:AG(F(A))\eta_A: A \to G(F(A)), the collection {ηA}\{\eta_A\} forms the unit. The counit components εB\varepsilon_B are constructed as the unique morphisms with G(εB)ηG(B)=idG(B)G(\varepsilon_B) \circ \eta_{G(B)} = \mathrm{id}_{G(B)}. The triangle identities follow from the uniqueness in the universal property. \blacksquare

Three equivalent definitions of adjunctions — Hom-set, unit-counit, universal morphisms


Properties of Adjunctions

Theorem 1 (Uniqueness of Adjoints).

If FGF \dashv G and FGF \dashv G', then GGG \cong G' (natural isomorphism). Dually, if FGF \dashv G and FGF' \dashv G, then FFF \cong F'.

Proof.

We have natural isomorphisms Hom(A,G(B))Hom(F(A),B)Hom(A,G(B))\mathrm{Hom}(A, G(B)) \cong \mathrm{Hom}(F(A), B) \cong \mathrm{Hom}(A, G'(B)) natural in AA. Fixing BB and varying AA, this gives a natural isomorphism Hom(,G(B))Hom(,G(B))\mathrm{Hom}(-, G(B)) \cong \mathrm{Hom}(-, G'(B)) of functors CopSet\mathcal{C}^{\mathrm{op}} \to \mathbf{Set}. By the Yoneda lemma, a natural isomorphism between representable functors implies G(B)G(B)G(B) \cong G'(B), and this isomorphism is natural in BB. \blacksquare

This is one of the most satisfying applications of Yoneda: adjoints are unique (up to natural isomorphism) when they exist.

Proposition 3 (Composition of Adjunctions).

If FG:CDF \dashv G: \mathcal{C} \rightleftarrows \mathcal{D} and FG:DEF' \dashv G': \mathcal{D} \rightleftarrows \mathcal{E}, then FFGG:CEF'F \dashv GG': \mathcal{C} \rightleftarrows \mathcal{E}.

Proof.

For any ACA \in \mathcal{C} and CEC \in \mathcal{E}:

HomE(FF(A),C)HomD(F(A),G(C))HomC(A,GG(C))\mathrm{Hom}_\mathcal{E}(F'F(A), C) \cong \mathrm{Hom}_\mathcal{D}(F(A), G'(C)) \cong \mathrm{Hom}_\mathcal{C}(A, GG'(C))

The first isomorphism uses FGF' \dashv G' and the second uses FGF \dashv G. The composite is natural in both AA and CC since each isomorphism is. \blacksquare

Theorem 2 (RAPL: Right Adjoints Preserve Limits).

If FGF \dashv G, then GG preserves all limits that exist in D\mathcal{D}.

Proof.

Let D:JDD: \mathcal{J} \to \mathcal{D} be a diagram with limit limD\lim D in D\mathcal{D}. We need to show G(limD)limGDG(\lim D) \cong \lim GD in C\mathcal{C}. For any ACA \in \mathcal{C}:

HomC(A,G(limD))HomD(F(A),limD)limjHomD(F(A),D(j))limjHomC(A,G(D(j)))\mathrm{Hom}_\mathcal{C}(A, G(\lim D)) \cong \mathrm{Hom}_\mathcal{D}(F(A), \lim D) \cong \lim_j \mathrm{Hom}_\mathcal{D}(F(A), D(j)) \cong \lim_j \mathrm{Hom}_\mathcal{C}(A, G(D(j)))

The first step uses the adjunction, the second uses the defining property of limits (Hom out of a fixed object into a limit is the limit of the Hom sets), and the third uses the adjunction again. The composite says Hom(A,G(limD))limjHom(A,GD(j))\mathrm{Hom}(A, G(\lim D)) \cong \lim_j \mathrm{Hom}(A, GD(j)), which is exactly the statement that G(limD)G(\lim D) is the limit of GDGD. \blacksquare

Remark (LAPC: Left Adjoints Preserve Colimits).

Dually, FF preserves all colimits that exist in C\mathcal{C}. The proof is identical, using HomD(colimFD,B)\mathrm{Hom}_\mathcal{D}(\mathrm{colim}\, FD, B) and the fact that Hom is contravariant in its first argument, turning colimits into limits.

Examples. The forgetful functor U:GrpSetU: \mathbf{Grp} \to \mathbf{Set} preserves products because it’s a right adjoint: U(G×H)U(G)×U(H)U(G \times H) \cong U(G) \times U(H). The free functor F:SetVecF: \mathbf{Set} \to \mathbf{Vec} preserves coproducts because it’s a left adjoint: F(ST)F(S)F(T)F(S \sqcup T) \cong F(S) \oplus F(T) — the free vector space on a disjoint union is the direct sum.

Proposition 4 (Galois Connections Yield Closure Operators).

Let fgf \dashv g be a Galois connection between posets (P,)(P, \leq) and (Q,)(Q, \leq). Then gf:PPg \circ f: P \to P is a closure operator: it is extensive (pgf(p)p \leq gf(p)), monotone, and idempotent (gfgf=gfgfgf = gf). Dually, fg:QQf \circ g: Q \to Q is a kernel (interior) operator.

Proof.

Extensive: From the adjunction condition with q=f(p)q = f(p): f(p)f(p)f(p) \leq f(p) is always true, so pg(f(p))p \leq g(f(p)).

Monotone: If ppp \leq p', then by extensiveness pgf(p)p' \leq gf(p'), so pgf(p)p \leq gf(p'). From the adjunction condition, f(p)f(p)f(p) \leq f(p'), and then gf(p)gf(p)gf(p) \leq gf(p').

Idempotent: We need gf(gf(p))=gf(p)gf(gf(p)) = gf(p). From extensiveness, gf(p)gfgf(p)gf(p) \leq gfgf(p). For the reverse, apply ff to the extensive inequality pgf(p)p \leq gf(p) to get f(p)fgf(p)f(p) \leq fgf(p) (monotonicity of ff). Then fgfgfgfgfg \leq fg by the counit condition fg(q)qfg(q) \leq q applied to q=f(p)q = f(p). So fgf(p)f(p)fgf(p) \leq f(p), giving gfgf(p)gf(p)gfgf(p) \leq gf(p) by monotonicity of gg. \blacksquare

Remark (Monads from Adjunctions (Preview)).

Every adjunction FGF \dashv G gives rise to a monad T=GF:CCT = GF: \mathcal{C} \to \mathcal{C} with unit η:IdCGF\eta: \mathrm{Id}_\mathcal{C} \Rightarrow GF and multiplication μ=GεF:GFGFGF\mu = G\varepsilon F: GFGF \Rightarrow GF. The triangle identities for the adjunction imply the monad laws: μTη=idT=μηT\mu \circ T\eta = \mathrm{id}_T = \mu \circ \eta T and μTμ=μμT\mu \circ T\mu = \mu \circ \mu T. Conversely, every monad arises from an adjunction — in fact, from two canonical ones (Eilenberg-Moore and Kleisli). Monads & Comonads completes the story: every adjunction FGF \dashv G gives rise to a monad T=GFT = GF with unit η\eta and multiplication μ=GεF\mu = G\varepsilon F. The Eilenberg-Moore and Kleisli categories provide canonical adjunctions that recover any monad.

Properties of adjunctions — uniqueness, composition, RAPL, and monads from adjunctions


Galois Connections

When both categories are posets, an adjunction becomes especially concrete.

Definition 8 (Galois Connection).

A Galois connection between posets (P,)(P, \leq) and (Q,)(Q, \leq) consists of monotone maps f:PQf: P \to Q (the left adjoint) and g:QPg: Q \to P (the right adjoint) such that

f(p)qpg(q)for all pP,qQf(p) \leq q \quad\Longleftrightarrow\quad p \leq g(q) \qquad \text{for all } p \in P, \, q \in Q

This is exactly FGF \dashv G when the categories are posets: the unique morphism pqp \to q exists if and only if pqp \leq q, so the Hom-set bijection Hom(f(p),q)Hom(p,g(q))\mathrm{Hom}(f(p), q) \cong \mathrm{Hom}(p, g(q)) reduces to the equivalence above (both Hom sets are either empty or singletons).

Example: Ceiling ⊣ Inclusion (and Inclusion ⊣ Floor). The interplay between rounding and inclusion gives two classic Galois connections. Let P=RP = \mathbb{R} with the usual order, Q=ZQ = \mathbb{Z} with the usual order, and let ι:ZR\iota: \mathbb{Z} \hookrightarrow \mathbb{R} be the inclusion. Then:

xnxι(n)for all xR,nZ\lceil x \rceil \leq n \quad\Longleftrightarrow\quad x \leq \iota(n) \qquad \text{for all } x \in \mathbb{R}, \, n \in \mathbb{Z}

This says the ceiling is left adjoint to inclusion: ι\lceil \cdot \rceil \dashv \iota. Check: 2.7=33\lceil 2.7 \rceil = 3 \leq 3 and 2.732.7 \leq 3 — both true. 2.7=32\lceil 2.7 \rceil = 3 \leq 2? No, and 2.722.7 \leq 2 is also false — both false. The biconditional holds.

There is a dual Galois connection where inclusion is the left adjoint and floor is the right adjoint:

ι(n)xnxfor all nZ,xR\iota(n) \leq x \quad\Longleftrightarrow\quad n \leq \lfloor x \rfloor \qquad \text{for all } n \in \mathbb{Z}, \, x \in \mathbb{R}

Check: 22.72 \leq 2.7 and 22.7=22 \leq \lfloor 2.7 \rfloor = 2 — both true. 32.73 \leq 2.7? No, and 323 \leq 2? Also no — consistent. The key insight: \lfloor \cdot \rfloor is right adjoint to inclusion, not left. Mixing these up is a common source of confusion.

Example: Image ⊣ Preimage. For a function f:XYf: X \to Y between sets, the direct image f:P(X)P(Y)f_*: \mathcal{P}(X) \to \mathcal{P}(Y) and the preimage f1:P(Y)P(X)f^{-1}: \mathcal{P}(Y) \to \mathcal{P}(X) form a Galois connection on the power set lattices:

f(A)BAf1(B)f_*(A) \subseteq B \quad\Longleftrightarrow\quad A \subseteq f^{-1}(B)

The closure operator f1ff^{-1} \circ f_* sends a subset AA to f1(f(A))f^{-1}(f(A)) — the preimage of the image, which is the “saturation” of AA with respect to ff.

Definition 9 (Closure Operator).

A closure operator on a poset (P,)(P, \leq) is a function c:PPc: P \to P that is:

  1. Extensive: pc(p)p \leq c(p) for all pp
  2. Monotone: pq    c(p)c(q)p \leq q \implies c(p) \leq c(q)
  3. Idempotent: c(c(p))=c(p)c(c(p)) = c(p) for all pp

An element pp with c(p)=pc(p) = p is called closed (or a fixed point of cc).

Remark (Lagrangian Duality as a Galois Connection).

Lagrangian duality is a Galois connection between the primal and dual optimization posets. The primal problem minxf0(x)\min_x f_0(x) subject to gi(x)0g_i(x) \leq 0 and the dual problem maxλ0infxL(x,λ)\max_{\lambda \geq 0} \inf_x \mathcal{L}(x, \lambda) are connected by:

  • Weak duality (dfd^* \leq f^*) is the counit condition f(g(q))qf(g(q)) \leq q.
  • Strong duality (d=fd^* = f^*) is when the unit and counit are isomorphisms — the adjunction is “tight.”
  • The duality gap fdf^* - d^* measures the obstruction to the unit being an isomorphism.
  • The KKT conditions characterize the fixed points of the closure operator.
Ceiling ⊣ Inclusion (Z ↪ Q)

Ceiling ⊣ Inclusion: ⌈x⌉ ≤ n ⟺ x ≤ n. The ceiling function is left adjoint to the inclusion of integers into rationals.

Verification

f(p) ≤ q ⟺ p ≤ g(q) : ✓ valid

Galois connections — poset adjunctions, closure-interior, image-preimage, floor-inclusion


Representability and the Adjoint Functor Theorem

The adjunction isomorphism Hom(F(A),B)Hom(A,G(B))\mathrm{Hom}(F(A), B) \cong \mathrm{Hom}(A, G(B)) has a representability interpretation. Fixing BB and varying AA, the functor HomD(F(),B):CopSet\mathrm{Hom}_\mathcal{D}(F(-), B): \mathcal{C}^{\mathrm{op}} \to \mathbf{Set} is representable, with representing object G(B)G(B). Conversely, fixing AA, the functor HomC(A,G()):DSet\mathrm{Hom}_\mathcal{C}(A, G(-)): \mathcal{D} \to \mathbf{Set} is representable by F(A)F(A).

This gives a representability criterion: FF has a right adjoint if and only if Hom(F(),B)\mathrm{Hom}(F(-), B) is representable for all BB.

The natural question is: when does a right adjoint exist? The Adjoint Functor Theorem gives sufficient conditions.

Theorem 3 (Adjoint Functor Theorem (Freyd)).

Let G:DCG: \mathcal{D} \to \mathcal{C} be a functor where D\mathcal{D} is locally small and complete (has all small limits). Then GG has a left adjoint if and only if:

  1. GG preserves all small limits, and
  2. For each ACA \in \mathcal{C}, the solution set condition holds: there exists a set SS of morphisms {fi:AG(Di)}iI\{f_i: A \to G(D_i)\}_{i \in I} such that every morphism f:AG(D)f: A \to G(D) factors through some fif_i — that is, there exists h:DiDh: D_i \to D with G(h)fi=fG(h) \circ f_i = f.

The solution set condition prevents the “solution” from being too large (a proper class). For locally small categories with enough structure, this condition is often automatic.

The theorem is important because it tells us when adjoints exist without having to construct them explicitly. In practice, limit preservation is the condition we check, and the solution set condition is verified by a smallness argument.

Representability from adjunctions and the Adjoint Functor Theorem


Adjunctions in Machine Learning

The adjunction pattern — a pair of functors in opposite directions, with the unit measuring “insertion cost” and the counit measuring “evaluation” — appears in several ML paradigms.

1. Lagrangian Duality as a Galois Connection. As noted in the Galois Connections section, the primal-dual relationship in constrained optimization is a Galois connection. For a convex program, strong duality (when it holds) means the adjunction collapses to an equivalence — the primal and dual solutions determine each other. The regularization path (varying the constraint bound) traces the closure operator of this Galois connection.

2. Encoder-Decoder as an Adjunction. An autoencoder consists of an encoder E:XZE: \mathcal{X} \to \mathcal{Z} and a decoder D:ZXD: \mathcal{Z} \to \mathcal{X}. The unit η=DE:idXDE\eta = D \circ E: \mathrm{id}_\mathcal{X} \Rightarrow DE measures reconstruction quality — if ηxx\eta_x \approx x, the round-trip is nearly lossless. The counit ε=ED:EDidZ\varepsilon = E \circ D: ED \Rightarrow \mathrm{id}_\mathcal{Z} projects the “reconstructed-then-encoded” latent code back to the original latent space. When the autoencoder is perfect (zero reconstruction error), η\eta is an isomorphism and we have an equivalence rather than a mere adjunction.

3. Tensor-Hom and Attention. The attention mechanism computes Attention(Q,K,V)=softmax(QK/dk)V\mathrm{Attention}(Q, K, V) = \mathrm{softmax}(QK^\top / \sqrt{d_k}) V. The bilinear form QKscoresQ \otimes K \to \text{scores} corresponds, via the tensor-hom adjunction, to the curried form QHom(K,scores)Q \to \mathrm{Hom}(K, \text{scores}). This is why we can implement attention as matrix multiplication: the tensor-hom adjunction says bilinear maps are equivalent to linear maps into a function space.

4. Regularization as Free Construction. Adding a regularization term λw2\lambda \|w\|^2 to a loss function can be viewed as the unit of a free-forgetful adjunction. The “free” functor maps from the space of unconstrained models to the space of regularized models (by adding the penalty), and the “forgetful” functor strips the penalty away. The regularization path — varying λ\lambda — explores the image of the unit, interpolating between the “freely constructed” constrained model (λ\lambda \to \infty) and the unconstrained model (λ=0\lambda = 0).

1.5

Convex program: min x² s.t. x ≥ b. Strong duality holds (convex QP). The Galois connection collapses to an equality: f* = d*.

Adjunctions in ML — Lagrangian duality, encoder-decoder, attention as tensor-hom, regularization


Computational Notes

A free-forgetful adjunction in Python. We verify the adjunction between Set\mathbf{Set} and Vec\mathbf{Vec} concretely. The set S={a,b}S = \{a, b\} maps to the free vector space F(S)R2F(S) \cong \mathbb{R}^2.

import numpy as np

S = ['a', 'b']
eta = {s: np.eye(len(S))[i] for i, s in enumerate(S)}
print("Unit eta (basis insertion):")
for s, vec in eta.items():
    print(f"  eta({s}) = {vec}")

# A function f: S -> R^3
f = {'a': np.array([1, 0, 2]), 'b': np.array([0, 3, 1])}

# Adjoint transpose: the 3x2 matrix whose columns are f(a), f(b)
f_bar = np.column_stack([f[s] for s in S])
print(f"\nAdjoint transpose f-bar (matrix):\n{f_bar}")

# Verify: f_bar . eta(s) = f(s)
for s in S:
    result = f_bar @ eta[s]
    print(f"  f_bar(eta({s})) = {result} == f({s}) = {f[s]}: {np.allclose(result, f[s])}")

Galois connection verification: ceiling ⊣ inclusion.

import numpy as np

test_pairs = [(2.7, 3), (2.7, 2), (3.0, 3), (-1.5, -1), (-1.5, -2)]
for x, n in test_pairs:
    lhs = (np.ceil(x) <= n)    # ceil(x) <= n
    rhs = (x <= n)             # x <= n
    print(f"  ceil({x}) = {int(np.ceil(x))} <= {n}: {lhs}  "
          f"  {x} <= {n}: {rhs}  "
          f"  Match: {lhs == rhs}")

Triangle identity verification.

# In the free-forgetful adjunction Set ↔ Vec:
# First triangle: epsilon_{F(S)} . F(eta_S) = id_{F(S)}
# F(eta_S) sends e_a -> e_{e_a} in F(U(F(S)))  (huge space!)
# epsilon_{F(S)} sends e_{e_a} -> e_a  (evaluation)
# Composition: e_a -> e_{e_a} -> e_a = id(e_a)  ✓

print("Triangle identity: epsilon_F . F(eta) = id_F")
print("  Verified: the zig-zag composition collapses to the identity")

RAPL verification: forgetful functor preserves products.

# U: Grp -> Set preserves products
# U(G × H) ≅ U(G) × U(H)
# The underlying set of a product group IS the product of underlying sets.
# This is automatic because U is a right adjoint (Free ⊣ U).

# Similarly, F: Set -> Vec preserves coproducts (LAPC):
# F(S ⊔ T) ≅ F(S) ⊕ F(T)
# The free vector space on a disjoint union is the direct sum.

print("RAPL: Forgetful functor preserves products")
print("  U(Z × Z/2) = U(Z) × U(Z/2) ✓")
print("\nLAPC: Free functor preserves coproducts")
print("  F({a}{b}) = F({a}) ⊕ F({b}) = R ⊕ R = R² ✓")

Connections & Further Reading

Where This Fits

TopicConnection
Categories & FunctorsAll category, functor, and morphism definitions assumed. Products and coproducts from Topic 1 are special cases of limits and colimits — the structures preserved by right and left adjoints.
Natural TransformationsThe unit and counit are natural transformations. The Hom-set definition requires naturality in both variables. The Yoneda lemma underlies the uniqueness-of-adjoints proof.
Lagrangian Duality & KKTLagrangian duality is a Galois connection between primal and dual optimization posets. Weak duality is the counit condition; strong duality is when the unit is an isomorphism.
The Spectral TheoremThe free-forgetful adjunction FUF \dashv U between Set and Vec is the primary running example. The tensor-hom adjunction operates on the vector spaces governed by the Spectral Theorem.
Measure-Theoretic ProbabilityThe Giry monad on Meas\mathbf{Meas} arises from an adjunction between measurable spaces and probability spaces, with unit xδxx \mapsto \delta_x.
Convex AnalysisConvex conjugation fff \mapsto f^* is a Galois connection. The Fenchel-Moreau theorem (f=ff^{**} = f for closed convex ff) says the unit is an isomorphism on the closed convex functions.
Monads & ComonadsEvery adjunction FGF \dashv G generates a monad T=GFT = GF with unit η\eta and multiplication μ=GεF\mu = G\varepsilon F. The Eilenberg-Moore and Kleisli categories provide canonical adjunctions that recover a monad — establishing the fundamental correspondence between adjunctions and monads.

Notation Summary

SymbolMeaning
FGF \dashv GFF is left adjoint to GG
η:IdCGF\eta: \mathrm{Id}_\mathcal{C} \Rightarrow GFUnit of the adjunction
ε:FGIdD\varepsilon: FG \Rightarrow \mathrm{Id}_\mathcal{D}Counit of the adjunction
ΦA,B\Phi_{A,B}Hom-set isomorphism Hom(FA,B)Hom(A,GB)\mathrm{Hom}(FA, B) \xrightarrow{\cong} \mathrm{Hom}(A, GB)
fˉ\bar{f}Adjoint transpose of ff
εFFη=idF\varepsilon_F \circ F\eta = \mathrm{id}_FFirst triangle identity
GεηG=idGG\varepsilon \circ \eta_G = \mathrm{id}_GSecond triangle identity
fgf \dashv g (posets)Galois connection: f(p)q    pg(q)f(p) \leq q \iff p \leq g(q)
T=GFT = GFMonad from adjunction
μ=GεF\mu = G\varepsilon FMonad multiplication

Connections

  • Direct prerequisite. The unit and counit of an adjunction are natural transformations. The Hom-set definition requires naturality in both variables. The Yoneda lemma underlies the proof that right adjoints are unique up to natural isomorphism. natural-transformations
  • Transitive prerequisite. All category, functor, and morphism definitions assumed. Products and coproducts from Topic 1 are special cases of limits and colimits — the structures that right and left adjoints preserve, respectively. categories-functors
  • Cross-track connection. Lagrangian duality between primal and dual optimization problems is a Galois connection — an adjunction between posets. Weak duality (d* ≤ f*) is the counit condition. Strong duality (d* = f*) is when the unit and counit are isomorphisms. The KKT conditions characterize when the adjunction is tight. lagrangian-duality
  • The free-forgetful adjunction Free ⊣ Forget between Set and Vec is the primary running example. The tensor-hom adjunction (- ⊗ V) ⊣ Hom(V, -) operates on the vector spaces where the Spectral Theorem applies. spectral-theorem
  • The Giry monad on Meas arises from an adjunction between measurable spaces and probability spaces. The unit is the Dirac delta embedding x ↦ δ_x, and the multiplication is given by the counit. This monad is the categorical foundation of Bayesian probability. measure-theoretic-probability
  • Convex conjugation f ↦ f* is a Galois connection on the poset of extended real-valued functions. The Fenchel-Moreau theorem (f** = f for closed convex f) is the statement that the unit of this adjunction is an isomorphism on closed convex functions. convex-analysis

References & Further Reading

  • book Categories for the Working Mathematician — Mac Lane (1998) Chapter IV covers adjunctions definitively — the three equivalent definitions, examples, and the relationship to limits
  • book Category Theory — Awodey (2010) Chapter 9 on adjunctions with accessible algebraic examples
  • book Category Theory in Context — Riehl (2016) Chapter 4 develops adjunctions with excellent motivation and proof of RAPL — freely available online
  • book An Invitation to Applied Category Theory: Seven Sketches in Compositionality — Fong & Spivak (2019) Galois connections as the entry point to adjunctions — applied examples from databases and optimization
  • paper Category Theory in Machine Learning — Shiebler, Gavranović & Wilson (2021) Sections on categorical optimization and functorial learning via adjunctions
  • book Topology: A Categorical Approach — Bradley, Bryson & Terilla (2020) Adjunctions in topology — discrete ⊣ forgetful ⊣ indiscrete, compactification as left adjoint