Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Gibbs Overview

Gibbs is a Lean 4 formalization of statistical mechanics within a session-type choreography framework. It provides machine-checked proofs connecting Hamiltonian mechanics, population dynamics, continuum field theory, and distributed consensus through shared mathematical structures. The codebase depends on Mathlib for analysis and algebra, and on Telltale for session-type and effects infrastructure.

Gibbs Modules

The project is organized into four modules that share a common foundation in Session.lean, which defines session identifiers, roles, endpoints, and directed communication edges. Each module builds on these primitives to formalize a different physical domain.

graph TD
    S[Session] --> H[Hamiltonian]
    S --> M[MeanField]
    S --> C[ContinuumField]
    S --> Co[Consensus]
    H --> M
    H --> C
    H --> Co
    M --> Co
    H --> G[GlobalType]
    M --> G
    C --> G
    Co --> G

The diagram shows the dependencies between each module. Hamiltonian provides convex analysis and energy machinery used by MeanField, ContinuumField, and Consensus. MeanField provides order parameters and universality classes consumed by Consensus. Each module encodes its interactions as a Telltale GlobalType, the shared compilation target that connects physical dynamics to session-typed protocols.

The Hamiltonian module formalizes classical mechanics in finite-dimensional phase space. It covers separable convex Hamiltonians, damped and symplectic flows, Legendre and Bregman duality, the Fenchel-Moreau theorem, Gibbs ensembles, Shannon entropy, discrete channels, and stochastic dynamics. See Hamiltonian Mechanics and Convex Duality and Bregman Divergence for the theoretical background.

The MeanField module formalizes population dynamics over finite state spaces on the probability simplex. Drift functions are constructed compositionally from stoichiometric rules. ODE existence and uniqueness follow from Picard-Lindelöf, with Lyapunov and Hurwitz stability theory for equilibrium analysis. See Mean-Field Dynamics for details.

The ContinuumField module lifts the discrete framework to spatially extended systems with nonlocal integral kernels. The central result is that global and local kernel operators are definitionally equal, proved by rfl. Spatial and temporal bridges connect continuous evolution to discrete protocol steps, with coherence conditions ensuring that distributed local computations reproduce the global operator exactly. See Continuum Fields for details.

The Consensus module specializes the physics machinery to distributed agreement. Executions, decisions, adversary models, quorum thresholds, and energy gaps are defined in terms of partition functions and interactive distances. See Consensus as Statistical Mechanics for the theoretical synthesis.

Proof Completeness

Two statements remain as Lean axiom declarations, both components of Shannon’s noisy channel coding theorem. Their proofs require probabilistic combinatorics (random codebook generation, joint typicality, the method of types) that is beyond current Mathlib infrastructure. These axioms are collected in Gibbs/Axioms.lean and imported through the Hamiltonian facade, making them available to downstream modules. They are used only in Gibbs/Consensus/ChannelThreshold.lean to prove codingSafe_iff_positive_gap. Everything else is fully proven from Lean’s type theory and Mathlib.

channel_coding_achievability says that for any rate below capacity and any target error probability, codes achieving that rate with error at most that target exist at all sufficiently large blocklengths.

channel_coding_converse says that for any rate at or above capacity, every code sequence has error probability approaching 1 as blocklength grows (the strong converse).

Where to Start

The full dependency tree with API tables per-file can be found in Gibbs/CODE_MAP.md.

Getting Started

This document covers how to set up, build, and extend the Gibbs project. For an overview of the project structure, see Gibbs Overview.

Prerequisites

Gibbs requires Lean 4 (v4.26.0), managed through elan. The project uses nix and direnv for environment setup, and just as a task runner. Install elan from the Lean 4 documentation if you do not already have it.

Building

Run the following commands to initialize and build:

direnv allow
just build

The first command loads the nix environment, which provides elan, just, and other tools. The second runs lake build with LEAN_NUM_THREADS=3. There is no separate test suite. The build itself is the verification. If lake build succeeds, all proofs are valid.

To type-check a single file during development:

lake env lean Gibbs/Path/To/File.lean

This is faster than a full build when iterating on one module. Use just clean to remove .lake build artifacts and just update to refresh Lake dependencies.

Dependencies

Gibbs depends on two local path dependencies:

  • Mathlib from ../lean_common/mathlib4 (shared installation with pre-built artifacts)
  • Telltale from ../telltale/lean (effects and session-type spatial system)

Both must be checked out at the expected relative paths before building. The lakefile.lean declares them as require mathlib and require telltale.

Using Gibbs as a Dependency

To use Gibbs in your own Lean project, add it as a dependency in your lakefile.lean:

require gibbs from .."../gibbs"

Then import the modules you need:

import Gibbs.Hamiltonian
import Gibbs.MeanField
import Gibbs.ContinuumField
import Gibbs.Consensus

Each of these is a facade file that re-exports all submodules within that directory. You can also import individual files for finer-grained control, for example import Gibbs.Hamiltonian.Stability.

Hamiltonian Mechanics

Physical systems evolve by trading energy between different forms. A pendulum converts potential energy into kinetic energy and back. A damped spring loses energy to friction until it comes to rest. A thermostat injects and extracts energy to maintain a target temperature. In each case, the total energy function governs the dynamics.

Gibbs takes this as its starting point. The Hamiltonian (total energy) defines a landscape over the system’s state space. Convexity of that landscape is the central structural assumption. It guarantees that energy has a well-defined minimum, that the Legendre transform connecting position and momentum descriptions is well-behaved, and that damped systems converge to equilibrium. These properties propagate through every module in the project: convex duality (Chapter 3) inherits the energy structure, mean-field dynamics (Chapter 4) uses it for stability, and consensus (Chapter 8) reinterprets energy gaps as fault tolerance thresholds.

This chapter covers the undamped and damped equations of motion, energy conservation and dissipation, Lyapunov stability, the Nose-Hoover thermostat, Gibbs ensembles, stochastic (Langevin) dynamics, and a concrete case study in lattice electrodynamics.

For the convex analysis toolkit, see Convex Duality and Bregman Divergence. For how these dynamics connect to population models, see Mean-Field Dynamics.

Phase Space and Separable Hamiltonians

A system with $n$ degrees of freedom lives in a $2n$-dimensional phase space: $n$ position coordinates and $n$ momentum coordinates. The configuration space is Config n = EuclideanSpace ℝ (Fin n). A phase point bundles position $q$ and momentum $p$ into a pair. The Hamiltonian is the total energy:

$$H(q, p) = T(p) + V(q)$$

Separability means kinetic energy $T$ depends only on momentum and potential energy $V$ depends only on position. Both are assumed convex and differentiable everywhere. The canonical example is the harmonic oscillator with $T(p) = \frac{1}{2}|p|^2$ and $V(q) = \frac{1}{2}\omega^2|q|^2$.

The structure ConvexHamiltonian n bundles $T$, $V$, their convexity proofs, and their differentiability proofs.

Symplectic vs Damped Dynamics

The distinction between conservative and dissipative systems is fundamental. A frictionless pendulum swings forever. A pendulum in honey comes to rest. Both obey the same Hamiltonian, but damping changes the long-term behavior entirely.

The undamped (symplectic) equations of motion are:

$$\dot{q} = \nabla_p T, \quad \dot{p} = -\nabla_q V$$

Energy is exactly conserved along these trajectories. The proof in symplectic_energy_conserved shows $dH/dt = 0$ by inner-product cancellation.

Adding linear friction produces the damped system:

$$\dot{q} = \nabla_p T, \quad \dot{p} = -\nabla_q V - \gamma p$$

The damping coefficient $\gamma > 0$ introduces energy dissipation. Computing the time derivative of energy along damped trajectories gives the key identity:

$$\frac{dH}{dt} = -\gamma |p|^2 \leq 0$$

This inequality makes $H$ a Lyapunov function. Energy decreases monotonically and can only be zero when momentum vanishes. The theorem energy_dissipation proves this identity, and energy_decreasing establishes monotonicity.

The Lyapunov Connection

How do you prove a system converges to equilibrium without solving the equations of motion? The answer is to find a quantity that always decreases. If you can exhibit such a function, convergence follows without computing a single trajectory.

A Lyapunov function $V$ for a dynamical system satisfies three properties: it is nonneg, zero at equilibrium, and decreasing along trajectories. The damped Hamiltonian satisfies all three when $V$ has a unique minimum.

The LyapunovData and StrictLyapunovData structures in Stability.lean capture these conditions. A strict Lyapunov function additionally tends to zero along trajectories, which gives asymptotic stability. The theorem damped_asymptotically_stable packages this implication.

Exponential energy decay occurs when the system is strongly convex. The ExponentialEnergyDecay predicate and exponential_convergence theorem provide convergence rate bounds via the condition number $L/m$.

Nosé-Hoover Thermostat

Constant damping drives a system to zero energy. But in statistical mechanics, the goal is often to maintain a specific temperature, not to cool to absolute zero. The Nose-Hoover thermostat achieves this with a feedback variable $\xi$ that speeds up or slows down particles to hold the average kinetic energy at a target $kT$:

$$\dot{q} = \nabla_p T, \quad \dot{p} = -\nabla_q V - \xi p, \quad \dot{\xi} = \frac{|p|^2 - n \cdot kT}{Q}$$

The thermostat mass $Q > 0$ controls the coupling timescale. When kinetic energy exceeds the target ($|p|^2 > n \cdot kT$), $\xi$ increases, extracting energy. When kinetic energy falls below the target, $\xi$ decreases, injecting energy. Equilibrium occurs at equipartition: $|p|^2 = n \cdot kT$.

The subsystem energy rate is $dH/dt = -\xi |p|^2$. Unlike constant damping, the sign of $\xi$ fluctuates. An extended Hamiltonian that includes the $\xi$ degree of freedom is exactly conserved, proved in extended_energy_conserved.

Gibbs Ensemble and Ergodicity

Rather than tracking a single trajectory, statistical mechanics describes the probability of finding the system in each configuration. At thermal equilibrium, low-energy configurations are exponentially more likely than high-energy ones.

The Gibbs (Boltzmann) distribution assigns probability proportional to $e^{-V(q)/kT}$ to each configuration. The partition function normalizes this weight:

$$Z = \int e^{-V(q)/kT} , d\mu(q)$$

The gibbsDensity divides the Boltzmann weight by $Z$ to produce a probability density. The theorem gibbsDensity_integral_eq_one confirms normalization.

The ergodic hypothesis asserts that time averages along trajectories equal ensemble averages against the Gibbs density. The IsErgodic predicate captures this equality. Time averages of constant functions are trivially ergodic, proved in ergodic_of_constant_process.

Stochastic Extension

Real physical systems are not deterministic at the microscopic level. Thermal fluctuations randomly kick particles, and the strength of these kicks is tied to the amount of damping. Langevin dynamics models this by adding noise to the damped system:

$$\dot{q} = \nabla_p T, \quad dp = (-\nabla_q V - \gamma p) , dt + \sigma , dW$$

The noise strength $\sigma$ satisfies the fluctuation-dissipation relation $\sigma^2 = 2\gamma kT$. This ensures the Gibbs density is the stationary distribution of the associated Fokker-Planck equation.

The theorem gibbs_is_stationary in ChannelNoise.lean proves stationarity by direct substitution of the Gibbs density into the Fokker-Planck operator and cancellation via $\sigma^2 = 2\gamma kT$. The LangevinParams structure bundles the potential, damping, and temperature with their positivity constraints.

The stochastic layer uses a constant-diffusion model. The Itô integral for constant diffusion matrices reduces to $\int_0^t A , dW = A(W_t - W_0)$, implemented using Mathlib’s Bochner integral.

Case Study: Lattice Maxwell

Let’s use Gibbs to work with Maxwell’s equations. The electromagnetic field energy on a lattice is quadratic (and therefore convex):

$$H = \frac{1}{2} \sum_i \left( \varepsilon |E_i|^2 + \frac{1}{\mu} |B_i|^2 \right)$$

Adding conductivity $\sigma$ introduces Ohmic damping: $dH/dt = -\int \sigma |E|^2 , dx \leq 0$. The Hamiltonian becomes a Lyapunov function, so a lattice Maxwell system with conductivity fits naturally into our framework. The discrete curl operator couples neighboring lattice sites, serving as the spatial kernel. To distribute the computation, domain decomposition partitions the lattice into regions, each assigned to a session role.

RegimeStatus
Lattice Maxwell + conductivityFull support
Vacuum Maxwell ($\sigma = 0$)Integrable, periodic solutions
Dispersive mediaMemory effects for $\varepsilon(\omega)$
Nonlinear opticsLoss of quadratic structure

The implementation in LatticeMaxwell.lean provides lattice index types, quadratic field energy, zero-field minimizer proofs, curl stencils, ghost sets, and local/global coherence proofs for subdomains.

Convex Duality and Bregman Divergence

A recurring pattern our framework is that energy functions come in dual pairs. The kinetic energy of a system has a dual description in terms of momentum. The entropy of a distribution has a dual description in terms of free energy. These dualities arise from the same underlying structure: convex conjugation.

This chapter develops the convex analysis toolkit that makes these connections precise. The Legendre transform converts between primal and dual descriptions of energy. The Bregman divergence measures how far a system is from equilibrium, providing Lyapunov functions for stability analysis. The Fenchel-Moreau theorem guarantees that the transform is invertible for well-behaved functions, so no information is lost in passing between descriptions.

These tools serve a specific role in Gibbs: they connect Hamiltonian energy landscapes (Chapter 2) to mean-field stability (Chapter 4). When a population evolves on the probability simplex, the natural measure of “distance to equilibrium” is a Bregman divergence generated by negative entropy, which turns out to be the KL divergence. This is why information-theoretic quantities appear naturally in the stability analysis of physical systems.

For the phase-space setting, see Hamiltonian Mechanics. For the application to population dynamics, see Mean-Field Dynamics.

Legendre Transform

Given a convex energy function, the Legendre transform produces its dual description. If $f$ describes energy as a function of position, $f^*$ describes it as a function of momentum, and vice versa.

The Legendre transform (convex conjugate) of a function $f$ maps each dual vector $p$ to the tightest affine upper bound on $\langle p, x \rangle - f(x)$:

$$f^*(p) = \sup_x \left( \langle p, x \rangle - f(x) \right)$$

For the quadratic $f(x) = \frac{1}{2}|x|^2$, the conjugate is $f^*(p) = \frac{1}{2}|p|^2$. The Legendre transform is self-dual in this case. For strongly convex functions, the conjugate has a Lipschitz gradient, and vice versa. This duality between strong convexity and smoothness underlies the condition number $L/m$ that controls convergence rates.

The definition legendre in Legendre.lean computes the conjugate as a supremum. The Fenchel-Young inequality $\langle p, x \rangle \leq f(x) + f^*(p)$ follows immediately from the definition of supremum.

Bregman Divergence

The Bregman divergence measures asymmetric distance according to a convex generator $f$:

$$D_f(x, y) = f(x) - f(y) - \langle \nabla f(y), x - y \rangle$$

Geometrically, $D_f(x, y)$ is the gap between $f(x)$ and the first-order Taylor approximation of $f$ at $y$ evaluated at $x$. Convexity guarantees this gap is nonneg. The theorem bregman_nonneg proves $D_f(x, y) \geq 0$ using convexity of restrictions to line segments.

Strict convexity sharpens the bound. The theorem bregman_eq_zero_iff shows $D_f(x, y) = 0$ if and only if $x = y$ when $f$ is strictly convex. This makes the Bregman divergence a valid candidate for Lyapunov functions. For the quadratic generator $f = \frac{1}{2}|{\cdot}|^2$, the Bregman divergence reduces to $\frac{1}{2}|x - y|^2$, proved in bregman_quadratic.

Fenchel-Moreau Theorem

The Legendre transform is useful only if it is reversible. If you pass from position to momentum and back, you should recover the original function. The Fenchel-Moreau theorem guarantees this for well-behaved (convex, lower-semicontinuous) functions.

The theorem states that a lower-semicontinuous convex function equals its double conjugate:

$$f = f^{**}$$

The proof in FenchelMoreau.lean proceeds in two directions. The inequality $f^{**} \leq f$ follows from two applications of the Fenchel-Young inequality. The reverse inequality requires showing that every point $(x, f(x))$ on the graph of $f$ lies on or above the graph of $f^{**}$.

The reverse direction uses geometric Hahn-Banach separation. Because $f$ is convex and lower-semicontinuous, its epigraph ${(x, t) : t \geq f(x)}$ is a closed convex set. Any point strictly below the epigraph can be separated by a hyperplane, which yields a supporting affine function and hence a subgradient. The existence of subgradients everywhere implies $f \leq f^{**}$.

The hypotheses for fenchel_moreau are ConvexOn ℝ Set.univ f, LowerSemicontinuous f, HasFiniteConjugate f, and HasFiniteBiconjugate f. The finiteness conditions exclude degenerate cases where the conjugate is identically $+\infty$ or $-\infty$.

Entropy as a Bregman Generator

Negative Shannon entropy $f(x) = \sum_i x_i \log x_i$ is strictly convex on the probability simplex interior. Using it as a Bregman generator produces the KL divergence:

$$D_f(p, q) = \sum_i p_i \log \frac{p_i}{q_i} = D_{\mathrm{KL}}(p | q)$$

The Legendre dual of negative entropy is the log-sum-exp function. Free energy $F = -\frac{1}{\beta} \log Z$ arises as a scaled Legendre dual. These connections are formalized in EntropyBregman.lean through axioms kl_eq_bregman_negEntropy and legendre_negEntropy_eq_logSumExp.

The softmax function computes the exponentially-weighted normalized distribution. The theorems softmax_nonneg and softmax_sum_one verify it produces a valid distribution.

Connection to Stability

This is where convex duality meets dynamics. The Bregman divergence provides Lyapunov functions for mean-field dynamics. If $x^*$ is an equilibrium and $f$ is strictly convex, then $V(x) = D_f(x, x^*)$ satisfies the Lyapunov conditions: it is nonneg, zero only at $x^*$, and (under appropriate drift conditions) decreasing along trajectories.

The bregman_lyapunov_data construction in BregmanBridge.lean packages this into a StrictLyapunovData structure. The bridge converts between Config n (Euclidean space representation) and Fin n → ℝ (simplex representation) using the EuclideanSpace.equiv round-trip identities toConfig_fromConfig and fromConfig_toConfig.

This construction is the formal link between Hamiltonian convex analysis and mean-field stability. Strong convexity of the generator translates directly to positivity of the Lyapunov function away from equilibrium.

Mean-Field Dynamics

Many systems in Gibbs reduce to populations of agents in a finite number of states. Molecules in a chemical reaction network switch between species. Spins in an Ising model flip between up and down. Nodes in a consensus protocol vote for different values. In each case, the macroscopic behavior depends not on individual agents but on the fraction of the population in each state.

This chapter formalizes how those fractions evolve over time. The state of a population lives on the probability simplex, the set of all valid distributions over a finite state space. Drift functions describe how the distribution changes. Rules (reaction rates, flip rates, vote-switching rates) compose into drift functions. The key results are that solutions always exist, always stay on the simplex, and converge to equilibrium under appropriate conditions.

The stability theory here connects back to convex duality (Chapter 3): the Bregman divergence to equilibrium serves as a Lyapunov function, providing a formal proof that the system converges. The Ising model serves as the concrete example that ties the module together, exhibiting a phase transition between disordered and ordered states that reappears in the consensus setting (Chapter 8).

For the convex analysis underpinning stability, see Convex Duality and Bregman Divergence. For the application to consensus, see Consensus as Statistical Mechanics.

The Probability Simplex

The probability simplex over a finite state space $Q$ is the set of nonneg vectors that sum to one:

$$\text{Simplex}(Q) = { x : Q \to \mathbb{R} \mid x_q \geq 0 \text{ for all } q, ; \sum_q x_q = 1 }$$

A PopulationState Q stores integer counts with a positive total. The empirical function normalizes these counts into a point on the simplex. The theorems empirical_nonneg and empirical_sum_one verify the normalization.

The simplex is compact (closed and bounded in finite dimensions). This compactness is essential for ODE existence: it prevents finite-time blowup and guarantees that bounded drifts produce globally defined solutions.

Drift Functions and Rules

The drift function specifies how the distribution changes at each instant. A DriftFunction Q maps a state $x : Q \to \mathbb{R}$ to a velocity $F(x) : Q \to \mathbb{R}$. Two constraints ensure the simplex is forward-invariant:

  1. Conservation: $\sum_q F(x)_q = 0$ for all $x$ on the simplex.
  2. Boundary: $F(x)_q \geq 0$ whenever $x_q = 0$.

Conservation keeps the total probability at one. The boundary condition prevents components from going negative. Together they guarantee that trajectories starting in the simplex remain there.

Drift functions are built compositionally from PopRule structures. Each rule specifies a stoichiometric update vector and a rate function. The driftFromRules function aggregates a list of rules by summing their contributions. Conservation and boundary invariance are proved by induction on the rule list in driftFromRules_conserves and driftFromRules_boundary_nonneg.

ODE Existence and Uniqueness

Before analyzing where solutions go, we need to know they exist. The drift function is Lipschitz on the simplex by construction, but Mathlib’s Picard-Lindelof theorem requires a globally Lipschitz function. The LipschitzBridge.lean module extends the drift from the simplex to all of $\mathbb{R}^Q$ while preserving the Lipschitz constant. The theorem extend_lipschitz establishes this.

Local existence follows from Picard-Lindelöf on a bounded ball containing the simplex. The local_ode_exists theorem in Existence.lean constructs the IsPicardLindelof instance with explicit parameters: time half-width, ball radius (simplex diameter), Lipschitz constant, and drift bound.

Global existence follows from compactness. Since the simplex is bounded and the drift is bounded on it, solutions cannot escape to infinity in finite time. The global_ode_exists theorem chains local solutions forward. Uniqueness follows from Gronwall’s inequality in ode_unique.

Stability Theory

An equilibrium $x^*$ satisfies $F(x^*) = 0$ and $x^* \in \text{Simplex}(Q)$. Stability is analyzed through two approaches.

The direct (Lyapunov) approach uses a function $V$ that is nonneg, zero at $x^*$, and decreasing along trajectories. The theorem lyapunov_implies_stable gives stability. A strict Lyapunov function where $V \to 0$ along trajectories gives asymptotic stability via strict_lyapunov_implies_asymptotic.

The linearized (Hurwitz) approach examines the Jacobian $J = \partial F / \partial x$ at equilibrium. If all eigenvalues of $J$ have negative real part (the Hurwitz condition), a quadratic Lyapunov function exists. The chain hurwitz_implies_lyapunov_exists then linear_stable_implies_asymptotic completes the argument.

The Bregman divergence connects these approaches. For a strictly convex generator $f$, $D_f(x, x^*)$ provides a natural Lyapunov function. The bregman_lyapunov_data construction in BregmanBridge.lean packages this as a StrictLyapunovData.

The Ising Model

The simplest nontrivial mean-field system has two states: up and down. This is the Ising model, and it exhibits the core phenomenon that drives the consensus module. Below a critical temperature, the population spontaneously magnetizes. Above it, the population remains disordered. This phase transition is the prototype for the gapped/gapless distinction in consensus (Chapter 8).

The state space is TwoState with values up and down. The magnetization order parameter is $m = x_{\text{up}} - x_{\text{down}}$.

The drift function is:

$$\frac{dm}{dt} = \frac{1}{\tau}\left[\tanh(\beta(Jm + h)) - m\right]$$

The parameter $\beta$ is inverse temperature, $J$ is coupling strength, $h$ is external field, and $\tau$ is relaxation time. Equilibria satisfy the self-consistency equation $m = \tanh(\beta(Jm + h))$.

A phase transition occurs at $\beta J = 1$ when $h = 0$. Below the critical point ($\beta J < 1$), the unique equilibrium is $m = 0$ (paramagnetic phase). The proof in paramagnetic_unique_equilibrium uses strict sublinearity of $\tanh$: the theorem Real.tanh_lt_self shows $\tanh(x) < x$ for $x > 0$.

Above the critical point ($\beta J > 1$), two nonzero equilibria appear (ferromagnetic phase). The proof in ferromagnetic_bistable applies the intermediate value theorem to the residual $f(m) = m - \tanh(\beta J m)$, which changes sign on $(0, 1)$.

Glauber dynamics provides the microscopic mechanism: individual spins flip at rates that depend on the local field. The spin-flip rates glauberAlpha and glauberGamma reproduce the macroscopic Ising drift when aggregated, proved in glauber_produces_isingDrift. This is the mean-field reduction in action. Individual stochastic transitions produce deterministic macroscopic flow.

Continuum Fields

The mean-field module describes populations where every agent interacts with every other agent equally. In many systems this is not realistic. Molecules in a fluid interact with nearby molecules, not distant ones. Electromagnetic fields couple neighboring lattice sites through curl operators. Flocking birds align with neighbors within a perceptual cone, not with the entire flock.

The ContinuumField module extends the discrete framework to spatially extended systems. Instead of a finite state space with uniform coupling, the state is a set of fields over continuous space, and interactions are mediated by nonlocal integral kernels. The central result is that projecting the global kernel to local displacement coordinates introduces zero approximation error. Global and local operators are definitionally equal, proved by rfl. This exactness guarantee means that distributing the computation across spatial roles loses nothing.

For the discrete population dynamics that this module generalizes, see Mean-Field Dynamics. For the session-type guarantees that govern the distributed computation, see The Session-Physics Correspondence.

Fields and Kernels

A FieldState X V W bundles three fields over a measurable space $X$: a density field $\rho(x)$, a polarization field $p(x)$ (vector-valued), and an optional spin field $\omega(x)$. These are the continuum analogues of population counts in the mean-field setting.

Interactions between spatial positions are encoded by a GlobalKernel X. This is a function $K(x, x’)$ that specifies how strongly position $x’$ influences position $x$. The kernel is nonneg, integrable, and normalized so that $\int K(x, x’) , dx’ = 1$. The nonlocal alignment operator applies the kernel to the polarization field:

$$\int K(x, x’) \left[ p(x’) - p(x) \right] dx’$$

This integral averages the difference between the local polarization and the polarization at every other point, weighted by the kernel. It is the continuum generalization of the coupling terms in lattice models.

Exact Projection

The key result in the ContinuumField module is that the global kernel operator can be rewritten in local displacement coordinates without loss. Define the local kernel field at position $x$ by shifting to displacement coordinates:

$$K_x(\xi) := K(x, x + \xi)$$

Then the nonlocal operator becomes:

$$\int K_x(\xi) \left[ p(x + \xi) - p(x) \right] d\xi$$

The theorem nonlocal_exact proves that these two forms are equal by rfl. This is not an approximation. The global and local operators are definitionally the same computation expressed in different coordinates. The function projectKernelAt extracts the local kernel field for each spatial position.

Spatial + Temporal Bridges

Distributing a continuum computation requires assigning spatial regions to protocol roles. A RoleLoc map sends each role to a position in the continuous space. The Colocated predicate identifies roles at the same location, and Within checks that two roles are within a distance bound.

The temporal bridge connects discrete protocol steps to continuous time. A SamplingSchedule maps step indices to real-valued times. Sampling depends only on the sampler’s own step counter or a globally agreed schedule, never on a remote party’s local clock. The predicate ClockIndependent formalizes this constraint, and mkSampler_clockIndependent proves that schedule-based sampling satisfies it.

Coherence

When the global kernel is distributed across roles, each role receives a local kernel field. The KernelCoherent predicate asserts that every role’s local kernel matches what you get by projecting the global kernel to that role’s location. The theorem projection_sound closes the loop: when coherence holds, evaluating the local kernel at a role’s position reproduces the global operator exactly. This is the ContinuumField analogue of well-formedness in the session layer.

Adaptivity and Closure

Kernels can evolve with the field state. A KernelDependence structure specifies how the kernel depends on the current fields, with a Lipschitz bound ensuring regularity. A KernelDynamics structure adds a continuous-time drift for the kernel itself.

For analysis, the full kernel can be compressed into a KernelSummary that records effective range, anisotropy, and total mass. A ClosureSpec packages the compression and reconstruction with a uniform approximation bound. This is the continuum analogue of mean-field truncation: you track only the leading moments of the coupling function, with explicit control over the error introduced.

Information Theory and Channels

Every physical system and every distributed protocol faces the same problem: communicating reliably through a noisy medium. A sensor reading corrupted by thermal fluctuations, a message corrupted by a faulty network link, and a vote corrupted by a Byzantine node are all instances of the same mathematical situation. Information theory provides the universal language for reasoning about these problems.

This chapter develops that language for Gibbs. Entropy measures how much uncertainty a distribution carries. KL divergence measures how different one distribution is from another, connecting to the Bregman divergence from Chapter 3. Channel capacity quantifies how much reliable information can pass through a noisy channel. Error-correcting codes achieve that capacity by creating energy gaps between valid codewords, the same gaps that reappear as fault tolerance thresholds in consensus (Chapter 8).

The key insight is that coding theory is the non-interactive special case of consensus. A code has a single sender and one round of communication. Consensus adds interaction and adaptive adversaries, which tightens the achievable threshold from $1/2$ to $1/3$. But the underlying structure, energy gaps between macrostates, is identical.

For the convex duality perspective on entropy, see Convex Duality and Bregman Divergence. For the application to agreement protocols, see Consensus as Statistical Mechanics.

Entropy and KL Divergence

Shannon entropy measures uncertainty in a discrete distribution $p$ over a finite set:

$$H(p) = -\sum_i p_i \log p_i$$

Entropy is nonneg (shannonEntropy_nonneg) and zero for deterministic distributions (shannonEntropy_deterministic). It is maximized by the uniform distribution.

The KL divergence measures how one distribution $p$ differs from another $q$:

$$D_{\mathrm{KL}}(p | q) = \sum_i p_i \log \frac{p_i}{q_i}$$

The Gibbs inequality states $D_{\mathrm{KL}}(p | q) \geq 0$, proved in klDivergence_nonneg using the bound $\log x \leq x - 1$. KL divergence equals zero if and only if $p = q$.

Mutual information $I(X; Y)$ quantifies shared information between two random variables. It decomposes as $I(X; Y) = H(X) - H(X|Y)$, where $H(X|Y)$ is conditional entropy. The theorem condEntropy_le_entropy shows conditioning cannot increase entropy. Mutual information is symmetric: mutualInfo_symm.

Discrete Memoryless Channels

A channel models any process that corrupts data in transit. The sender picks an input symbol, noise acts on it, and the receiver observes a (possibly different) output symbol.

A discrete memoryless channel (DMC) is a stochastic matrix $W(y|x)$ mapping input symbols to output symbols. The DMC structure stores this matrix with the constraint that each row sums to one.

Given an input distribution $p$ and channel $W$, the induced output distribution is $q(y) = \sum_x p(x) W(y|x)$. The joint distribution factorizes as $p(x) W(y|x)$. The theorems jointDist_marginalFst and jointDist_marginalSnd recover the input and output marginals.

Channel capacity is the maximum mutual information over all input distributions:

$$C = \sup_p I(p; W)$$

Capacity is nonneg (channelCapacity_nonneg). It is bounded above by $\log |\mathcal{X}|$ and $\log |\mathcal{Y}|$. The data processing inequality states that post-processing cannot increase mutual information. These bounds are stated as axioms in Entropy.lean.

The Binary Symmetric Channel

The BSC with crossover probability $\varepsilon$ flips each bit independently with probability $\varepsilon$. Its capacity is:

$$C_{\mathrm{BSC}} = 1 - H_2(\varepsilon)$$

where $H_2$ is binary entropy. At $\varepsilon = 0$, capacity is 1 (perfect channel). At $\varepsilon = 1/2$, capacity is 0 (the channel destroys all information). The axioms bsc_capacity and bsc_capacity_half formalize these.

The BSC is the channel-theoretic model for random bit corruption. In the consensus context, $\varepsilon$ plays the role of the corruption fraction. The zero-capacity point $\varepsilon = 1/2$ corresponds to the impossibility threshold.

The Coding Bridge

This section makes precise the connection between coding theory and consensus that the introduction previewed. Error-correcting codes are the static (non-interactive) case of the consensus framework. A code of length $N$ encodes messages into codewords. The minimum Hamming distance $d_{\min}$ between distinct codewords determines correction capability.

Unique decoding succeeds when the number of errors $t$ satisfies $2t < d_{\min}$. The theorem unique_decoding_of_minDistance in CodingDistance.lean formalizes this. Hamming distance instantiates the EnergyDistance class, and energyGap_singleton_eq_dist shows the energy gap between singleton codeword sets equals their distance.

The repetition code encodes one bit as $N$ identical copies. Majority vote decodes. The theorem repetition_corrects_up_to proves correction of up to $\lfloor(N-1)/2\rfloor$ errors. This is the simplest gapped phase: the distance between the two codewords is $N$, creating a macroscopic energy barrier.

Gaussian Channels and Spatial Capacity

Continuous-valued signals face continuous-valued noise. The Gaussian channel adds noise with variance $\sigma^2$. Its capacity under power constraint $P$ is:

$$C = \frac{1}{2} \log\left(1 + \frac{P}{\sigma^2}\right)$$

Capacity is nonneg for nonneg power (gaussianCapacity_nonneg), monotone decreasing in noise variance (gaussianCapacity_antitone_variance), and monotone increasing in power (gaussianCapacity_monotone_power).

This is where the physics connection becomes concrete. Noise variance and inverse temperature are interchangeable through the mapping $\beta = 1/(2\sigma^2)$. The round-trip identity noiseToInvTemp_invTempToNoise confirms this bijection. Higher inverse temperature (lower noise) means higher capacity (capacity_monotone_invTemp).

For spatially embedded systems, channel capacity depends on the distance between communicating roles. The SpatialChannelModel structure specifies signal power and a distance-dependent noise function. The theorem spatialCapacity_antitone shows capacity decreases with distance. Colocated roles achieve maximum capacity (colocated_max_capacity).

The Session-Physics Correspondence

Session types were designed to reason about message-passing programs. Physical dynamics were formalized to reason about coupled differential equations. These two formalisms turn out to describe the same thing: structured interaction between parts of a system. A session type specifying “role A sends position to role B, then B sends force back” encodes the same coupling structure as a pair of differential equations where A’s position determines the force on B. This chapter makes that correspondence precise, identifies where it holds, and maps out its boundaries.

For the physical foundations, see Hamiltonian Mechanics and Mean-Field Dynamics. For the application to consensus, see Consensus as Statistical Mechanics.

Why Session Types and Physics Correspond

Session types describe how distributed processes communicate: who sends what to whom, in what order, and with what guarantees. Physical systems describe how coupled components evolve: which degrees of freedom interact, through what forces, and subject to what conservation laws.

These descriptions align because both are fundamentally about structured interaction between parts. A session type says “role A sends a position to role B, then B sends a force back.” A coupled ODE says “subsystem A’s position determines the force on subsystem B, and vice versa.” The communication pattern is the coupling structure.

Session types provide three formal guarantees (progress, well-formedness, and erasure), each one has a precise physical meaning. Progress means the system of equations always has a solution (no deadlock corresponds to no blowup). Well-formedness means conservation laws are preserved at every interaction (type compatibility corresponds to dimensional consistency). Erasure means uncoupled components evolve independently (non-participation corresponds to locality).

The power of this correspondence is that proofs in one domain transfer to the other. A session-type progress proof is a global existence theorem. A conservation law proof is a well-formedness check. The formal machinery of session types, which was designed for reasoning about message-passing programs, turns out to be equally natural for reasoning about physical dynamics.

The Three Guarantees

Session types provide three fundamental guarantees. Each maps to a physical property that Gibbs formalizes.

Session PropertyPhysical PropertyGibbs Implementation
ProgressExistence of dynamicsglobal_ode_exists, simplex_forward_invariant
Well-formednessConservation lawsdrift_sum_zero, KernelCoherent
ErasureLocalityprojectKernelAt, local kernel environments

Progress means the system never deadlocks. Every send finds a matching receive. The physical analogue is that the coupled ODE or PDE system always has a forward solution. The progress guarantee is the global existence theorem.

Deadlock corresponds to finite-time blowup or an ill-posed system. An unresolved dependency corresponds to a singular coupling or undefined drift.

Well-formedness means every send has a matching receive with compatible types. The physical analogue is conservation law preservation. A type mismatch corresponds to dimensional inconsistency. A well-formed message corresponds to a consistent coupling term.

On the probability simplex, the drift sums to zero. For continuum fields, local kernels are coherent projections of the global kernel.

Erasure means that if a role does not appear in a sub-protocol, its state is completely unaffected. The physical analogue is locality. A non-participant corresponds to an uncoupled field component whose dynamics are independent of the interaction. Protocol projection corresponds to kernel restriction. The function projectKernelAt gives each role only its local slice of the global kernel.

Termination as Stability

Session termination, where all participants reach their end state, corresponds to asymptotic stability. The system converges to equilibrium.

A StrictLyapunovData structure captures this correspondence. The Lyapunov function $V$ is nonneg, zero at equilibrium, and tends to zero along trajectories. The theorem strict_lyapunov_implies_asymptotic shows this gives asymptotic stability. Protocol completion corresponds to $V \to 0$.

In Hamiltonian systems, the energy function serves as the Lyapunov function when damping is present ($dH/dt = -\gamma |p|^2 \leq 0$). In mean-field systems, the Bregman divergence to equilibrium serves the same role. In consensus, the free-energy gap determines whether termination (finality) is achievable.

The Two-Level Architecture

Session types are Markovian by design. The protocol structure, sends, receives, and branches, carries no internal memory. Complex physics lives inside effect handlers attached to each role.

graph TD
    subgraph "Session Layer (Markovian)"
        P[Protocol structure]
        P --> |"progress, well-formedness, erasure"| P
    end
    subgraph "Role A"
        EA[Effect State]
    end
    subgraph "Role B"
        EB[Effect State]
    end
    P --> EA
    P --> EB

The session layer handles coordination. Effect handlers within each role carry memory, history, and non-Markovian internal dynamics. This separation keeps progress and deadlock proofs simple while allowing arbitrarily complex physics.

Memory scope is explicit. Each role declares what history it maintains. Approximations (discretization, mean-field closure, coarse-graining) are also explicit. Handlers declare truncation strategies and error bounds. Effects compose correctly across roles because the session layer enforces conservation and coherence at every interaction point.

Modeling via Effects

Non-Markovian dynamics, stochastic forcing, and multi-body interactions require effect handlers that maintain internal state beyond what the Markovian session layer provides.

SystemMemory StructureEffect Representation
ViscoelasticityProny seriesMemory (Fin n → ℝ) for n modes
Retarded potentialsFixed delay $\tau$DelayLine τ with circular buffer
Fractional diffusionPower-law kernelFractionalDeriv α
HysteresisPath-dependentPreisach distribution

Stochastic forcing uses a Stochastic effect providing Brownian increments and Poisson jumps.

Semiclassical and Open Quantum Systems

A significant subset of quantum systems reduces to classical ODEs on finite-dimensional state spaces. The requirement is that quantum coherence either lives within a single role’s internal state or has been averaged out via a mean-field or semiclassical limit.

Quantum SystemState SpaceDissipation Source
Lindblad (n-level)$n \times n$ density matricesEnvironment coupling
Bloch equationsBloch sphere $S^2$Decoherence $\gamma$
Lattice Gross-PitaevskiiFinite-dim $\psi$Imaginary time or reservoir
Quantum optimal controlClassical control parametersCost functional

Quantum mechanics within a role is unrestricted. A role’s effect handler can maintain a density matrix, solve Schrödinger equations, or track entanglement internally. The restriction is on inter-role quantum correlations: the session layer is classical, so messages between roles carry classical data.

Scope and Boundaries

Not all physical systems fit naturally into this framework. The determining factor is whether the system can be decomposed into identifiable roles with local state and classical communication.

Systems with a natural fit include mean-field models, reaction networks, coupled oscillators, and gradient flows on probability simplices. These have finite-dimensional state spaces, Lipschitz drifts, and local interactions.

Convex Hamiltonians with damping bridge naturally to the framework, as described in Hamiltonian Mechanics. The mean-field drift is often a gradient flow of convex free energy: $dx/dt = -\nabla_W F(x)$. This is Hamiltonian structure in the Wasserstein geometry.

SystemHamiltonianDamping Source
Damped oscillator$\frac{1}{2}(p^2 + \omega^2 q^2)$Friction $\gamma p$
Molecular dynamics$\sum \frac{1}{2} p_i^2/m + V(q)$Langevin thermostat
Heavy-ball optimization$\frac{1}{2}|p|^2 + f(q)$Momentum decay

Some systems require engineering effort but are not structurally excluded.

SystemChallengeApproach
Gauge theoriesRedundant degrees of freedomQuotient types, gauge-fixing effects
n-body forcesBeyond pairwiseMulti-party sessions, sequential approximation
Integrable systemsPeriodic, no convergenceAction-angle variables, periodic sessions
Unbounded memoryInfinite historyTruncation effects with error bounds

Structural Limitations

Structural mismatches arise from hard constraints in the framework, not from engineering difficulty.

LimitationIssue
Infinite dimensionsMean-field uses Fintype Q. True infinite-dim requires function spaces.
Indistinguishable particlesSession roles must be identifiable. Bosons and fermions have no role identity.
Lipschitz regularityExistence requires bounded derivatives. This excludes shocks, blowup, and turbulence.
Inter-role entanglementBell’s theorem prohibits local hidden variables. Erasure is violated.
Holographic dualityBulk and boundary are dual descriptions, not separable roles.
Topological field theoriesNo local degrees of freedom to assign to roles.
Critical phenomenaFluctuations at all scales. Mean-field destroys structure.
Spin glassesExponentially many metastable states. Rugged landscapes.
Chaotic HamiltoniansNon-convex, separatrices, Arnold diffusion.
Full QFTInfinite-dimensional Hilbert space. Field quantization.

Practical limitations (finite memory, numerical approximation) can be addressed with better engineering. Structural limitations (locality violation, infinite dimensions) cannot.

The Reverse Direction

The correspondence runs in both directions. Physical systems provide analogues for distributed systems concepts.

Consensus corresponds to spontaneous symmetry breaking. Above the critical temperature, no consensus forms ($m = 0$). Below it, consensus emerges spontaneously ($m \to \pm m^*$).

Agreement maps to spin alignment. Validity maps to the chosen direction lying in the allowed set ${+1, -1}$. Termination maps to reaching the ordered phase.

Metastability in physics corresponds to partial consensus or local minima in distributed systems. Leader election corresponds to nucleation, where the first supercritical nucleus determines the outcome.

Safety corresponds to conservation laws and constraint preservation. Liveness corresponds to equilibration, mixing, and ergodicity. Asymptotic stability captures both: the system reaches equilibrium (liveness) without violating constraints along the way (safety).

Eventual Consistency

Eventual consistency corresponds to diffusive relaxation. When updates stop, replicas converge like the heat equation $\partial T / \partial t = \kappa \nabla^2 T$ smooths temperature gradients. The convergence rate is $\sim e^{-\lambda_1 t}$ where $\lambda_1$ is the spectral gap of the diffusion operator.

The spectral gap is the consistency window. A larger gap means faster convergence to agreement. This connects to the stability theory in Mean-Field Dynamics where Hurwitz eigenvalue conditions control convergence rates.

Consistency Levels

Consistency levels map to physical regimes. Causal consistency corresponds to relativistic causality. Strong consistency requires infinite propagation speed, which is non-physical.

Consistency LevelPhysical Analogue
Strong consistencyInfinite propagation speed
LinearizabilityNewtonian instantaneous action
Sequential consistencyWell-defined causal order
Causal consistencyRelativistic causality
Eventual consistencyDiffusive equilibration

Causal consistency is the maximum achievable in physics. Information cannot travel faster than light.

The CAP Theorem in Physics

The CAP theorem has a physical analogue. You cannot simultaneously have instantaneous global equilibrium (C), continuous local dynamics (A), and finite propagation speed (P).

RegimeSacrificeKeep
Thermodynamics (quasi-static)AC + P
Hydrodynamics (local equilibrium)Global CA + P
Newtonian gravityPC + A
RelativityInstantaneous CA + P

Further Correspondences

Vector clocks correspond to light cone causal structure. Quorum intersection corresponds to percolation and connected clusters. Byzantine fault tolerance corresponds to topological protection, where the system’s resilience is a global property that cannot be defeated by local perturbations. Replication strategies correspond to statistical ensembles.

The deepest connection is that consensus is a phase transition. The decision emerges from local interactions just as magnetization emerges from spin flips. Protocol parameters (timeouts, quorum sizes) function like temperature and coupling strength. Tuning them determines whether consensus is achievable.

Open Questions

Several directions remain unexplored.

  1. Can physical stability proofs transfer to session liveness proofs?
  2. When interactions have stochastic rates, does progress become progress in expectation?
  3. What is the session analogue of non-equilibrium steady states?
  4. Do gauge invariances correspond to session polymorphism?
  5. Can renormalization group ideas inform choreographies at different scales?
  6. How do we verify that memory truncation preserves physical properties?
  7. Can engineered phase transitions guide the design of consensus protocols?
  8. Can consistency-latency tradeoffs be quantified using physical principles?

Consensus as Statistical Mechanics

This document presents the synthesis at the heart of Gibbs: consensus protocols are statistical-mechanical systems. For the physical foundations, see Hamiltonian Mechanics and Mean-Field Dynamics. For the information-theoretic bridge, see Information Theory and Channels.

The Analogy

Consider a group of people trying to agree on a decision. Each person has an opinion and can talk to their neighbors, influencing them. The question is whether the group converges to a shared decision.

Now consider a magnet cooling from high temperature. Each atom has a spin (up or down) and interacts with its neighbors. Thermal noise randomly flips spins. The question is whether the spins align into a coherent magnetization.

These problems are analogous. In both cases, local interactions between agents produce (or fail to produce) global order. Interactions are local (message passing or spin coupling). There is noise (faults or thermal fluctuations). And the central question is identical: does macroscopic order emerge from microscopic dynamics?

The mathematics is isomorphic. A consensus protocol defines an energy landscape over possible execution histories, just as a Hamiltonian defines an energy landscape over spin configurations. Agreement is a low-energy state. Disagreement is high-energy. The question of whether consensus is achievable becomes the question of whether the system has a phase transition from disorder to order.

Because of this parallel, we can use well established machinery from physics to understand distributed systems, and visa-versa. For instance, the fault tolerance thresholds that appear in distributed consensus ($f < N/3$ for BFT, $f < N/2$ for static) represent phase boundaries. The distinction between deterministic and probabilistic finality maps to gapped vs gapless phases. And the impossibility results of distributed computing equate to thermodynamic laws.

Agreement as Macroscopic Order

A distributed system of $N$ processes, each supporting a value $\sigma_i \in {-1, +1}$, has a natural order parameter:

$$m = \frac{1}{N} \sum_{i=1}^N \sigma_i$$

When $|m| \approx 0$, there is no consensus (disordered phase). When $|m| \approx 1$, strong consensus has formed (ordered phase). This is the same order parameter as the Ising model. Consensus corresponds to spontaneous symmetry breaking: the system collectively selects one of the two equivalent ordered states.

The magnetization function in Consensus/OrderParameter.lean computes this quantity. The OrderParameter structure wraps it as a state observable.

Consensus PropertyPhysical Analogue
AgreementAll spins align to same direction
ValidityChosen direction is $\pm 1$ (from allowed set)
TerminationSystem reaches ordered phase
MetastabilityPartial consensus or local minima
Leader electionNucleation (first supercritical nucleus wins)

Energy and the Partition Function

An effective Hamiltonian assigns an energy cost to each execution:

$$H(\omega) = H_{\text{conflict}}(\omega) + H_{\text{delay}}(\omega) + H_{\text{fault}}(\omega)$$

The conflict term penalizes disagreement. The delay term penalizes prolonged indecision. The fault term encodes adversarial behavior. Safety invariants correspond to forbidden regions where $H = +\infty$.

The ConsensusHamiltonian structure in Consensus/Hamiltonian.lean bundles these three components. The forbiddenEnergy function assigns $\infty$ to forbidden executions. The energyWeight function maps this to Boltzmann weights, with forbidden executions receiving weight zero.

The partition function sums Boltzmann weights over all admissible executions:

$$Z(\beta) = \sum_{\omega \in \Omega} e^{-\beta H(\omega)}$$

Free energy is $F = -\frac{1}{\beta} \log Z$. The parameter $\beta$ (inverse temperature) interpolates between analysis regimes. Taking $\beta \to \infty$ recovers worst-case (deterministic) analysis. Finite $\beta$ corresponds to average-case (randomized) analysis. The bounds freeEnergy_le_minEnergy and minEnergy_le_freeEnergy_add sandwich the free energy between $\min H$ and $\min H - (\log |\Omega|) / \beta$.

Gap vs No Gap

The central dichotomy is whether a free-energy gap separates ordered from disordered macrostates.

A protocol has a safety gap at fault budget $b$ when the interactive distance between any good macrostate and any bad macrostate exceeds $b$:

$$\min_{M \in \text{Good},; M’ \in \text{Bad}} \Delta_H(M, M’) > b$$

The interactive distance $\Delta_H(M, M’)$ is the minimum corruption effort needed to move from an execution realizing $M$ to one realizing $M’$. This generalizes minimum code distance to the interactive setting.

When a gap exists, reversal probability decays exponentially: $P(\text{reversal}) \sim e^{-\Delta F}$. In the thermodynamic limit ($N \to \infty$), a positive gap makes reversal probability vanish. This is finality.

When no gap exists ($\Delta F = 0$), competing histories coexist at comparable free energy. Reorganizations remain possible. Safety violations have nonzero probability. This is probabilistic finality, where additional confirmations reduce but never eliminate the reversal risk.

Universality Classes

Consensus protocols fall into three universality classes determined by their gap structure.

ClassGapFinalityExamples
I: GaplessNo safety gapProbabilistic only. Reversal probability decays geometrically with confirmation depth $k$ but never reaches zero.Nakamoto
II: GappedDeterministic safety gapDeterministic. Once a quorum certificate forms, alternative histories are forbidden. Gap scales with $N$.BFT, PBFT, Tendermint
III: HybridFast path gapless, fallback gappedProbabilistic fast path with deterministic BFT fallback. Rare violations possible in fast path.Fast finality + BFT fallback

The classOf function in Consensus/UniversalityClasses.lean classifies protocols based on gap and tunneling flags. Microstructural details differ across protocols within a class. Large-scale behavior does not.

Thresholds as Phase Boundaries

The fault thresholds $f < N/2$ (static) and $f < N/3$ (interactive) are not protocol artifacts. They are phase boundaries in the thermodynamic limit.

For static decoding (error-correcting codes), the repetition code of length $N$ has distance $d_{\min} = N$. Unique decoding of $f$ errors requires $2f < N$, giving $N \geq 2f + 1$. The theorem repetition_threshold formalizes this.

For interactive decoding (consensus), quorum intersection provides the additional constraint. With quorum size $q = 2f + 1$ and $N = 3f + 1$:

$$|Q \cap Q’| \geq 2q - N = f + 1$$

Any two quorums overlap in at least $f + 1$ processes, ensuring at least one honest process in the intersection. The theorem quorum_intersection_3f1 proves this. The gap collapses at $N = 3f$ (quorum_gap_collapse): the adversary can place all overlap inside Byzantine nodes.

In the thermodynamic limit, the normalized interactive distance $\delta(\alpha) = \liminf_{N \to \infty} \frac{1}{N} \min \Delta_H(\text{Good}, \text{Bad})$ is positive for $\alpha < 1/3$ and zero for $\alpha \geq 1/3$. The corruption fraction $\alpha = 1/3$ is a phase boundary.

The Unifying Triangle

The deepest statement in the framework is that error-correcting codes, information-theoretic channels, and consensus protocols are three instantiations of the same mathematical structure.

graph TD
    C[Codes] --- I[Channels]
    I --- Co[Consensus]
    Co --- C
    C -->|"distance = gap"| G[Energy Gap]
    I -->|"capacity = threshold"| G
    Co -->|"interactive distance"| G

Codes are non-interactive consensus (single sender, one round). Channels add noise statistics and rate-reliability tradeoffs. Consensus adds interaction and adaptive adversaries, tightening the threshold from $1/2$ to $1/3$.

In all three cases, the energy gap determines whether the system can reliably distinguish between macrostates. A positive gap means unique decoding, reliable communication, or deterministic finality. Zero gap means ambiguity, capacity violation, or impossibility.

This perspective explains why the thresholds are sharp, why they recur across fields, and why additional confirmations in Nakamoto consensus provide probabilistic but never deterministic safety.