From bc81aece53a3ba09aa3342751a6fe71b828d4e0e Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Sat, 30 May 2026 12:39:33 -0700
Subject: [PATCH 01/22] Add spec-layer matrix factorizations (Cholesky, QR,
 symmetric eig, SVD)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Motivation: formalizing Computational Hypergraph Discovery (CHD) in TorchLean.

CHD (prototype: https://github.com/TheoBourdais/ComputationalHypergraphDiscovery)
is a Gaussian-process / kernel method: it fits a kernel ridge regression per node
and prunes "ancestors" by a noise/activation criterion. Essentially every
statistically meaningful quantity it computes — the regression solution, the
gamma (noise) selection, and the Z-test — is derived from the *full* symmetric
eigendecomposition of the kernel matrix K (all eigenvalues AND eigenvectors).

TorchLean's spec layer lacked this. The only eigendecomposition available
(`Spec.eigendecompSpec`) is a power-iteration stub that recovers just the largest
eigenpair, and there were no Cholesky / QR / SVD routines at all. That makes CHD
inexpressible. This commit adds real reference implementations so the kernel
linear algebra CHD depends on can be written in Lean.

New: NN/Spec/Core/Tensor/Factorizations.lean (namespace Spec), executable over
Float / ℝ, shape-indexed like the rest of the spec layer:
- choleskySpec      : A = L · Lᵀ for SPD A (lower-triangular L)
- qrSpec/qrQSpec/qrRSpec : A = Q · R via modified Gram–Schmidt
- symEigJacobiSpec  : FULL symmetric eigendecomposition via cyclic Jacobi
                      (all n eigenpairs) — the replacement for the largest-only
                      stub, which is left untouched so PCA keeps building
- svdSpec           : A = U · diag(σ) · Vᵀ via the eig of Aᵀ·A

The iterative Jacobi loop runs over a strict `Array (Array α)` representation
(converted to/from Spec.Tensor only at the boundary): threading the functional
`Fin n → Fin n → α` representation through one matrix product per rotation builds
deep closure chains that blow up under evaluation, whereas arrays are strict
values and keep execution cheap.

Examples: NN/Examples/Factorization/{Common,Cholesky,QR,SymEig,SVD}.lean (+ the
NN.Examples.Factorization umbrella). Each reconstructs the input from its factors
and asserts (compiled `assertLt` via #eval, which fails the build) that the
maximum reconstruction error is below tolerance. All reconstruct to 0.000000;
SymEig recovers eigenvalues {1.3249, 2.4608, 5.2143} and SVD singular values
{5, 3, 0} as expected.

These are executable reference defs (matching the bar of the existing
determinantSpec / inverseSpec / eigendecompSpec); formal correctness theorems
are a planned follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 NN/Examples/Factorization.lean          |  30 +++
 NN/Examples/Factorization/Cholesky.lean |  42 +++
 NN/Examples/Factorization/Common.lean   |  71 +++++
 NN/Examples/Factorization/QR.lean       |  44 ++++
 NN/Examples/Factorization/SVD.lean      |  50 ++++
 NN/Examples/Factorization/SymEig.lean   |  52 ++++
 NN/Spec/Core/Tensor/Factorizations.lean | 332 ++++++++++++++++++++++++
 7 files changed, 621 insertions(+)
 create mode 100644 NN/Examples/Factorization.lean
 create mode 100644 NN/Examples/Factorization/Cholesky.lean
 create mode 100644 NN/Examples/Factorization/Common.lean
 create mode 100644 NN/Examples/Factorization/QR.lean
 create mode 100644 NN/Examples/Factorization/SVD.lean
 create mode 100644 NN/Examples/Factorization/SymEig.lean
 create mode 100644 NN/Spec/Core/Tensor/Factorizations.lean

diff --git a/NN/Examples/Factorization.lean b/NN/Examples/Factorization.lean
new file mode 100644
index 0000000..7f7be49
--- /dev/null
+++ b/NN/Examples/Factorization.lean
@@ -0,0 +1,30 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Examples.Factorization.Common
+public import NN.Examples.Factorization.Cholesky
+public import NN.Examples.Factorization.QR
+public import NN.Examples.Factorization.SymEig
+public import NN.Examples.Factorization.SVD
+
+/-!
+# Matrix factorization examples
+
+Executable sanity checks for the spec-layer matrix factorizations in
+`NN.Spec.Core.Tensor.Factorizations`:
+
+- `Cholesky` — `A = L · Lᵀ`
+- `QR`       — `A = Q · R`, `Qᵀ·Q = I`
+- `SymEig`   — full symmetric eigendecomposition `A = V · diag(λ) · Vᵀ`
+- `SVD`      — `A = U · diag(σ) · Vᵀ`
+
+Each example reconstructs the original matrix and asserts (via `#guard`) that the maximum
+reconstruction error is below `tol`, so the build fails if a factorization is incorrect.
+-/
+
+@[expose] public section
diff --git a/NN/Examples/Factorization/Cholesky.lean b/NN/Examples/Factorization/Cholesky.lean
new file mode 100644
index 0000000..fc23255
--- /dev/null
+++ b/NN/Examples/Factorization/Cholesky.lean
@@ -0,0 +1,42 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Examples.Factorization.Common
+meta import NN.Examples.Factorization.Common
+
+/-!
+# Example: Cholesky factorization
+
+`choleskySpec A` returns the lower-triangular `L` with `A = L · Lᵀ` for a symmetric
+positive-definite `A`. Here we factor a 3×3 SPD matrix and check the reconstruction error.
+-/
+
+@[expose] public section
+
+
+namespace NN.Examples.Factorization.Cholesky
+
+/-- A symmetric positive-definite test matrix. -/
+def A : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) :=
+  mkMat [[4, 2, 2],
+         [2, 5, 3],
+         [2, 3, 6]]
+
+/-- The Cholesky factor `L` (lower-triangular). -/
+def L : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := Spec.choleskySpec A
+
+/-- Reconstruction error `‖A - L·Lᵀ‖_max`. -/
+def reconErr : Float := maxMatErr A (mm L (tr L))
+
+-- Inspect the diagonal of the factor.
+#eval vecToList (Spec.ofVecFn (fun i : Fin 3 => Spec.get2 L i i))
+
+-- Compiled assertion: the factorization reconstructs A (fails the build otherwise).
+#eval assertLt "Cholesky A = L·Lᵀ" reconErr
+
+end NN.Examples.Factorization.Cholesky
diff --git a/NN/Examples/Factorization/Common.lean b/NN/Examples/Factorization/Common.lean
new file mode 100644
index 0000000..c970e32
--- /dev/null
+++ b/NN/Examples/Factorization/Common.lean
@@ -0,0 +1,71 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Spec.Core.Tensor.Factorizations
+
+/-!
+# Factorization examples — shared helpers
+
+Small `Float`-valued helpers used by the matrix-factorization examples
+(`Cholesky`, `QR`, `SymEig`, `SVD`). These examples are *executable sanity checks*: each one
+reconstructs the original matrix from its factors and asserts (via `#guard`) that the maximum
+entrywise reconstruction error is below a tolerance, so the build fails if a factorization is wrong.
+
+These run over `Float` (the executable 64-bit runtime scalar), which is the precision the
+factorizations target for Gaussian-process / kernel-method use.
+-/
+
+@[expose] public section
+
+
+namespace NN.Examples.Factorization
+
+/-- Build an `m × n` `Float` matrix tensor from a row-major nested list. Missing entries are `0`. -/
+def mkMat {m n : Nat} (rows : List (List Float)) : Spec.Tensor Float (.dim m (.dim n .scalar)) :=
+  Spec.ofMatFn (fun i j => (rows.getD i.val []).getD j.val 0.0)
+
+/-- Maximum entrywise absolute difference between two `m × n` matrices. -/
+def maxMatErr {m n : Nat} (A B : Spec.Tensor Float (.dim m (.dim n .scalar))) : Float :=
+  (List.finRange m).foldl (fun acc i =>
+    (List.finRange n).foldl
+      (fun a j => max a (Float.abs (Spec.get2 A i j - Spec.get2 B i j))) acc) 0.0
+
+/-- Matrix product `A · B` (thin wrapper over `matMulSpec`). -/
+def mm {m n p : Nat} (A : Spec.Tensor Float (.dim m (.dim n .scalar)))
+    (B : Spec.Tensor Float (.dim n (.dim p .scalar))) : Spec.Tensor Float (.dim m (.dim p .scalar)) :=
+  Spec.matMulSpec A B
+
+/-- Matrix transpose. -/
+def tr {m n : Nat} (A : Spec.Tensor Float (.dim m (.dim n .scalar))) :
+    Spec.Tensor Float (.dim n (.dim m .scalar)) :=
+  Spec.Tensor.matrixTransposeSpec A
+
+/-- Turn a length-`n` vector into an `n × n` diagonal matrix. -/
+def diagFromVec {n : Nat} (v : Spec.Tensor Float (.dim n .scalar)) :
+    Spec.Tensor Float (.dim n (.dim n .scalar)) :=
+  Spec.ofMatFn (fun i j => if i.val == j.val then Spec.Tensor.toScalar (Spec.get v i) else 0.0)
+
+/-- Read a vector tensor back out as a `List Float` (for display). -/
+def vecToList {n : Nat} (v : Spec.Tensor Float (.dim n .scalar)) : List Float :=
+  (List.finRange n).map (fun i => Spec.Tensor.toScalar (Spec.get v i))
+
+/-- Shared tolerance for reconstruction-error assertions. -/
+def tol : Float := 1e-6
+
+/--
+Compiled assertion used by the examples: print `name: OK (err)` when `err < tol`, otherwise raise an
+`IO` error so the build/`#eval` fails. Running this through `#eval` evaluates with the compiler
+(fast), unlike `#guard`, which forces slow kernel reduction of the whole factorization.
+-/
+def assertLt (name : String) (err : Float) (tolerance : Float := tol) : IO Unit :=
+  if err < tolerance then
+    IO.println s!"{name}: OK (err = {err})"
+  else
+    throw (IO.userError s!"{name}: FAIL (err = {err} ≥ tol = {tolerance})")
+
+end NN.Examples.Factorization
diff --git a/NN/Examples/Factorization/QR.lean b/NN/Examples/Factorization/QR.lean
new file mode 100644
index 0000000..0080de7
--- /dev/null
+++ b/NN/Examples/Factorization/QR.lean
@@ -0,0 +1,44 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Examples.Factorization.Common
+meta import NN.Examples.Factorization.Common
+
+/-!
+# Example: QR factorization
+
+`qrSpec A` returns `(Q, R)` with `A = Q · R`, `Q` having orthonormal columns and `R`
+upper-triangular (modified Gram–Schmidt). We check both `A = Q·R` and `Qᵀ·Q = I`.
+-/
+
+@[expose] public section
+
+
+namespace NN.Examples.Factorization.QR
+
+/-- A 3×3 test matrix (the classic Householder/QR example). -/
+def A : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) :=
+  mkMat [[12, -51, 4],
+         [6, 167, -68],
+         [-4, 24, -41]]
+
+/-- Orthonormal `Q` factor. -/
+def Q : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := Spec.qrQSpec A
+/-- Upper-triangular `R` factor. -/
+def R : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := Spec.qrRSpec A
+
+/-- Reconstruction error `‖A - Q·R‖_max`. -/
+def reconErr : Float := maxMatErr A (mm Q R)
+/-- Orthonormality error `‖Qᵀ·Q - I‖_max`. -/
+def orthoErr : Float := maxMatErr (mm (tr Q) Q) (Spec.identityTensorSpec 3)
+
+-- Compiled assertions (fail the build otherwise).
+#eval assertLt "QR A = Q·R" reconErr
+#eval assertLt "QR Qᵀ·Q = I" orthoErr
+
+end NN.Examples.Factorization.QR
diff --git a/NN/Examples/Factorization/SVD.lean b/NN/Examples/Factorization/SVD.lean
new file mode 100644
index 0000000..0b2369c
--- /dev/null
+++ b/NN/Examples/Factorization/SVD.lean
@@ -0,0 +1,50 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Examples.Factorization.Common
+meta import NN.Examples.Factorization.Common
+
+/-!
+# Example: singular value decomposition
+
+`svdSpec A sweeps` returns `(U, σ, V)` with `A = U · diag(σ) · Vᵀ`. The singular values come from
+the symmetric eigendecomposition of `Aᵀ·A`. We check the reconstruction of a 2×3 matrix whose
+singular values are `{5, 3, 0}`.
+-/
+
+@[expose] public section
+
+
+namespace NN.Examples.Factorization.SVD
+
+/-- A 2×3 test matrix with singular values `{5, 3}` (third is `0` since rank 2 < 3). -/
+def A : Spec.Tensor Float (.dim 2 (.dim 3 .scalar)) :=
+  mkMat [[3, 2, 2],
+         [2, 3, -2]]
+
+/-- `(U, σ, V)` from the SVD. -/
+def svd : Spec.Tensor Float (.dim 2 (.dim 3 .scalar)) × Spec.Tensor Float (.dim 3 .scalar) ×
+    Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) :=
+  Spec.svdSpec A 12
+
+/-- Left singular vectors `U` (2×3). -/
+def U : Spec.Tensor Float (.dim 2 (.dim 3 .scalar)) := svd.1
+/-- Singular values `σ`. -/
+def σ : Spec.Tensor Float (.dim 3 .scalar) := svd.2.1
+/-- Right singular vectors `V` (3×3). -/
+def V : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := svd.2.2
+
+/-- Reconstruction error `‖A - U·diag(σ)·Vᵀ‖_max`. -/
+def reconErr : Float := maxMatErr A (mm (mm U (diagFromVec σ)) (tr V))
+
+#eval vecToList σ
+
+-- Compiled assertion (fails the build otherwise).
+#eval assertLt "SVD A = U·diag(σ)·Vᵀ" reconErr
+
+end NN.Examples.Factorization.SVD
diff --git a/NN/Examples/Factorization/SymEig.lean b/NN/Examples/Factorization/SymEig.lean
new file mode 100644
index 0000000..5426e07
--- /dev/null
+++ b/NN/Examples/Factorization/SymEig.lean
@@ -0,0 +1,52 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Examples.Factorization.Common
+meta import NN.Examples.Factorization.Common
+
+/-!
+# Example: symmetric eigendecomposition (cyclic Jacobi)
+
+`symEigJacobiSpec A sweeps` returns `(eigenvalues, V)` for a symmetric `A`, where the columns of
+`V` are the (orthonormal) eigenvectors. Unlike the power-iteration `eigendecompSpec`, this recovers
+**all** eigenpairs. We check the spectral reconstruction `A = V · diag(λ) · Vᵀ` and orthogonality
+`Vᵀ · V = I`.
+-/
+
+@[expose] public section
+
+
+namespace NN.Examples.Factorization.SymEig
+
+/-- A symmetric test matrix (eigenvalues ≈ {1.3249, 2.4608, 5.2143}). -/
+def A : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) :=
+  mkMat [[2, 1, 1],
+         [1, 3, 1],
+         [1, 1, 4]]
+
+/-- Eigenvalues (diagonal after Jacobi sweeps) and eigenvector matrix `V`. -/
+def eig : Spec.Tensor Float (.dim 3 .scalar) × Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) :=
+  Spec.symEigJacobiSpec A 8
+
+/-- Eigenvalues. -/
+def evals : Spec.Tensor Float (.dim 3 .scalar) := eig.1
+/-- Eigenvector matrix (columns are eigenvectors). -/
+def V : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := eig.2
+
+/-- Spectral reconstruction error `‖A - V·diag(λ)·Vᵀ‖_max`. -/
+def reconErr : Float := maxMatErr A (mm (mm V (diagFromVec evals)) (tr V))
+/-- Orthogonality error `‖Vᵀ·V - I‖_max`. -/
+def orthoErr : Float := maxMatErr (mm (tr V) V) (Spec.identityTensorSpec 3)
+
+#eval vecToList evals
+
+-- Compiled assertions (fail the build otherwise).
+#eval assertLt "SymEig A = V·diag(λ)·Vᵀ" reconErr
+#eval assertLt "SymEig Vᵀ·V = I" orthoErr
+
+end NN.Examples.Factorization.SymEig
diff --git a/NN/Spec/Core/Tensor/Factorizations.lean b/NN/Spec/Core/Tensor/Factorizations.lean
new file mode 100644
index 0000000..0fbe845
--- /dev/null
+++ b/NN/Spec/Core/Tensor/Factorizations.lean
@@ -0,0 +1,332 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Spec.Core.Tensor.Linalg
+public import NN.Spec.Core.TensorReductionShape.LinearAlgebra
+
+/-!
+# Matrix factorizations (spec layer)
+
+This file provides **real**, shape-indexed reference implementations of the matrix
+factorizations that classical / scientific-ML models (Gaussian processes, kernel ridge
+regression, PCA, least squares) depend on, and which were previously missing from the spec
+layer:
+
+- `choleskySpec`     — Cholesky factorization `A = L · Lᵀ` (lower-triangular `L`), for SPD `A`.
+- `qrSpec`           — QR factorization `A = Q · R` via modified Gram–Schmidt
+                       (`Q` has orthonormal columns, `R` upper-triangular).
+- `symEigJacobiSpec` — **full** symmetric eigendecomposition via the cyclic Jacobi algorithm
+                       (all eigenpairs, not just the largest).
+- `svdSpec`          — singular value decomposition `A = U · diag(σ) · Vᵀ`, built on the
+                       symmetric eigendecomposition of `Aᵀ·A`.
+
+## Relationship to `eigendecompSpec`
+
+`Spec.eigendecompSpec` (in `NN/Spec/Models/CommonHelpers.lean`) is a power-iteration *stub* that
+only recovers the **largest** eigenpair. It is intentionally left untouched (PCA depends on it).
+`symEigJacobiSpec` here is the full replacement: for a symmetric matrix it returns *all* `n`
+eigenvalues and an orthogonal matrix of eigenvectors.
+
+## Intent / tradeoffs
+
+Like the rest of the spec layer (`determinantSpec`, `inverseSpec`, `matMulSpec`), these prioritize
+**mathematical clarity** and **shape safety** over performance, and are intended for small/medium
+matrices and proof-oriented reference code. For large-scale numerics, use array-backed runtime
+kernels.
+
+Internally the algorithms are written over the plain function representation
+`Fin n → Fin n → α` (matrices) and `Fin n → α` (vectors), then wrapped back into `Spec.Tensor`
+at the boundary. This keeps the numerical formulas readable and keeps later correctness proofs
+working on ordinary functions rather than on nested `Tensor` `match`es.
+
+The iterative routines (Jacobi) take an explicit `sweeps` count: convergence of Jacobi is
+asymptotic, so the caller chooses how much work to do. A dozen sweeps is ample for the small
+matrices these specs target.
+-/
+
+@[expose] public section
+
+
+namespace Spec
+
+open Tensor
+
+variable {α : Type} [Context α]
+
+/-! ## Boundary conversions between `Spec.Tensor` and plain functions -/
+
+/-- View a matrix tensor as a function `Fin m → Fin n → α`. -/
+def toMatFn {m n : Nat} (A : Tensor α (.dim m (.dim n .scalar))) : Fin m → Fin n → α :=
+  fun i j => get2 A i j
+
+/-- Build a matrix tensor from a function `Fin m → Fin n → α`. -/
+def ofMatFn {m n : Nat} (f : Fin m → Fin n → α) : Tensor α (.dim m (.dim n .scalar)) :=
+  Tensor.dim (fun i => Tensor.dim (fun j => Tensor.scalar (f i j)))
+
+/-- View a vector tensor as a function `Fin n → α`. -/
+def toVecFn {n : Nat} (v : Tensor α (.dim n .scalar)) : Fin n → α :=
+  fun i => Tensor.toScalar (get v i)
+
+/-- Build a vector tensor from a function `Fin n → α`. -/
+def ofVecFn {n : Nat} (f : Fin n → α) : Tensor α (.dim n .scalar) :=
+  Tensor.dim (fun i => Tensor.scalar (f i))
+
+/-! ## Small numeric helpers on the function representation -/
+
+/-- Dot product of two length-`p` vectors. -/
+def dotFn {p : Nat} (u v : Fin p → α) : α :=
+  (List.finRange p).foldl (fun s i => s + u i * v i) 0
+
+/-- Euclidean norm of a length-`p` vector. -/
+def normFn {p : Nat} (v : Fin p → α) : α :=
+  MathFunctions.sqrt (dotFn v v)
+
+/-- Decide `x < y` as a `Bool` (via the `Context`'s decidable `>`). -/
+def ltBool (x y : α) : Bool := Context.gtBool y x
+
+/-! ## Cholesky factorization
+
+For a symmetric positive-definite `A`, compute the lower-triangular `L` with `A = L · Lᵀ`.
+
+The columns are computed left to right. Column `j` uses only columns `0 .. j-1`:
+
+- diagonal:  `L[j,j] = sqrt(A[j,j] - Σ_{k<j} L[j,k]²)`
+- below:     `L[i,j] = (A[i,j] - Σ_{k<j} L[i,k]·L[j,k]) / L[j,j]`   for `i > j`
+- above:     `L[i,j] = 0`                                           for `i < j`
+-/
+
+/--
+The list of columns of the Cholesky factor `L`, as length-`n` vectors, computed left to right.
+Element `j` of the result is column `j` of `L`. Built by a left fold so that when column `j` is
+formed, `cols` already holds columns `0 .. j-1`.
+-/
+def choleskyColsFn {n : Nat} (A : Fin n → Fin n → α) : List (Fin n → α) :=
+  (List.finRange n).foldl (fun cols j =>
+    -- Σ_{k<j} L[j,k]²  (the already-computed columns evaluated at row `j`).
+    let sumsq := (cols.map (fun ck => ck j)).foldl (fun s x => s + x * x) 0
+    let Ljj := MathFunctions.sqrt (A j j - sumsq)
+    let colj : Fin n → α := fun i =>
+      if i.val < j.val then 0
+      else if i.val == j.val then Ljj
+      else
+        -- Σ_{k<j} L[i,k]·L[j,k]
+        let s := (cols.map (fun ck => ck i * ck j)).foldl (fun acc x => acc + x) 0
+        (A i j - s) / Ljj
+    cols ++ [colj]) []
+
+/-- Cholesky factor as a function: `L[i,j] = (choleskyColsFn A)[j] i`. -/
+def choleskyFn {n : Nat} (A : Fin n → Fin n → α) : Fin n → Fin n → α :=
+  let cols := choleskyColsFn A
+  fun i j => (cols.getD j.val (fun _ => 0)) i
+
+/--
+Cholesky factorization of a symmetric positive-definite matrix `A`, returning the
+lower-triangular factor `L` with `A = L · Lᵀ`.
+
+PyTorch analogue: `torch.linalg.cholesky(A)`.
+-/
+def choleskySpec {n : Nat} (A : Tensor α (.dim n (.dim n .scalar))) :
+    Tensor α (.dim n (.dim n .scalar)) :=
+  ofMatFn (choleskyFn (toMatFn A))
+
+/-! ## QR factorization (modified Gram–Schmidt)
+
+For `A : m × n`, produce `Q : m × n` with orthonormal columns and `R : n × n` upper-triangular
+such that `A = Q · R`. Modified Gram–Schmidt is used for better numerical behavior than the
+classical variant.
+-/
+
+/-- Internal state for the Gram–Schmidt fold: computed `Q` columns and `R` columns so far. -/
+structure GSState (m n : Nat) (α : Type) where
+  /-- Orthonormal `Q` columns produced so far (each of length `m`). -/
+  qs : List (Fin m → α)
+  /-- `R` columns produced so far (each of length `n`, upper-triangular). -/
+  rcols : List (Fin n → α)
+
+/--
+Run modified Gram–Schmidt over the columns of `A`, returning the `Q` columns and `R` columns.
+Column `j` is orthogonalized against the previously produced `Q` columns.
+-/
+def gramSchmidtFn {m n : Nat} (A : Fin m → Fin n → α) : GSState m n α :=
+  (List.finRange n).foldl (fun (st : GSState m n α) j =>
+    let a : Fin m → α := fun i => A i j
+    -- r[k,j] = qₖ · a   for each previously computed column k
+    let rkjs : List α := st.qs.map (fun qk => dotFn qk a)
+    -- v = a - Σ r[k,j] qₖ
+    let v : Fin m → α := fun i =>
+      a i - (List.zip st.qs rkjs).foldl (fun acc (qk, r) => acc + r * qk i) 0
+    let rjj := normFn v
+    let qj : Fin m → α := fun i => if Context.gtBool rjj 0 then v i / rjj else 0
+    let rcolj : Fin n → α := fun k =>
+      if k.val < j.val then rkjs.getD k.val 0
+      else if k.val == j.val then rjj
+      else 0
+    { qs := st.qs ++ [qj], rcols := st.rcols ++ [rcolj] }) { qs := [], rcols := [] }
+
+/-- The `Q` factor (orthonormal columns) of the QR factorization of `A`. -/
+def qrQSpec {m n : Nat} (A : Tensor α (.dim m (.dim n .scalar))) :
+    Tensor α (.dim m (.dim n .scalar)) :=
+  let st := gramSchmidtFn (toMatFn A)
+  ofMatFn (fun i j => (st.qs.getD j.val (fun _ => 0)) i)
+
+/-- The `R` factor (upper-triangular) of the QR factorization of `A`. -/
+def qrRSpec {m n : Nat} (A : Tensor α (.dim m (.dim n .scalar))) :
+    Tensor α (.dim n (.dim n .scalar)) :=
+  let st := gramSchmidtFn (toMatFn A)
+  ofMatFn (fun k j => (st.rcols.getD j.val (fun _ => 0)) k)
+
+/--
+QR factorization of `A : m × n` via modified Gram–Schmidt, returning `(Q, R)` with
+`A = Q · R`, `Q` orthonormal columns, `R` upper-triangular.
+
+PyTorch analogue: `torch.linalg.qr(A)`.
+-/
+def qrSpec {m n : Nat} (A : Tensor α (.dim m (.dim n .scalar))) :
+    Tensor α (.dim m (.dim n .scalar)) × Tensor α (.dim n (.dim n .scalar)) :=
+  (qrQSpec A, qrRSpec A)
+
+/-! ## Symmetric eigendecomposition (cyclic Jacobi)
+
+For a symmetric `A`, iteratively apply Givens rotations `J` that zero one off-diagonal entry at a
+time, accumulating `A ← Jᵀ A J` and `V ← V J`. Each `J` is orthogonal, so every step is an
+orthogonal similarity: the spectrum is preserved and `V` stays orthogonal. After enough sweeps the
+off-diagonal mass vanishes; the diagonal holds the eigenvalues and the columns of `V` are the
+eigenvectors.
+-/
+
+/-!
+The iteration below runs over an `Array (Array α)` representation rather than `Fin n → Fin n → α`.
+Arrays are strict values, so threading them through the rotation loop cannot build the deep closure
+chains that a functional representation would (one matrix product per rotation), which is what keeps
+execution cheap. We convert to/from `Spec.Tensor` only at the boundary.
+-/
+
+/-- Read entry `(i, j)` of an `Array (Array α)` matrix (`0` if out of bounds). -/
+def arrGet (M : Array (Array α)) (i j : Nat) : α := (M.getD i #[]).getD j 0
+
+/-- Materialize a matrix function into a strict `Array (Array α)`. -/
+def matToArr {n : Nat} (X : Fin n → Fin n → α) : Array (Array α) :=
+  Array.ofFn (fun i : Fin n => Array.ofFn (fun j : Fin n => X i j))
+
+/-- Matrix product `X · Y` of two `n × n` array matrices. -/
+def arrMatMul (n : Nat) (X Y : Array (Array α)) : Array (Array α) :=
+  Array.ofFn (fun i : Fin n => Array.ofFn (fun j : Fin n =>
+    (List.finRange n).foldl (fun s k => s + arrGet X i.val k.val * arrGet Y k.val j.val) 0))
+
+/-- Transpose of an `n × n` array matrix. -/
+def arrTr (n : Nat) (X : Array (Array α)) : Array (Array α) :=
+  Array.ofFn (fun i : Fin n => Array.ofFn (fun j : Fin n => arrGet X j.val i.val))
+
+/-- `n × n` identity as an array matrix. -/
+def arrId (n : Nat) : Array (Array α) :=
+  Array.ofFn (fun i : Fin n => Array.ofFn (fun j : Fin n => if i.val == j.val then 1 else 0))
+
+/--
+Givens rotation in the `(p, q)` plane as an array matrix:
+identity except `J[p,p]=J[q,q]=c`, `J[p,q]=s`, `J[q,p]=-s`.
+-/
+def arrGivens (n : Nat) (p q : Nat) (c s : α) : Array (Array α) :=
+  Array.ofFn (fun i : Fin n => Array.ofFn (fun j : Fin n =>
+    if i.val == p && j.val == p then c
+    else if i.val == q && j.val == q then c
+    else if i.val == p && j.val == q then s
+    else if i.val == q && j.val == p then -s
+    else if i.val == j.val then 1 else 0))
+
+/--
+Apply one Jacobi rotation that targets off-diagonal entry `(p, q)`, updating `(A, V)` as strict
+arrays. If `A[p,q]` is already (numerically) zero, the state is returned unchanged.
+
+The rotation parameters follow Golub & Van Loan:
+`τ = (A[q,q] - A[p,p]) / (2 A[p,q])`, `t = sign(τ)/(|τ| + sqrt(1+τ²))` (or `1` if `τ = 0`),
+`c = 1/sqrt(1+t²)`, `s = t·c`.
+-/
+def arrJacobiRotate (n : Nat) (A V : Array (Array α)) (p q : Nat) :
+    Array (Array α) × Array (Array α) :=
+  let apq := arrGet A p q
+  if Context.gtBool (MathFunctions.abs apq) 0 then
+    let τ := (arrGet A q q - arrGet A p p) / (Numbers.two * apq)
+    let absτ := MathFunctions.abs τ
+    let sgn : α := if ltBool τ 0 then Numbers.neg_one else 1
+    let t : α :=
+      if Context.gtBool absτ 0 then sgn / (absτ + MathFunctions.sqrt (1 + τ * τ)) else 1
+    let c := 1 / MathFunctions.sqrt (1 + t * t)
+    let s := t * c
+    let J := arrGivens n p q c s
+    (arrMatMul n (arrTr n J) (arrMatMul n A J), arrMatMul n V J)
+  else
+    (A, V)
+
+/-- All index pairs `(p, q)` with `p < q`, in row-major order (one cyclic Jacobi sweep). -/
+def jacobiPairs (n : Nat) : List (Nat × Nat) :=
+  (List.range n).flatMap (fun p =>
+    (List.range n).filterMap (fun q => if p < q then some (p, q) else none))
+
+/-- One Jacobi sweep: rotate through every `(p, q)` pair with `p < q`. -/
+def arrJacobiSweep (n : Nat) (st : Array (Array α) × Array (Array α)) :
+    Array (Array α) × Array (Array α) :=
+  (jacobiPairs n).foldl (fun s pq => arrJacobiRotate n s.1 s.2 pq.1 pq.2) st
+
+/-- Run `sweeps` Jacobi sweeps starting from `(A, I)`, returning the rotated `A` and accumulated `V`. -/
+def arrJacobiRun (n : Nat) (A : Array (Array α)) (sweeps : Nat) :
+    Array (Array α) × Array (Array α) :=
+  (List.range sweeps).foldl (fun st _ => arrJacobiSweep n st) (A, arrId n)
+
+/--
+Full symmetric eigendecomposition of `A` via cyclic Jacobi, returning `(eigenvalues, eigenvectors)`.
+
+The eigenvalues are the diagonal of the rotated matrix; the eigenvectors are the **columns** of the
+returned matrix `V` (so `eigenvectors[i, j]` is the `i`-th component of the `j`-th eigenvector).
+`sweeps` controls how many Jacobi sweeps to run (default `12`).
+
+Unlike `eigendecompSpec`, this recovers **all** `n` eigenpairs.
+
+PyTorch analogue: `torch.linalg.eigh(A)`.
+-/
+def symEigJacobiSpec {n : Nat} (A : Tensor α (.dim n (.dim n .scalar))) (sweeps : Nat := 12) :
+    Tensor α (.dim n .scalar) × Tensor α (.dim n (.dim n .scalar)) :=
+  let (Af, Vf) := arrJacobiRun n (matToArr (toMatFn A)) sweeps
+  (ofVecFn (fun i => arrGet Af i.val i.val), ofMatFn (fun i j => arrGet Vf i.val j.val))
+
+/-! ## Singular value decomposition
+
+For `A : m × n`, form the symmetric `M = Aᵀ·A : n × n`, eigendecompose it as `M = V Λ Vᵀ`,
+take `σ = sqrt(max(Λ, 0))`, and recover `U` columns as `uⱼ = A vⱼ / σⱼ` (zero when `σⱼ = 0`).
+Then `A = U · diag(σ) · Vᵀ`. This is the simplest reference SVD and is exact (up to the Jacobi
+sweep count) for `A` of full column rank.
+-/
+
+/--
+Singular value decomposition of `A : m × n` returning `(U, σ, V)` with
+`A = U · diag(σ) · Vᵀ`, `U : m × n` with orthonormal columns (full-rank case), `σ : n` the singular
+values, and `V : n × n` orthogonal.
+
+`sweeps` controls the Jacobi sweep count used for the eigendecomposition of `Aᵀ·A`.
+
+PyTorch analogue: `torch.linalg.svd(A, full_matrices=False)`.
+-/
+def svdSpec {m n : Nat} (A : Tensor α (.dim m (.dim n .scalar))) (sweeps : Nat := 12) :
+    Tensor α (.dim m (.dim n .scalar)) × Tensor α (.dim n .scalar) ×
+      Tensor α (.dim n (.dim n .scalar)) :=
+  let Af := toMatFn A
+  -- M = Aᵀ A  (n × n, symmetric PSD), as a strict array matrix
+  let M : Array (Array α) :=
+    Array.ofFn (fun i : Fin n => Array.ofFn (fun j : Fin n =>
+      (List.finRange m).foldl (fun s k => s + Af k i * Af k j) 0))
+  let (Mf, Vf) := arrJacobiRun n M sweeps
+  let σ : Fin n → α := fun j =>
+    let d := arrGet Mf j.val j.val
+    MathFunctions.sqrt (if ltBool d 0 then 0 else d)
+  let U : Fin m → Fin n → α := fun i j =>
+    let sj := σ j
+    if Context.gtBool sj 0 then
+      ((List.finRange n).foldl (fun s k => s + Af i k * arrGet Vf k.val j.val) 0) / sj
+    else 0
+  (ofMatFn U, ofVecFn σ, ofMatFn (fun i j => arrGet Vf i.val j.val))
+
+end Spec

From 35923c4cbc47241741911a3dea7d0f356e46c707 Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Sat, 30 May 2026 15:13:00 -0700
Subject: [PATCH 02/22] Add correctness theorems for matrix factorizations (CHD
 foundation)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Formal verification companion to the spec-layer factorizations added in e0d08ac.
The motivation is Computational Hypergraph Discovery
(https://github.com/TheoBourdais/ComputationalHypergraphDiscovery), a Gaussian-
process / kernel-ridge method whose numerical core is the full symmetric
eigendecomposition of a kernel matrix K (solve_variationnal, find_gamma, Z_test).
This commit gives that core a verified linear-algebra foundation.

New: NN/Proofs/Tensor/Basic/Factorizations.lean (sorry-free, over ℝ via Mathlib).
A refinement architecture: specification predicates on Mathlib matrices, with the
CHD consequences proved from the spec independent of the float algorithm.

- Specifications: IsCholesky, IsQR, IsSymEig, IsSVD.
- CHD foundation (consumed by solve_variationnal / find_gamma / Z_test):
  * IsSymEig.add_smul_inv — the regularized inverse
    (K + γI)⁻¹ = V·diag(1/(λ+γ))·Vᵀ, proved from orthogonality of V.
  * IsSymEig.trace_eq / det_eq (trace K = Σλ, det K = Πλ), isHermitian.
  * IsSVD.gram_isSymEig — an SVD of A is an eigendecomposition of the Gram
    matrix AᵀA with eigenvalues σ², the form CHD actually builds.
- Exact algorithm invariants: trace_orthogonal_conj / det_orthogonal_conj
  (every Jacobi sweep is a spectrum-preserving orthogonal similarity),
  givens_normSq (c²+s²=1), choleskyFn_lower_triangular (via a reusable
  List.foldl indexing lemma getD_foldl_finRange).
- Residual certificate (Tier D): symEig_reconstruction_residual and
  symEig_frobenius_residual prove the reconstruction error equals the
  off-diagonal mass of the rotated matrix exactly; isSymEig_of_diagonal closes
  the loop in the zero-residual limit. This replaces an impossible a-priori
  convergence proof (Mathlib v4.30.0 has no Jacobi convergence theory, and Float
  never diagonalizes exactly), matching the runtime assertLt checks.

Scope honesty: the exact algebraic reconstruction of the finite float folds
(A=L·Lᵀ, A=QR) is documented as the remaining increment (it needs a prefix-fold
induction plus per-pivot positivity); the spec-level facts CHD relies on do not
depend on it.

Blueprint: new chapter Ch4_Verification/Factorizations.lean ("Matrix
Factorizations for Kernel Methods"), registered in Guide.lean and cross-linked
from ScientificMLVerification.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 NN/Proofs/Tensor/Basic.lean                   |   1 +
 NN/Proofs/Tensor/Basic/Factorizations.lean    | 387 ++++++++++++++++++
 blueprint/TorchLeanBlueprint/Guide.lean       |   3 +
 .../Ch4_Verification/Factorizations.lean      | 115 ++++++
 .../ScientificMLVerification.lean             |   6 +
 5 files changed, 512 insertions(+)
 create mode 100644 NN/Proofs/Tensor/Basic/Factorizations.lean
 create mode 100644 blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean

diff --git a/NN/Proofs/Tensor/Basic.lean b/NN/Proofs/Tensor/Basic.lean
index f9a7b2f..c079721 100644
--- a/NN/Proofs/Tensor/Basic.lean
+++ b/NN/Proofs/Tensor/Basic.lean
@@ -9,6 +9,7 @@ module
 public import NN.Proofs.Tensor.Basic.Core
 public import NN.Proofs.Tensor.Basic.Folds
 public import NN.Proofs.Tensor.Basic.LinearAlgebra
+public import NN.Proofs.Tensor.Basic.Factorizations
 public import NN.Proofs.Tensor.Basic.BoundsNorms
 public import NN.Proofs.Tensor.Basic.Algebra
 
diff --git a/NN/Proofs/Tensor/Basic/Factorizations.lean b/NN/Proofs/Tensor/Basic/Factorizations.lean
new file mode 100644
index 0000000..e4f5fc6
--- /dev/null
+++ b/NN/Proofs/Tensor/Basic/Factorizations.lean
@@ -0,0 +1,387 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Spec.Core.Tensor.Factorizations
+public import NN.Proofs.Tensor.Basic.LinearAlgebra
+public import Mathlib.Analysis.Matrix.Spectrum
+public import Mathlib.Analysis.Matrix.PosDef
+public import Mathlib.Analysis.Matrix.HermitianFunctionalCalculus
+public import Mathlib.LinearAlgebra.Matrix.PosDef
+public import Mathlib.LinearAlgebra.UnitaryGroup
+public import Mathlib.LinearAlgebra.Matrix.NonsingularInverse
+public import Mathlib.Data.List.GetD
+
+/-!
+# Correctness of the matrix factorizations (foundation for CHD)
+
+This file provides the **formal correctness theorems** for the spec-layer factorizations in
+[`NN.Spec.Core.Tensor.Factorizations`](../../../Spec/Core/Tensor/Factorizations.lean)
+(`choleskySpec`, `qrSpec`, `symEigJacobiSpec`, `svdSpec`). The motivation is
+[Computational Hypergraph Discovery](https://github.com/TheoBourdais/ComputationalHypergraphDiscovery):
+a Gaussian-process / kernel-ridge method whose numerical core reduces to the **full symmetric
+eigendecomposition** of a kernel matrix `K`. CHD's `solve_variationnal`, `find_gamma` and `Z_test`
+are all expressed through the eigendecomposition of `K`, so a verified linear-algebra foundation is a
+prerequisite for formalizing CHD.
+
+## Architecture (refinement)
+
+* **Specifications** (`IsCholesky`, `IsQR`, `IsSymEig`, `IsSVD`) are `Prop`s on Mathlib
+  `Matrix (Fin n) (Fin n) ℝ`. Mathlib's `Matrix m n α` is *definitionally* `m → n → α`, so the
+  function representation `Spec.toMatFn` produced by the executable specs bridges for free.
+* **Foundation theorems** (this is what CHD consumes) are proved from the *specifications*, independent
+  of the executable algorithm, via Mathlib's spectral theorem and continuous functional calculus.
+* **Algorithm theorems** connect the executable `Spec.*Fn` defs to the specifications. Proven here:
+  the Cholesky factor is lower-triangular (`choleskyFn_lower_triangular`); the Jacobi/SVD routines
+  satisfy their *exact* invariants — orthogonal similarity preserves trace/determinant
+  (`trace_orthogonal_conj`, `det_orthogonal_conj`), the Givens rotation is orthogonal
+  (`givens_normSq`), and the eigendecomposition is exact in the zero-residual limit
+  (`isSymEig_of_diagonal`), with the finite-sweep error captured a-posteriori by
+  `symEig_frobenius_residual`.
+
+## Scope honesty
+
+`A = V · diag(λ) · Vᵀ` is **not** an exact theorem for the finite-sweep / floating-point Jacobi output;
+it is the *target* certified at runtime by the `assertLt` checks in `NN/Examples/Factorization`, and
+bounded a-posteriori here by `symEig_frobenius_residual` (residual = off-diagonal mass of `Af`).
+Mathlib v4.30.0 has no Jacobi convergence theory and `Float` never diagonalizes exactly, so no
+a-priori convergence theorem is possible.
+
+The exact algebraic reconstruction of the executable *finite* factorizations — `A = L · Lᵀ` for
+`choleskyFn` (under SPD pivots) and `A = Q · R`, `Qᵀ Q = 1` for `gramSchmidtFn` (under full column
+rank) — is the remaining increment: it requires an induction relating the `List.foldl` prefix at step
+`j` to the first `j` columns (extending `getD_foldl_finRange`) plus the per-pivot positivity discharge.
+The specification-level consequences CHD needs (above) are independent of that algorithmic step.
+-/
+
+@[expose] public section
+
+namespace Spec.Factorization
+
+open Matrix
+open scoped BigOperators
+
+variable {n : Nat}
+
+/-! ## Specifications
+
+The mathematical meaning of each factorization, as a predicate over real matrices. Over `ℝ`,
+`star = id` so `conjTranspose = transpose`; we phrase everything with `ᵀ`.
+-/
+
+/-- `L` is a Cholesky factor of `A`: lower-triangular with `A = L · Lᵀ`. -/
+def IsCholesky (A L : Matrix (Fin n) (Fin n) ℝ) : Prop :=
+  (∀ i j, i < j → L i j = 0) ∧ A = L * Lᵀ
+
+/-- `(Q, R)` is a QR factorization of `A`: `Q` has orthonormal columns, `R` is upper-triangular,
+`A = Q · R`. -/
+def IsQR {m k : Nat} (A Q : Matrix (Fin m) (Fin k) ℝ) (R : Matrix (Fin k) (Fin k) ℝ) : Prop :=
+  Qᵀ * Q = 1 ∧ (∀ i j, j < i → R i j = 0) ∧ A = Q * R
+
+/-- `(Λ, V)` is a symmetric eigendecomposition of `A`: `V` orthogonal, `A = V · diag(Λ) · Vᵀ`. -/
+def IsSymEig (A : Matrix (Fin n) (Fin n) ℝ) (Λ : Fin n → ℝ) (V : Matrix (Fin n) (Fin n) ℝ) : Prop :=
+  Vᵀ * V = 1 ∧ A = V * Matrix.diagonal Λ * Vᵀ
+
+/-- `(U, σ, V)` is a (thin) SVD of `A`: `U`, `V` have orthonormal columns, `σ ≥ 0`,
+`A = U · diag(σ) · Vᵀ`. -/
+def IsSVD {m k : Nat} (A U : Matrix (Fin m) (Fin k) ℝ) (σ : Fin k → ℝ)
+    (V : Matrix (Fin k) (Fin k) ℝ) : Prop :=
+  Uᵀ * U = 1 ∧ Vᵀ * V = 1 ∧ (∀ j, 0 ≤ σ j) ∧ A = U * Matrix.diagonal σ * Vᵀ
+
+/-! ## Foundation theorems consumed by CHD
+
+These follow from the *specification*, not from any particular algorithm. -/
+
+/-- A symmetric eigendecomposition exhibits `A` as Hermitian (here: symmetric, over `ℝ`). -/
+theorem IsSymEig.isHermitian {A : Matrix (Fin n) (Fin n) ℝ} {Λ V}
+    (h : IsSymEig A Λ V) : A.IsHermitian := by
+  obtain ⟨_, hA⟩ := h
+  unfold Matrix.IsHermitian
+  rw [hA]
+  simp [Matrix.mul_assoc]
+
+/-- From a symmetric eigendecomposition, an orthogonal matrix `V` satisfies `V · Vᵀ = 1` as well as
+`Vᵀ · V = 1`. -/
+theorem IsSymEig.mul_transpose_self {A : Matrix (Fin n) (Fin n) ℝ} {Λ V}
+    (h : IsSymEig A Λ V) : V * Vᵀ = 1 :=
+  mul_eq_one_comm.mp h.1
+
+/-! ### The kernel-ridge / `solve_variationnal` identity
+
+CHD repeatedly forms `(K + γ I)⁻¹ b`. Diagonalizing `K = V diag(λ) Vᵀ` turns this into a per-eigenvalue
+rescaling `V diag(1/(λ+γ)) Vᵀ b`, which is the basis of `solve_variationnal`, `find_gamma` and the
+`Z_test`. The identity below is proved purely from orthogonality of `V` (no appeal to Mathlib's own
+spectral decomposition), so it holds for *any* eigendecomposition the algorithm returns. -/
+
+/-- Conjugating a diagonal by an orthogonal `V` is inverted by conjugating the entrywise inverse:
+`(V · diag(d) · Vᵀ) · (V · diag(d⁻¹) · Vᵀ) = 1` when every `d i ≠ 0`. -/
+theorem orthogonal_conj_diagonal_mul_inv {V : Matrix (Fin n) (Fin n) ℝ} (hV : Vᵀ * V = 1)
+    {d : Fin n → ℝ} (hd : ∀ i, d i ≠ 0) :
+    (V * Matrix.diagonal d * Vᵀ) * (V * Matrix.diagonal (fun i => (d i)⁻¹) * Vᵀ) = 1 := by
+  have hdd : (Matrix.diagonal d) * (Matrix.diagonal (fun i => (d i)⁻¹))
+      = (1 : Matrix (Fin n) (Fin n) ℝ) := by
+    rw [Matrix.diagonal_mul_diagonal]
+    rw [show (fun i => d i * (d i)⁻¹) = (fun _ : Fin n => (1 : ℝ)) from
+      funext fun i => mul_inv_cancel₀ (hd i)]
+    exact Matrix.diagonal_one
+  calc
+    (V * Matrix.diagonal d * Vᵀ) * (V * Matrix.diagonal (fun i => (d i)⁻¹) * Vᵀ)
+        = V * Matrix.diagonal d * (Vᵀ * V) * Matrix.diagonal (fun i => (d i)⁻¹) * Vᵀ := by
+          simp [Matrix.mul_assoc]
+    _ = V * (Matrix.diagonal d * Matrix.diagonal (fun i => (d i)⁻¹)) * Vᵀ := by
+          rw [hV]; simp [Matrix.mul_assoc]
+    _ = V * Vᵀ := by rw [hdd, Matrix.mul_one]
+    _ = 1 := mul_eq_one_comm.mp hV
+
+/-- `K + γ I` rewritten through the eigendecomposition: `V · diag(λ + γ) · Vᵀ`. -/
+theorem IsSymEig.add_smul_eq {A : Matrix (Fin n) (Fin n) ℝ} {Λ V}
+    (h : IsSymEig A Λ V) (γ : ℝ) :
+    A + γ • (1 : Matrix (Fin n) (Fin n) ℝ)
+      = V * Matrix.diagonal (fun i => Λ i + γ) * Vᵀ := by
+  obtain ⟨hV, hA⟩ := h
+  have hVV : V * Vᵀ = 1 := mul_eq_one_comm.mp hV
+  have hsplit : Matrix.diagonal (fun i => Λ i + γ)
+      = Matrix.diagonal Λ + γ • (1 : Matrix (Fin n) (Fin n) ℝ) := by
+    ext i j
+    by_cases hij : i = j <;>
+      simp [Matrix.add_apply, Matrix.smul_apply, hij]
+  rw [hsplit, hA]
+  rw [Matrix.mul_add, Matrix.add_mul]
+  congr 1
+  rw [Matrix.mul_smul, Matrix.smul_mul, Matrix.mul_one, hVV]
+
+/-- **Regularized inverse / `solve_variationnal`.** For `γ` avoiding `-λᵢ`, the regularized system
+`K + γ I` is inverted by per-eigenvalue rescaling: `(K + γ I)⁻¹ = V · diag(1/(λ + γ)) · Vᵀ`. -/
+theorem IsSymEig.add_smul_inv {A : Matrix (Fin n) (Fin n) ℝ} {Λ V}
+    (h : IsSymEig A Λ V) (γ : ℝ) (hγ : ∀ i, Λ i + γ ≠ 0) :
+    (A + γ • (1 : Matrix (Fin n) (Fin n) ℝ))⁻¹
+      = V * Matrix.diagonal (fun i => (Λ i + γ)⁻¹) * Vᵀ := by
+  apply Matrix.inv_eq_right_inv
+  rw [h.add_smul_eq γ]
+  exact orthogonal_conj_diagonal_mul_inv h.1 hγ
+
+/-! ### Spectral trace and determinant (used by `find_gamma` / model-evidence terms) -/
+
+/-- `trace K = Σ λᵢ`. -/
+theorem IsSymEig.trace_eq {A : Matrix (Fin n) (Fin n) ℝ} {Λ V}
+    (h : IsSymEig A Λ V) : A.trace = ∑ i, Λ i := by
+  obtain ⟨hV, hA⟩ := h
+  rw [hA, Matrix.trace_mul_comm, ← Matrix.mul_assoc, hV, Matrix.one_mul,
+    Matrix.trace_diagonal]
+
+/-- `det K = Π λᵢ`. -/
+theorem IsSymEig.det_eq {A : Matrix (Fin n) (Fin n) ℝ} {Λ V}
+    (h : IsSymEig A Λ V) : A.det = ∏ i, Λ i := by
+  obtain ⟨hV, hA⟩ := h
+  have hVV : V * Vᵀ = 1 := mul_eq_one_comm.mp hV
+  rw [hA, Matrix.det_mul, Matrix.det_mul, Matrix.det_diagonal,
+    mul_right_comm, ← Matrix.det_mul, hVV, Matrix.det_one, one_mul]
+
+/-! ### SVD ⟹ eigendecomposition of the Gram matrix
+
+CHD forms the kernel/Gram matrix `K = Aᵀ A` and eigendecomposes it. An SVD of `A` *is* such an
+eigendecomposition, with eigenvalues `σᵢ²` and the same orthogonal `V`. -/
+
+/-- The right singular vectors `V` of `A` diagonalize the Gram matrix `Aᵀ A`, with eigenvalues `σᵢ²`. -/
+theorem IsSVD.gram_isSymEig {m k : Nat} {A U : Matrix (Fin m) (Fin k) ℝ}
+    {σ : Fin k → ℝ} {V} (h : IsSVD A U σ V) :
+    IsSymEig (Aᵀ * A) (fun i => σ i ^ 2) V := by
+  obtain ⟨hU, hV, _, hA⟩ := h
+  refine ⟨hV, ?_⟩
+  have hσσ : Matrix.diagonal σ * Matrix.diagonal σ
+      = Matrix.diagonal (fun i => σ i ^ 2) := by
+    rw [Matrix.diagonal_mul_diagonal]; simp [pow_two]
+  rw [hA, Matrix.transpose_mul, Matrix.transpose_mul, Matrix.transpose_transpose,
+    Matrix.diagonal_transpose]
+  -- V Dᵀ Uᵀ · U D Vᵀ  with Dᵀ = D
+  calc
+    V * (Matrix.diagonal σ * Uᵀ) * (U * Matrix.diagonal σ * Vᵀ)
+        = V * Matrix.diagonal σ * (Uᵀ * U) * Matrix.diagonal σ * Vᵀ := by
+          simp [Matrix.mul_assoc]
+    _ = V * (Matrix.diagonal σ * Matrix.diagonal σ) * Vᵀ := by
+          rw [hU]; simp [Matrix.mul_assoc]
+    _ = V * Matrix.diagonal (fun i => σ i ^ 2) * Vᵀ := by rw [hσσ]
+
+/-! ## Tier B — exact structural & invariant facts
+
+These hold *exactly* (no convergence/rounding caveat). The orthogonal-similarity invariants below are
+the precise sense in which the Jacobi iteration is faithful: every sweep is an orthogonal similarity
+`A ← Jᵀ A J`, so trace, determinant and spectrum are preserved at every step, independent of how far
+the off-diagonal has been driven down. -/
+
+/-- Orthogonal similarity preserves the trace: `trace (V · M · Vᵀ) = trace M` when `Vᵀ · V = 1`. -/
+theorem trace_orthogonal_conj {V M : Matrix (Fin n) (Fin n) ℝ} (hV : Vᵀ * V = 1) :
+    (V * M * Vᵀ).trace = M.trace := by
+  rw [Matrix.trace_mul_comm, ← Matrix.mul_assoc, hV, Matrix.one_mul]
+
+/-- Orthogonal similarity preserves the determinant: `det (V · M · Vᵀ) = det M` when `Vᵀ · V = 1`. -/
+theorem det_orthogonal_conj {V M : Matrix (Fin n) (Fin n) ℝ} (hV : Vᵀ * V = 1) :
+    (V * M * Vᵀ).det = M.det := by
+  have hVV : V * Vᵀ = 1 := mul_eq_one_comm.mp hV
+  rw [Matrix.det_mul, Matrix.det_mul, mul_right_comm, ← Matrix.det_mul, hVV, Matrix.det_one,
+    one_mul]
+
+/-- **Givens rotation is orthogonal.** With `c = 1/√(1+t²)` and `s = t·c` (the parameters
+`arrJacobiRotate` uses), the rotation satisfies `c² + s² = 1`, so every Jacobi step is an orthogonal
+transformation. -/
+theorem givens_normSq (t : ℝ) :
+    (1 / Real.sqrt (1 + t ^ 2)) ^ 2 + (t * (1 / Real.sqrt (1 + t ^ 2))) ^ 2 = 1 := by
+  have hpos : (0 : ℝ) < 1 + t ^ 2 := by positivity
+  have hsqrt : Real.sqrt (1 + t ^ 2) ^ 2 = 1 + t ^ 2 := Real.sq_sqrt hpos.le
+  have hne : (1 + t ^ 2) ≠ 0 := ne_of_gt hpos
+  have hc2 : (1 / Real.sqrt (1 + t ^ 2)) ^ 2 = 1 / (1 + t ^ 2) := by
+    rw [div_pow, one_pow, hsqrt]
+  rw [mul_pow, hc2]
+  field_simp
+
+/-! ### Fold-indexing for the column-building specs
+
+`choleskyColsFn` and `gramSchmidtFn` build their output with a left fold that appends one column per
+index. The lemmas here read off the column produced at a given position, bridging the executable
+`List.foldl` form to per-entry reasoning. They are generic over the appended-value function `g`. -/
+
+section FoldSnoc
+
+variable {β : Type _} {ι : Type _}
+
+/-- A left fold that appends one element per input grows the accumulator by `l.length`. -/
+private theorem length_foldl_snoc (g : List β → ι → β) (l : List ι) (acc : List β) :
+    (l.foldl (fun s a => s ++ [g s a]) acc).length = acc.length + l.length := by
+  induction l generalizing acc with
+  | nil => simp
+  | cons a t ih =>
+      rw [List.foldl_cons, ih]
+      simp only [List.length_append, List.length_cons, List.length_nil]
+      omega
+
+/-- A fold that only appends never changes an index already inside the accumulator. -/
+private theorem getD_foldl_snoc_lt (g : List β → ι → β) (d : β) (l : List ι) (acc : List β)
+    (k : Nat) (hk : k < acc.length) :
+    (l.foldl (fun s a => s ++ [g s a]) acc).getD k d = acc.getD k d := by
+  induction l generalizing acc with
+  | nil => simp
+  | cons a t ih =>
+      rw [List.foldl_cons,
+        ih (acc ++ [g acc a]) (by rw [List.length_append]; omega),
+        List.getD_append _ _ _ _ hk]
+
+/-- The element at position `j` of the snoc-fold over `finRange n` is `g` applied to the fold of the
+length-`j` prefix and the index `j`. -/
+private theorem getD_foldl_finRange (g : List β → Fin n → β) (d : β) (j : Fin n) :
+    ((List.finRange n).foldl (fun s a => s ++ [g s a]) []).getD j.val d
+      = g (((List.finRange n).take j.val).foldl (fun s a => s ++ [g s a]) []) j := by
+  have hjlen : j.val < (List.finRange n).length := by
+    rw [List.length_finRange]; exact j.isLt
+  have htake : (List.finRange n).take (j.val + 1)
+      = (List.finRange n).take j.val ++ [j] := by
+    rw [List.take_succ_eq_append_getElem hjlen]
+    congr 1
+    simp [List.getElem_finRange]
+  have hplen : (((List.finRange n).take j.val).foldl (fun s a => s ++ [g s a]) []).length
+      = j.val := by
+    rw [length_foldl_snoc, List.length_nil, List.length_take, List.length_finRange, Nat.zero_add,
+      Nat.min_eq_left (Nat.le_of_lt j.isLt)]
+  calc
+    ((List.finRange n).foldl (fun s a => s ++ [g s a]) []).getD j.val d
+        = (((List.finRange n).drop (j.val + 1)).foldl (fun s a => s ++ [g s a])
+            ((List.finRange n).take (j.val + 1) |>.foldl (fun s a => s ++ [g s a]) [])).getD
+              j.val d := by
+          conv_lhs => rw [show List.finRange n
+            = (List.finRange n).take (j.val + 1) ++ (List.finRange n).drop (j.val + 1) from
+            (List.take_append_drop _ _).symm]
+          rw [List.foldl_append]
+    _ = ((List.finRange n).take (j.val + 1) |>.foldl (fun s a => s ++ [g s a]) []).getD j.val d := by
+          apply getD_foldl_snoc_lt
+          rw [length_foldl_snoc, List.length_nil, List.length_take, List.length_finRange,
+            Nat.zero_add]
+          omega
+    _ = g (((List.finRange n).take j.val).foldl (fun s a => s ++ [g s a]) []) j := by
+          rw [htake, List.foldl_append, List.foldl_cons, List.foldl_nil]
+          rw [List.getD_append_right _ _ _ _ (le_of_eq hplen), hplen, Nat.sub_self]
+          rfl
+
+end FoldSnoc
+
+/-! ### Cholesky factor is lower-triangular
+
+A structural fact about the executable `choleskyFn`, proved directly from the column fold: the entry
+above the diagonal is forced to `0` by the construction. -/
+
+/-- Reading an entry of a matrix tensor built by `ofMatFn` returns the underlying function value. -/
+theorem get2_ofMatFn {m k : Nat} (f : Fin m → Fin k → ℝ) (i : Fin m) (j : Fin k) :
+    Spec.get2 (Spec.ofMatFn f) i j = f i j := rfl
+
+/-- The executable Cholesky factor is lower-triangular: entries strictly above the diagonal vanish. -/
+theorem choleskyFn_lower_triangular (A : Fin n → Fin n → ℝ) {i j : Fin n} (hij : i.val < j.val) :
+    Spec.choleskyFn A i j = 0 := by
+  unfold Spec.choleskyFn Spec.choleskyColsFn
+  rw [getD_foldl_finRange]
+  rw [if_pos hij]
+
+/-- Tensor-level statement: the Cholesky factor `choleskySpec A` is lower-triangular. -/
+theorem choleskySpec_lower_triangular (A : Spec.Tensor ℝ (.dim n (.dim n .scalar)))
+    {i j : Fin n} (hij : i.val < j.val) :
+    Spec.get2 (Spec.choleskySpec A) i j = 0 := by
+  rw [show Spec.choleskySpec A = Spec.ofMatFn (Spec.choleskyFn (Spec.toMatFn A)) from rfl,
+    get2_ofMatFn]
+  exact choleskyFn_lower_triangular _ hij
+
+/-! ## Tier D — convergence as an a-posteriori residual certificate
+
+The cyclic Jacobi iteration produces `(Λ, V)` from the rotated matrix `Af = Vᵀ A V` (an *exact*
+orthogonal similarity — see `trace_orthogonal_conj`), with `Λ` the diagonal of `Af`. After finitely
+many sweeps `Af` is only *approximately* diagonal, so `A = V·diag(Λ)·Vᵀ` does not hold exactly (and
+never does in floating point). Mathlib v4.30.0 has no Jacobi convergence theory, so instead of an
+*a-priori* convergence proof we give the *a-posteriori* certificate: the reconstruction residual is
+exactly the orthogonal conjugation of the off-diagonal part of `Af`, hence its Frobenius mass equals
+the off-diagonal mass — which the runtime `assertLt` checks in `NN/Examples/Factorization` bound on
+concrete inputs. -/
+
+/-- The off-diagonal part of a matrix (`0` iff the matrix is diagonal). -/
+def offDiagonal (M : Matrix (Fin n) (Fin n) ℝ) : Matrix (Fin n) (Fin n) ℝ :=
+  M - Matrix.diagonal (fun i => M i i)
+
+/-- **Exact residual identity.** Reconstructing with the diagonal of `Af` leaves exactly the orthogonal
+conjugation of `Af`'s off-diagonal part: `A − V·diag(Af)·Vᵀ = V · offDiag(Af) · Vᵀ`. -/
+theorem symEig_reconstruction_residual {A V Af : Matrix (Fin n) (Fin n) ℝ}
+    (hA : A = V * Af * Vᵀ) :
+    A - V * Matrix.diagonal (fun i => Af i i) * Vᵀ = V * offDiagonal Af * Vᵀ := by
+  rw [hA, offDiagonal, Matrix.mul_sub, Matrix.sub_mul]
+
+/-- **Frobenius residual certificate.** The squared Frobenius reconstruction error
+`‖A − V·diag(Af)·Vᵀ‖²` equals the squared Frobenius off-diagonal mass `‖offDiag(Af)‖²` (expressed as
+`trace(Rᵀ R)`), because orthogonal conjugation preserves the Frobenius norm. In particular it is `0`
+iff `Af` is diagonal — the exact sense in which "more Jacobi sweeps ⟹ smaller residual". -/
+theorem symEig_frobenius_residual {A V Af : Matrix (Fin n) (Fin n) ℝ} (hV : Vᵀ * V = 1)
+    (hA : A = V * Af * Vᵀ) :
+    ((A - V * Matrix.diagonal (fun i => Af i i) * Vᵀ)ᵀ
+        * (A - V * Matrix.diagonal (fun i => Af i i) * Vᵀ)).trace
+      = ((offDiagonal Af)ᵀ * offDiagonal Af).trace := by
+  rw [symEig_reconstruction_residual hA]
+  have hB : (V * offDiagonal Af * Vᵀ)ᵀ = V * (offDiagonal Af)ᵀ * Vᵀ := by
+    rw [Matrix.transpose_mul, Matrix.transpose_mul, Matrix.transpose_transpose, Matrix.mul_assoc]
+  have key : (V * offDiagonal Af * Vᵀ)ᵀ * (V * offDiagonal Af * Vᵀ)
+      = V * ((offDiagonal Af)ᵀ * offDiagonal Af) * Vᵀ := by
+    rw [hB]
+    calc
+      (V * (offDiagonal Af)ᵀ * Vᵀ) * (V * offDiagonal Af * Vᵀ)
+          = V * (offDiagonal Af)ᵀ * (Vᵀ * V) * offDiagonal Af * Vᵀ := by simp [Matrix.mul_assoc]
+      _ = V * ((offDiagonal Af)ᵀ * offDiagonal Af) * Vᵀ := by rw [hV]; simp [Matrix.mul_assoc]
+  rw [key]
+  exact trace_orthogonal_conj hV
+
+/-- **Conditional correctness of Jacobi.** When the rotated matrix `Af = Vᵀ A V` is diagonal (zero
+residual — the limit the sweeps drive toward), the Jacobi output `(diag Af, V)` is an *exact*
+symmetric eigendecomposition `IsSymEig`. Together with `symEig_frobenius_residual` this is the precise
+correctness statement: orthogonality and the orthogonal-similarity hold always; full diagonalization
+holds exactly in the zero-residual limit. -/
+theorem isSymEig_of_diagonal {A V Af : Matrix (Fin n) (Fin n) ℝ} (hV : Vᵀ * V = 1)
+    (hA : A = V * Af * Vᵀ) (hdiag : Af = Matrix.diagonal (fun i => Af i i)) :
+    IsSymEig A (fun i => Af i i) V :=
+  ⟨hV, by rw [hA]; conv_lhs => rw [hdiag]⟩
+
+end Spec.Factorization
diff --git a/blueprint/TorchLeanBlueprint/Guide.lean b/blueprint/TorchLeanBlueprint/Guide.lean
index 59903a3..b97833d 100644
--- a/blueprint/TorchLeanBlueprint/Guide.lean
+++ b/blueprint/TorchLeanBlueprint/Guide.lean
@@ -34,6 +34,7 @@ import TorchLeanBlueprint.Guide.Ch4_Verification.ApproximationTheory
 import TorchLeanBlueprint.Guide.Ch4_Verification.ClassicalMLProofs
 import TorchLeanBlueprint.Guide.Ch4_Verification.ProbabilityAndGradients
 import TorchLeanBlueprint.Guide.Ch4_Verification.ScientificMLVerification
+import TorchLeanBlueprint.Guide.Ch4_Verification.Factorizations
 import TorchLeanBlueprint.Guide.Ch4_Verification.Certificates
 import TorchLeanBlueprint.Guide.Ch4_Verification.FP32Soundness
 import TorchLeanBlueprint.Guide.Ch4_Verification.TwoStageWorkflows
@@ -233,6 +234,8 @@ into precise mathematical statements.
 
 {include 2 TorchLeanBlueprint.Guide.Ch4_Verification.ScientificMLVerification}
 
+{include 2 TorchLeanBlueprint.Guide.Ch4_Verification.Factorizations}
+
 {include 2 TorchLeanBlueprint.Guide.Ch4_Verification.Certificates}
 
 {include 2 TorchLeanBlueprint.Guide.Ch4_Verification.TwoStageWorkflows}
diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
new file mode 100644
index 0000000..13f8df3
--- /dev/null
+++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
@@ -0,0 +1,115 @@
+import VersoManual
+import VersoBlueprint
+
+open Verso.Genre Manual
+
+#doc (Manual) "Matrix Factorizations for Kernel Methods" =>
+%%%
+tag := "matrix-factorizations"
+%%%
+
+Kernel and Gaussian-process methods do not reduce to a single forward pass. Their numerical core is a
+matrix factorization. The motivating target here is
+[Computational Hypergraph Discovery](https://github.com/TheoBourdais/ComputationalHypergraphDiscovery)
+(CHD): a Gaussian-process / kernel-ridge method that recovers the dependency structure of a system by
+repeatedly solving regularized kernel systems and testing the resulting variances. Every quantity CHD
+inspects — the variational solution, the noise/ridge parameter, and the `Z`-test — is a function of the
+*full symmetric eigendecomposition* of a kernel matrix `K`.
+
+TorchLean previously had only a power-iteration stub that recovers the *largest* eigenpair. The spec
+layer now provides real, shape-indexed reference factorizations in
+[`NN.Spec.Core.Tensor.Factorizations`](https://github.com/lean-dojo/TorchLean/blob/main/NN/Spec/Core/Tensor/Factorizations.lean):
+Cholesky (`choleskySpec`), QR via modified Gram–Schmidt (`qrSpec`), the full symmetric
+eigendecomposition via cyclic Jacobi (`symEigJacobiSpec`), and the SVD (`svdSpec`). The correctness
+theorems live in
+[`NN.Proofs.Tensor.Basic.Factorizations`](https://github.com/lean-dojo/TorchLean/blob/main/NN/Proofs/Tensor/Basic/Factorizations.lean).
+
+# What "verified factorization" can and cannot mean
+
+A subtle but decisive point governs the whole chapter. The executable specs are
+`Context`-polymorphic and run over Lean's native `Float` (IEEE binary64). Two of them — Cholesky and
+QR — are *finite* constructions, so over the reals they reconstruct their input exactly under the
+usual success hypotheses. The other two — the cyclic Jacobi eigensolver and the SVD built on it — are
+*iterative*. After a finite number of sweeps the rotated matrix is only approximately diagonal, and in
+floating point it is never exactly diagonal. Mathlib v4.30.0 contains no Jacobi convergence theory.
+
+So `A = V · diag(λ) · Vᵀ` is _not_ an a-priori theorem about the floating-point output. The honest
+verification therefore splits into three kinds of statement, all proved over `ℝ`:
+
+- *Specification consequences*: facts CHD consumes, proved from a predicate that says "these matrices
+  form an eigendecomposition", independent of any algorithm.
+- *Exact invariants*: properties the algorithm satisfies on the nose at every step.
+- *A-posteriori certificate*: an exact identity bounding the reconstruction residual by the
+  off-diagonal mass, with the runtime `assertLt` checks supplying the numeric bound on concrete inputs.
+
+# Specification consequences (the CHD foundation)
+
+The specification predicate is `IsSymEig A Λ V`: an orthogonal `V` (`Vᵀ V = 1`) with
+`A = V · diag(Λ) · Vᵀ`. From it the kernel-method facts follow without reference to the solver.
+
+The central one is the regularized inverse behind `solve_variationnal`. CHD repeatedly forms
+`(K + γ I)⁻¹ b`; diagonalizing turns this into a per-eigenvalue rescaling:
+
+$$`(K+\gamma I)^{-1} = V\,\operatorname{diag}\!\left(\tfrac{1}{\lambda_i+\gamma}\right) V^\top,
+\qquad \gamma \neq -\lambda_i.`
+
+This is `IsSymEig.add_smul_inv`, proved purely from orthogonality of `V` (so it holds for *any*
+eigendecomposition the solver returns, not only Mathlib's canonical one). The supporting rewrite
+`IsSymEig.add_smul_eq` expresses `K + γI = V · diag(λ + γ) · Vᵀ`, and
+`orthogonal_conj_diagonal_mul_inv` is the reusable fact that conjugating a diagonal by an orthogonal
+matrix is inverted by conjugating the entrywise inverse.
+
+The scalar summaries used by `find_gamma` and the evidence terms are `IsSymEig.trace_eq`
+(`trace K = Σ λᵢ`) and `IsSymEig.det_eq` (`det K = Π λᵢ`). Symmetry itself is `IsSymEig.isHermitian`.
+
+CHD actually builds the Gram matrix `K = Aᵀ A`. `IsSVD.gram_isSymEig` records that an SVD of `A` is
+exactly an eigendecomposition of that Gram matrix, with eigenvalues `σᵢ²` and the same orthogonal `V` —
+connecting the SVD spec to the eigendecomposition foundation.
+
+# Exact invariants of the algorithms
+
+Some properties hold exactly, with no convergence or rounding caveat, and these pin down the precise
+sense in which the iterative solver is faithful.
+
+The cyclic Jacobi iteration applies Givens rotations `J` with `A ← Jᵀ A J` and `V ← V J`. Each `J` is
+orthogonal: with `c = 1/\sqrt{1+t^2}` and `s = t c` (the parameters the implementation uses),
+`givens_normSq` proves `c² + s² = 1`. Consequently every sweep is an *orthogonal similarity*, and
+`trace_orthogonal_conj` and `det_orthogonal_conj` show that the trace and determinant of the running
+matrix equal those of the original at every step — the spectrum is preserved exactly, however far the
+off-diagonal has been driven down.
+
+For the finite Cholesky construction, `choleskyFn_lower_triangular` (and its tensor-level form
+`choleskySpec_lower_triangular`) proves the factor is lower-triangular: entries above the diagonal
+vanish by construction. The proof reads the column produced at each position out of the `List.foldl`
+that builds the factor, via the reusable indexing lemma `getD_foldl_finRange`.
+
+# The a-posteriori residual certificate
+
+For the iterative routines, the replacement for an impossible a-priori convergence proof is an exact
+residual identity. Writing `Af = Vᵀ A V` for the rotated matrix and `Λ` for its diagonal,
+`symEig_reconstruction_residual` shows
+
+$$`A - V\,\operatorname{diag}(A_f)\,V^\top \;=\; V\,\operatorname{offDiag}(A_f)\,V^\top,`
+
+so the reconstruction error is exactly the orthogonal conjugation of the off-diagonal part of `Af`.
+Because orthogonal conjugation preserves the Frobenius norm, `symEig_frobenius_residual` upgrades this
+to an equality of squared Frobenius masses:
+
+$$`\bigl\|A - V\,\operatorname{diag}(A_f)\,V^\top\bigr\|_F^2
+   \;=\; \bigl\|\operatorname{offDiag}(A_f)\bigr\|_F^2,`
+
+expressed in Lean as an equality of `trace(Rᵀ R)` terms. The residual is `0` exactly when `Af` is
+diagonal, which is the precise meaning of "more Jacobi sweeps shrink the error". And in that
+zero-residual limit, `isSymEig_of_diagonal` shows the solver output `(diag Af, V)` is an exact
+`IsSymEig` decomposition. The numeric `assertLt` reconstruction checks in
+`NN/Examples/Factorization` are concrete instances of this certificate: they bound the off-diagonal
+mass on specific matrices.
+
+# What remains
+
+The exact algebraic reconstruction of the *finite* executable factorizations — `A = L · Lᵀ` for the
+Cholesky column fold under positive pivots, and `A = Q · R` with `Qᵀ Q = 1` for Gram–Schmidt under
+full column rank — is the natural next increment. It needs an induction relating the `List.foldl`
+prefix at step `j` to the first `j` produced columns (a strengthening of `getD_foldl_finRange`)
+together with the per-pivot positivity discharge from `Matrix.PosDef`. The specification-level facts
+the kernel methods rely on are independent of that step, so the CHD foundation is already in place.
diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/ScientificMLVerification.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/ScientificMLVerification.lean
index 4f8a12a..e9a497a 100644
--- a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/ScientificMLVerification.lean
+++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/ScientificMLVerification.lean
@@ -146,3 +146,9 @@ models. These sections give readers a clear place to start when their question i
 
 The answer is the same pattern we use elsewhere: small certificate formats, explicit parsers,
 checked predicates, and theorem statements that say exactly which mathematical claim follows.
+
+Kernel and Gaussian-process methods bring their own version of this discipline through matrix
+factorizations rather than corridors or residuals. The next section, Matrix Factorizations for Kernel
+Methods, develops the eigendecomposition foundation that Computational Hypergraph Discovery relies on,
+including the same split between exact specification consequences and a-posteriori numeric
+certificates.

From e9d851ffc36ca849ceb87e6d99b39db53f08531d Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Sat, 30 May 2026 15:50:44 -0700
Subject: [PATCH 03/22] =?UTF-8?q?Add=20exact=20Cholesky=20reconstruction?=
 =?UTF-8?q?=20A=20=3D=20L=C2=B7L=E1=B5=80=20(finite-fold=20increment)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Prove the exact algebraic reconstruction of the finite executable Cholesky
factorization over ℝ, the increment documented in Basic/Factorizations.lean.

`isCholesky_of_pos`: for symmetric A with positive executable pivots
(0 < choleskyFn A j j — the success condition over ℝ), L = choleskyFn A is a
genuine Cholesky factor (lower-triangular, A = L·Lᵀ), satisfying the
`IsCholesky` spec. Tensor-level corollary `choleskySpec_reconstruction`.

Method: a general snoc-fold read lemma (`getD_foldl_snoc_read`) reads the j-th
built column as the step function on the length-j prefix; `prefix_eq_map`
identifies that prefix with the first j columns of L; `take_map_sum_eq` rewrites
the code's List.foldl sums as masked Finset partial sums. Positive pivots
discharge the √-radicand and divisor side conditions; symmetry of A lifts the
lower-triangular reconstruction to the full matrix.

Blueprint: new "Exact Cholesky reconstruction" section; "What remains" narrowed
to QR (dual-list GSState structure-fold + the orthonormality invariant).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 NN/Proofs/Tensor/Basic.lean                   |   1 +
 .../Basic/FactorizationsReconstruction.lean   | 377 ++++++++++++++++++
 .../Ch4_Verification/Factorizations.lean      |  38 +-
 3 files changed, 410 insertions(+), 6 deletions(-)
 create mode 100644 NN/Proofs/Tensor/Basic/FactorizationsReconstruction.lean

diff --git a/NN/Proofs/Tensor/Basic.lean b/NN/Proofs/Tensor/Basic.lean
index c079721..9e44d53 100644
--- a/NN/Proofs/Tensor/Basic.lean
+++ b/NN/Proofs/Tensor/Basic.lean
@@ -10,6 +10,7 @@ public import NN.Proofs.Tensor.Basic.Core
 public import NN.Proofs.Tensor.Basic.Folds
 public import NN.Proofs.Tensor.Basic.LinearAlgebra
 public import NN.Proofs.Tensor.Basic.Factorizations
+public import NN.Proofs.Tensor.Basic.FactorizationsReconstruction
 public import NN.Proofs.Tensor.Basic.BoundsNorms
 public import NN.Proofs.Tensor.Basic.Algebra
 
diff --git a/NN/Proofs/Tensor/Basic/FactorizationsReconstruction.lean b/NN/Proofs/Tensor/Basic/FactorizationsReconstruction.lean
new file mode 100644
index 0000000..386d77c
--- /dev/null
+++ b/NN/Proofs/Tensor/Basic/FactorizationsReconstruction.lean
@@ -0,0 +1,377 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Spec.Core.Tensor.Factorizations
+public import NN.Proofs.Tensor.Basic.Factorizations
+public import Mathlib.Data.List.GetD
+public import Mathlib.Algebra.BigOperators.Fin
+
+/-!
+# Exact reconstruction of the finite Cholesky factorization
+
+This file proves the *exact* algebraic reconstruction of the finite executable Cholesky
+factorization from [`NN.Spec.Core.Tensor.Factorizations`](../../../Spec/Core/Tensor/Factorizations.lean),
+the increment promised in `NN.Proofs.Tensor.Basic.Factorizations`. Unlike the iterative Jacobi/SVD
+routines (whose reconstruction is only an a-posteriori residual certificate), Cholesky is a *finite*
+construction, so over `ℝ` it reconstructs its input on the nose under the success hypothesis.
+
+## Main result
+
+`isCholesky_of_pos`: for a symmetric `A : Fin n → Fin n → ℝ` whose executable Cholesky pivots are all
+positive (`0 < choleskyFn A j j`, the exact condition under which the algorithm succeeds over `ℝ`),
+the factor `L = choleskyFn A` satisfies the specification `Spec.Factorization.IsCholesky`:
+it is lower-triangular and `A = L · Lᵀ`. `choleskySpec_reconstruction` is the tensor-level corollary.
+
+## Method
+
+The executable factor is built by a `List.foldl` that snocs one column per index. The core technical
+device is `getD_foldl_snoc_read`, a general lemma reading the `j`-th element of such a fold as the
+step function applied to the length-`j` prefix. From it, `prefix_eq_map` identifies the prefix of
+columns with the first `j` columns of the final factor `L`, and `take_map_sum_eq` turns the code's
+`List.foldl` sums into masked `Finset` partial sums. The positive-pivot hypothesis discharges the two
+side conditions (`√` radicand `> 0` for the diagonal, divisor `≠ 0` for the below-diagonal entries).
+
+## Scope
+
+The QR factorization's exact reconstruction (`A = Q · R` from `gramSchmidtFn`, plus the orthonormality
+`Qᵀ Q = 1`) is the remaining finite-fold increment. It needs analogous read lemmas for the
+`GSState` *dual-list* structure-fold (the step writes both `qs` and `rcols`), and `Qᵀ Q = 1`
+additionally requires the Gram–Schmidt orthogonality invariant, which Mathlib only provides for its
+own `gramSchmidt`, not for this executable variant.
+-/
+
+@[expose] public section
+
+namespace Spec.Factorization.Reconstruction
+
+open Matrix
+open scoped BigOperators
+
+variable {n : Nat}
+
+/-! ## List/Finset bridges -/
+
+/-- A left `+`-fold accumulates the list sum. -/
+theorem foldl_add_eq_sum (l : List ℝ) (a : ℝ) :
+    l.foldl (· + ·) a = a + l.sum := by
+  induction l generalizing a with
+  | nil => simp
+  | cons x t ih => rw [List.foldl_cons, ih, List.sum_cons]; ring
+
+/-- A left `s + x*x`-fold accumulates the sum of squares. -/
+theorem foldl_addsq_eq_sum (l : List ℝ) (a : ℝ) :
+    l.foldl (fun s x => s + x * x) a = a + (l.map (fun x => x * x)).sum := by
+  induction l generalizing a with
+  | nil => simp
+  | cons x t ih => rw [List.foldl_cons, ih, List.map_cons, List.sum_cons]; ring
+
+/-- A `Fin n` sum is the foldl-sum over `finRange n`. -/
+theorem finsum_eq_finRange_sum (h : Fin n → ℝ) :
+    ∑ i, h i = ((List.finRange n).map h).sum := by
+  rw [← List.sum_toFinset _ (List.nodup_finRange n)]
+  · simp [List.toFinset_finRange]
+
+/-! ## General snoc-fold read lemmas -/
+
+section FoldSnoc
+
+variable {β : Type _} {ι : Type _}
+
+/-- A left fold that appends one element per input grows the accumulator by `l.length`. -/
+theorem length_foldl_snoc (g : List β → ι → β) (l : List ι) (acc : List β) :
+    (l.foldl (fun s a => s ++ [g s a]) acc).length = acc.length + l.length := by
+  induction l generalizing acc with
+  | nil => simp
+  | cons a t ih =>
+      rw [List.foldl_cons, ih]
+      simp only [List.length_append, List.length_cons, List.length_nil]
+      omega
+
+/-- A fold that only appends never changes an index already inside the accumulator. -/
+theorem getD_foldl_snoc_lt (g : List β → ι → β) (d : β) (l : List ι) (acc : List β)
+    (k : Nat) (hk : k < acc.length) :
+    (l.foldl (fun s a => s ++ [g s a]) acc).getD k d = acc.getD k d := by
+  induction l generalizing acc with
+  | nil => simp
+  | cons a t ih =>
+      rw [List.foldl_cons,
+        ih (acc ++ [g acc a]) (by rw [List.length_append]; omega),
+        List.getD_append _ _ _ _ hk]
+
+/-- The element at position `k` of the snoc-fold over an arbitrary list `l` is `g` applied to the
+fold of the length-`k` prefix and the `k`-th element. -/
+theorem getD_foldl_snoc_read (g : List β → ι → β) (d : β) (l : List ι) (k : Nat)
+    (hk : k < l.length) :
+    (l.foldl (fun s a => s ++ [g s a]) []).getD k d
+      = g ((l.take k).foldl (fun s a => s ++ [g s a]) []) (l[k]'hk) := by
+  have htake : l.take (k + 1) = l.take k ++ [l[k]'hk] := List.take_succ_eq_append_getElem hk
+  have hplen : ((l.take k).foldl (fun s a => s ++ [g s a]) []).length = k := by
+    rw [length_foldl_snoc, List.length_nil, List.length_take, Nat.zero_add,
+      Nat.min_eq_left (le_of_lt hk)]
+  calc
+    (l.foldl (fun s a => s ++ [g s a]) []).getD k d
+        = ((l.drop (k + 1)).foldl (fun s a => s ++ [g s a])
+            ((l.take (k + 1)).foldl (fun s a => s ++ [g s a]) [])).getD k d := by
+          conv_lhs => rw [show l = l.take (k + 1) ++ l.drop (k + 1) from
+            (List.take_append_drop _ _).symm]
+          rw [List.foldl_append]
+    _ = ((l.take (k + 1)).foldl (fun s a => s ++ [g s a]) []).getD k d := by
+          apply getD_foldl_snoc_lt
+          rw [length_foldl_snoc, List.length_nil, List.length_take, Nat.zero_add]
+          omega
+    _ = g ((l.take k).foldl (fun s a => s ++ [g s a]) []) (l[k]'hk) := by
+          rw [htake, List.foldl_append, List.foldl_cons, List.foldl_nil]
+          rw [List.getD_append_right _ _ _ _ (le_of_eq hplen), hplen, Nat.sub_self]
+          rfl
+
+end FoldSnoc
+
+/-! ## Cholesky: the column-building step
+
+`choleskyColsFn` is a left fold that snocs one column per index. `cholStep` names the function it
+appends, so that the read lemmas above can be specialized to it. -/
+
+/-- The column appended at index `j` of the Cholesky fold, given the columns `cols` built so far. -/
+noncomputable def cholStep (A : Fin n → Fin n → ℝ) (cols : List (Fin n → ℝ)) (j : Fin n) :
+    Fin n → ℝ :=
+  let sumsq := (cols.map (fun ck => ck j)).foldl (fun s x => s + x * x) 0
+  let Ljj := MathFunctions.sqrt (A j j - sumsq)
+  fun i =>
+    if i.val < j.val then 0
+    else if i.val == j.val then Ljj
+    else
+      let s := (cols.map (fun ck => ck i * ck j)).foldl (fun acc x => acc + x) 0
+      (A i j - s) / Ljj
+
+/-- `choleskyColsFn` is the snoc-fold appending `cholStep`. -/
+theorem choleskyColsFn_eq (A : Fin n → Fin n → ℝ) :
+    Spec.choleskyColsFn A
+      = (List.finRange n).foldl (fun cols j => cols ++ [cholStep A cols j]) [] := rfl
+
+/-- The diagonal value produced by `cholStep`. -/
+theorem cholStep_diag (A : Fin n → Fin n → ℝ) (cols : List (Fin n → ℝ)) (j : Fin n) :
+    cholStep A cols j j
+      = MathFunctions.sqrt (A j j - (cols.map (fun ck => ck j)).foldl (fun s x => s + x * x) 0) := by
+  simp only [cholStep]
+  rw [if_neg (lt_irrefl _), if_pos (beq_self_eq_true _)]
+
+/-- The below-diagonal value produced by `cholStep`. -/
+theorem cholStep_offdiag (A : Fin n → Fin n → ℝ) (cols : List (Fin n → ℝ)) {i j : Fin n}
+    (hij : j.val < i.val) :
+    cholStep A cols j i
+      = (A i j - (cols.map (fun ck => ck i * ck j)).foldl (fun acc x => acc + x) 0)
+          / MathFunctions.sqrt (A j j - (cols.map (fun ck => ck j)).foldl (fun s x => s + x * x) 0) := by
+  simp only [cholStep]
+  rw [if_neg (by omega), if_neg (by rw [beq_iff_eq]; omega)]
+
+/-- The length-`j` prefix of Cholesky columns built before index `j`. -/
+noncomputable def prefixCols (A : Fin n → Fin n → ℝ) (j : Fin n) : List (Fin n → ℝ) :=
+  ((List.finRange n).take j.val).foldl (fun cols k => cols ++ [cholStep A cols k]) []
+
+/-- Entry `(i, j)` of the executable Cholesky factor equals `cholStep` evaluated on the prefix. -/
+theorem choleskyFn_eq_step (A : Fin n → Fin n → ℝ) (i j : Fin n) :
+    Spec.choleskyFn A i j = cholStep A (prefixCols A j) j i := by
+  have hlen : j.val < (List.finRange n).length := by rw [List.length_finRange]; exact j.isLt
+  show (Spec.choleskyColsFn A).getD j.val (fun _ => 0) i = _
+  rw [choleskyColsFn_eq, getD_foldl_snoc_read (fun cols k => cholStep A cols k) (fun _ => 0)
+    (List.finRange n) j.val hlen]
+  have hj : (List.finRange n)[j.val]'hlen = j := by simp [List.getElem_finRange]
+  rw [hj]
+  rfl
+
+/-- The prefix of Cholesky columns is exactly the first `j` columns of the final factor `L`,
+each presented as the function `r ↦ L r k`. -/
+theorem prefix_eq_map (A : Fin n → Fin n → ℝ) (j : Fin n) :
+    prefixCols A j
+      = ((List.finRange n).take j.val).map (fun k => fun r => Spec.choleskyFn A r k) := by
+  have hjval : ((List.finRange n).take j.val).length = j.val := by
+    rw [List.length_take, List.length_finRange, Nat.min_eq_left (le_of_lt j.isLt)]
+  apply List.ext_getElem
+  · unfold prefixCols
+    rw [length_foldl_snoc (fun cols k => cholStep A cols k), List.length_nil, Nat.zero_add,
+      List.length_map]
+  · intro p h1 h2
+    rw [List.length_map, hjval] at h2
+    have hpn : p < n := lt_trans h2 j.isLt
+    rw [List.getElem_map]
+    have hidx : ((List.finRange n).take j.val)[p]'(by rw [hjval]; exact h2) = (⟨p, hpn⟩ : Fin n) := by
+      rw [List.getElem_take, List.getElem_finRange]; exact Fin.ext rfl
+    rw [show (prefixCols A j)[p]'h1 = (prefixCols A j).getD p (fun _ => 0) from
+      (List.getD_eq_getElem _ _ h1).symm]
+    unfold prefixCols
+    rw [getD_foldl_snoc_read (fun cols k => cholStep A cols k) (fun _ => 0)
+      ((List.finRange n).take j.val) p (by rw [hjval]; exact h2)]
+    rw [List.take_take, Nat.min_eq_left (le_of_lt h2), hidx]
+    funext r
+    rw [choleskyFn_eq_step]
+    rfl
+
+/-! ### List/Finset partial-sum bridges -/
+
+/-- Every element of a `finRange` prefix has index below the cut. -/
+theorem mem_take_finRange {m : Nat} {x : Fin n} (hx : x ∈ (List.finRange n).take m) :
+    x.val < m := by
+  obtain ⟨p, hp, hpx⟩ := List.getElem_of_mem hx
+  rw [List.length_take, List.length_finRange] at hp
+  rw [List.getElem_take, List.getElem_finRange] at hpx
+  subst hpx
+  exact lt_of_lt_of_le hp (Nat.min_le_left m n)
+
+/-- Every element of a `finRange` tail has index at least the cut. -/
+theorem mem_drop_finRange {m : Nat} {x : Fin n} (hx : x ∈ (List.finRange n).drop m) :
+    m ≤ x.val := by
+  obtain ⟨p, hp, hpx⟩ := List.getElem_of_mem hx
+  rw [List.getElem_drop, List.getElem_finRange] at hpx
+  subst hpx
+  exact Nat.le_add_right m p
+
+/-- Mapping `f` over a `finRange` prefix and summing equals the masked full sum. -/
+theorem take_map_sum_eq (m : Nat) (f : Fin n → ℝ) :
+    (((List.finRange n).take m).map f).sum = ∑ k : Fin n, if k.val < m then f k else 0 := by
+  rw [finsum_eq_finRange_sum]
+  conv_rhs => rw [show (List.finRange n)
+    = (List.finRange n).take m ++ (List.finRange n).drop m from (List.take_append_drop _ _).symm]
+  rw [List.map_append, List.sum_append]
+  have htake : ((List.finRange n).take m).map (fun k => if k.val < m then f k else 0)
+      = ((List.finRange n).take m).map f :=
+    List.map_congr_left (fun x hx => if_pos (mem_take_finRange hx))
+  have hdrop : (((List.finRange n).drop m).map (fun k => if k.val < m then f k else 0)).sum = 0 := by
+    rw [List.sum_eq_zero]
+    intro y hy
+    rw [List.mem_map] at hy
+    obtain ⟨x, hx, rfl⟩ := hy
+    exact if_neg (by have := mem_drop_finRange hx; omega)
+  rw [htake, hdrop, add_zero]
+
+/-- The Cholesky cross-sum equals the masked partial dot product of rows `i` and `j` of `L`. -/
+theorem cross_sum_eq (A : Fin n → Fin n → ℝ) (i j : Fin n) :
+    ((prefixCols A j).map (fun ck => ck i * ck j)).foldl (fun acc x => acc + x) 0
+      = ∑ k : Fin n, if k.val < j.val then Spec.choleskyFn A i k * Spec.choleskyFn A j k else 0 := by
+  rw [prefix_eq_map, List.map_map, foldl_add_eq_sum, zero_add,
+    show ((fun ck : Fin n → ℝ => ck i * ck j) ∘ fun k => fun r => Spec.choleskyFn A r k)
+      = (fun k => Spec.choleskyFn A i k * Spec.choleskyFn A j k) from rfl]
+  exact take_map_sum_eq j.val (fun k => Spec.choleskyFn A i k * Spec.choleskyFn A j k)
+
+/-- The Cholesky diagonal sum-of-squares equals the masked partial squared norm of row `j` of `L`. -/
+theorem sumsq_eq (A : Fin n → Fin n → ℝ) (j : Fin n) :
+    ((prefixCols A j).map (fun ck => ck j)).foldl (fun s x => s + x * x) 0
+      = ∑ k : Fin n, if k.val < j.val then Spec.choleskyFn A j k * Spec.choleskyFn A j k else 0 := by
+  rw [prefix_eq_map, List.map_map, foldl_addsq_eq_sum, zero_add, List.map_map,
+    show ((fun x : ℝ => x * x) ∘ ((fun ck : Fin n → ℝ => ck j) ∘ fun k => fun r => Spec.choleskyFn A r k))
+      = (fun k => Spec.choleskyFn A j k * Spec.choleskyFn A j k) from rfl]
+  exact take_map_sum_eq j.val (fun k => Spec.choleskyFn A j k * Spec.choleskyFn A j k)
+
+/-! ### Closed-form entries of the executable Cholesky factor -/
+
+/-- Over `ℝ`, the `Context` square root is `Real.sqrt`. -/
+theorem mfsqrt_eq (x : ℝ) : MathFunctions.sqrt x = Real.sqrt x := rfl
+
+/-- The diagonal entry of `L` in closed form: `L[j,j] = √(A[j,j] − Σ_{k<j} L[j,k]²)`. -/
+theorem choleskyFn_diag_eq (A : Fin n → Fin n → ℝ) (j : Fin n) :
+    Spec.choleskyFn A j j
+      = Real.sqrt (A j j
+          - ∑ k, if k.val < j.val then Spec.choleskyFn A j k * Spec.choleskyFn A j k else 0) := by
+  rw [choleskyFn_eq_step, cholStep_diag, sumsq_eq, mfsqrt_eq]
+
+/-- The below-diagonal entry of `L` in closed form:
+`L[i,j] = (A[i,j] − Σ_{k<j} L[i,k]·L[j,k]) / L[j,j]` for `i > j`. -/
+theorem choleskyFn_offdiag_eq (A : Fin n → Fin n → ℝ) {i j : Fin n} (hij : j.val < i.val) :
+    Spec.choleskyFn A i j
+      = (A i j - ∑ k, if k.val < j.val then Spec.choleskyFn A i k * Spec.choleskyFn A j k else 0)
+          / Spec.choleskyFn A j j := by
+  rw [choleskyFn_eq_step A i j, cholStep_offdiag _ _ hij, cross_sum_eq, sumsq_eq, mfsqrt_eq,
+    ← choleskyFn_diag_eq]
+
+/-! ### Reconstruction `A = L · Lᵀ`
+
+The diagonal of the rotated/peeled product is reconstructed using the closed-form entries and the
+positive-pivot hypothesis (`0 < L[j,j]`), which is exactly the condition under which the executable
+Cholesky succeeds over `ℝ`. -/
+
+/-- Per-entry reconstruction for the lower part (`j ≤ i`): the `(i, j)` entry of `L · Lᵀ` is `A i j`. -/
+theorem choleskyFn_dot_eq (A : Fin n → Fin n → ℝ)
+    (hpos : ∀ j : Fin n, 0 < Spec.choleskyFn A j j) {i j : Fin n} (hji : j.val ≤ i.val) :
+    (∑ k, Spec.choleskyFn A i k * Spec.choleskyFn A j k) = A i j := by
+  set L := Spec.choleskyFn A with hL
+  have key : ∀ k : Fin n, L i k * L j k
+      = (if k.val < j.val then L i k * L j k else 0) + (if k = j then L i j * L j j else 0) := by
+    intro k
+    rcases lt_trichotomy k.val j.val with h | h | h
+    · have hne : k ≠ j := fun hk => by rw [hk] at h; exact lt_irrefl _ h
+      rw [if_pos h, if_neg hne, add_zero]
+    · have hkj : k = j := Fin.ext h
+      rw [if_neg (by omega), if_pos hkj, zero_add, hkj]
+    · have hne : k ≠ j := fun hk => by rw [hk] at h; exact lt_irrefl _ h
+      rw [if_neg (by omega), if_neg hne, add_zero,
+        show L j k = 0 from Spec.Factorization.choleskyFn_lower_triangular A h, mul_zero]
+  rw [show (∑ k, L i k * L j k)
+      = ∑ k, ((if k.val < j.val then L i k * L j k else 0) + (if k = j then L i j * L j j else 0))
+      from Finset.sum_congr rfl (fun k _ => key k),
+    Finset.sum_add_distrib, Finset.sum_ite_eq' Finset.univ j (fun _ => L i j * L j j)]
+  simp only [Finset.mem_univ, if_true]
+  rcases eq_or_lt_of_le hji with heq | hlt
+  · have hij' : i = j := Fin.ext heq.symm
+    subst hij'
+    have hrad : 0 < A i i - (∑ k, if k.val < i.val then L i k * L i k else 0) := by
+      have hp := hpos i
+      rw [hL, choleskyFn_diag_eq] at hp
+      exact Real.sqrt_pos.mp hp
+    have hsq : L i i * L i i = A i i - (∑ k, if k.val < i.val then L i k * L i k else 0) := by
+      conv_lhs => rw [hL, choleskyFn_diag_eq A i]
+      exact Real.mul_self_sqrt hrad.le
+    rw [hsq]; ring
+  · have hne : L j j ≠ 0 := ne_of_gt (hpos j)
+    have hmul : L i j * L j j
+        = A i j - (∑ k, if k.val < j.val then L i k * L j k else 0) := by
+      rw [hL, choleskyFn_offdiag_eq A hlt, div_mul_eq_mul_div, mul_div_assoc, div_self hne, mul_one]
+    rw [hmul]; ring
+
+/-- Per-entry reconstruction for all `(i, j)`, using symmetry of `A`. -/
+theorem choleskyFn_dot (A : Fin n → Fin n → ℝ) (hsymm : ∀ i j, A i j = A j i)
+    (hpos : ∀ j : Fin n, 0 < Spec.choleskyFn A j j) (i j : Fin n) :
+    (∑ k, Spec.choleskyFn A i k * Spec.choleskyFn A j k) = A i j := by
+  rcases le_total j.val i.val with h | h
+  · exact choleskyFn_dot_eq A hpos h
+  · rw [show (∑ k, Spec.choleskyFn A i k * Spec.choleskyFn A j k)
+        = ∑ k, Spec.choleskyFn A j k * Spec.choleskyFn A i k
+        from Finset.sum_congr rfl (fun k _ => mul_comm _ _),
+      choleskyFn_dot_eq A hpos h, hsymm j i]
+
+/-- **Exact Cholesky reconstruction.** For a symmetric `A` whose executable Cholesky pivots are all
+positive (`0 < L[j,j]`, the success condition over `ℝ`), the factor `L = choleskyFn A` is a genuine
+Cholesky factor: lower-triangular with `A = L · Lᵀ`. -/
+theorem isCholesky_of_pos (A : Fin n → Fin n → ℝ) (hsymm : ∀ i j, A i j = A j i)
+    (hpos : ∀ j : Fin n, 0 < Spec.choleskyFn A j j) :
+    Spec.Factorization.IsCholesky (Matrix.of A) (Matrix.of (Spec.choleskyFn A)) := by
+  refine ⟨?_, ?_⟩
+  · intro a b hab
+    show Spec.choleskyFn A a b = 0
+    exact Spec.Factorization.choleskyFn_lower_triangular A (Fin.lt_def.mp hab)
+  · ext i j
+    rw [Matrix.mul_apply]
+    simp only [Matrix.of_apply, Matrix.transpose_apply]
+    exact (choleskyFn_dot A hsymm hpos i j).symm
+
+/-- **Tensor-level Cholesky reconstruction.** For a symmetric tensor `A` whose `choleskySpec` pivots
+are positive, every entry of `A` is reconstructed by `L · Lᵀ`:
+`A[i,j] = Σ_k L[i,k] · L[j,k]`, with `L = choleskySpec A`. -/
+theorem choleskySpec_reconstruction (A : Spec.Tensor ℝ (.dim n (.dim n .scalar)))
+    (hsymm : ∀ i j, Spec.get2 A i j = Spec.get2 A j i)
+    (hpos : ∀ j : Fin n, 0 < Spec.get2 (Spec.choleskySpec A) j j) (i j : Fin n) :
+    Spec.get2 A i j
+      = ∑ k, Spec.get2 (Spec.choleskySpec A) i k * Spec.get2 (Spec.choleskySpec A) j k := by
+  have hg : ∀ a b, Spec.get2 (Spec.choleskySpec A) a b = Spec.choleskyFn (Spec.toMatFn A) a b := by
+    intro a b
+    rw [show Spec.choleskySpec A = Spec.ofMatFn (Spec.choleskyFn (Spec.toMatFn A)) from rfl,
+      Spec.Factorization.get2_ofMatFn]
+  simp only [hg]
+  show Spec.toMatFn A i j = _
+  refine (choleskyFn_dot (Spec.toMatFn A) (fun a b => hsymm a b) (fun b => ?_) i j).symm
+  rw [← hg b b]; exact hpos b
+
+end Spec.Factorization.Reconstruction
diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
index 13f8df3..0297902 100644
--- a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
+++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
@@ -83,6 +83,29 @@ For the finite Cholesky construction, `choleskyFn_lower_triangular` (and its ten
 vanish by construction. The proof reads the column produced at each position out of the `List.foldl`
 that builds the factor, via the reusable indexing lemma `getD_foldl_finRange`.
 
+# Exact Cholesky reconstruction
+
+Cholesky is a _finite_ construction, so unlike the iterative routines it admits an exact
+reconstruction theorem — no residual, no convergence caveat. In
+[`NN.Proofs.Tensor.Basic.FactorizationsReconstruction`](https://github.com/lean-dojo/TorchLean/blob/main/NN/Proofs/Tensor/Basic/FactorizationsReconstruction.lean),
+`isCholesky_of_pos` proves that for a symmetric `A` whose executable pivots are all positive
+(`0 < L[j,j]`, exactly the condition under which the algorithm succeeds over the reals) the factor
+`L = choleskyFn A` is a genuine Cholesky factor:
+
+$$`L \text{ lower-triangular} \quad\text{and}\quad A = L\,L^\top.`
+
+The tensor-level corollary `choleskySpec_reconstruction` states the same per entry:
+`A[i,j] = Σ_k L[i,k]·L[j,k]`.
+
+The proof turns the executable algorithm — a `List.foldl` that snocs one column per index — into
+per-entry algebra. The reusable lemma `getD_foldl_snoc_read` reads the `j`-th column as the step
+function applied to the length-`j` prefix; `prefix_eq_map` then identifies that prefix with the first
+`j` columns of the final `L`, and `take_map_sum_eq` rewrites the code's `List.foldl` sums as masked
+`Finset` partial sums. Lower-triangularity collapses the matrix product to a partial sum plus a single
+pivot term, and the positive-pivot hypothesis discharges the two side conditions: `√` of a positive
+radicand for the diagonal (`Real.mul_self_sqrt`) and a non-zero divisor for the below-diagonal
+entries. Symmetry of `A` extends the lower-triangular reconstruction to the whole matrix.
+
 # The a-posteriori residual certificate
 
 For the iterative routines, the replacement for an impossible a-priori convergence proof is an exact
@@ -107,9 +130,12 @@ mass on specific matrices.
 
 # What remains
 
-The exact algebraic reconstruction of the *finite* executable factorizations — `A = L · Lᵀ` for the
-Cholesky column fold under positive pivots, and `A = Q · R` with `Qᵀ Q = 1` for Gram–Schmidt under
-full column rank — is the natural next increment. It needs an induction relating the `List.foldl`
-prefix at step `j` to the first `j` produced columns (a strengthening of `getD_foldl_finRange`)
-together with the per-pivot positivity discharge from `Matrix.PosDef`. The specification-level facts
-the kernel methods rely on are independent of that step, so the CHD foundation is already in place.
+With Cholesky's exact reconstruction in place, the remaining finite-fold increment is the QR
+factorization: `A = Q · R` from modified Gram–Schmidt under full column rank, and the orthonormality
+`Qᵀ Q = 1`. The `A = Q · R` part is within reach of the same machinery, but `gramSchmidtFn` threads a
+`GSState` that snocs onto _two_ lists at once (the `Q` columns and the `R` columns), so it needs read
+lemmas for that dual-list structure-fold rather than the single-list `getD_foldl_snoc_read` used for
+Cholesky. The orthonormality `Qᵀ Q = 1` is harder still: it rests on the Gram–Schmidt orthogonality
+invariant, which Mathlib provides for its own `gramSchmidt` but not for this executable variant. The
+specification-level facts the kernel methods rely on are independent of these steps, so the CHD
+foundation is already in place.

From eaee2bc1127ee0cd22d081be5b56d540fed9d2c9 Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Sat, 30 May 2026 16:10:44 -0700
Subject: [PATCH 04/22] =?UTF-8?q?Add=20exact=20QR=20reconstruction=20A=20?=
 =?UTF-8?q?=3D=20Q=C2=B7R=20(finite-fold=20increment)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Prove the exact algebraic reconstruction of the finite executable QR
(modified Gram–Schmidt) factorization over ℝ, extending the Cholesky work.

`qr_mul_eq`: for A : Fin m → Fin n → ℝ whose executable R-pivots are positive
(0 < Rmat A j j — full column rank, the success condition), the factors satisfy
A = Q·R with R upper-triangular (`Rmat_upper_triangular`). `qrSpec_reconstruction`
is the tensor-level corollary.

Method: gramSchmidtFn threads a GSState that snocs onto both the Q-list and the
R-list at once. The appended values depend only on the Q-history, so the Q-list
is a single-list snoc-fold (`gs_proj_qs`, read by `getD_foldl_snoc_read`) and the
R-list is the Q-prefix tail `rTail` (read by `gs_fold_split` + `rTail_getD`). The
orthogonalization sum v = a − Σ rₖⱼqₖ (a List.zip fold) collapses to a map-fold
(`cross_fold_eq`) and then a masked Finset partial sum (`take_map_sum_eq`); the
positive pivot cancels the v/rⱼⱼ normalization exactly.

Not done (documented, not sorry): orthonormality Qᵀ Q = 1, which rests on the
Gram–Schmidt orthogonality invariant Mathlib only has for its own gramSchmidt.

Blueprint: new "Exact QR reconstruction" section; "What remains" narrowed to the
orthonormality invariant.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 .../Basic/FactorizationsReconstruction.lean   | 362 +++++++++++++++++-
 .../Ch4_Verification/Factorizations.lean      |  31 +-
 2 files changed, 363 insertions(+), 30 deletions(-)

diff --git a/NN/Proofs/Tensor/Basic/FactorizationsReconstruction.lean b/NN/Proofs/Tensor/Basic/FactorizationsReconstruction.lean
index 386d77c..90cd5cb 100644
--- a/NN/Proofs/Tensor/Basic/FactorizationsReconstruction.lean
+++ b/NN/Proofs/Tensor/Basic/FactorizationsReconstruction.lean
@@ -12,37 +12,43 @@ public import Mathlib.Data.List.GetD
 public import Mathlib.Algebra.BigOperators.Fin
 
 /-!
-# Exact reconstruction of the finite Cholesky factorization
+# Exact reconstruction of the finite factorizations (Cholesky and QR)
 
-This file proves the *exact* algebraic reconstruction of the finite executable Cholesky
-factorization from [`NN.Spec.Core.Tensor.Factorizations`](../../../Spec/Core/Tensor/Factorizations.lean),
+This file proves the *exact* algebraic reconstruction of the finite executable Cholesky and QR
+factorizations from [`NN.Spec.Core.Tensor.Factorizations`](../../../Spec/Core/Tensor/Factorizations.lean),
 the increment promised in `NN.Proofs.Tensor.Basic.Factorizations`. Unlike the iterative Jacobi/SVD
-routines (whose reconstruction is only an a-posteriori residual certificate), Cholesky is a *finite*
-construction, so over `ℝ` it reconstructs its input on the nose under the success hypothesis.
+routines (whose reconstruction is only an a-posteriori residual certificate), Cholesky and Gram–Schmidt
+are *finite* constructions, so over `ℝ` they reconstruct their input on the nose under the success
+hypotheses.
 
-## Main result
+## Main results
 
-`isCholesky_of_pos`: for a symmetric `A : Fin n → Fin n → ℝ` whose executable Cholesky pivots are all
-positive (`0 < choleskyFn A j j`, the exact condition under which the algorithm succeeds over `ℝ`),
-the factor `L = choleskyFn A` satisfies the specification `Spec.Factorization.IsCholesky`:
-it is lower-triangular and `A = L · Lᵀ`. `choleskySpec_reconstruction` is the tensor-level corollary.
+* `isCholesky_of_pos`: for a symmetric `A : Fin n → Fin n → ℝ` whose executable Cholesky pivots are all
+  positive (`0 < choleskyFn A j j`, the exact condition under which the algorithm succeeds over `ℝ`),
+  the factor `L = choleskyFn A` satisfies the spec `Spec.Factorization.IsCholesky`: lower-triangular
+  and `A = L · Lᵀ`. `choleskySpec_reconstruction` is the tensor-level corollary.
+* `qr_mul_eq`: for `A : Fin m → Fin n → ℝ` whose executable Gram–Schmidt `R`-pivots are positive
+  (`0 < Rmat A j j`, full column rank), the factors `Q = gramSchmidtFn A` and `R` satisfy `A = Q · R`,
+  with `R` upper-triangular (`Rmat_upper_triangular`). `qrSpec_reconstruction` is the tensor-level
+  corollary.
 
 ## Method
 
-The executable factor is built by a `List.foldl` that snocs one column per index. The core technical
-device is `getD_foldl_snoc_read`, a general lemma reading the `j`-th element of such a fold as the
-step function applied to the length-`j` prefix. From it, `prefix_eq_map` identifies the prefix of
-columns with the first `j` columns of the final factor `L`, and `take_map_sum_eq` turns the code's
-`List.foldl` sums into masked `Finset` partial sums. The positive-pivot hypothesis discharges the two
-side conditions (`√` radicand `> 0` for the diagonal, divisor `≠ 0` for the below-diagonal entries).
+Each executable factor is built by a `List.foldl` that snocs one column per index. The core technical
+device is `getD_foldl_snoc_read`, a general lemma reading the `j`-th element of such a fold as the step
+function applied to the length-`j` prefix. From it, `prefix_eq_map`/`qsPrefix_eq_map` identify the
+prefix with the first `j` columns of the final factor, and `take_map_sum_eq` turns the code's
+`List.foldl` sums into masked `Finset` partial sums. The QR fold threads a `GSState` that snocs onto
+*both* the `Q`-list and the `R`-list at once; `gs_proj_qs` and `gs_fold_split`/`rTail_getD` recover the
+single-list read lemmas for each projection (the step depends only on the `Q`-history). The
+positive-pivot hypotheses discharge the `√`-radicand and divisor side conditions.
 
 ## Scope
 
-The QR factorization's exact reconstruction (`A = Q · R` from `gramSchmidtFn`, plus the orthonormality
-`Qᵀ Q = 1`) is the remaining finite-fold increment. It needs analogous read lemmas for the
-`GSState` *dual-list* structure-fold (the step writes both `qs` and `rcols`), and `Qᵀ Q = 1`
-additionally requires the Gram–Schmidt orthogonality invariant, which Mathlib only provides for its
-own `gramSchmidt`, not for this executable variant.
+The one piece *not* proved is the orthonormality of the QR factor, `Qᵀ Q = 1`. Unlike `A = Q · R`
+(which is a purely algebraic consequence of the orthogonalization step), it rests on the Gram–Schmidt
+orthogonality invariant, which Mathlib provides for its own `gramSchmidt` but not for this executable
+variant — so it stays the documented remaining increment, never a `sorry`.
 -/
 
 @[expose] public section
@@ -374,4 +380,318 @@ theorem choleskySpec_reconstruction (A : Spec.Tensor ℝ (.dim n (.dim n .scalar
   refine (choleskyFn_dot (Spec.toMatFn A) (fun a b => hsymm a b) (fun b => ?_) i j).symm
   rw [← hg b b]; exact hpos b
 
+/-! ## QR (modified Gram–Schmidt): exact reconstruction `A = Q · R`
+
+`gramSchmidtFn` threads a `GSState` that snocs a column onto *both* the `Q`-list and the `R`-list at
+each index. Crucially the appended values depend only on the `Q`-history (`st.qs`), never on the
+`R`-history, so the `Q`-list is itself a single-list snoc-fold (`gs_proj_qs`) and the `R`-list is the
+`Q`-prefix-indexed tail `rTail`. -/
+
+section QR
+
+variable {m : Nat}
+
+open Spec (GSState)
+
+/-- Column `j` of `A` as a function of the row. -/
+noncomputable def gsA (A : Fin m → Fin n → ℝ) (j : Fin n) : Fin m → ℝ := fun i => A i j
+
+/-- The `R` off-diagonal entries `rₖⱼ = qₖ · a` for the columns `qs` built so far. -/
+noncomputable def gsRkjs (A : Fin m → Fin n → ℝ) (qs : List (Fin m → ℝ)) (j : Fin n) : List ℝ :=
+  qs.map (fun qk => Spec.dotFn qk (gsA A j))
+
+/-- The orthogonalized (not-yet-normalized) vector `v = a − Σ rₖⱼ qₖ`. -/
+noncomputable def gsV (A : Fin m → Fin n → ℝ) (qs : List (Fin m → ℝ)) (j : Fin n) : Fin m → ℝ :=
+  fun i => gsA A j i
+    - (List.zip qs (gsRkjs A qs j)).foldl (fun acc (qk, r) => acc + r * qk i) 0
+
+/-- The diagonal `R` entry `rⱼⱼ = ‖v‖`. -/
+noncomputable def gsRjj (A : Fin m → Fin n → ℝ) (qs : List (Fin m → ℝ)) (j : Fin n) : ℝ :=
+  Spec.normFn (gsV A qs j)
+
+/-- The `Q` column appended at index `j`: `v / rⱼⱼ` (or `0` when `rⱼⱼ = 0`). -/
+noncomputable def qStep (A : Fin m → Fin n → ℝ) (qs : List (Fin m → ℝ)) (j : Fin n) : Fin m → ℝ :=
+  fun i => if Context.gtBool (gsRjj A qs j) 0 then gsV A qs j i / gsRjj A qs j else 0
+
+/-- The `R` column appended at index `j`: `rₖⱼ` below the diagonal, `rⱼⱼ` on it, `0` above. -/
+noncomputable def rStep (A : Fin m → Fin n → ℝ) (qs : List (Fin m → ℝ)) (j : Fin n) : Fin n → ℝ :=
+  fun k => if k.val < j.val then (gsRkjs A qs j).getD k.val 0
+    else if k.val == j.val then gsRjj A qs j else 0
+
+/-- `gramSchmidtFn` as the dual-list snoc-fold appending `qStep`/`rStep`. -/
+theorem gramSchmidtFn_eq (A : Fin m → Fin n → ℝ) :
+    Spec.gramSchmidtFn A
+      = (List.finRange n).foldl
+          (fun st j => (⟨st.qs ++ [qStep A st.qs j], st.rcols ++ [rStep A st.qs j]⟩ : GSState m n ℝ))
+          ⟨[], []⟩ := rfl
+
+/-- The `Q`-list projection of the structure fold is the single-list `qStep` snoc-fold. -/
+theorem gs_proj_qs (A : Fin m → Fin n → ℝ) (l : List (Fin n)) (q0 : List (Fin m → ℝ))
+    (r0 : List (Fin n → ℝ)) :
+    (l.foldl (fun st j => (⟨st.qs ++ [qStep A st.qs j], st.rcols ++ [rStep A st.qs j]⟩ : GSState m n ℝ))
+        ⟨q0, r0⟩).qs
+      = l.foldl (fun qs j => qs ++ [qStep A qs j]) q0 := by
+  induction l generalizing q0 r0 with
+  | nil => rfl
+  | cons a t ih => simp only [List.foldl_cons]; exact ih _ _
+
+/-- The `Q` columns built before index `j`. -/
+noncomputable def qsPrefix (A : Fin m → Fin n → ℝ) (j : Fin n) : List (Fin m → ℝ) :=
+  ((List.finRange n).take j.val).foldl (fun qs k => qs ++ [qStep A qs k]) []
+
+/-- The `R`-list tail: the `R` columns produced from `Q`-prefix `q0` over the indices `l`. -/
+noncomputable def rTail (A : Fin m → Fin n → ℝ) (q0 : List (Fin m → ℝ)) : List (Fin n) →
+    List (Fin n → ℝ)
+  | [] => []
+  | j :: rest => rStep A q0 j :: rTail A (q0 ++ [qStep A q0 j]) rest
+
+/-- The structure fold splits into the `qStep` snoc-fold (`Q`-list) and the `rTail` (`R`-list). -/
+theorem gs_fold_split (A : Fin m → Fin n → ℝ) (l : List (Fin n)) (q0 : List (Fin m → ℝ))
+    (r0 : List (Fin n → ℝ)) :
+    (l.foldl (fun st j => (⟨st.qs ++ [qStep A st.qs j], st.rcols ++ [rStep A st.qs j]⟩ : GSState m n ℝ))
+        ⟨q0, r0⟩)
+      = ⟨l.foldl (fun qs j => qs ++ [qStep A qs j]) q0, r0 ++ rTail A q0 l⟩ := by
+  induction l generalizing q0 r0 with
+  | nil => simp [rTail]
+  | cons j rest ih =>
+      simp only [List.foldl_cons, rTail]
+      rw [ih]
+      simp [List.append_assoc]
+
+/-- Reading the `k`-th element of `rTail` recovers `rStep` applied to the length-`k` `Q`-prefix. -/
+theorem rTail_getD (A : Fin m → Fin n → ℝ) (q0 : List (Fin m → ℝ)) (l : List (Fin n)) (k : Nat)
+    (hk : k < l.length) (d : Fin n → ℝ) :
+    (rTail A q0 l).getD k d
+      = rStep A ((l.take k).foldl (fun qs j => qs ++ [qStep A qs j]) q0) (l[k]'hk) := by
+  induction l generalizing q0 k with
+  | nil => simp at hk
+  | cons j rest ih =>
+      cases k with
+      | zero => simp [rTail]
+      | succ k' =>
+          simp only [rTail, List.getD_cons_succ, List.take_succ_cons, List.foldl_cons,
+            List.getElem_cons_succ]
+          exact ih (q0 ++ [qStep A q0 j]) k' (by simpa using hk)
+
+/-- Semantics of the `Context` `>` test over `ℝ`. -/
+theorem gtBool_true_iff {x y : ℝ} : Context.gtBool x y = true ↔ y < x := by
+  unfold Context.gtBool; exact decide_eq_true_iff
+
+/-- A left fold `acc + h x` accumulates the mapped list sum. -/
+theorem foldl_addf_eq_sum {β : Type _} (h : β → ℝ) (l : List β) (a : ℝ) :
+    l.foldl (fun acc x => acc + h x) a = a + (l.map h).sum := by
+  induction l generalizing a with
+  | nil => simp
+  | cons x t ih => rw [List.foldl_cons, ih, List.map_cons, List.sum_cons]; ring
+
+/-! ### Entries of the executable `Q` and `R` factors -/
+
+/-- Entry `(i, k)` of the `Q` factor produced by `gramSchmidtFn`. -/
+noncomputable def Qmat (A : Fin m → Fin n → ℝ) (i : Fin m) (k : Fin n) : ℝ :=
+  (Spec.gramSchmidtFn A).qs.getD k.val (fun _ => 0) i
+
+/-- Entry `(k, j)` of the `R` factor produced by `gramSchmidtFn`. -/
+noncomputable def Rmat (A : Fin m → Fin n → ℝ) (k j : Fin n) : ℝ :=
+  (Spec.gramSchmidtFn A).rcols.getD j.val (fun _ => 0) k
+
+/-- Column `k` of `Q` as a function of the row. -/
+noncomputable def Qcol (A : Fin m → Fin n → ℝ) (k : Fin n) : Fin m → ℝ := fun r => Qmat A r k
+
+/-- Closed form of a `Q` entry: `qStep` evaluated on the `Q`-prefix. -/
+theorem Qmat_eq (A : Fin m → Fin n → ℝ) (i : Fin m) (k : Fin n) :
+    Qmat A i k = qStep A (qsPrefix A k) k i := by
+  have hqs : (Spec.gramSchmidtFn A).qs
+      = (List.finRange n).foldl (fun qs j => qs ++ [qStep A qs j]) [] := by
+    rw [gramSchmidtFn_eq]; exact gs_proj_qs A (List.finRange n) [] []
+  unfold Qmat
+  rw [hqs, getD_foldl_snoc_read (fun qs j => qStep A qs j) (fun _ => 0) (List.finRange n) k.val
+    (by rw [List.length_finRange]; exact k.isLt)]
+  have hk : (List.finRange n)[k.val]'(by rw [List.length_finRange]; exact k.isLt) = k := by
+    simp [List.getElem_finRange]
+  rw [hk]; rfl
+
+/-- Closed form of an `R` entry: `rStep` evaluated on the `Q`-prefix. -/
+theorem Rmat_eq (A : Fin m → Fin n → ℝ) (k j : Fin n) :
+    Rmat A k j = rStep A (qsPrefix A j) j k := by
+  have hrc : (Spec.gramSchmidtFn A).rcols = rTail A [] (List.finRange n) := by
+    rw [gramSchmidtFn_eq, gs_fold_split]; simp
+  unfold Rmat
+  rw [hrc, rTail_getD A [] (List.finRange n) j.val (by rw [List.length_finRange]; exact j.isLt)]
+  have hk : (List.finRange n)[j.val]'(by rw [List.length_finRange]; exact j.isLt) = j := by
+    simp [List.getElem_finRange]
+  rw [hk]; rfl
+
+/-- `R` is upper-triangular: entries strictly below the diagonal vanish. -/
+theorem rStep_above (A : Fin m → Fin n → ℝ) (qs : List (Fin m → ℝ)) {j k : Fin n}
+    (hjk : j.val < k.val) : rStep A qs j k = 0 := by
+  simp only [rStep]; rw [if_neg (by omega), if_neg (by rw [beq_iff_eq]; omega)]
+
+/-- The diagonal `R` entry is `rⱼⱼ`. -/
+theorem rStep_diag (A : Fin m → Fin n → ℝ) (qs : List (Fin m → ℝ)) (j : Fin n) :
+    rStep A qs j j = gsRjj A qs j := by
+  simp only [rStep]; rw [if_neg (lt_irrefl _), if_pos (beq_self_eq_true _)]
+
+/-- The `Q` column when the pivot is positive: `qⱼ = v / rⱼⱼ`. -/
+theorem qStep_pos (A : Fin m → Fin n → ℝ) (qs : List (Fin m → ℝ)) (j : Fin n)
+    (h : 0 < gsRjj A qs j) (i : Fin m) :
+    qStep A qs j i = gsV A qs j i / gsRjj A qs j := by
+  simp only [qStep]; rw [if_pos (gtBool_true_iff.mpr h)]
+
+/-! ### The orthogonalization sum as a `Finset` sum -/
+
+set_option linter.unusedSimpArgs false in
+/-- The zip-fold defining `v` collapses to a single map-fold over the `Q` columns. -/
+theorem cross_fold_eq (qs : List (Fin m → ℝ)) (g : (Fin m → ℝ) → ℝ) (i : Fin m) (a : ℝ) :
+    (List.zip qs (qs.map g)).foldl (fun acc (qk, r) => acc + r * qk i) a
+      = a + (qs.map (fun qk => g qk * qk i)).sum := by
+  induction qs generalizing a with
+  | nil => simp
+  | cons x xs ih =>
+      simp only [List.map_cons, List.zip_cons_cons, List.foldl_cons]
+      rw [ih]; simp only [List.map_cons, List.sum_cons]; ring
+
+/-- Closed form of `v i`: `A i j` minus the partial projection sum. -/
+theorem gsV_eq (A : Fin m → Fin n → ℝ) (qs : List (Fin m → ℝ)) (j : Fin n) (i : Fin m) :
+    gsV A qs j i = gsA A j i - (qs.map (fun qk => Spec.dotFn qk (gsA A j) * qk i)).sum := by
+  unfold gsV gsRkjs
+  rw [cross_fold_eq qs (fun qk => Spec.dotFn qk (gsA A j)) i 0, zero_add]
+
+/-- Length of the `Q`-prefix list. -/
+theorem qsPrefix_length (A : Fin m → Fin n → ℝ) (j : Fin n) : (qsPrefix A j).length = j.val := by
+  unfold qsPrefix
+  rw [length_foldl_snoc (fun qs k => qStep A qs k), List.length_nil, Nat.zero_add, List.length_take,
+    List.length_finRange, Nat.min_eq_left (le_of_lt j.isLt)]
+
+/-- The `Q`-prefix is exactly the first `j` columns of the final factor `Q`. -/
+theorem qsPrefix_eq_map (A : Fin m → Fin n → ℝ) (j : Fin n) :
+    qsPrefix A j = ((List.finRange n).take j.val).map (fun k => Qcol A k) := by
+  have hjval : ((List.finRange n).take j.val).length = j.val := by
+    rw [List.length_take, List.length_finRange, Nat.min_eq_left (le_of_lt j.isLt)]
+  apply List.ext_getElem
+  · unfold qsPrefix
+    rw [length_foldl_snoc (fun qs k => qStep A qs k), List.length_nil, Nat.zero_add,
+      List.length_map]
+  · intro p h1 h2
+    rw [List.length_map, hjval] at h2
+    have hpn : p < n := lt_trans h2 j.isLt
+    rw [List.getElem_map]
+    have hidx : ((List.finRange n).take j.val)[p]'(by rw [hjval]; exact h2) = (⟨p, hpn⟩ : Fin n) := by
+      rw [List.getElem_take, List.getElem_finRange]; exact Fin.ext rfl
+    rw [show (qsPrefix A j)[p]'h1 = (qsPrefix A j).getD p (fun _ => 0) from
+      (List.getD_eq_getElem _ _ h1).symm]
+    unfold qsPrefix
+    rw [getD_foldl_snoc_read (fun qs k => qStep A qs k) (fun _ => 0)
+      ((List.finRange n).take j.val) p (by rw [hjval]; exact h2)]
+    rw [List.take_take, Nat.min_eq_left (le_of_lt h2), hidx]
+    funext r
+    rw [show Qcol A (⟨p, hpn⟩ : Fin n) r = Qmat A r ⟨p, hpn⟩ from rfl, Qmat_eq]
+    rfl
+
+/-- `getD` commutes with `dotFn`-mapping when the index is in range. -/
+theorem getD_map_dotFn (qs : List (Fin m → ℝ)) (a : Fin m → ℝ) (k : Nat) (hk : k < qs.length) :
+    (qs.map (fun qk => Spec.dotFn qk a)).getD k 0 = Spec.dotFn (qs.getD k (fun _ => 0)) a := by
+  rw [List.getD_eq_getElem _ _ (by rw [List.length_map]; exact hk), List.getElem_map,
+    List.getD_eq_getElem _ _ hk]
+
+/-- A `Q`-prefix entry equals the final `Q` column at that index. -/
+theorem qsPrefix_getD (A : Fin m → Fin n → ℝ) {k j : Fin n} (hkj : k.val < j.val) :
+    (qsPrefix A j).getD k.val (fun _ => 0) = Qcol A k := by
+  rw [qsPrefix_eq_map,
+    List.getD_eq_getElem _ _ (by rw [List.length_map, List.length_take, List.length_finRange,
+      Nat.min_eq_left (le_of_lt j.isLt)]; exact hkj),
+    List.getElem_map]
+  congr 1
+  rw [List.getElem_take, List.getElem_finRange]; exact Fin.ext rfl
+
+/-- The below-diagonal `R` entry is the inner product of the corresponding `Q` column with column `j`. -/
+theorem R_below (A : Fin m → Fin n → ℝ) {k j : Fin n} (hkj : k.val < j.val) :
+    Rmat A k j = Spec.dotFn (Qcol A k) (gsA A j) := by
+  rw [Rmat_eq]; simp only [rStep]; rw [if_pos hkj]; unfold gsRkjs
+  rw [getD_map_dotFn (qsPrefix A j) (gsA A j) k.val (by rw [qsPrefix_length]; exact hkj),
+    qsPrefix_getD A hkj]
+
+/-- The projection sum equals the masked partial sum `Σ_{k<j} R[k,j]·Q[i,k]`. -/
+theorem cross_sum_qr (A : Fin m → Fin n → ℝ) (i : Fin m) (j : Fin n) :
+    ((qsPrefix A j).map (fun qk => Spec.dotFn qk (gsA A j) * qk i)).sum
+      = ∑ k, if k.val < j.val then Rmat A k j * Qmat A i k else 0 := by
+  rw [qsPrefix_eq_map]
+  rw [List.map_map]
+  rw [take_map_sum_eq]
+  apply Finset.sum_congr rfl
+  intro k _
+  by_cases hkj : k.val < j.val
+  · rw [if_pos hkj, if_pos hkj]
+    show Spec.dotFn (Qcol A k) (gsA A j) * Qmat A i k = Rmat A k j * Qmat A i k
+    rw [R_below A hkj]
+  · rw [if_neg hkj, if_neg hkj]
+
+/-! ### Exact reconstruction `A = Q · R` -/
+
+/-- `R` is upper-triangular: entries strictly below the diagonal vanish. -/
+theorem Rmat_upper_triangular (A : Fin m → Fin n → ℝ) {k j : Fin n} (hjk : j.val < k.val) :
+    Rmat A k j = 0 := by
+  rw [Rmat_eq]; exact rStep_above A (qsPrefix A j) hjk
+
+/-- **Per-entry QR reconstruction.** When every `R` pivot is positive (`0 < R[j,j]`, the full
+column-rank success condition), `A[i,j] = Σ_k Q[i,k]·R[k,j]`. -/
+theorem qr_reconstruction (A : Fin m → Fin n → ℝ) (hrank : ∀ j : Fin n, 0 < Rmat A j j)
+    (i : Fin m) (j : Fin n) :
+    A i j = ∑ k, Qmat A i k * Rmat A k j := by
+  have key : ∀ k : Fin n, Qmat A i k * Rmat A k j
+      = (if k.val < j.val then Qmat A i k * Rmat A k j else 0)
+        + (if k = j then Qmat A i j * Rmat A j j else 0) := by
+    intro k
+    rcases lt_trichotomy k.val j.val with h | h | h
+    · have hne : k ≠ j := fun hk => by rw [hk] at h; exact lt_irrefl _ h
+      rw [if_pos h, if_neg hne, add_zero]
+    · have hkj : k = j := Fin.ext h
+      rw [if_neg (by omega), if_pos hkj, zero_add, hkj]
+    · have hne : k ≠ j := fun hk => by rw [hk] at h; exact lt_irrefl _ h
+      rw [if_neg (by omega), if_neg hne, add_zero, Rmat_upper_triangular A h, mul_zero]
+  rw [show (∑ k, Qmat A i k * Rmat A k j)
+      = ∑ k, ((if k.val < j.val then Qmat A i k * Rmat A k j else 0)
+        + (if k = j then Qmat A i j * Rmat A j j else 0))
+      from Finset.sum_congr rfl (fun k _ => key k),
+    Finset.sum_add_distrib, Finset.sum_ite_eq' Finset.univ j (fun _ => Qmat A i j * Rmat A j j)]
+  simp only [Finset.mem_univ, if_true]
+  have hρpos : 0 < gsRjj A (qsPrefix A j) j := by
+    have h := hrank j; rwa [Rmat_eq, rStep_diag] at h
+  have hdiag : Qmat A i j * Rmat A j j = gsV A (qsPrefix A j) j i := by
+    rw [Qmat_eq, qStep_pos A (qsPrefix A j) j hρpos,
+      show Rmat A j j = gsRjj A (qsPrefix A j) j from by rw [Rmat_eq]; exact rStep_diag _ _ j,
+      div_mul_eq_mul_div, mul_div_assoc, div_self (ne_of_gt hρpos), mul_one]
+  rw [hdiag, gsV_eq, cross_sum_qr,
+    show gsA A j i = A i j from rfl,
+    show (∑ k, if k.val < j.val then Qmat A i k * Rmat A k j else 0)
+      = (∑ k, if k.val < j.val then Rmat A k j * Qmat A i k else 0)
+      from Finset.sum_congr rfl (fun k _ => by
+        by_cases hkj : k.val < j.val
+        · rw [if_pos hkj, if_pos hkj, mul_comm]
+        · rw [if_neg hkj, if_neg hkj])]
+  ring
+
+/-- **Matrix-level QR reconstruction.** `A = Q · R` for the executable Gram–Schmidt factors,
+under positive `R` pivots (full column rank). -/
+theorem qr_mul_eq (A : Fin m → Fin n → ℝ) (hrank : ∀ j : Fin n, 0 < Rmat A j j) :
+    Matrix.of A = Matrix.of (fun i k => Qmat A i k) * Matrix.of (fun k j => Rmat A k j) := by
+  ext i j
+  rw [Matrix.mul_apply]
+  simp only [Matrix.of_apply]
+  exact qr_reconstruction A hrank i j
+
+/-- **Tensor-level QR reconstruction.** For a tensor `A` whose `qrSpec` `R`-pivots are positive
+(full column rank), every entry of `A` is reconstructed by `Q · R`:
+`A[i,j] = Σ_k Q[i,k]·R[k,j]`, with `Q = qrQSpec A`, `R = qrRSpec A`. -/
+theorem qrSpec_reconstruction (A : Spec.Tensor ℝ (.dim m (.dim n .scalar)))
+    (hrank : ∀ j : Fin n, 0 < Spec.get2 (Spec.qrRSpec A) j j) (i : Fin m) (j : Fin n) :
+    Spec.get2 A i j
+      = ∑ k, Spec.get2 (Spec.qrQSpec A) i k * Spec.get2 (Spec.qrRSpec A) k j := by
+  have hQ : ∀ a b, Spec.get2 (Spec.qrQSpec A) a b = Qmat (Spec.toMatFn A) a b := fun _ _ => rfl
+  have hR : ∀ a b, Spec.get2 (Spec.qrRSpec A) a b = Rmat (Spec.toMatFn A) a b := fun _ _ => rfl
+  simp only [hQ, hR]
+  show Spec.toMatFn A i j = _
+  exact qr_reconstruction (Spec.toMatFn A) (fun b => by rw [← hR b b]; exact hrank b) i j
+
+end QR
+
 end Spec.Factorization.Reconstruction
diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
index 0297902..2fdaf92 100644
--- a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
+++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
@@ -128,14 +128,27 @@ zero-residual limit, `isSymEig_of_diagonal` shows the solver output `(diag Af, V
 `NN/Examples/Factorization` are concrete instances of this certificate: they bound the off-diagonal
 mass on specific matrices.
 
+# Exact QR reconstruction
+
+The QR factorization admits the same treatment. `qr_mul_eq` (in the same file) proves that for an
+`A` whose executable Gram–Schmidt `R`-pivots are all positive (`0 < R[j,j]`, the full-column-rank
+success condition) the factors satisfy
+
+$$`R \text{ upper-triangular} \quad\text{and}\quad A = Q\,R,`
+
+with `qrSpec_reconstruction` the tensor-level corollary. The new wrinkle is that `gramSchmidtFn`
+threads a `GSState` that snocs onto _two_ lists at once — the `Q` columns and the `R` columns. Because
+the appended values depend only on the `Q`-history, the `Q`-list is itself a single-list snoc-fold
+(`gs_proj_qs`, read by `getD_foldl_snoc_read` as for Cholesky), and the `R`-list is the `Q`-prefix
+tail `rTail`, read by `gs_fold_split` together with `rTail_getD`. The orthogonalization sum
+`v = a − Σ rₖⱼ qₖ`, a fold over `List.zip`, collapses to a single map-fold (`cross_fold_eq`) and then
+to a masked `Finset` partial sum, after which the positive-pivot hypothesis cancels the `v / rⱼⱼ`
+normalization exactly.
+
 # What remains
 
-With Cholesky's exact reconstruction in place, the remaining finite-fold increment is the QR
-factorization: `A = Q · R` from modified Gram–Schmidt under full column rank, and the orthonormality
-`Qᵀ Q = 1`. The `A = Q · R` part is within reach of the same machinery, but `gramSchmidtFn` threads a
-`GSState` that snocs onto _two_ lists at once (the `Q` columns and the `R` columns), so it needs read
-lemmas for that dual-list structure-fold rather than the single-list `getD_foldl_snoc_read` used for
-Cholesky. The orthonormality `Qᵀ Q = 1` is harder still: it rests on the Gram–Schmidt orthogonality
-invariant, which Mathlib provides for its own `gramSchmidt` but not for this executable variant. The
-specification-level facts the kernel methods rely on are independent of these steps, so the CHD
-foundation is already in place.
+The one finite-fold property still open is the orthonormality of the QR factor, `Qᵀ Q = 1`. Unlike
+`A = Q · R` — a purely algebraic consequence of the orthogonalization step, proved above — it rests on
+the Gram–Schmidt orthogonality invariant, which Mathlib provides for its own `gramSchmidt` but not for
+this executable variant. The specification-level facts the kernel methods rely on are independent of
+that step, so the CHD foundation is already in place.

From 31450f3ff8263dd77e11ca7198792cfe808fc330 Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Sat, 30 May 2026 16:48:17 -0700
Subject: [PATCH 05/22] =?UTF-8?q?Add=20QR=20orthonormality=20Q=E1=B5=80Q?=
 =?UTF-8?q?=3D1=20via=20Mathlib=20gramSchmidt=20bridge?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Closes the last open finite-fold property: orthonormality of the executable
Gram–Schmidt Q factor. Rather than re-derive the orthogonality induction, the
new file NN/Proofs/Tensor/Basic/FactorizationsOrthonormal.lean unifies the
executable variant with Mathlib's gramSchmidt.

Reading columns of A as EuclideanSpace ℝ (Fin m) vectors (gsCol), Qcol_bridge
proves by strong induction that the j-th executable Q column equals
gramSchmidtNormed ℝ (gsCol A) j; orthonormality then follows from Mathlib's
gramSchmidtNormed_orthonormal'. Yields Q_orthonormal (qₐ·q_b = δₐᵦ),
QT_mul_Q_eq_one, the full IsQR predicate isQR_of_pos, and the tensor-level
qrSpec_orthonormal.

Three reusable connectors over ℝ — dotFn_eq_inner, normFn_eq_norm,
proj_normalize — are stated generally enough to lift into a future Mathlib
matrix-level QR contribution.

Blueprint chapter and reconstruction docstring updated: only the iterative
Jacobi/SVD convergence now remains (residual certificate only). Sorry-free;
NN.Examples.Factorization still reconstructs at err 0.000000, with the
empirical Qᵀ·Q = I check now formally backed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 NN/Proofs/Tensor/Basic.lean                   |   1 +
 .../Basic/FactorizationsOrthonormal.lean      | 252 ++++++++++++++++++
 .../Basic/FactorizationsReconstruction.lean   |   9 +-
 .../Ch4_Verification/Factorizations.lean      |  29 +-
 4 files changed, 282 insertions(+), 9 deletions(-)
 create mode 100644 NN/Proofs/Tensor/Basic/FactorizationsOrthonormal.lean

diff --git a/NN/Proofs/Tensor/Basic.lean b/NN/Proofs/Tensor/Basic.lean
index 9e44d53..fb2f877 100644
--- a/NN/Proofs/Tensor/Basic.lean
+++ b/NN/Proofs/Tensor/Basic.lean
@@ -11,6 +11,7 @@ public import NN.Proofs.Tensor.Basic.Folds
 public import NN.Proofs.Tensor.Basic.LinearAlgebra
 public import NN.Proofs.Tensor.Basic.Factorizations
 public import NN.Proofs.Tensor.Basic.FactorizationsReconstruction
+public import NN.Proofs.Tensor.Basic.FactorizationsOrthonormal
 public import NN.Proofs.Tensor.Basic.BoundsNorms
 public import NN.Proofs.Tensor.Basic.Algebra
 
diff --git a/NN/Proofs/Tensor/Basic/FactorizationsOrthonormal.lean b/NN/Proofs/Tensor/Basic/FactorizationsOrthonormal.lean
new file mode 100644
index 0000000..f45a8f6
--- /dev/null
+++ b/NN/Proofs/Tensor/Basic/FactorizationsOrthonormal.lean
@@ -0,0 +1,252 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Proofs.Tensor.Basic.FactorizationsReconstruction
+public import Mathlib.Analysis.InnerProductSpace.GramSchmidtOrtho
+public import Mathlib.Analysis.InnerProductSpace.PiL2
+
+/-!
+# Orthonormality of the executable Gram–Schmidt `Q` factor (`Qᵀ Q = 1`)
+
+This file closes the one finite-fold property left open by
+[`NN.Proofs.Tensor.Basic.FactorizationsReconstruction`](FactorizationsReconstruction.lean): the
+orthonormality of the `Q` factor produced by the executable modified Gram–Schmidt `gramSchmidtFn`.
+
+The strategy is to **unify the executable variant with Mathlib's `gramSchmidt`** rather than re-derive
+the orthogonality induction by hand. Reading the columns of `A` as vectors of
+`EuclideanSpace ℝ (Fin m)`, the `j`-th executable `Q` column equals Mathlib's `gramSchmidtNormed ℝ`
+of the column map (`Qcol_bridge`), so the orthonormality follows from Mathlib's
+`gramSchmidtNormed_orthonormal'`.
+
+## Main results
+
+* `Qcol_bridge`: `WithLp.toLp 2 (Qcol A k) = gramSchmidtNormed ℝ (gsCol A) k` — the executable `Q`
+  column is Mathlib's normalized Gram–Schmidt vector, proved by strong induction on `k`.
+* `Q_orthonormal`: `dotFn (Qcol A a) (Qcol A b) = if a = b then 1 else 0` under positive `R` pivots.
+* `QT_mul_Q_eq_one` and `isQR_of_pos`: the matrix-level `Qᵀ Q = 1` and the full
+  `Spec.Factorization.IsQR` predicate for the executable factors (combining with the reconstruction
+  `A = Q · R` and `R` upper-triangular from the companion file).
+* `qrSpec_orthonormal`: the tensor-level corollary.
+
+## Method
+
+The bridge rests on three connectors over `ℝ`: `dotFn = ⟪·,·⟫` and `normFn = ‖·‖` on
+`EuclideanSpace ℝ (Fin m)`, and the projection identity `proj_normalize` showing the un-normalized
+Gram–Schmidt projection term equals the normalized one. The strong induction feeds the partial
+identification of the earlier `Q` columns into `gramSchmidt_def''`, term by term.
+-/
+
+@[expose] public section
+
+namespace Spec.Factorization.Reconstruction
+
+open Matrix
+open scoped BigOperators RealInnerProductSpace
+open InnerProductSpace
+
+/-! ## Connectors between the executable scalar ops and the Euclidean inner product -/
+
+/-- `dotFn` as a `Finset` sum. -/
+theorem dotFn_eq_sum {p : Nat} (u v : Fin p → ℝ) : Spec.dotFn u v = ∑ i, u i * v i := by
+  unfold Spec.dotFn
+  rw [foldl_addf_eq_sum (fun i => u i * v i) (List.finRange p) 0, zero_add,
+    ← finsum_eq_finRange_sum (fun i => u i * v i)]
+
+/-- The executable dot product is the Euclidean inner product over `ℝ`. -/
+theorem dotFn_eq_inner {p : Nat} (u v : Fin p → ℝ) :
+    Spec.dotFn u v
+      = ⟪(WithLp.toLp 2 u : EuclideanSpace ℝ (Fin p)), WithLp.toLp 2 v⟫_ℝ := by
+  rw [dotFn_eq_sum, PiLp.inner_apply]
+  apply Finset.sum_congr rfl
+  intro i _
+  rw [RCLike.inner_apply', PiLp.toLp_apply, PiLp.toLp_apply]
+  simp
+
+/-- The executable Euclidean norm is the `EuclideanSpace` norm over `ℝ`. -/
+theorem normFn_eq_norm {p : Nat} (v : Fin p → ℝ) :
+    Spec.normFn v = ‖(WithLp.toLp 2 v : EuclideanSpace ℝ (Fin p))‖ := by
+  rw [Spec.normFn, mfsqrt_eq, EuclideanSpace.norm_eq]
+  congr 1
+  rw [dotFn_eq_sum]
+  apply Finset.sum_congr rfl
+  intro i _
+  rw [PiLp.toLp_apply, Real.norm_eq_abs, sq_abs, sq]
+
+/-- The Gram–Schmidt projection term, with the normalized vector pulled out. Holds with no
+non-degeneracy hypothesis (both sides vanish when `gramSchmidt = 0`). -/
+theorem proj_normalize {F : Type*} [NormedAddCommGroup F] [InnerProductSpace ℝ F] (w x : F) :
+    (⟪w, x⟫_ℝ / ‖w‖ ^ 2) • w = ⟪‖w‖⁻¹ • w, x⟫_ℝ • (‖w‖⁻¹ • w) := by
+  rw [real_inner_smul_left, smul_smul]
+  congr 1
+  rw [div_eq_mul_inv, ← inv_pow, sq]
+  ring
+
+/-- `gramSchmidtNormed` over `ℝ`, with the scalar coercion removed. -/
+theorem gn_eq {n : Nat} {F : Type*} [NormedAddCommGroup F] [InnerProductSpace ℝ F]
+    (f : Fin n → F) (i : Fin n) :
+    gramSchmidtNormed ℝ f i = ‖gramSchmidt ℝ f i‖⁻¹ • gramSchmidt ℝ f i := by
+  rw [gramSchmidtNormed]
+  norm_num
+
+/-- A masked full sum equals the sum over `Iio`. -/
+theorem sum_Iio_eq_mask {n : Nat} (k : Fin n) (h : Fin n → ℝ) :
+    ∑ i ∈ Finset.Iio k, h i = ∑ i, if i.val < k.val then h i else 0 := by
+  rw [← Finset.sum_filter]
+  congr 1
+  ext i
+  simp only [Finset.mem_Iio, Finset.mem_filter, Finset.mem_univ, true_and, Fin.lt_def]
+
+/-! ## The bridge to Mathlib's `gramSchmidt` -/
+
+section QR
+
+variable {m n : Nat}
+
+/-- Column `j` of `A` as a vector of `EuclideanSpace ℝ (Fin m)`. -/
+noncomputable def gsCol (A : Fin m → Fin n → ℝ) (j : Fin n) : EuclideanSpace ℝ (Fin m) :=
+  WithLp.toLp 2 (gsA A j)
+
+/-- `gsCol A k` reads as the executable column `gsA A k`. -/
+theorem gsCol_apply (A : Fin m → Fin n → ℝ) (k : Fin n) (r : Fin m) :
+    gsCol A k r = gsA A k r := rfl
+
+/-- **Orthogonalized-vector bridge.** Given that the earlier `Q` columns coincide with Mathlib's
+normalized Gram–Schmidt vectors, the executable orthogonalized vector `v` at index `k` equals
+Mathlib's (un-normalized) `gramSchmidt` vector. -/
+theorem gsV_bridge (A : Fin m → Fin n → ℝ) (k : Fin n)
+    (ih : ∀ i : Fin n, i.val < k.val →
+        (WithLp.toLp 2 (Qcol A i) : EuclideanSpace ℝ (Fin m)) = gramSchmidtNormed ℝ (gsCol A) i) :
+    gramSchmidt ℝ (gsCol A) k = WithLp.toLp 2 (gsV A (qsPrefix A k) k) := by
+  -- Rewrite Mathlib's vector via the explicit recurrence.
+  rw [show gramSchmidt ℝ (gsCol A) k
+        = gsCol A k - ∑ i ∈ Finset.Iio k,
+            (⟪gramSchmidt ℝ (gsCol A) i, gsCol A k⟫_ℝ / ‖gramSchmidt ℝ (gsCol A) i‖ ^ 2)
+              • gramSchmidt ℝ (gsCol A) i
+      from eq_sub_of_add_eq (gramSchmidt_def'' ℝ (gsCol A) k).symm]
+  -- Replace each projection term by the normalized form, then by the executable `Q` column.
+  have hproj : ∀ i ∈ Finset.Iio k,
+      (⟪gramSchmidt ℝ (gsCol A) i, gsCol A k⟫_ℝ / ‖gramSchmidt ℝ (gsCol A) i‖ ^ 2)
+          • gramSchmidt ℝ (gsCol A) i
+        = ⟪(WithLp.toLp 2 (Qcol A i) : EuclideanSpace ℝ (Fin m)), gsCol A k⟫_ℝ
+            • (WithLp.toLp 2 (Qcol A i) : EuclideanSpace ℝ (Fin m)) := by
+    intro i hi
+    have hik : i < k := Finset.mem_Iio.mp hi
+    rw [proj_normalize (gramSchmidt ℝ (gsCol A) i) (gsCol A k), ← gn_eq, ih i hik]
+  rw [Finset.sum_congr rfl hproj]
+  -- Compare entrywise.
+  ext r
+  rw [PiLp.sub_apply]
+  show gsCol A k r - _ = gsV A (qsPrefix A k) k r
+  rw [gsV_eq, gsCol_apply]
+  congr 1
+  -- The Euclidean `Iio` sum, applied at `r`, equals the executable list projection sum.
+  rw [WithLp.ofLp_sum, Finset.sum_apply]
+  rw [show ∑ i ∈ Finset.Iio k,
+        (WithLp.ofLp (⟪(WithLp.toLp 2 (Qcol A i) : EuclideanSpace ℝ (Fin m)), gsCol A k⟫_ℝ
+          • (WithLp.toLp 2 (Qcol A i) : EuclideanSpace ℝ (Fin m)))) r
+      = ∑ i ∈ Finset.Iio k, Spec.dotFn (Qcol A i) (gsA A k) * Qcol A i r from by
+        apply Finset.sum_congr rfl
+        intro i _
+        rw [show WithLp.ofLp (⟪(WithLp.toLp 2 (Qcol A i) : EuclideanSpace ℝ (Fin m)), gsCol A k⟫_ℝ
+              • (WithLp.toLp 2 (Qcol A i) : EuclideanSpace ℝ (Fin m))) r
+            = ⟪(WithLp.toLp 2 (Qcol A i) : EuclideanSpace ℝ (Fin m)), gsCol A k⟫_ℝ • Qcol A i r
+            from rfl, smul_eq_mul, gsCol, ← dotFn_eq_inner]]
+  rw [sum_Iio_eq_mask, qsPrefix_eq_map, List.map_map, take_map_sum_eq]
+  rfl
+
+/-- **Normalized-column bridge.** The executable `Q` column at index `k` equals Mathlib's
+`gramSchmidtNormed`. Proved by strong induction on `k`, under positive `R` pivots (full column rank). -/
+theorem Qcol_bridge (A : Fin m → Fin n → ℝ) (hrank : ∀ j : Fin n, 0 < Rmat A j j) :
+    ∀ k : Fin n,
+      (WithLp.toLp 2 (Qcol A k) : EuclideanSpace ℝ (Fin m)) = gramSchmidtNormed ℝ (gsCol A) k := by
+  have main : ∀ N : Nat, ∀ k : Fin n, k.val = N →
+      (WithLp.toLp 2 (Qcol A k) : EuclideanSpace ℝ (Fin m)) = gramSchmidtNormed ℝ (gsCol A) k := by
+    intro N
+    induction N using Nat.strong_induction_on with
+    | _ N ih =>
+      intro k hk
+      have IH : ∀ i : Fin n, i.val < k.val →
+          (WithLp.toLp 2 (Qcol A i) : EuclideanSpace ℝ (Fin m)) = gramSchmidtNormed ℝ (gsCol A) i :=
+        fun i hi => ih i.val (hk ▸ hi) i rfl
+      have hρpos : 0 < gsRjj A (qsPrefix A k) k := by
+        have h := hrank k; rwa [Rmat_eq, rStep_diag] at h
+      have hgsV := gsV_bridge A k IH
+      rw [gn_eq, hgsV]
+      ext r
+      rw [PiLp.smul_apply, PiLp.toLp_apply, PiLp.toLp_apply, smul_eq_mul]
+      show Qcol A k r = _
+      rw [show Qcol A k r = Qmat A r k from rfl, Qmat_eq, qStep_pos A (qsPrefix A k) k hρpos,
+        ← normFn_eq_norm]
+      show gsV A (qsPrefix A k) k r / gsRjj A (qsPrefix A k) k
+        = (Spec.normFn (gsV A (qsPrefix A k) k))⁻¹ * gsV A (qsPrefix A k) k r
+      rw [show Spec.normFn (gsV A (qsPrefix A k) k) = gsRjj A (qsPrefix A k) k from rfl,
+        div_eq_mul_inv, mul_comm]
+  exact fun k => main k.val k rfl
+
+/-! ## Orthonormality `Qᵀ Q = 1` -/
+
+/-- Each normalized Gram–Schmidt vector is non-zero (the pivot is positive). -/
+theorem gn_ne_zero (A : Fin m → Fin n → ℝ) (hrank : ∀ j : Fin n, 0 < Rmat A j j) (j : Fin n) :
+    gramSchmidtNormed ℝ (gsCol A) j ≠ 0 := by
+  have hpos : 0 < ‖gramSchmidt ℝ (gsCol A) j‖ := by
+    have h := hrank j
+    rw [Rmat_eq, rStep_diag] at h
+    rwa [gsV_bridge A j (fun i _ => Qcol_bridge A hrank i), ← normFn_eq_norm]
+  rw [gn_eq]
+  exact smul_ne_zero (inv_ne_zero (ne_of_gt hpos)) (norm_pos_iff.mp hpos)
+
+/-- **Orthonormality of the executable `Q` columns.** Under positive `R` pivots,
+`qₐ · q_b = δₐᵦ`. -/
+theorem Q_orthonormal (A : Fin m → Fin n → ℝ) (hrank : ∀ j : Fin n, 0 < Rmat A j j) (a b : Fin n) :
+    Spec.dotFn (Qcol A a) (Qcol A b) = if a = b then 1 else 0 := by
+  rw [dotFn_eq_inner]
+  show ⟪(WithLp.toLp 2 (Qcol A a) : EuclideanSpace ℝ (Fin m)), WithLp.toLp 2 (Qcol A b)⟫_ℝ = _
+  rw [Qcol_bridge A hrank a, Qcol_bridge A hrank b]
+  have horth := orthonormal_iff_ite.mp (gramSchmidtNormed_orthonormal' (gsCol A))
+    ⟨a, gn_ne_zero A hrank a⟩ ⟨b, gn_ne_zero A hrank b⟩
+  rw [horth]
+  simp only [Subtype.mk.injEq]
+
+/-- **Matrix-level orthonormality.** `Qᵀ Q = 1` for the executable Gram–Schmidt `Q` factor. -/
+theorem QT_mul_Q_eq_one (A : Fin m → Fin n → ℝ) (hrank : ∀ j : Fin n, 0 < Rmat A j j) :
+    (Matrix.of (fun i k => Qmat A i k))ᵀ * Matrix.of (fun i k => Qmat A i k) = 1 := by
+  ext a b
+  rw [Matrix.mul_apply]
+  simp only [Matrix.transpose_apply, Matrix.of_apply, Matrix.one_apply]
+  rw [show (∑ i, Qmat A i a * Qmat A i b) = Spec.dotFn (Qcol A a) (Qcol A b) from by
+        rw [dotFn_eq_sum]; rfl,
+    Q_orthonormal A hrank a b]
+
+/-- **Full QR specification.** For `A` with positive executable `R`-pivots (full column rank), the
+executable Gram–Schmidt factors satisfy `Spec.Factorization.IsQR`: `Qᵀ Q = 1`, `R` upper-triangular,
+and `A = Q · R`. -/
+theorem isQR_of_pos (A : Fin m → Fin n → ℝ) (hrank : ∀ j : Fin n, 0 < Rmat A j j) :
+    Spec.Factorization.IsQR (Matrix.of A) (Matrix.of (fun i k => Qmat A i k))
+      (Matrix.of (fun k j => Rmat A k j)) := by
+  refine ⟨QT_mul_Q_eq_one A hrank, ?_, qr_mul_eq A hrank⟩
+  intro i j hji
+  show Rmat A i j = 0
+  exact Rmat_upper_triangular A (Fin.lt_def.mp hji)
+
+/-- **Tensor-level orthonormality.** For a tensor `A` with positive `qrRSpec` pivots, the `Q` factor
+`qrQSpec A` has orthonormal columns: `Σ_i Q[i,a]·Q[i,b] = δₐᵦ`. -/
+theorem qrSpec_orthonormal (A : Spec.Tensor ℝ (.dim m (.dim n .scalar)))
+    (hrank : ∀ j : Fin n, 0 < Spec.get2 (Spec.qrRSpec A) j j) (a b : Fin n) :
+    (∑ i, Spec.get2 (Spec.qrQSpec A) i a * Spec.get2 (Spec.qrQSpec A) i b)
+      = if a = b then 1 else 0 := by
+  have hQ : ∀ x y, Spec.get2 (Spec.qrQSpec A) x y = Qmat (Spec.toMatFn A) x y := fun _ _ => rfl
+  have hR : ∀ x y, Spec.get2 (Spec.qrRSpec A) x y = Rmat (Spec.toMatFn A) x y := fun _ _ => rfl
+  simp only [hQ]
+  rw [show (∑ i, Qmat (Spec.toMatFn A) i a * Qmat (Spec.toMatFn A) i b)
+        = Spec.dotFn (Qcol (Spec.toMatFn A) a) (Qcol (Spec.toMatFn A) b) from by
+        rw [dotFn_eq_sum]; rfl]
+  exact Q_orthonormal (Spec.toMatFn A) (fun j => by rw [← hR]; exact hrank j) a b
+
+end QR
+
+end Spec.Factorization.Reconstruction
diff --git a/NN/Proofs/Tensor/Basic/FactorizationsReconstruction.lean b/NN/Proofs/Tensor/Basic/FactorizationsReconstruction.lean
index 90cd5cb..6aa6e73 100644
--- a/NN/Proofs/Tensor/Basic/FactorizationsReconstruction.lean
+++ b/NN/Proofs/Tensor/Basic/FactorizationsReconstruction.lean
@@ -45,10 +45,11 @@ positive-pivot hypotheses discharge the `√`-radicand and divisor side conditio
 
 ## Scope
 
-The one piece *not* proved is the orthonormality of the QR factor, `Qᵀ Q = 1`. Unlike `A = Q · R`
-(which is a purely algebraic consequence of the orthogonalization step), it rests on the Gram–Schmidt
-orthogonality invariant, which Mathlib provides for its own `gramSchmidt` but not for this executable
-variant — so it stays the documented remaining increment, never a `sorry`.
+This file proves `A = L · Lᵀ` and `A = Q · R` purely algebraically. The remaining QR property —
+orthonormality of the `Q` factor, `Qᵀ Q = 1` — is proved in the companion file
+[`NN.Proofs.Tensor.Basic.FactorizationsOrthonormal`](FactorizationsOrthonormal.lean) by bridging the
+executable Gram–Schmidt to Mathlib's `gramSchmidt`, completing the full `Spec.Factorization.IsQR`
+predicate (`isQR_of_pos`).
 -/
 
 @[expose] public section
diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
index 2fdaf92..d8e51b5 100644
--- a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
+++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
@@ -145,10 +145,29 @@ tail `rTail`, read by `gs_fold_split` together with `rTail_getD`. The orthogonal
 to a masked `Finset` partial sum, after which the positive-pivot hypothesis cancels the `v / rⱼⱼ`
 normalization exactly.
 
+# Orthonormality of the QR factor (`Qᵀ Q = 1`)
+
+The remaining finite-fold property — orthonormality of the `Q` factor, `Qᵀ Q = 1` — is proved in
+[`NN.Proofs.Tensor.Basic.FactorizationsOrthonormal`](https://github.com/lean-dojo/TorchLean/blob/main/NN/Proofs/Tensor/Basic/FactorizationsOrthonormal.lean)
+by *unifying the executable variant with Mathlib's `gramSchmidt`* rather than re-deriving the
+orthogonality induction by hand. Reading the columns of `A` as vectors of `EuclideanSpace ℝ (Fin m)`,
+`Qcol_bridge` proves by strong induction that the `j`-th executable `Q` column equals Mathlib's
+`gramSchmidtNormed ℝ` of the column map. The orthonormality then follows from Mathlib's
+`gramSchmidtNormed_orthonormal'`, giving `Q_orthonormal` (`qₐ · q_b = δₐᵦ`), the matrix-level
+`QT_mul_Q_eq_one`, and the full `IsQR` predicate `isQR_of_pos` (orthonormal `Q`, upper-triangular `R`,
+`A = Q · R`).
+
+The bridge rests on three small connectors over `ℝ`: the executable `dotFn`/`normFn` are the Euclidean
+inner product and norm (`dotFn_eq_inner`, `normFn_eq_norm`), and `proj_normalize` shows the
+un-normalized Gram–Schmidt projection term equals the normalized one (with no non-degeneracy
+hypothesis). The positive-pivot assumption (`0 < R[j,j]`, full column rank) supplies the non-vanishing
+of each `gramSchmidt` vector via `gn_ne_zero`. These connectors are stated generally enough to lift
+into a future Mathlib matrix-level QR contribution.
+
 # What remains
 
-The one finite-fold property still open is the orthonormality of the QR factor, `Qᵀ Q = 1`. Unlike
-`A = Q · R` — a purely algebraic consequence of the orthogonalization step, proved above — it rests on
-the Gram–Schmidt orthogonality invariant, which Mathlib provides for its own `gramSchmidt` but not for
-this executable variant. The specification-level facts the kernel methods rely on are independent of
-that step, so the CHD foundation is already in place.
+With Cholesky and QR fully reconstructed (`A = L · Lᵀ`, `A = Q · R`, `Qᵀ Q = 1`), the only properties
+not available as a-priori theorems are the *iterative* ones: full diagonalization for the cyclic Jacobi
+eigensolver and the SVD built on it. Mathlib v4.30.0 has no Jacobi convergence theory, so those remain
+captured by the exact a-posteriori residual certificate above, never by `sorry`. The specification-level
+facts the kernel methods rely on are independent of that step, so the CHD foundation is complete.

From f6717b5693d9817195b96476887b58419bbba075 Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Sat, 30 May 2026 22:03:05 -0700
Subject: [PATCH 06/22] Make Jacobi residual certificate unconditional +
 reviewer examples
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Discharge the two hypotheses the symmetric-eigendecomposition residual
certificate assumed (`Vᵀ V = 1` and `A = V·Af·Vᵀ`) for the real
`symEigJacobiSpec` output, so the certificate holds outright.

NN/Proofs/Tensor/Basic/FactorizationsJacobi.lean (sorry-free, warning-free):
- toM bridge from `Array (Array ℝ)` to Mathlib `Matrix`; toM_matMul/tr/id
  show the array ops realise the matrix ops (unconditionally).
- givens_orthogonal: each Givens rotation with c²+s²=1 is orthogonal
  (Jᵀ J = 1), via the three column shapes and a 9-case dot-product split.
- JacInv loop invariant preserved by jacInv_rotate/_sweep/_run
  (List.foldlRecOn over jacobiPairs; base case (A, I)).
- jacobi_orthogonal, jacobi_similarity (no hypotheses) ⟹ unconditional
  symEigJacobi_{reconstruction,frobenius}_residual and
  symEigJacobi_isSymEig_of_diagonal, with worked examples.

Blueprint Ch4: new "Faithfulness of the Jacobi run" section; "What remains"
narrowed from the iterative properties to just the convergence rate.

Examples (positive + negative controls, compiled #eval assertions):
- Cholesky: indefinite A correctly fails (NaN; uses summed Frobenius error).
- QR: rank-deficient A reconstructs but Qᵀ Q ≠ I (full rank needed).
- SVD: Vᵀ V = I; permuted σ fails to reconstruct.
- SymEig: orthogonality exact at 1 sweep; off-diagonal residual asymptotic;
  exact residual certificate verified numerically (lhs = rhs, |Δ| = 0).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 NN/Examples/Factorization.lean                |  23 +-
 NN/Examples/Factorization/Cholesky.lean       |  19 +
 NN/Examples/Factorization/Common.lean         |  55 ++-
 NN/Examples/Factorization/QR.lean             |  25 +
 NN/Examples/Factorization/SVD.lean            |  20 +-
 NN/Examples/Factorization/SymEig.lean         |  78 ++-
 NN/Proofs/Tensor/Basic.lean                   |   1 +
 .../Tensor/Basic/FactorizationsJacobi.lean    | 458 ++++++++++++++++++
 .../Ch4_Verification/Factorizations.lean      |  46 +-
 9 files changed, 695 insertions(+), 30 deletions(-)
 create mode 100644 NN/Proofs/Tensor/Basic/FactorizationsJacobi.lean

diff --git a/NN/Examples/Factorization.lean b/NN/Examples/Factorization.lean
index 7f7be49..5bd7a28 100644
--- a/NN/Examples/Factorization.lean
+++ b/NN/Examples/Factorization.lean
@@ -16,15 +16,24 @@ public import NN.Examples.Factorization.SVD
 # Matrix factorization examples
 
 Executable sanity checks for the spec-layer matrix factorizations in
-`NN.Spec.Core.Tensor.Factorizations`:
+`NN.Spec.Core.Tensor.Factorizations`, designed to corroborate the formal correctness theorems in
+`NN.Proofs.Tensor.Basic.{Factorizations, FactorizationsReconstruction, FactorizationsOrthonormal,
+FactorizationsJacobi}`. Each check runs through compiled `#eval` assertions, so the build fails if a
+factorization misbehaves.
 
-- `Cholesky` — `A = L · Lᵀ`
-- `QR`       — `A = Q · R`, `Qᵀ·Q = I`
-- `SymEig`   — full symmetric eigendecomposition `A = V · diag(λ) · Vᵀ`
-- `SVD`      — `A = U · diag(σ) · Vᵀ`
+- `Cholesky` — `A = L · Lᵀ`; **negative control**: an indefinite `A` correctly fails (no SPD factor).
+- `QR`       — `A = Q · R`, `Qᵀ·Q = I`; **negative control**: a rank-deficient `A` still reconstructs
+  but `Qᵀ Q ≠ I`, separating the two guarantees and showing full column rank is needed.
+- `SymEig`   — `A = V · diag(λ) · Vᵀ`; orthogonality `Vᵀ V = I` is exact at *any* sweep count (witness
+  of the a-priori `jacobi_orthogonal`), diagonalization is asymptotic, and the **exact residual
+  certificate** `‖A − V·diag(λ)·Vᵀ‖² = ‖offDiag(VᵀAV)‖²` (`symEigJacobi_frobenius_residual`) is
+  verified numerically.
+- `SVD`      — `A = U · diag(σ) · Vᵀ`, `Vᵀ V = I`; **negative control**: a permuted `σ` fails to
+  reconstruct.
 
-Each example reconstructs the original matrix and asserts (via `#guard`) that the maximum
-reconstruction error is below `tol`, so the build fails if a factorization is incorrect.
+Both **positive** checks (a valid factorization reconstructs to `err ≈ 0`) and **negative controls**
+(the same metric reports a large error / `NaN` when a hypothesis is violated) are included, so a
+reviewer can see the checks are not vacuous.
 -/
 
 @[expose] public section
diff --git a/NN/Examples/Factorization/Cholesky.lean b/NN/Examples/Factorization/Cholesky.lean
index fc23255..da351a2 100644
--- a/NN/Examples/Factorization/Cholesky.lean
+++ b/NN/Examples/Factorization/Cholesky.lean
@@ -39,4 +39,23 @@ def reconErr : Float := maxMatErr A (mm L (tr L))
 -- Compiled assertion: the factorization reconstructs A (fails the build otherwise).
 #eval assertLt "Cholesky A = L·Lᵀ" reconErr
 
+/-! ## Negative control: the SPD hypothesis is necessary
+
+`isCholesky_of_pos` requires the executable pivots `L[j,j]` to be positive (`0 < choleskyFn A j j`),
+which is exactly the success condition over the reals. The matrix below is symmetric but *not*
+positive-definite (eigenvalues `3` and `-1`), so the diagonal step takes `√(negative)` and the
+reconstruction is `NaN` — never a small error. This documents that the hypothesis genuinely bites. -/
+
+/-- A symmetric but **indefinite** matrix (eigenvalues `{3, -1}`), outside Cholesky's domain. -/
+def Abad : Spec.Tensor Float (.dim 2 (.dim 2 .scalar)) :=
+  mkMat [[1, 2],
+         [2, 1]]
+
+def Lbad : Spec.Tensor Float (.dim 2 (.dim 2 .scalar)) := Spec.choleskySpec Abad
+-- Use the *summed* Frobenius error here, not `maxMatErr`: IEEE `max` ignores `NaN`, whereas the sum
+-- propagates the `NaN` produced by `√(negative)`, faithfully reporting that no factor exists.
+def reconErrBad : Float := frobSqErr Abad (mm Lbad (tr Lbad))
+
+#eval assertReconFails "Cholesky on indefinite A correctly fails (no SPD ⇒ no factor)" reconErrBad
+
 end NN.Examples.Factorization.Cholesky
diff --git a/NN/Examples/Factorization/Common.lean b/NN/Examples/Factorization/Common.lean
index c970e32..a5d2074 100644
--- a/NN/Examples/Factorization/Common.lean
+++ b/NN/Examples/Factorization/Common.lean
@@ -50,15 +50,32 @@ def diagFromVec {n : Nat} (v : Spec.Tensor Float (.dim n .scalar)) :
     Spec.Tensor Float (.dim n (.dim n .scalar)) :=
   Spec.ofMatFn (fun i j => if i.val == j.val then Spec.Tensor.toScalar (Spec.get v i) else 0.0)
 
+/-- Extract the diagonal of a square matrix as a length-`n` vector. -/
+def diagOf {n : Nat} (M : Spec.Tensor Float (.dim n (.dim n .scalar))) :
+    Spec.Tensor Float (.dim n .scalar) :=
+  Spec.ofVecFn (fun i => Spec.get2 M i i)
+
 /-- Read a vector tensor back out as a `List Float` (for display). -/
 def vecToList {n : Nat} (v : Spec.Tensor Float (.dim n .scalar)) : List Float :=
   (List.finRange n).map (fun i => Spec.Tensor.toScalar (Spec.get v i))
 
+/-- Squared Frobenius distance `Σ_{i,j} (A_ij - B_ij)²` between two `m × n` matrices. -/
+def frobSqErr {m n : Nat} (A B : Spec.Tensor Float (.dim m (.dim n .scalar))) : Float :=
+  (List.finRange m).foldl (fun acc i =>
+    (List.finRange n).foldl
+      (fun a j => let d := Spec.get2 A i j - Spec.get2 B i j; a + d * d) acc) 0.0
+
+/-- Squared Frobenius off-diagonal mass `Σ_{i≠j} M_ij²` of a square matrix. -/
+def offDiagFrobSq {n : Nat} (M : Spec.Tensor Float (.dim n (.dim n .scalar))) : Float :=
+  (List.finRange n).foldl (fun acc i =>
+    (List.finRange n).foldl
+      (fun a j => if i.val == j.val then a else let x := Spec.get2 M i j; a + x * x) acc) 0.0
+
 /-- Shared tolerance for reconstruction-error assertions. -/
 def tol : Float := 1e-6
 
 /--
-Compiled assertion used by the examples: print `name: OK (err)` when `err < tol`, otherwise raise an
+Compiled **positive** assertion: print `name: OK (err)` when `err < tol`, otherwise raise an
 `IO` error so the build/`#eval` fails. Running this through `#eval` evaluates with the compiler
 (fast), unlike `#guard`, which forces slow kernel reduction of the whole factorization.
 -/
@@ -68,4 +85,40 @@ def assertLt (name : String) (err : Float) (tolerance : Float := tol) : IO Unit
   else
     throw (IO.userError s!"{name}: FAIL (err = {err} ≥ tol = {tolerance})")
 
+/--
+Compiled **negative-control** assertion: succeeds only when `err ≥ threshold`, i.e. when a property
+that *should not* hold is correctly detected as violated. Gives the metric teeth — a reviewer can see
+the same `maxMatErr`/residual that reports `0` on a valid factorization reports a large value on an
+invalid one, so the positive checks are not vacuous.
+-/
+def assertGe (name : String) (err : Float) (threshold : Float := 0.5) : IO Unit :=
+  if err ≥ threshold then
+    IO.println s!"{name}: OK (correctly rejected, err = {err} ≥ {threshold})"
+  else
+    throw (IO.userError s!"{name}: FAIL (err = {err} < {threshold}; expected the property to fail)")
+
+/--
+Compiled **negative-control** assertion that a reconstruction *fails*: succeeds when the error is not
+below `tol` — including the `NaN` produced when a hypothesis is violated (e.g. Cholesky of a
+non-positive-definite matrix takes `√(negative)`). Documents that the success hypotheses (SPD pivots,
+full column rank) are genuinely necessary.
+-/
+def assertReconFails (name : String) (err : Float) (tolerance : Float := tol) : IO Unit :=
+  if err < tolerance then
+    throw (IO.userError s!"{name}: FAIL (unexpectedly reconstructed, err = {err} < {tolerance})")
+  else
+    IO.println s!"{name}: OK (correctly failed, err = {err})"
+
+/--
+Compiled assertion that two scalars agree to `tolerance`. Used to verify the *exact* residual
+identity numerically: the reconstruction error and the off-diagonal mass it equals are computed by
+independent routines and shown to match, so the identity `symEigJacobi_frobenius_residual` is not a
+tautology of the code.
+-/
+def assertApproxEq (name : String) (a b : Float) (tolerance : Float := tol) : IO Unit :=
+  if Float.abs (a - b) < tolerance then
+    IO.println s!"{name}: OK (lhs = {a}, rhs = {b}, |Δ| = {Float.abs (a - b)})"
+  else
+    throw (IO.userError s!"{name}: FAIL (lhs = {a}, rhs = {b}, |Δ| = {Float.abs (a - b)} ≥ {tolerance})")
+
 end NN.Examples.Factorization
diff --git a/NN/Examples/Factorization/QR.lean b/NN/Examples/Factorization/QR.lean
index 0080de7..b2e549a 100644
--- a/NN/Examples/Factorization/QR.lean
+++ b/NN/Examples/Factorization/QR.lean
@@ -41,4 +41,29 @@ def orthoErr : Float := maxMatErr (mm (tr Q) Q) (Spec.identityTensorSpec 3)
 #eval assertLt "QR A = Q·R" reconErr
 #eval assertLt "QR Qᵀ·Q = I" orthoErr
 
+/-! ## Negative control: full column rank is necessary for orthonormality
+
+`qrSpec_orthonormal` (`Qᵀ Q = 1`) requires full column rank — positive `R`-pivots
+(`0 < R[j,j]`). The matrix below has a dependent column (`col₂ = 2·col₁`), so Gram–Schmidt produces a
+**zero** `Q` column where the pivot vanishes: `A = Q·R` still holds, but `Qᵀ Q` has a `0` on the
+diagonal, so orthonormality fails. This separates the two guarantees and shows the rank hypothesis
+genuinely bites. -/
+
+/-- A rank-2 matrix (`col₂ = 2·col₁`): reconstructs, but `Q` cannot be orthonormal. -/
+def Adef : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) :=
+  mkMat [[1, 2, 0],
+         [2, 4, 1],
+         [1, 2, 0]]
+
+def Qdef : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := Spec.qrQSpec Adef
+def Rdef : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := Spec.qrRSpec Adef
+
+/-- Reconstruction still holds even without full rank. -/
+def reconErrDef : Float := maxMatErr Adef (mm Qdef Rdef)
+/-- Orthonormality fails: `Qᵀ·Q` has a zero diagonal entry, so it is far from `I`. -/
+def orthoErrDef : Float := maxMatErr (mm (tr Qdef) Qdef) (Spec.identityTensorSpec 3)
+
+#eval assertLt "QR(rank-deficient) A = Q·R still reconstructs" reconErrDef
+#eval assertGe "QR(rank-deficient) Qᵀ·Q = I correctly fails (needs full column rank)" orthoErrDef
+
 end NN.Examples.Factorization.QR
diff --git a/NN/Examples/Factorization/SVD.lean b/NN/Examples/Factorization/SVD.lean
index 0b2369c..399b1fb 100644
--- a/NN/Examples/Factorization/SVD.lean
+++ b/NN/Examples/Factorization/SVD.lean
@@ -41,10 +41,28 @@ def V : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := svd.2.2
 
 /-- Reconstruction error `‖A - U·diag(σ)·Vᵀ‖_max`. -/
 def reconErr : Float := maxMatErr A (mm (mm U (diagFromVec σ)) (tr V))
+/-- Orthogonality error `‖Vᵀ·V - I‖_max` for the right singular vectors. -/
+def orthoErrV : Float := maxMatErr (mm (tr V) V) (Spec.identityTensorSpec 3)
 
 #eval vecToList σ
 
--- Compiled assertion (fails the build otherwise).
+-- Compiled assertions (fail the build otherwise).
 #eval assertLt "SVD A = U·diag(σ)·Vᵀ" reconErr
+-- `V` are the eigenvectors of `Aᵀ A` (see `IsSVD.gram_isSymEig`), hence orthogonal a-priori — the
+-- numeric witness of `jacobi_orthogonal` applied to the Gram matrix, even though `σ₃ = 0` (rank 2).
+#eval assertLt "SVD Vᵀ·V = I" orthoErrV
+
+/-! ## Negative control: a wrong factor is rejected
+
+Permuting the singular values (so they no longer pair with their vectors) must break the
+reconstruction — otherwise the `maxMatErr` reconstruction check would be vacuous. -/
+
+/-- A deliberately mismatched singular-value vector (permuted, and nonzero where the true `σ₃ = 0`). -/
+def σbad : Spec.Tensor Float (.dim 3 .scalar) :=
+  Spec.ofVecFn (fun i => ([3.0, 5.0, 1.0] : List Float).getD i.val 0.0)
+/-- Reconstruction with the mismatched `σ` (should be far from `A`). -/
+def reconErrBad : Float := maxMatErr A (mm (mm U (diagFromVec σbad)) (tr V))
+
+#eval assertGe "SVD with permuted σ correctly fails to reconstruct" reconErrBad
 
 end NN.Examples.Factorization.SVD
diff --git a/NN/Examples/Factorization/SymEig.lean b/NN/Examples/Factorization/SymEig.lean
index 5426e07..a896a99 100644
--- a/NN/Examples/Factorization/SymEig.lean
+++ b/NN/Examples/Factorization/SymEig.lean
@@ -14,8 +14,23 @@ meta import NN.Examples.Factorization.Common
 
 `symEigJacobiSpec A sweeps` returns `(eigenvalues, V)` for a symmetric `A`, where the columns of
 `V` are the (orthonormal) eigenvectors. Unlike the power-iteration `eigendecompSpec`, this recovers
-**all** eigenpairs. We check the spectral reconstruction `A = V · diag(λ) · Vᵀ` and orthogonality
-`Vᵀ · V = I`.
+**all** eigenpairs.
+
+These checks are designed to give a reviewer confidence in the matching formal development
+(`NN.Proofs.Tensor.Basic.FactorizationsJacobi`), and in particular to exhibit the precise boundary
+between what is proved *exactly / a-priori* and what is only *asymptotic*:
+
+* **Spectral reconstruction** `A = V · diag(λ) · Vᵀ` and orthogonality `Vᵀ V = I` hold at high sweep
+  counts (positive checks).
+* **Orthogonality is exact at *any* sweep count** — even after a single sweep `Vᵀ V = I` to machine
+  precision. This is the numeric witness of `jacobi_orthogonal`, which is an a-priori theorem (no
+  convergence hypothesis).
+* **Diagonalization is only asymptotic**: one sweep leaves a genuine off-diagonal residual that more
+  sweeps drive to zero. This is the "rate" that remains a-posteriori (`What remains` in the blueprint).
+* **The exact residual certificate** `‖A − V·diag(λ)·Vᵀ‖_F² = ‖offDiag(VᵀAV)‖_F²`
+  (`symEigJacobi_frobenius_residual`) is checked numerically at a *low* sweep count, where both sides
+  are large and equal — the two sides are computed by independent routines, so the match is evidence
+  the identity is real and not a tautology of the code.
 -/
 
 @[expose] public section
@@ -29,24 +44,55 @@ def A : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) :=
          [1, 3, 1],
          [1, 1, 4]]
 
-/-- Eigenvalues (diagonal after Jacobi sweeps) and eigenvector matrix `V`. -/
-def eig : Spec.Tensor Float (.dim 3 .scalar) × Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) :=
+/-- The 3×3 identity (target for the orthogonality checks). -/
+def I3 : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := Spec.identityTensorSpec 3
+
+/-- Eigendecomposition after 8 sweeps (converged) and after 1 sweep (not yet converged). -/
+def eig8 : Spec.Tensor Float (.dim 3 .scalar) × Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) :=
   Spec.symEigJacobiSpec A 8
+def eig1 : Spec.Tensor Float (.dim 3 .scalar) × Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) :=
+  Spec.symEigJacobiSpec A 1
+
+/-- Eigenvalues and eigenvector matrix `V` (columns are eigenvectors) at 8 sweeps. -/
+def evals8 : Spec.Tensor Float (.dim 3 .scalar) := eig8.1
+def V8 : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := eig8.2
+/-- Eigenvector matrix after a single sweep. -/
+def V1 : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := eig1.2
+
+/-- Rotated matrices `Af = Vᵀ A V` after 1 and 8 sweeps (diagonal in the limit). -/
+def Af1 : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := mm (mm (tr V1) A) V1
+def Af8 : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := mm (mm (tr V8) A) V8
+
+/-- Spectral reconstruction error `‖A - V·diag(λ)·Vᵀ‖_max` at 8 sweeps. -/
+def reconErr8 : Float := maxMatErr A (mm (mm V8 (diagFromVec evals8)) (tr V8))
+/-- Orthogonality error `‖Vᵀ·V - I‖_max` at 8 and at 1 sweep. -/
+def orthoErr8 : Float := maxMatErr (mm (tr V8) V8) I3
+def orthoErr1 : Float := maxMatErr (mm (tr V1) V1) I3
+
+/-- Off-diagonal mass of `Af` after 1 and 8 sweeps (the squared reconstruction residual). -/
+def offResid1 : Float := offDiagFrobSq Af1
+def offResid8 : Float := offDiagFrobSq Af8
+
+/-- Reconstruction side of the exact certificate, computed independently at 1 sweep. -/
+def reconFrobSq1 : Float := frobSqErr A (mm (mm V1 (diagFromVec (diagOf Af1))) (tr V1))
+
+#eval vecToList evals8
+#eval IO.println s!"off-diagonal mass: 1 sweep = {offResid1}, 8 sweeps = {offResid8}"
 
-/-- Eigenvalues. -/
-def evals : Spec.Tensor Float (.dim 3 .scalar) := eig.1
-/-- Eigenvector matrix (columns are eigenvectors). -/
-def V : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := eig.2
+-- Positive checks at convergence.
+#eval assertLt "SymEig(8) A = V·diag(λ)·Vᵀ" reconErr8
+#eval assertLt "SymEig(8) Vᵀ·V = I" orthoErr8
 
-/-- Spectral reconstruction error `‖A - V·diag(λ)·Vᵀ‖_max`. -/
-def reconErr : Float := maxMatErr A (mm (mm V (diagFromVec evals)) (tr V))
-/-- Orthogonality error `‖Vᵀ·V - I‖_max`. -/
-def orthoErr : Float := maxMatErr (mm (tr V) V) (Spec.identityTensorSpec 3)
+-- Orthogonality is EXACT after a single sweep (numeric witness of the a-priori `jacobi_orthogonal`).
+#eval assertLt "SymEig(1) Vᵀ·V = I  (orthogonality exact at any sweep count)" orthoErr1
 
-#eval vecToList evals
+-- Diagonalization is only asymptotic: 1 sweep leaves a real residual, 8 sweeps remove it.
+#eval assertGe "SymEig(1) off-diagonal residual is non-negligible" offResid1 0.01
+#eval assertLt "SymEig(8) off-diagonal residual ≈ 0" offResid8
 
--- Compiled assertions (fail the build otherwise).
-#eval assertLt "SymEig A = V·diag(λ)·Vᵀ" reconErr
-#eval assertLt "SymEig Vᵀ·V = I" orthoErr
+-- The EXACT residual certificate `‖A - V·diag(λ)·Vᵀ‖² = ‖offDiag(VᵀAV)‖²`, at a sweep count where
+-- both sides are large — independent computations agree (witness of `symEigJacobi_frobenius_residual`).
+#eval assertApproxEq "SymEig residual certificate ‖A-V·diagΛ·Vᵀ‖² = ‖offDiag(VᵀAV)‖²"
+  reconFrobSq1 offResid1
 
 end NN.Examples.Factorization.SymEig
diff --git a/NN/Proofs/Tensor/Basic.lean b/NN/Proofs/Tensor/Basic.lean
index fb2f877..33ec81a 100644
--- a/NN/Proofs/Tensor/Basic.lean
+++ b/NN/Proofs/Tensor/Basic.lean
@@ -12,6 +12,7 @@ public import NN.Proofs.Tensor.Basic.LinearAlgebra
 public import NN.Proofs.Tensor.Basic.Factorizations
 public import NN.Proofs.Tensor.Basic.FactorizationsReconstruction
 public import NN.Proofs.Tensor.Basic.FactorizationsOrthonormal
+public import NN.Proofs.Tensor.Basic.FactorizationsJacobi
 public import NN.Proofs.Tensor.Basic.BoundsNorms
 public import NN.Proofs.Tensor.Basic.Algebra
 
diff --git a/NN/Proofs/Tensor/Basic/FactorizationsJacobi.lean b/NN/Proofs/Tensor/Basic/FactorizationsJacobi.lean
new file mode 100644
index 0000000..dafec5f
--- /dev/null
+++ b/NN/Proofs/Tensor/Basic/FactorizationsJacobi.lean
@@ -0,0 +1,458 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Proofs.Tensor.Basic.Factorizations
+
+/-!
+# The cyclic Jacobi run is faithful (orthogonality + orthogonal similarity)
+
+The a-posteriori residual certificate in
+[`NN.Proofs.Tensor.Basic.Factorizations`](./Factorizations.lean)
+(`symEig_reconstruction_residual`, `symEig_frobenius_residual`, `isSymEig_of_diagonal`) is stated
+*conditionally*: it assumes the two algebraic premises `Vᵀ V = 1` (the accumulated eigenvector matrix
+is orthogonal) and `A = V · Af · Vᵀ` (the rotated matrix is an orthogonal similarity of the input).
+Both are *exact, finite, a-priori* facts about the executable `Spec.arrJacobiRun` — they need no
+Jacobi convergence theory. This file proves them and thereby discharges the hypotheses, turning the
+certificate into an **unconditional** statement about the real solver output.
+
+The development is a refinement bridge from the strict `Array (Array ℝ)` representation the iteration
+runs over to Mathlib `Matrix (Fin n) (Fin n) ℝ`:
+
+* `toM` reads an array matrix as a `Matrix`; `toM_matMul`/`toM_tr`/`toM_id` show the array operations
+  realise the corresponding matrix operations.
+* `givens_orthogonal` is the one genuinely-new piece: each Givens rotation `arrGivens n p q c s` with
+  `c² + s² = 1` is an orthogonal matrix (`Jᵀ J = 1`).
+* `JacInv` is the loop invariant `Vᵀ V = 1 ∧ A₀ = V · A · Vᵀ`; `jacInv_rotate`/`jacInv_sweep`/
+  `jacInv_run` propagate it through one rotation, one sweep, and the whole run.
+* `jacobi_orthogonal` and `jacobi_similarity` are the discharged premises for the actual
+  `symEigJacobiSpec` output, and `symEigJacobi_*` re-state the residual certificate unconditionally.
+-/
+
+@[expose] public section
+
+namespace Spec.Factorization
+
+open Matrix
+open scoped BigOperators
+
+variable {n : Nat}
+
+/-! ## Reading array matrices as Mathlib matrices -/
+
+/-- Reading position `i` (in bounds) of `Array.ofFn f` returns `f ⟨i, _⟩`. -/
+theorem getD_ofFn {β : Type} (f : Fin n → β) (i : Nat) (hi : i < n) (d : β) :
+    (Array.ofFn f).getD i d = f ⟨i, hi⟩ := by
+  rw [Array.getD_eq_getD_getElem?, Array.getElem?_eq_getElem (by simpa using hi),
+    Option.getD_some, Array.getElem_ofFn]
+
+/-- Reading entry `(i, j)` of a doubly-`ofFn` array matrix returns the underlying function value. -/
+theorem arrGet_ofFn₂ (F : Fin n → Fin n → ℝ) (i j : Fin n) :
+    Spec.arrGet (Array.ofFn (fun a : Fin n => Array.ofFn (fun b : Fin n => F a b))) i.val j.val
+      = F i j := by
+  unfold Spec.arrGet
+  rw [getD_ofFn (fun a : Fin n => Array.ofFn (fun b : Fin n => F a b)) i.val i.isLt #[],
+    getD_ofFn (fun b : Fin n => F ⟨i.val, i.isLt⟩ b) j.val j.isLt 0]
+
+/-- View an `Array (Array ℝ)` as a `Matrix (Fin n) (Fin n) ℝ`. -/
+noncomputable def toM (n : Nat) (M : Array (Array ℝ)) : Matrix (Fin n) (Fin n) ℝ :=
+  Matrix.of (fun i j => Spec.arrGet M i.val j.val)
+
+@[simp] theorem toM_apply (M : Array (Array ℝ)) (i j : Fin n) :
+    toM n M i j = Spec.arrGet M i.val j.val := rfl
+
+/-- The array matrix product realises the matrix product. -/
+theorem toM_matMul (X Y : Array (Array ℝ)) :
+    toM n (Spec.arrMatMul n X Y) = toM n X * toM n Y := by
+  ext i j
+  rw [Matrix.mul_apply]
+  simp only [toM_apply]
+  unfold Spec.arrMatMul
+  rw [arrGet_ofFn₂]
+  exact Spec.finRange_foldl_add_eq_finset_sum
+    (fun k => Spec.arrGet X i.val k.val * Spec.arrGet Y k.val j.val)
+
+/-- The array transpose realises the matrix transpose. -/
+theorem toM_tr (X : Array (Array ℝ)) : toM n (Spec.arrTr n X) = (toM n X)ᵀ := by
+  ext i j
+  rw [Matrix.transpose_apply]
+  simp only [toM_apply]
+  unfold Spec.arrTr
+  rw [arrGet_ofFn₂]
+
+/-- The array identity realises the matrix identity. -/
+theorem toM_id : toM n (Spec.arrId n) = 1 := by
+  ext i j
+  simp only [toM_apply]
+  unfold Spec.arrId
+  rw [arrGet_ofFn₂]
+  by_cases h : i = j
+  · subst h; simp
+  · rw [Matrix.one_apply_ne h]
+    simp [Fin.val_ne_of_ne h]
+
+/-! ## The Givens rotation is orthogonal -/
+
+/-- Entrywise value of the Givens array matrix (boolean conditions). -/
+theorem toM_givens_apply (p q : Nat) (c s : ℝ) (a b : Fin n) :
+    toM n (Spec.arrGivens n p q c s) a b
+      = (if a.val == p && b.val == p then c
+         else if a.val == q && b.val == q then c
+         else if a.val == p && b.val == q then s
+         else if a.val == q && b.val == p then -s
+         else if a.val == b.val then 1 else 0) := by
+  simp only [toM_apply]
+  unfold Spec.arrGivens
+  rw [arrGet_ofFn₂]
+
+/-- Entrywise value of the Givens array matrix (propositional conditions). -/
+theorem toM_givens_apply' (p q : Nat) (c s : ℝ) (a b : Fin n) :
+    toM n (Spec.arrGivens n p q c s) a b
+      = (if a.val = p ∧ b.val = p then c
+         else if a.val = q ∧ b.val = q then c
+         else if a.val = p ∧ b.val = q then s
+         else if a.val = q ∧ b.val = p then -s
+         else if a.val = b.val then 1 else 0) := by
+  rw [toM_givens_apply]
+  simp only [Bool.and_eq_true, beq_iff_eq]
+
+/-- Column `p` of the Givens matrix: `c` at row `p`, `-s` at row `q`, `0` elsewhere. -/
+theorem givens_col_fp (p q : Nat) (hp : p < n) (hq : q < n) (hpq : p ≠ q) (c s : ℝ) (k : Fin n) :
+    toM n (Spec.arrGivens n p q c s) k ⟨p, hp⟩
+      = (if k = ⟨p, hp⟩ then c else if k = ⟨q, hq⟩ then -s else 0) := by
+  rw [toM_givens_apply']
+  by_cases hkp : k.val = p
+  · simp [hkp, Fin.ext_iff]
+  · by_cases hkq : k.val = q
+    · simp [hkq, hpq, Ne.symm hpq, Fin.ext_iff]
+    · simp [hkp, hkq, hpq, Fin.ext_iff]
+
+/-- Column `q` of the Givens matrix: `s` at row `p`, `c` at row `q`, `0` elsewhere. -/
+theorem givens_col_fq (p q : Nat) (hp : p < n) (hq : q < n) (hpq : p ≠ q) (c s : ℝ) (k : Fin n) :
+    toM n (Spec.arrGivens n p q c s) k ⟨q, hq⟩
+      = (if k = ⟨p, hp⟩ then s else if k = ⟨q, hq⟩ then c else 0) := by
+  rw [toM_givens_apply']
+  by_cases hkp : k.val = p
+  · simp [hkp, hpq, Ne.symm hpq, Fin.ext_iff]
+  · by_cases hkq : k.val = q
+    · simp [hkq, Ne.symm hpq, Fin.ext_iff]
+    · simp [hkp, hkq, Ne.symm hpq, Fin.ext_iff]
+
+/-- Any other column `o ∉ {p, q}` of the Givens matrix is the `o`-th standard basis vector. -/
+theorem givens_col_other (p q : Nat) (c s : ℝ) (o k : Fin n)
+    (hop : o.val ≠ p) (hoq : o.val ≠ q) :
+    toM n (Spec.arrGivens n p q c s) k o = (if k = o then 1 else 0) := by
+  rw [toM_givens_apply']
+  by_cases hko : k = o
+  · simp [hko, hop, hoq]
+  · simp [hop, hoq, hko, Fin.val_ne_of_ne hko]
+
+/-- A sum of products of two indicator functions is the Kronecker delta. -/
+private theorem sum_ite_mul_ite (i j : Fin n) :
+    ∑ k : Fin n, (if k = i then (1 : ℝ) else 0) * (if k = j then 1 else 0)
+      = if i = j then 1 else 0 := by
+  by_cases hij : i = j
+  · subst hij
+    have hterm : ∀ k : Fin n,
+        (if k = i then (1 : ℝ) else 0) * (if k = i then 1 else 0) = if k = i then 1 else 0 :=
+      fun k => by by_cases hk : k = i <;> simp [hk]
+    rw [if_pos rfl, Finset.sum_congr rfl (fun k _ => hterm k), Finset.sum_ite_eq']
+    simp
+  · rw [if_neg hij]
+    refine Finset.sum_eq_zero (fun k _ => ?_)
+    by_cases hki : k = i
+    · subst hki; simp [hij]
+    · rw [if_neg hki, zero_mul]
+
+/-- A sum of products of two functions each supported on `{fp, fq}` (with `fp ≠ fq`). -/
+private theorem sum_two_supp (fp fq : Fin n) (hfpq : fp ≠ fq) (x1 y1 x2 y2 : ℝ) :
+    ∑ k : Fin n, (if k = fp then x1 else if k = fq then y1 else 0)
+                 * (if k = fp then x2 else if k = fq then y2 else 0)
+      = x1 * x2 + y1 * y2 := by
+  have hterm : ∀ k : Fin n,
+      (if k = fp then x1 else if k = fq then y1 else 0)
+        * (if k = fp then x2 else if k = fq then y2 else 0)
+        = (if k = fp then x1 * x2 else 0) + (if k = fq then y1 * y2 else 0) := by
+    intro k
+    by_cases hkp : k = fp
+    · subst hkp; simp [hfpq]
+    · by_cases hkq : k = fq
+      · subst hkq; simp [hkp]
+      · simp [hkp, hkq]
+  rw [Finset.sum_congr rfl (fun k _ => hterm k), Finset.sum_add_distrib,
+    Finset.sum_ite_eq', Finset.sum_ite_eq']
+  simp
+
+/-- A function supported on `{fp, fq}` times an indicator at `o ∉ {fp, fq}` sums to zero. -/
+private theorem sum_two_supp_mul_ite (fp fq o : Fin n) (hop : o ≠ fp) (hoq : o ≠ fq) (x1 y1 : ℝ) :
+    ∑ k : Fin n, (if k = fp then x1 else if k = fq then y1 else 0) * (if k = o then (1 : ℝ) else 0)
+      = 0 := by
+  refine Finset.sum_eq_zero (fun k _ => ?_)
+  by_cases hko : k = o
+  · subst hko; rw [if_neg hop, if_neg hoq, zero_mul]
+  · rw [if_neg hko, mul_zero]
+
+/-- **Givens rotation is orthogonal.** For `c² + s² = 1` and `p ≠ q`, `Jᵀ J = 1`. -/
+theorem givens_orthogonal (p q : Nat) (hp : p < n) (hq : q < n) (hpq : p ≠ q) (c s : ℝ)
+    (hcs : c ^ 2 + s ^ 2 = 1) :
+    (toM n (Spec.arrGivens n p q c s))ᵀ * toM n (Spec.arrGivens n p q c s) = 1 := by
+  have hfpq : (⟨p, hp⟩ : Fin n) ≠ ⟨q, hq⟩ := fun h => hpq (Fin.ext_iff.mp h)
+  ext i j
+  rw [Matrix.mul_apply, Matrix.one_apply]
+  simp only [Matrix.transpose_apply]
+  by_cases hip : i = ⟨p, hp⟩
+  · subst hip
+    by_cases hjp : j = ⟨p, hp⟩
+    · -- (p, p)
+      subst hjp
+      rw [Finset.sum_congr rfl (fun k _ => by rw [givens_col_fp p q hp hq hpq c s k]),
+        sum_two_supp _ _ hfpq c (-s) c (-s), if_pos rfl]
+      nlinarith [hcs]
+    · by_cases hjq : j = ⟨q, hq⟩
+      · -- (p, q)
+        subst hjq
+        rw [Finset.sum_congr rfl (fun k _ => by
+            rw [givens_col_fp p q hp hq hpq c s k, givens_col_fq p q hp hq hpq c s k]),
+          sum_two_supp _ _ hfpq c (-s) s c, if_neg hfpq]
+        ring
+      · -- (p, other)
+        have hjp' : j.val ≠ p := fun h => hjp (Fin.ext h)
+        have hjq' : j.val ≠ q := fun h => hjq (Fin.ext h)
+        rw [Finset.sum_congr rfl (fun k _ => by
+            rw [givens_col_fp p q hp hq hpq c s k, givens_col_other p q c s j k hjp' hjq']),
+          sum_two_supp_mul_ite _ _ j hjp hjq c (-s), if_neg (Ne.symm hjp)]
+  · by_cases hiq : i = ⟨q, hq⟩
+    · subst hiq
+      by_cases hjp : j = ⟨p, hp⟩
+      · -- (q, p)
+        subst hjp
+        rw [Finset.sum_congr rfl (fun k _ => by
+            rw [givens_col_fq p q hp hq hpq c s k, givens_col_fp p q hp hq hpq c s k]),
+          sum_two_supp _ _ hfpq s c c (-s), if_neg (Ne.symm hfpq)]
+        ring
+      · by_cases hjq : j = ⟨q, hq⟩
+        · -- (q, q)
+          subst hjq
+          rw [Finset.sum_congr rfl (fun k _ => by rw [givens_col_fq p q hp hq hpq c s k]),
+            sum_two_supp _ _ hfpq s c s c, if_pos rfl]
+          nlinarith [hcs]
+        · -- (q, other)
+          have hjp' : j.val ≠ p := fun h => hjp (Fin.ext h)
+          have hjq' : j.val ≠ q := fun h => hjq (Fin.ext h)
+          rw [Finset.sum_congr rfl (fun k _ => by
+              rw [givens_col_fq p q hp hq hpq c s k, givens_col_other p q c s j k hjp' hjq']),
+            sum_two_supp_mul_ite _ _ j hjp hjq s c, if_neg (Ne.symm hjq)]
+    · -- i other
+      have hip' : i.val ≠ p := fun h => hip (Fin.ext h)
+      have hiq' : i.val ≠ q := fun h => hiq (Fin.ext h)
+      by_cases hjp : j = ⟨p, hp⟩
+      · -- (other, p)
+        subst hjp
+        rw [Finset.sum_congr rfl (fun k _ => by
+            rw [givens_col_other p q c s i k hip' hiq', givens_col_fp p q hp hq hpq c s k,
+              mul_comm]),
+          sum_two_supp_mul_ite _ _ i hip hiq c (-s), if_neg hip]
+      · by_cases hjq : j = ⟨q, hq⟩
+        · -- (other, q)
+          subst hjq
+          rw [Finset.sum_congr rfl (fun k _ => by
+              rw [givens_col_other p q c s i k hip' hiq', givens_col_fq p q hp hq hpq c s k,
+                mul_comm]),
+            sum_two_supp_mul_ite _ _ i hip hiq s c, if_neg hiq]
+        · -- (other, other)
+          have hjp' : j.val ≠ p := fun h => hjp (Fin.ext h)
+          have hjq' : j.val ≠ q := fun h => hjq (Fin.ext h)
+          rw [Finset.sum_congr rfl (fun k _ => by
+              rw [givens_col_other p q c s i k hip' hiq', givens_col_other p q c s j k hjp' hjq']),
+            sum_ite_mul_ite]
+
+/-- The Golub–Van Loan rotation parameters the implementation uses satisfy `c² + s² = 1` for any
+intermediate value `t`: this is `givens_normSq` with `MathFunctions.sqrt = Real.sqrt` and `t·t = t²`. -/
+theorem code_givens_normSq (t : ℝ) :
+    (1 / MathFunctions.sqrt (1 + t * t)) ^ 2 + (t * (1 / MathFunctions.sqrt (1 + t * t))) ^ 2 = 1 := by
+  have h1 : MathFunctions.sqrt (1 + t * t) = Real.sqrt (1 + t ^ 2) := by
+    rw [show (1 : ℝ) + t * t = 1 + t ^ 2 from by ring]; rfl
+  rw [h1]
+  exact givens_normSq t
+
+/-! ## The loop invariant -/
+
+/-- The Jacobi loop invariant relative to the input `A₀`: the running `V` is orthogonal and the
+running pair `(A, V)` satisfies the orthogonal-similarity identity `A₀ = V · A · Vᵀ`. -/
+def JacInv (A0 : Matrix (Fin n) (Fin n) ℝ) (st : Array (Array ℝ) × Array (Array ℝ)) : Prop :=
+  (toM n st.2)ᵀ * toM n st.2 = 1 ∧ A0 = toM n st.2 * toM n st.1 * (toM n st.2)ᵀ
+
+/-- One orthogonal-similarity update by an orthogonal `J` preserves the invariant. -/
+theorem jacInv_step {A0 : Matrix (Fin n) (Fin n) ℝ} {A V J : Array (Array ℝ)}
+    (hJ : (toM n J)ᵀ * toM n J = 1) (h : JacInv A0 (A, V)) :
+    JacInv A0 (Spec.arrMatMul n (Spec.arrTr n J) (Spec.arrMatMul n A J), Spec.arrMatMul n V J) := by
+  obtain ⟨hVo, hsim⟩ := h
+  simp only [JacInv] at hVo hsim ⊢
+  have hJJ : toM n J * (toM n J)ᵀ = 1 := mul_eq_one_comm.mp hJ
+  refine ⟨?_, ?_⟩
+  · rw [toM_matMul, Matrix.transpose_mul]
+    calc (toM n J)ᵀ * (toM n V)ᵀ * (toM n V * toM n J)
+        = (toM n J)ᵀ * ((toM n V)ᵀ * toM n V) * toM n J := by
+          simp only [Matrix.mul_assoc]
+      _ = (toM n J)ᵀ * toM n J := by rw [hVo, Matrix.mul_one]
+      _ = 1 := hJ
+  · simp only [toM_matMul, toM_tr, Matrix.transpose_mul]
+    rw [hsim]
+    have e1 : (toM n V * toM n J) * ((toM n J)ᵀ * (toM n A * toM n J)) * ((toM n J)ᵀ * (toM n V)ᵀ)
+        = toM n V * (toM n J * (toM n J)ᵀ) * toM n A * (toM n J * (toM n J)ᵀ) * (toM n V)ᵀ := by
+      simp only [Matrix.mul_assoc]
+    rw [e1, hJJ]
+    simp only [Matrix.mul_one, Matrix.mul_assoc]
+
+/-- One Jacobi rotation preserves the invariant (the parameters always give an orthogonal `J`, and
+the no-op branch is trivial). -/
+theorem jacInv_rotate {A0 : Matrix (Fin n) (Fin n) ℝ} (p q : Nat) (hp : p < n) (hq : q < n)
+    (hpq : p ≠ q) {st : Array (Array ℝ) × Array (Array ℝ)} (h : JacInv A0 st) :
+    JacInv A0 (Spec.arrJacobiRotate n st.1 st.2 p q) := by
+  unfold Spec.arrJacobiRotate
+  extract_lets apq
+  split
+  · exact jacInv_step (givens_orthogonal p q hp hq hpq _ _ (code_givens_normSq _)) h
+  · exact h
+
+/-- Every pair produced by `jacobiPairs n` has `p < q < n`. -/
+theorem jacobiPairs_spec {pq : Nat × Nat} (h : pq ∈ Spec.jacobiPairs n) :
+    pq.1 < pq.2 ∧ pq.2 < n := by
+  unfold Spec.jacobiPairs at h
+  simp only [List.mem_flatMap, List.mem_filterMap, List.mem_range] at h
+  obtain ⟨p, _, q, hq, hcond⟩ := h
+  split at hcond
+  · rename_i hlt
+    simp only [Option.some.injEq] at hcond
+    rw [← hcond]
+    exact ⟨hlt, hq⟩
+  · simp at hcond
+
+/-- One Jacobi sweep preserves the invariant. -/
+theorem jacInv_sweep {A0 : Matrix (Fin n) (Fin n) ℝ} {st : Array (Array ℝ) × Array (Array ℝ)}
+    (h : JacInv A0 st) : JacInv A0 (Spec.arrJacobiSweep n st) := by
+  unfold Spec.arrJacobiSweep
+  refine List.foldlRecOn _ _ h ?_
+  intro b hb pq hmem
+  obtain ⟨hlt, hqn⟩ := jacobiPairs_spec hmem
+  exact jacInv_rotate pq.1 pq.2 (Nat.lt_trans hlt hqn) hqn (Nat.ne_of_lt hlt) hb
+
+/-- **The whole Jacobi run preserves the invariant.** Starting from `(A, I)`, after any number of
+sweeps the accumulated `V` is orthogonal and `toM A = V · Af · Vᵀ`. -/
+theorem jacInv_run (A : Array (Array ℝ)) (sweeps : Nat) :
+    JacInv (toM n A) (Spec.arrJacobiRun n A sweeps) := by
+  unfold Spec.arrJacobiRun
+  refine List.foldlRecOn _ _ ?_ ?_
+  · refine ⟨?_, ?_⟩
+    · show (toM n (Spec.arrId n))ᵀ * toM n (Spec.arrId n) = 1
+      rw [toM_id]; simp
+    · show toM n A = toM n (Spec.arrId n) * toM n A * (toM n (Spec.arrId n))ᵀ
+      rw [toM_id]; simp
+  · intro b hb _ _
+    exact jacInv_sweep hb
+
+/-! ## Discharging the residual-certificate hypotheses for the real solver output -/
+
+/-- View of the input tensor `A` as a `Matrix`. -/
+noncomputable def inputMat (A : Spec.Tensor ℝ (.dim n (.dim n .scalar))) :
+    Matrix (Fin n) (Fin n) ℝ :=
+  Matrix.of (Spec.toMatFn A)
+
+/-- The eigenvector matrix `V` produced by the Jacobi run on `A` (columns are the eigenvectors). -/
+noncomputable def jacobiV (A : Spec.Tensor ℝ (.dim n (.dim n .scalar))) (sweeps : Nat) :
+    Matrix (Fin n) (Fin n) ℝ :=
+  toM n (Spec.arrJacobiRun n (Spec.matToArr (Spec.toMatFn A)) sweeps).2
+
+/-- The rotated matrix `Af = Vᵀ A V` produced by the Jacobi run (diagonal in the zero-residual
+limit; its diagonal holds the eigenvalues). -/
+noncomputable def jacobiAf (A : Spec.Tensor ℝ (.dim n (.dim n .scalar))) (sweeps : Nat) :
+    Matrix (Fin n) (Fin n) ℝ :=
+  toM n (Spec.arrJacobiRun n (Spec.matToArr (Spec.toMatFn A)) sweeps).1
+
+/-- `toM` of the materialised input function is the input matrix. -/
+theorem toM_matToArr (X : Fin n → Fin n → ℝ) : toM n (Spec.matToArr X) = Matrix.of X := by
+  ext i j
+  simp only [toM_apply, Matrix.of_apply]
+  unfold Spec.matToArr
+  rw [arrGet_ofFn₂]
+
+/-- **Discharged premise 1 — orthogonality.** The eigenvector matrix the Jacobi solver returns is
+orthogonal, with no convergence hypothesis. -/
+theorem jacobi_orthogonal (A : Spec.Tensor ℝ (.dim n (.dim n .scalar))) (sweeps : Nat) :
+    (jacobiV A sweeps)ᵀ * jacobiV A sweeps = 1 :=
+  (jacInv_run (Spec.matToArr (Spec.toMatFn A)) sweeps).1
+
+/-- **Discharged premise 2 — orthogonal similarity.** The input equals `V · Af · Vᵀ` exactly, with no
+convergence hypothesis. -/
+theorem jacobi_similarity (A : Spec.Tensor ℝ (.dim n (.dim n .scalar))) (sweeps : Nat) :
+    inputMat A = jacobiV A sweeps * jacobiAf A sweeps * (jacobiV A sweeps)ᵀ := by
+  have h := (jacInv_run (n := n) (Spec.matToArr (Spec.toMatFn A)) sweeps).2
+  rw [toM_matToArr] at h
+  exact h
+
+/-- **Unconditional residual identity.** Reconstructing with the diagonal of `Af` leaves exactly the
+orthogonal conjugation of `Af`'s off-diagonal part — now stated about the real `symEigJacobiSpec`
+output rather than under a hypothesis. -/
+theorem symEigJacobi_reconstruction_residual (A : Spec.Tensor ℝ (.dim n (.dim n .scalar)))
+    (sweeps : Nat) :
+    inputMat A
+        - jacobiV A sweeps * Matrix.diagonal (fun i => jacobiAf A sweeps i i) * (jacobiV A sweeps)ᵀ
+      = jacobiV A sweeps * offDiagonal (jacobiAf A sweeps) * (jacobiV A sweeps)ᵀ :=
+  symEig_reconstruction_residual (jacobi_similarity A sweeps)
+
+/-- **Unconditional Frobenius residual certificate.** The squared reconstruction error equals the
+squared off-diagonal mass of `Af` — unconditionally for the real solver output. -/
+theorem symEigJacobi_frobenius_residual (A : Spec.Tensor ℝ (.dim n (.dim n .scalar)))
+    (sweeps : Nat) :
+    ((inputMat A
+          - jacobiV A sweeps * Matrix.diagonal (fun i => jacobiAf A sweeps i i)
+            * (jacobiV A sweeps)ᵀ)ᵀ
+        * (inputMat A
+          - jacobiV A sweeps * Matrix.diagonal (fun i => jacobiAf A sweeps i i)
+            * (jacobiV A sweeps)ᵀ)).trace
+      = ((offDiagonal (jacobiAf A sweeps))ᵀ * offDiagonal (jacobiAf A sweeps)).trace :=
+  symEig_frobenius_residual (jacobi_orthogonal A sweeps) (jacobi_similarity A sweeps)
+
+/-- **Unconditional correctness in the zero-residual limit.** When the rotated matrix is diagonal,
+the solver output is an exact symmetric eigendecomposition of the input — no hypotheses beyond
+diagonality. -/
+theorem symEigJacobi_isSymEig_of_diagonal (A : Spec.Tensor ℝ (.dim n (.dim n .scalar)))
+    (sweeps : Nat)
+    (hdiag : jacobiAf A sweeps = Matrix.diagonal (fun i => jacobiAf A sweeps i i)) :
+    IsSymEig (inputMat A) (fun i => jacobiAf A sweeps i i) (jacobiV A sweeps) :=
+  isSymEig_of_diagonal (jacobi_orthogonal A sweeps) (jacobi_similarity A sweeps) hdiag
+
+/-- The eigenvector matrix read back from the public `symEigJacobiSpec` output is `jacobiV`, so the
+theorems above are statements about the actual returned `V`. -/
+theorem symEigJacobiSpec_V_eq (A : Spec.Tensor ℝ (.dim n (.dim n .scalar))) (sweeps : Nat) :
+    Matrix.of (fun i j => Spec.get2 (Spec.symEigJacobiSpec A sweeps).2 i j) = jacobiV A sweeps :=
+  rfl
+
+/-! ## Example: the residual certificate is now unconditional
+
+`symEig_frobenius_residual` and `isSymEig_of_diagonal` used to *take* `Vᵀ V = 1` and
+`A = V · Af · Vᵀ` as hypotheses. For the real `symEigJacobiSpec` output those are now theorems
+(`jacobi_orthogonal`, `jacobi_similarity`), so the certificate follows from the input and sweep
+count alone — no premises to discharge at the call site. -/
+
+/-- The Frobenius residual identity for a `3×3` Jacobi run with `8` sweeps, with no hypotheses. -/
+example (A : Spec.Tensor ℝ (.dim 3 (.dim 3 .scalar))) :
+    ((inputMat A
+          - jacobiV A 8 * Matrix.diagonal (fun i => jacobiAf A 8 i i) * (jacobiV A 8)ᵀ)ᵀ
+        * (inputMat A
+          - jacobiV A 8 * Matrix.diagonal (fun i => jacobiAf A 8 i i) * (jacobiV A 8)ᵀ)).trace
+      = ((offDiagonal (jacobiAf A 8))ᵀ * offDiagonal (jacobiAf A 8)).trace :=
+  symEigJacobi_frobenius_residual A 8
+
+/-- In the zero-residual limit the output is a genuine eigendecomposition; the only hypothesis is
+diagonality of the rotated matrix — orthogonality and the orthogonal similarity come for free. -/
+example (A : Spec.Tensor ℝ (.dim 3 (.dim 3 .scalar)))
+    (h : jacobiAf A 8 = Matrix.diagonal (fun i => jacobiAf A 8 i i)) :
+    IsSymEig (inputMat A) (fun i => jacobiAf A 8 i i) (jacobiV A 8) :=
+  symEigJacobi_isSymEig_of_diagonal A 8 h
+
+end Spec.Factorization
diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
index d8e51b5..de1522d 100644
--- a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
+++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
@@ -128,6 +128,36 @@ zero-residual limit, `isSymEig_of_diagonal` shows the solver output `(diag Af, V
 `NN/Examples/Factorization` are concrete instances of this certificate: they bound the off-diagonal
 mass on specific matrices.
 
+# Faithfulness of the Jacobi run: orthogonality and orthogonal similarity
+
+The three certificate theorems above are stated *conditionally* — they take the orthogonality
+`Vᵀ V = 1` and the orthogonal-similarity identity `A = V · Af · Vᵀ` as hypotheses. Both are
+*exact, finite, a-priori* facts about the executable `arrJacobiRun`, needing no convergence theory,
+and
+[`NN.Proofs.Tensor.Basic.FactorizationsJacobi`](https://github.com/lean-dojo/TorchLean/blob/main/NN/Proofs/Tensor/Basic/FactorizationsJacobi.lean)
+proves them, discharging the hypotheses for the real solver output.
+
+The development bridges the strict `Array (Array ℝ)` representation the loop runs over to Mathlib
+`Matrix` via `toM`, with `toM_matMul`/`toM_tr`/`toM_id` showing the array operations realise the
+matrix ones. The single genuinely-new ingredient is `givens_orthogonal`: each rotation
+`arrGivens n p q c s` with `c² + s² = 1` is an orthogonal matrix (`Jᵀ J = 1`), proved by reducing the
+column dot products to the `c² + s² = 1` identity (`givens_normSq`) for the diagonal blocks and to
+orthogonality of distinct standard basis vectors elsewhere. From it, the loop invariant
+`JacInv A₀ (A, V) := Vᵀ V = 1 ∧ A₀ = V · A · Vᵀ` is preserved by one rotation (`jacInv_rotate` — the
+no-op branch trivially, the rotating branch because conjugating by an orthogonal `J` cancels in
+`J Jᵀ = 1`), hence by a whole sweep (`jacInv_sweep`, a `List.foldlRecOn` over `jacobiPairs`) and the
+whole run (`jacInv_run`, starting from `(A, I)` where the invariant is immediate).
+
+Specialised to the `symEigJacobiSpec` output, this gives the two premises as theorems with no
+hypotheses: `jacobi_orthogonal` (`Vᵀ V = 1`) and `jacobi_similarity` (`A = V · Af · Vᵀ`).
+Feeding them into the certificate yields the *unconditional* restatements
+`symEigJacobi_reconstruction_residual`, `symEigJacobi_frobenius_residual`, and
+`symEigJacobi_isSymEig_of_diagonal`: the residual identity and the zero-residual-limit correctness now
+hold for the actual returned `(Λ, V)` outright. So the returned `V` is a genuine orthogonal matrix and
+`Af` a genuine orthogonal similarity of the input *regardless of how far the sweeps have converged* —
+the only thing the residual certificate still defers to runtime is the *size* of the off-diagonal
+mass, never the algebraic faithfulness of the decomposition.
+
 # Exact QR reconstruction
 
 The QR factorization admits the same treatment. `qr_mul_eq` (in the same file) proves that for an
@@ -166,8 +196,14 @@ into a future Mathlib matrix-level QR contribution.
 
 # What remains
 
-With Cholesky and QR fully reconstructed (`A = L · Lᵀ`, `A = Q · R`, `Qᵀ Q = 1`), the only properties
-not available as a-priori theorems are the *iterative* ones: full diagonalization for the cyclic Jacobi
-eigensolver and the SVD built on it. Mathlib v4.30.0 has no Jacobi convergence theory, so those remain
-captured by the exact a-posteriori residual certificate above, never by `sorry`. The specification-level
-facts the kernel methods rely on are independent of that step, so the CHD foundation is complete.
+With Cholesky and QR fully reconstructed (`A = L · Lᵀ`, `A = Q · R`, `Qᵀ Q = 1`), and the Jacobi run
+now proved faithful — `V` orthogonal and `A = V · Af · Vᵀ` exactly, so the residual certificate holds
+*unconditionally* for the real solver output — the single property still not available as an a-priori
+theorem is the *rate*: that finitely many cyclic-Jacobi sweeps drive `Af`'s off-diagonal mass to zero.
+That is the research-grade Forsythe–Henrici / Schönhage convergence result for cyclic (rather than
+classical, largest-pivot) Jacobi, and Mathlib v4.30.0 has no Jacobi convergence theory, so it remains
+captured by the exact a-posteriori residual certificate above — bounded numerically by the `assertLt`
+checks on concrete inputs — never by `sorry`. Everything else is exact: the algebraic faithfulness of
+the decomposition (orthogonality, orthogonal similarity, the residual identity, and correctness in the
+zero-residual limit) is proved, and the specification-level facts the kernel methods rely on are
+independent of the convergence step, so the CHD foundation is complete.

From dad1aa54aedcd57a07b6bb6f3403bdf607a3c9b4 Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Sun, 31 May 2026 08:27:52 -0700
Subject: [PATCH 07/22] Prove per-rotation Jacobi off-diagonal decrease (Tier
 2) + reviewer examples
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add the classical Jacobi progress identity as an exact, finite theorem over ℝ:
conjugating a symmetric A by the Givens rotation that annihilates pivot (p,q)
drops the squared off-diagonal mass by exactly 2·A[p,q]²
(`jacobi_off_decrease`).

New: NN/Proofs/Tensor/Basic/FactorizationsJacobiDecrease.lean
- frobSq/diagSq/offSq mass machinery + frobSq_eq_diagSq_add_offSq.
- frobSq_orthogonal_conj: orthogonal similarity preserves total Frobenius mass,
  so driving the off-diagonal down ≡ driving the diagonal up.
- givens_conj_pp/qq/pq/other: explicit conjugation entries via bilinear support
  lemmas; the 2×2 block-Frobenius identity is frobSq_orthogonal_conj specialised
  to Fin 2 (no hand-tuned linear_combination coefficients).
- jacobi_off_decrease: the per-rotation decrease, under symmetry + the pivot
  annihilation (the defining equation the Golub–Van Loan angle solves; the
  explicit pivot is givens_conj_pq).

Examples: NN/Examples/Factorization/JacobiDecrease.lean (+ Common helpers)
- Positive: one rotation takes off-diagonal mass 6 → 4 = 6 − 2·1², pivot
  annihilated to 0, total mass conserved at 35 — all to |Δ| = 0.
- Negative controls: a wrong-angle (orthogonal) rotation misses the decrease;
  a non-orthogonal conjugation breaks Frobenius-mass invariance.

Blueprint: new "Per-rotation progress" section; "What remains" narrowed to the
aggregate cyclic rate (Forsythe–Henrici/Schönhage), since per-rotation progress
is now exact.

Sorry-free; builds green (NN.Proofs.Tensor.Basic, NN.Examples.Factorization,
blueprint); repo_lint clean on all new/changed files.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 NN/Examples/Factorization.lean                |   4 +
 NN/Examples/Factorization/Common.lean         |  28 ++
 NN/Examples/Factorization/JacobiDecrease.lean | 111 ++++++
 NN/Proofs/Tensor/Basic.lean                   |   1 +
 .../Basic/FactorizationsJacobiDecrease.lean   | 316 ++++++++++++++++++
 .../Ch4_Verification/Factorizations.lean      |  54 ++-
 6 files changed, 503 insertions(+), 11 deletions(-)
 create mode 100644 NN/Examples/Factorization/JacobiDecrease.lean
 create mode 100644 NN/Proofs/Tensor/Basic/FactorizationsJacobiDecrease.lean

diff --git a/NN/Examples/Factorization.lean b/NN/Examples/Factorization.lean
index 5bd7a28..0dc3a6d 100644
--- a/NN/Examples/Factorization.lean
+++ b/NN/Examples/Factorization.lean
@@ -11,6 +11,7 @@ public import NN.Examples.Factorization.Cholesky
 public import NN.Examples.Factorization.QR
 public import NN.Examples.Factorization.SymEig
 public import NN.Examples.Factorization.SVD
+public import NN.Examples.Factorization.JacobiDecrease
 
 /-!
 # Matrix factorization examples
@@ -30,6 +31,9 @@ factorization misbehaves.
   verified numerically.
 - `SVD`      — `A = U · diag(σ) · Vᵀ`, `Vᵀ V = I`; **negative control**: a permuted `σ` fails to
   reconstruct.
+- `JacobiDecrease` — the per-rotation progress identity `‖offDiag(Jᵀ A J)‖² = ‖offDiag A‖² − 2·A[p,q]²`
+  (`jacobi_off_decrease`) and Frobenius-mass invariance; **negative controls**: a wrong-angle rotation
+  misses the decrease, a non-orthogonal one breaks mass invariance.
 
 Both **positive** checks (a valid factorization reconstructs to `err ≈ 0`) and **negative controls**
 (the same metric reports a large error / `NaN` when a hypothesis is violated) are included, so a
diff --git a/NN/Examples/Factorization/Common.lean b/NN/Examples/Factorization/Common.lean
index a5d2074..94a6d4a 100644
--- a/NN/Examples/Factorization/Common.lean
+++ b/NN/Examples/Factorization/Common.lean
@@ -71,6 +71,34 @@ def offDiagFrobSq {n : Nat} (M : Spec.Tensor Float (.dim n (.dim n .scalar))) :
     (List.finRange n).foldl
       (fun a j => if i.val == j.val then a else let x := Spec.get2 M i j; a + x * x) acc) 0.0
 
+/-- Total squared Frobenius mass `Σ_{i,j} M_ij²` of a square matrix (off-diagonal + diagonal mass). -/
+def totalFrobSq {n : Nat} (M : Spec.Tensor Float (.dim n (.dim n .scalar))) : Float :=
+  (List.finRange n).foldl (fun acc i =>
+    (List.finRange n).foldl
+      (fun a j => let x := Spec.get2 M i j; a + x * x) acc) 0.0
+
+/-- View a square `Float` matrix tensor as a strict array matrix (the representation the Jacobi
+iteration runs over). -/
+def arrOfMat {n : Nat} (A : Spec.Tensor Float (.dim n (.dim n .scalar))) : Array (Array Float) :=
+  Spec.matToArr (Spec.toMatFn A)
+
+/-- Read a strict array matrix back as a square `Float` matrix tensor. -/
+def matOfArr {n : Nat} (M : Array (Array Float)) : Spec.Tensor Float (.dim n (.dim n .scalar)) :=
+  Spec.ofMatFn (fun i j => Spec.arrGet M i.val j.val)
+
+/-- Apply the **annihilating** Jacobi rotation at pivot `(p, q)`: returns `A' = Jᵀ A J` for the
+Givens rotation whose angle zeroes `A'[p,q]` (the rotation the solver actually performs). -/
+def jacobiRotateAt {n : Nat} (A : Spec.Tensor Float (.dim n (.dim n .scalar))) (p q : Nat) :
+    Spec.Tensor Float (.dim n (.dim n .scalar)) :=
+  matOfArr (Spec.arrJacobiRotate n (arrOfMat A) (Spec.arrId n) p q).1
+
+/-- Apply an **arbitrary** Givens conjugation `A' = Jᵀ A J` with caller-chosen `(c, s)` at `(p, q)`
+(not necessarily the annihilating angle, nor even orthogonal). Used for negative controls. -/
+def givensConjAt {n : Nat} (A : Spec.Tensor Float (.dim n (.dim n .scalar))) (p q : Nat)
+    (c s : Float) : Spec.Tensor Float (.dim n (.dim n .scalar)) :=
+  let J := Spec.arrGivens n p q c s
+  matOfArr (Spec.arrMatMul n (Spec.arrTr n J) (Spec.arrMatMul n (arrOfMat A) J))
+
 /-- Shared tolerance for reconstruction-error assertions. -/
 def tol : Float := 1e-6
 
diff --git a/NN/Examples/Factorization/JacobiDecrease.lean b/NN/Examples/Factorization/JacobiDecrease.lean
new file mode 100644
index 0000000..763aa18
--- /dev/null
+++ b/NN/Examples/Factorization/JacobiDecrease.lean
@@ -0,0 +1,111 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Examples.Factorization.Common
+meta import NN.Examples.Factorization.Common
+
+/-!
+# Example: the cyclic Jacobi sweep makes progress (per-rotation off-diagonal decrease)
+
+These checks corroborate the **Tier 2** development in
+`NN.Proofs.Tensor.Basic.FactorizationsJacobiDecrease`: the exact per-rotation identity behind Jacobi
+convergence. For a symmetric `A`, conjugating by the Givens rotation that annihilates the pivot
+`(p, q)` removes exactly `2 · A[p,q]²` of squared off-diagonal mass:
+
+`‖offDiag(Jᵀ A J)‖² = ‖offDiag A‖² − 2 · A[p,q]²`     (`jacobi_off_decrease`)
+
+while preserving the total Frobenius mass `‖A‖²` (`frobSq_orthogonal_conj`).
+
+The checks exhibit both halves of the theorem, *and* its hypotheses biting (negative controls):
+
+* **Positive — exact decrease.** One annihilating rotation drops the off-diagonal mass by precisely
+  `2 · A[p,q]²` (independent computations of the two sides agree).
+* **Positive — pivot annihilated.** The rotated `A'[p,q]` is `≈ 0` (the defining property of the
+  angle; this is the `hannih` hypothesis holding on the concrete matrix).
+* **Positive — Frobenius mass preserved.** `‖A'‖² = ‖A‖²`: the orthogonal similarity moves mass from
+  the off-diagonal *onto the diagonal* without creating or destroying any.
+* **Negative — the angle matters.** A *wrong-angle* (but still orthogonal) Givens rotation fails to
+  achieve the `2 · A[p,q]²` decrease: the annihilation hypothesis `hannih` is genuinely needed.
+* **Negative — orthogonality matters.** A *non-orthogonal* conjugation (`c² + s² ≠ 1`) does **not**
+  preserve `‖A‖²`, so it is not a similarity and the whole argument collapses without
+  `givens_orthogonal`.
+-/
+
+@[expose] public section
+
+
+namespace NN.Examples.Factorization.JacobiDecrease
+
+/-- A symmetric `3×3` test matrix; the `(0,1)` pivot is `A[0,1] = 1`, so the predicted off-diagonal
+drop from annihilating it is `2 · 1² = 2`. -/
+def A : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) :=
+  mkMat [[2, 1, 1],
+         [1, 3, 1],
+         [1, 1, 4]]
+
+/-- The pivot we annihilate. -/
+def p : Nat := 0
+def q : Nat := 1
+
+/-- `A' = Jᵀ A J` after the annihilating rotation at `(0,1)`. -/
+def A' : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := jacobiRotateAt A p q
+
+/-- Off-diagonal mass before and after the rotation. -/
+def offBefore : Float := offDiagFrobSq A
+def offAfter : Float := offDiagFrobSq A'
+
+/-- The squared pivot `A[0,1]²` and the predicted post-rotation off-diagonal mass. -/
+def pivotSq : Float := let x := Spec.get2 A ⟨p, by decide⟩ ⟨q, by decide⟩; x * x
+def offPredicted : Float := offBefore - 2 * pivotSq
+
+#eval IO.println s!"off-diagonal mass: before = {offBefore}, after = {offAfter}, predicted = {offPredicted}"
+#eval IO.println s!"pivot A[0,1] = {Spec.get2 A ⟨p, by decide⟩ ⟨q, by decide⟩}, rotated A'[0,1] = {Spec.get2 A' ⟨p, by decide⟩ ⟨q, by decide⟩}"
+
+-- Positive — the exact per-rotation decrease `‖offDiag A'‖² = ‖offDiag A‖² − 2·A[p,q]²`
+-- (`jacobi_off_decrease`). The two sides are computed independently and shown to agree.
+#eval assertApproxEq "Jacobi(1 rot) off-diagonal decrease = 2·A[p,q]²" offAfter offPredicted
+
+-- Positive — the pivot really is annihilated (the `hannih` hypothesis holds here).
+#eval assertLt "Jacobi rotation annihilates the pivot A'[p,q] ≈ 0"
+  (Float.abs (Spec.get2 A' ⟨p, by decide⟩ ⟨q, by decide⟩))
+
+-- Positive — total Frobenius mass is preserved (`frobSq_orthogonal_conj`): the orthogonal similarity
+-- shifts mass from the off-diagonal onto the diagonal but conserves the total.
+#eval assertApproxEq "Jacobi rotation preserves total Frobenius mass ‖A'‖² = ‖A‖²"
+  (totalFrobSq A') (totalFrobSq A)
+
+/-! ## Negative control 1: the rotation angle matters
+
+A wrong-angle (but orthogonal, `c² + s² = 1`) Givens rotation does not annihilate the pivot, so the
+exact `2 · A[p,q]²` decrease fails. This is the numerical teeth of the `hannih` hypothesis. -/
+
+/-- A fixed orthogonal rotation with the *wrong* angle (`c = 0.6, s = 0.8`, so `c² + s² = 1`). -/
+def Awrong : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := givensConjAt A p q 0.6 0.8
+def offWrong : Float := offDiagFrobSq Awrong
+
+#eval IO.println s!"wrong-angle off-diagonal mass = {offWrong} (predicted-if-annihilating = {offPredicted})"
+
+-- The wrong angle misses the predicted decrease by a wide margin.
+#eval assertGe "wrong-angle rotation fails the 2·A[p,q]² decrease (annihilation hypothesis needed)"
+  (Float.abs (offWrong - offPredicted)) 0.5
+
+/-! ## Negative control 2: orthogonality matters
+
+A non-orthogonal conjugation (`c² + s² ≠ 1`) is not a similarity, so it does **not** preserve the
+total Frobenius mass — `frobSq_orthogonal_conj` genuinely needs `givens_orthogonal`. -/
+
+/-- A non-orthogonal "rotation" (`c = 0.6, s = 0.6`, so `c² + s² = 0.72 ≠ 1`). -/
+def Askew : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := givensConjAt A p q 0.6 0.6
+
+#eval IO.println s!"non-orthogonal conj total mass = {totalFrobSq Askew} (original = {totalFrobSq A})"
+
+-- A non-orthogonal conjugation changes the total Frobenius mass.
+#eval assertGe "non-orthogonal conjugation breaks Frobenius-mass invariance (orthogonality needed)"
+  (Float.abs (totalFrobSq Askew - totalFrobSq A)) 0.5
+
+end NN.Examples.Factorization.JacobiDecrease
diff --git a/NN/Proofs/Tensor/Basic.lean b/NN/Proofs/Tensor/Basic.lean
index 33ec81a..f2e8ffa 100644
--- a/NN/Proofs/Tensor/Basic.lean
+++ b/NN/Proofs/Tensor/Basic.lean
@@ -13,6 +13,7 @@ public import NN.Proofs.Tensor.Basic.Factorizations
 public import NN.Proofs.Tensor.Basic.FactorizationsReconstruction
 public import NN.Proofs.Tensor.Basic.FactorizationsOrthonormal
 public import NN.Proofs.Tensor.Basic.FactorizationsJacobi
+public import NN.Proofs.Tensor.Basic.FactorizationsJacobiDecrease
 public import NN.Proofs.Tensor.Basic.BoundsNorms
 public import NN.Proofs.Tensor.Basic.Algebra
 
diff --git a/NN/Proofs/Tensor/Basic/FactorizationsJacobiDecrease.lean b/NN/Proofs/Tensor/Basic/FactorizationsJacobiDecrease.lean
new file mode 100644
index 0000000..9c86e70
--- /dev/null
+++ b/NN/Proofs/Tensor/Basic/FactorizationsJacobiDecrease.lean
@@ -0,0 +1,316 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Proofs.Tensor.Basic.FactorizationsJacobi
+
+/-!
+# The cyclic Jacobi sweep makes progress (per-rotation off-diagonal decrease)
+
+[`FactorizationsJacobi`](./FactorizationsJacobi.lean) made the residual certificate *unconditional*:
+the solver output always satisfies orthogonality and the orthogonal-similarity `A = V · Af · Vᵀ`, so
+the reconstruction error equals the off-diagonal mass `‖offDiag(Af)‖²` of the rotated matrix. What
+that certificate does **not** say is that the off-diagonal mass actually *goes down*. This file
+proves the classical Jacobi progress identity, which is exactly that statement at the level of a
+single rotation:
+
+> If a symmetric `A` is conjugated by the Givens rotation that annihilates the pivot `(p, q)`, the
+> squared off-diagonal mass drops by exactly `2 · A[p,q]²`.
+
+The two ingredients, both *exact* over `ℝ`:
+
+* `frobSq_orthogonal_conj` — orthogonal similarity preserves the total Frobenius mass
+  `‖A‖² = trace(Aᵀ A)`. Combined with `frobSq_eq_diagSq_add_offSq` (`‖A‖² = diag-mass + off-mass`),
+  driving the off-diagonal mass down is the *same thing* as driving the diagonal mass up.
+* `givens_conj_*` — the explicit entries of `Jᵀ A J` in the rotation plane. A Givens conjugation only
+  touches rows/columns `p, q`, so the diagonal mass changes by `A'[p,p]² + A'[q,q]² − A[p,p]² −
+  A[q,q]²`, and the `2×2` block algebra (with `c² + s² = 1` and the annihilation `A'[p,q] = 0`) turns
+  that into `2 · A[p,q]²`.
+
+The pivot-annihilation is taken as a hypothesis (`hannih`): it is the defining property of the
+rotation angle. `givens_conj_pq` gives the explicit value of that pivot entry after the rotation, so
+`hannih` is the concrete equation `c·s·A[p,p] + c²·A[p,q] − s²·A[q,p] − c·s·A[q,q] = 0` that the
+Golub–Van Loan parameters the code uses are chosen to solve. Scope, as elsewhere in this development:
+this is the per-rotation decrease, the exact finite fact behind convergence; the *rate* over a whole
+sweep (and hence the number of sweeps needed) remains the research-grade piece Mathlib has no theory
+for.
+-/
+
+@[expose] public section
+
+namespace Spec.Factorization
+
+open Matrix
+open scoped BigOperators
+
+variable {n : Nat}
+
+/-! ## Frobenius mass: total, diagonal, off-diagonal -/
+
+/-- Total squared Frobenius mass `‖M‖² = trace(Mᵀ M) = ∑ᵢⱼ M[i,j]²`. -/
+def frobSq (M : Matrix (Fin n) (Fin n) ℝ) : ℝ := (Mᵀ * M).trace
+
+/-- Squared diagonal mass `∑ᵢ M[i,i]²`. -/
+def diagSq (M : Matrix (Fin n) (Fin n) ℝ) : ℝ := ∑ i, (M i i) ^ 2
+
+/-- Squared off-diagonal mass `‖offDiag M‖² = trace((offDiag M)ᵀ (offDiag M))`. This is the quantity
+the residual certificate equates with the reconstruction error. -/
+def offSq (M : Matrix (Fin n) (Fin n) ℝ) : ℝ :=
+  ((offDiagonal M)ᵀ * offDiagonal M).trace
+
+/-- `‖M‖²` as the sum of all squared entries. -/
+theorem frobSq_eq_sum (M : Matrix (Fin n) (Fin n) ℝ) :
+    frobSq M = ∑ i, ∑ j, (M i j) ^ 2 := by
+  unfold frobSq
+  rw [Matrix.trace]
+  simp only [Matrix.diag_apply, Matrix.mul_apply, Matrix.transpose_apply]
+  rw [Finset.sum_comm]
+  exact Finset.sum_congr rfl (fun i _ => Finset.sum_congr rfl (fun j _ => by ring))
+
+/-- The off-diagonal part has entries `M[i,j]` off the diagonal and `0` on it. -/
+theorem offDiagonal_apply (M : Matrix (Fin n) (Fin n) ℝ) (i j : Fin n) :
+    offDiagonal M i j = if i = j then 0 else M i j := by
+  unfold offDiagonal
+  rw [Matrix.sub_apply]
+  by_cases h : i = j
+  · subst h; simp
+  · rw [Matrix.diagonal_apply_ne _ h, sub_zero, if_neg h]
+
+/-- `‖offDiag M‖²` as the sum of squared off-diagonal entries. -/
+theorem offSq_eq_sum (M : Matrix (Fin n) (Fin n) ℝ) :
+    offSq M = ∑ i, ∑ j, if i = j then 0 else (M i j) ^ 2 := by
+  unfold offSq
+  rw [Matrix.trace]
+  simp only [Matrix.diag_apply, Matrix.mul_apply, Matrix.transpose_apply]
+  rw [Finset.sum_comm]
+  refine Finset.sum_congr rfl (fun i _ => Finset.sum_congr rfl (fun j _ => ?_))
+  rw [offDiagonal_apply]
+  by_cases h : i = j
+  · subst h; simp
+  · simp only [if_neg h]; ring
+
+/-- **The Frobenius mass splits as diagonal mass plus off-diagonal mass.** -/
+theorem frobSq_eq_diagSq_add_offSq (M : Matrix (Fin n) (Fin n) ℝ) :
+    frobSq M = diagSq M + offSq M := by
+  rw [frobSq_eq_sum, offSq_eq_sum, diagSq, ← Finset.sum_add_distrib]
+  refine Finset.sum_congr rfl (fun i _ => ?_)
+  have hsplit : ∀ j : Fin n,
+      (M i j) ^ 2 = (if i = j then (M i j) ^ 2 else 0) + (if i = j then 0 else (M i j) ^ 2) := by
+    intro j; by_cases h : i = j <;> simp [h]
+  rw [Finset.sum_congr rfl (fun j _ => hsplit j), Finset.sum_add_distrib, Finset.sum_ite_eq]
+  simp
+
+/-- **Orthogonal similarity preserves the total Frobenius mass.** Every Jacobi step is such a
+similarity, so `‖A‖²` is an exact invariant of the whole run. -/
+theorem frobSq_orthogonal_conj {J M : Matrix (Fin n) (Fin n) ℝ} (hJ : Jᵀ * J = 1) :
+    frobSq (Jᵀ * M * J) = frobSq M := by
+  have hJJ : J * Jᵀ = 1 := mul_eq_one_comm.mp hJ
+  unfold frobSq
+  have hprod : ((Jᵀ * M * J)ᵀ * (Jᵀ * M * J)) = Jᵀ * (Mᵀ * M) * J := by
+    rw [Matrix.transpose_mul, Matrix.transpose_mul, Matrix.transpose_transpose]
+    simp only [Matrix.mul_assoc]
+    rw [← Matrix.mul_assoc J Jᵀ (M * J), hJJ, Matrix.one_mul]
+  rw [hprod, Matrix.trace_mul_comm, ← Matrix.mul_assoc, hJJ, Matrix.one_mul]
+
+/-! ## The `2×2` block algebra
+
+The rotation only mixes rows/columns `p` and `q`, so all the analysis happens in a `2×2` block. The
+key fact is that an orthogonal `2×2` conjugation preserves the block's Frobenius mass; we obtain it by
+specialising `frobSq_orthogonal_conj` to `Fin 2`. -/
+
+/-- **`2×2` block Frobenius preservation.** Conjugating the block `!![a, b; b', d]` by the orthogonal
+rotation `!![c, s; -s, c]` (with `c² + s² = 1`) preserves the sum of squared entries. Proved by
+specialising `frobSq_orthogonal_conj` to `Fin 2`. -/
+private theorem block_frob (a b b' d c s : ℝ) (hcs : c ^ 2 + s ^ 2 = 1) :
+    (c ^ 2 * a - c * s * (b + b') + s ^ 2 * d) ^ 2 + (s ^ 2 * a + c * s * (b + b') + c ^ 2 * d) ^ 2
+      + (c * s * a + c ^ 2 * b - s ^ 2 * b' - c * s * d) ^ 2
+      + (c * s * a + c ^ 2 * b' - s ^ 2 * b - c * s * d) ^ 2
+      = a ^ 2 + b ^ 2 + b' ^ 2 + d ^ 2 := by
+  have hR : (!![c, s; -s, c] : Matrix (Fin 2) (Fin 2) ℝ)ᵀ * !![c, s; -s, c] = 1 := by
+    ext i j; fin_cases i <;> fin_cases j <;>
+      simp [Matrix.mul_apply, Fin.sum_univ_two] <;> nlinarith [hcs]
+  have hfrob := frobSq_orthogonal_conj (M := (!![a, b; b', d] : Matrix (Fin 2) (Fin 2) ℝ)) hR
+  rw [frobSq_eq_sum, frobSq_eq_sum] at hfrob
+  simp [Fin.sum_univ_two, Matrix.mul_apply, Matrix.transpose_apply] at hfrob
+  linear_combination hfrob
+
+/-- **The diagonal-mass increase.** Under the rotation parameters (`c² + s² = 1`), symmetry of the
+pivot (`b' = b`), and the annihilation equation `c·s·(a − d) + (c² − s²)·b = 0`, the two rotated
+diagonal squares exceed the originals by exactly `2 b²`. -/
+private theorem block_diag_algebra (a b d c s : ℝ) (hcs : c ^ 2 + s ^ 2 = 1)
+    (hann : c * s * (a - d) + (c ^ 2 - s ^ 2) * b = 0) :
+    (c ^ 2 * a - 2 * c * s * b + s ^ 2 * d) ^ 2 + (s ^ 2 * a + 2 * c * s * b + c ^ 2 * d) ^ 2
+        - a ^ 2 - d ^ 2 = 2 * b ^ 2 := by
+  have hbf := block_frob a b b d c s hcs
+  have h0 : c * s * a + c ^ 2 * b - s ^ 2 * b - c * s * d = 0 := by linear_combination hann
+  linear_combination hbf - 2 * (c * s * a + c ^ 2 * b - s ^ 2 * b - c * s * d) * h0
+
+/-! ## Sum helpers -/
+
+/-- Sum of a function `f` against an indicator supported on the pair `{p', q'}` (with `p' ≠ q'`). -/
+private theorem sum_pair (p' q' : Fin n) (hpq : p' ≠ q') (vp vq : ℝ) (f : Fin n → ℝ) :
+    ∑ l, (if l = p' then vp else if l = q' then vq else 0) * f l = vp * f p' + vq * f q' := by
+  have hterm : ∀ l : Fin n,
+      (if l = p' then vp else if l = q' then vq else 0) * f l
+        = (if l = p' then vp * f l else 0) + (if l = q' then vq * f l else 0) := by
+    intro l
+    by_cases hlp : l = p'
+    · subst hlp; simp [hpq]
+    · by_cases hlq : l = q'
+      · subst hlq; simp [hlp]
+      · simp [hlp, hlq]
+  rw [Finset.sum_congr rfl (fun l _ => hterm l), Finset.sum_add_distrib,
+    Finset.sum_ite_eq', Finset.sum_ite_eq']
+  simp
+
+/-- A fintype sum of a function supported on the pair `{p', q'}` collapses to the two values. -/
+private theorem sum_eq_pair (p' q' : Fin n) (hpq : p' ≠ q') (g : Fin n → ℝ)
+    (h0 : ∀ o, o ≠ p' → o ≠ q' → g o = 0) : ∑ o, g o = g p' + g q' := by
+  rw [← Finset.sum_pair hpq]
+  refine (Finset.sum_subset (Finset.subset_univ _) (fun o _ ho => ?_)).symm
+  simp only [Finset.mem_insert, Finset.mem_singleton, not_or] at ho
+  exact h0 o ho.1 ho.2
+
+/-! ## Entries of `A · J` and of the conjugation `Jᵀ · A · J`
+
+`J = toM n (arrGivens n p q c s)` has columns supported on `{p, q}` (off `{p, q}` it is the identity),
+so multiplying by it only combines columns `p`, `q`. -/
+
+variable (A : Matrix (Fin n) (Fin n) ℝ) (p q : Nat) (hp : p < n) (hq : q < n) (hpq : p ≠ q)
+  (c s : ℝ)
+
+include hpq in
+private theorem fin_pq_ne : (⟨p, hp⟩ : Fin n) ≠ ⟨q, hq⟩ := fun h => hpq (Fin.ext_iff.mp h)
+
+include hpq in
+/-- Column `p` of `A · J`: `c · A[·,p] − s · A[·,q]`. -/
+theorem givens_AJ_p (k : Fin n) :
+    (A * toM n (Spec.arrGivens n p q c s)) k ⟨p, hp⟩
+      = c * A k ⟨p, hp⟩ - s * A k ⟨q, hq⟩ := by
+  rw [Matrix.mul_apply,
+    Finset.sum_congr rfl (fun l _ => by rw [givens_col_fp p q hp hq hpq c s l, mul_comm]),
+    sum_pair ⟨p, hp⟩ ⟨q, hq⟩ (fin_pq_ne p q hp hq hpq) c (-s) (fun l => A k l)]
+  ring
+
+include hpq in
+/-- Column `q` of `A · J`: `s · A[·,p] + c · A[·,q]`. -/
+theorem givens_AJ_q (k : Fin n) :
+    (A * toM n (Spec.arrGivens n p q c s)) k ⟨q, hq⟩
+      = s * A k ⟨p, hp⟩ + c * A k ⟨q, hq⟩ := by
+  rw [Matrix.mul_apply,
+    Finset.sum_congr rfl (fun l _ => by rw [givens_col_fq p q hp hq hpq c s l, mul_comm]),
+    sum_pair ⟨p, hp⟩ ⟨q, hq⟩ (fin_pq_ne p q hp hq hpq) s c (fun l => A k l)]
+
+/-- Any other column `o ∉ {p, q}` of `A · J` is unchanged. -/
+theorem givens_AJ_other (o : Fin n) (hop : o.val ≠ p) (hoq : o.val ≠ q) (k : Fin n) :
+    (A * toM n (Spec.arrGivens n p q c s)) k o = A k o := by
+  rw [Matrix.mul_apply,
+    Finset.sum_congr rfl (fun l _ => by rw [givens_col_other p q c s o l hop hoq])]
+  simp
+
+include hpq in
+/-- The `(p, p)` entry of the conjugation `Jᵀ · A · J`. -/
+theorem givens_conj_pp :
+    ((toM n (Spec.arrGivens n p q c s))ᵀ * A * toM n (Spec.arrGivens n p q c s)) ⟨p, hp⟩ ⟨p, hp⟩
+      = c ^ 2 * A ⟨p, hp⟩ ⟨p, hp⟩ - c * s * (A ⟨p, hp⟩ ⟨q, hq⟩ + A ⟨q, hq⟩ ⟨p, hp⟩)
+        + s ^ 2 * A ⟨q, hq⟩ ⟨q, hq⟩ := by
+  rw [Matrix.mul_assoc, Matrix.mul_apply,
+    Finset.sum_congr rfl (fun k _ => by
+      rw [Matrix.transpose_apply, givens_col_fp p q hp hq hpq c s k,
+        givens_AJ_p A p q hp hq hpq c s k]),
+    sum_pair ⟨p, hp⟩ ⟨q, hq⟩ (fin_pq_ne p q hp hq hpq) c (-s)
+      (fun k => c * A k ⟨p, hp⟩ - s * A k ⟨q, hq⟩)]
+  ring
+
+include hpq in
+/-- The `(q, q)` entry of the conjugation `Jᵀ · A · J`. -/
+theorem givens_conj_qq :
+    ((toM n (Spec.arrGivens n p q c s))ᵀ * A * toM n (Spec.arrGivens n p q c s)) ⟨q, hq⟩ ⟨q, hq⟩
+      = s ^ 2 * A ⟨p, hp⟩ ⟨p, hp⟩ + c * s * (A ⟨p, hp⟩ ⟨q, hq⟩ + A ⟨q, hq⟩ ⟨p, hp⟩)
+        + c ^ 2 * A ⟨q, hq⟩ ⟨q, hq⟩ := by
+  rw [Matrix.mul_assoc, Matrix.mul_apply,
+    Finset.sum_congr rfl (fun k _ => by
+      rw [Matrix.transpose_apply, givens_col_fq p q hp hq hpq c s k,
+        givens_AJ_q A p q hp hq hpq c s k]),
+    sum_pair ⟨p, hp⟩ ⟨q, hq⟩ (fin_pq_ne p q hp hq hpq) s c
+      (fun k => s * A k ⟨p, hp⟩ + c * A k ⟨q, hq⟩)]
+  ring
+
+include hpq in
+/-- The `(p, q)` entry of the conjugation `Jᵀ · A · J` — the entry the rotation is chosen to
+annihilate. -/
+theorem givens_conj_pq :
+    ((toM n (Spec.arrGivens n p q c s))ᵀ * A * toM n (Spec.arrGivens n p q c s)) ⟨p, hp⟩ ⟨q, hq⟩
+      = c * s * A ⟨p, hp⟩ ⟨p, hp⟩ + c ^ 2 * A ⟨p, hp⟩ ⟨q, hq⟩ - s ^ 2 * A ⟨q, hq⟩ ⟨p, hp⟩
+        - c * s * A ⟨q, hq⟩ ⟨q, hq⟩ := by
+  rw [Matrix.mul_assoc, Matrix.mul_apply,
+    Finset.sum_congr rfl (fun k _ => by
+      rw [Matrix.transpose_apply, givens_col_fp p q hp hq hpq c s k,
+        givens_AJ_q A p q hp hq hpq c s k]),
+    sum_pair ⟨p, hp⟩ ⟨q, hq⟩ (fin_pq_ne p q hp hq hpq) c (-s)
+      (fun k => s * A k ⟨p, hp⟩ + c * A k ⟨q, hq⟩)]
+  ring
+
+/-- Any other diagonal entry `(o, o)` with `o ∉ {p, q}` is unchanged by the conjugation. -/
+theorem givens_conj_other (o : Fin n) (hop : o.val ≠ p) (hoq : o.val ≠ q) :
+    ((toM n (Spec.arrGivens n p q c s))ᵀ * A * toM n (Spec.arrGivens n p q c s)) o o = A o o := by
+  rw [Matrix.mul_assoc, Matrix.mul_apply,
+    Finset.sum_congr rfl (fun k _ => by
+      rw [Matrix.transpose_apply, givens_col_other p q c s o k hop hoq,
+        givens_AJ_other A p q c s o hop hoq k])]
+  simp
+
+/-! ## The per-rotation off-diagonal decrease -/
+
+include hpq in
+/-- **Per-rotation Jacobi progress.** For a *symmetric pivot* (`A[q,p] = A[p,q]`) and the Givens
+rotation that *annihilates* it (`(Jᵀ A J)[p,q] = 0`), conjugating `A` by `J` decreases the squared
+off-diagonal mass by exactly `2 · A[p,q]²`:
+
+`‖offDiag(Jᵀ A J)‖² = ‖offDiag A‖² − 2 · A[p,q]²`.
+
+This is the exact finite identity behind Jacobi convergence: each rotation removes `2 · A[p,q]²` of
+off-diagonal mass. The *rate* over a sweep (how the pivots are chosen and how fast the total mass
+falls) is the research-grade part Mathlib has no theory for. -/
+theorem jacobi_off_decrease (hcs : c ^ 2 + s ^ 2 = 1)
+    (hsym : A ⟨q, hq⟩ ⟨p, hp⟩ = A ⟨p, hp⟩ ⟨q, hq⟩)
+    (hannih : ((toM n (Spec.arrGivens n p q c s))ᵀ * A * toM n (Spec.arrGivens n p q c s))
+      ⟨p, hp⟩ ⟨q, hq⟩ = 0) :
+    offSq ((toM n (Spec.arrGivens n p q c s))ᵀ * A * toM n (Spec.arrGivens n p q c s))
+      = offSq A - 2 * (A ⟨p, hp⟩ ⟨q, hq⟩) ^ 2 := by
+  have hpq' : (⟨p, hp⟩ : Fin n) ≠ ⟨q, hq⟩ := fin_pq_ne p q hp hq hpq
+  set G := toM n (Spec.arrGivens n p q c s) with hGdef
+  have hJ : Gᵀ * G = 1 := givens_orthogonal p q hp hq hpq c s hcs
+  have hfrob : frobSq (Gᵀ * A * G) = frobSq A := frobSq_orthogonal_conj hJ
+  have hsumP := frobSq_eq_diagSq_add_offSq (Gᵀ * A * G)
+  have hsumA := frobSq_eq_diagSq_add_offSq A
+  -- The annihilation equation in explicit form (using symmetry).
+  have hann : c * s * (A ⟨p, hp⟩ ⟨p, hp⟩ - A ⟨q, hq⟩ ⟨q, hq⟩)
+      + (c ^ 2 - s ^ 2) * A ⟨p, hp⟩ ⟨q, hq⟩ = 0 := by
+    have hpq0 := givens_conj_pq A p q hp hq hpq c s
+    rw [hGdef] at hannih
+    rw [hannih] at hpq0
+    rw [hsym] at hpq0
+    linear_combination -hpq0
+  -- The diagonal mass increases by exactly `2 A[p,q]²`.
+  have hdiag : diagSq (Gᵀ * A * G) - diagSq A = 2 * (A ⟨p, hp⟩ ⟨q, hq⟩) ^ 2 := by
+    unfold diagSq
+    rw [← Finset.sum_sub_distrib,
+      sum_eq_pair ⟨p, hp⟩ ⟨q, hq⟩ hpq'
+        (fun o => (Gᵀ * A * G) o o ^ 2 - A o o ^ 2) ?_]
+    · simp only [hGdef, givens_conj_pp A p q hp hq hpq c s, givens_conj_qq A p q hp hq hpq c s,
+        hsym]
+      have hba := block_diag_algebra (A ⟨p, hp⟩ ⟨p, hp⟩) (A ⟨p, hp⟩ ⟨q, hq⟩) (A ⟨q, hq⟩ ⟨q, hq⟩) c s
+        hcs hann
+      linear_combination hba
+    · intro o hop' hoq'
+      have hop : o.val ≠ p := fun h => hop' (Fin.ext h)
+      have hoq : o.val ≠ q := fun h => hoq' (Fin.ext h)
+      simp only [hGdef, givens_conj_other A p q c s o hop hoq, sub_self]
+  linarith [hfrob, hsumP, hsumA, hdiag]
+
+end Spec.Factorization
diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
index de1522d..4003d8f 100644
--- a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
+++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
@@ -158,6 +158,34 @@ hold for the actual returned `(Λ, V)` outright. So the returned `V` is a genuin
 the only thing the residual certificate still defers to runtime is the *size* of the off-diagonal
 mass, never the algebraic faithfulness of the decomposition.
 
+# Per-rotation progress: the off-diagonal mass decreases
+
+Faithfulness says the residual *equals* the off-diagonal mass of `Af`; it does not say that mass ever
+goes *down*. The classical Jacobi progress identity, proved in
+[`NN.Proofs.Tensor.Basic.FactorizationsJacobiDecrease`](https://github.com/lean-dojo/TorchLean/blob/main/NN/Proofs/Tensor/Basic/FactorizationsJacobiDecrease.lean),
+is exactly that statement at the level of a single rotation. For a symmetric `A`, conjugating by the
+Givens rotation that *annihilates* the pivot `(p, q)` decreases the squared off-diagonal mass by
+exactly `2 · A[p,q]²`:
+
+$$`\bigl\|\operatorname{offDiag}(J^\top A J)\bigr\|_F^2 = \bigl\|\operatorname{offDiag} A\bigr\|_F^2 - 2\,A[p,q]^2.`
+
+This is `jacobi_off_decrease`, and it rests on two exact facts. First, *orthogonal similarity
+preserves the total Frobenius mass* (`frobSq_orthogonal_conj`): `‖Jᵀ A J‖² = ‖A‖²`, since
+`trace((Jᵀ A J)ᵀ (Jᵀ A J)) = trace(Aᵀ A)` after the `J Jᵀ = 1` cancellation. Splitting that total as
+diagonal-plus-off-diagonal mass (`frobSq_eq_diagSq_add_offSq`) shows that driving the off-diagonal
+down is *the same thing* as driving the diagonal up. Second, the rotation only mixes rows and columns
+`p, q`, so the diagonal mass changes by `A'[p,p]² + A'[q,q]² − A[p,p]² − A[q,q]²`; the explicit
+conjugation entries (`givens_conj_pp`, `givens_conj_qq`, `givens_conj_pq`, computed from the Givens
+columns via the support lemmas) plus the `2×2` block-Frobenius identity — itself just
+`frobSq_orthogonal_conj` specialised to `Fin 2` — turn that, under `c² + s² = 1` and the annihilation
+`A'[p,q] = 0`, into precisely `2 · A[p,q]²`. The annihilation is the defining equation the
+Golub–Van Loan rotation angle solves, and `givens_conj_pq` exhibits the pivot entry whose vanishing
+it is. The executable witnesses in
+[`NN.Examples.Factorization.JacobiDecrease`](https://github.com/lean-dojo/TorchLean/blob/main/NN/Examples/Factorization/JacobiDecrease.lean)
+confirm the identity numerically (one rotation takes the off-diagonal mass `6 → 4 = 6 − 2·1²` with
+total mass conserved at `35`) and show its hypotheses biting: a wrong-angle rotation misses the
+decrease, a non-orthogonal one breaks mass invariance.
+
 # Exact QR reconstruction
 
 The QR factorization admits the same treatment. `qr_mul_eq` (in the same file) proves that for an
@@ -196,14 +224,18 @@ into a future Mathlib matrix-level QR contribution.
 
 # What remains
 
-With Cholesky and QR fully reconstructed (`A = L · Lᵀ`, `A = Q · R`, `Qᵀ Q = 1`), and the Jacobi run
-now proved faithful — `V` orthogonal and `A = V · Af · Vᵀ` exactly, so the residual certificate holds
-*unconditionally* for the real solver output — the single property still not available as an a-priori
-theorem is the *rate*: that finitely many cyclic-Jacobi sweeps drive `Af`'s off-diagonal mass to zero.
-That is the research-grade Forsythe–Henrici / Schönhage convergence result for cyclic (rather than
-classical, largest-pivot) Jacobi, and Mathlib v4.30.0 has no Jacobi convergence theory, so it remains
-captured by the exact a-posteriori residual certificate above — bounded numerically by the `assertLt`
-checks on concrete inputs — never by `sorry`. Everything else is exact: the algebraic faithfulness of
-the decomposition (orthogonality, orthogonal similarity, the residual identity, and correctness in the
-zero-residual limit) is proved, and the specification-level facts the kernel methods rely on are
-independent of the convergence step, so the CHD foundation is complete.
+With Cholesky and QR fully reconstructed (`A = L · Lᵀ`, `A = Q · R`, `Qᵀ Q = 1`), the Jacobi run
+proved faithful — `V` orthogonal and `A = V · Af · Vᵀ` exactly, so the residual certificate holds
+*unconditionally* for the real solver output — and the *per-rotation* progress proved exactly (each
+annihilating rotation removes `2 · A[p,q]²` of off-diagonal mass), the single property still not
+available as an a-priori theorem is the *aggregate rate*: that a full *cyclic* sweep, choosing its
+pivots in fixed row-major order rather than always the largest, drives the off-diagonal mass to zero
+fast enough that finitely many sweeps suffice. Summing the per-rotation decrease over a sweep is exact;
+what is research-grade is bounding the *sum of the pivots* below in terms of the total off-diagonal
+mass when the pivots are visited cyclically — the Forsythe–Henrici / Schönhage convergence result.
+Mathlib v4.30.0 has no Jacobi convergence theory, so that aggregate rate remains captured by the exact
+a-posteriori residual certificate above — bounded numerically by the `assertLt` checks on concrete
+inputs — never by `sorry`. Everything else is exact: the algebraic faithfulness of the decomposition
+(orthogonality, orthogonal similarity, the residual identity, the per-rotation decrease, and
+correctness in the zero-residual limit) is proved, and the specification-level facts the kernel methods
+rely on are independent of the convergence step, so the CHD foundation is complete.

From fd1b505d79b37a67452667da08bbb705c275421d Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Sun, 31 May 2026 09:10:45 -0700
Subject: [PATCH 08/22] Prove classical Jacobi linear convergence rate (Tier 3)
 + reviewer examples
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add the aggregate-rate development for the largest-pivot Jacobi strategy,
building on the per-rotation decrease (Tier 2):

- offSq_le_count_mul_max: the largest off-diagonal pivot carries at least the
  average share of the mass, ‖offDiag A‖² ≤ (n²−n)·A[p,q]².
- jacobi_off_decrease_classical: substituting that into the exact per-rotation
  decrease yields a genuine linear contraction by 1 − 2/(n²−n) < 1.
- geom_bound_of_contraction / tendsto_zero_of_contraction: a fixed-factor
  contraction iterates to ρᵏ·a₀ and (with offSq_nonneg, factor < 1) tends to 0,
  so the classical algorithm provably converges geometrically. Stated for an
  arbitrary per-step factor, so a future cyclic per-sweep bound plugs in directly.

Honest scope: the cyclic ordering the solver uses does not satisfy the
largest-pivot hypothesis (the research-grade Forsythe–Henrici/Schönhage rate);
that gap stays captured by the exact a-posteriori residual certificate, never
by sorry.

Reviewer examples (NN/Examples/Factorization/JacobiRate.lean): largest pivot
meets the rate (mass 50.04 → 0.04, far under the guaranteed 33.36); negative
control — a tiny non-largest pivot misses the guaranteed factor. Blueprint
gains an "Aggregate rate" section and a precise restatement of what remains.

sorry-free, warning-free; repo_lint shows no new violations.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 NN/Examples/Factorization.lean                |   5 +
 NN/Examples/Factorization/JacobiRate.lean     | 102 +++++++++++
 NN/Proofs/Tensor/Basic.lean                   |   1 +
 .../Basic/FactorizationsJacobiRate.lean       | 163 ++++++++++++++++++
 .../Ch4_Verification/Factorizations.lean      |  58 +++++--
 5 files changed, 318 insertions(+), 11 deletions(-)
 create mode 100644 NN/Examples/Factorization/JacobiRate.lean
 create mode 100644 NN/Proofs/Tensor/Basic/FactorizationsJacobiRate.lean

diff --git a/NN/Examples/Factorization.lean b/NN/Examples/Factorization.lean
index 0dc3a6d..10be535 100644
--- a/NN/Examples/Factorization.lean
+++ b/NN/Examples/Factorization.lean
@@ -12,6 +12,7 @@ public import NN.Examples.Factorization.QR
 public import NN.Examples.Factorization.SymEig
 public import NN.Examples.Factorization.SVD
 public import NN.Examples.Factorization.JacobiDecrease
+public import NN.Examples.Factorization.JacobiRate
 
 /-!
 # Matrix factorization examples
@@ -34,6 +35,10 @@ factorization misbehaves.
 - `JacobiDecrease` — the per-rotation progress identity `‖offDiag(Jᵀ A J)‖² = ‖offDiag A‖² − 2·A[p,q]²`
   (`jacobi_off_decrease`) and Frobenius-mass invariance; **negative controls**: a wrong-angle rotation
   misses the decrease, a non-orthogonal one breaks mass invariance.
+- `JacobiRate` — the *aggregate* linear-contraction rate of the classical largest-pivot strategy:
+  `‖offDiag(Jᵀ A J)‖² ≤ (1 − 2/(n²−n))·‖offDiag A‖²` (`jacobi_off_decrease_classical`); **negative
+  control**: annihilating a non-largest (tiny) pivot misses the guaranteed factor, so the rate is
+  specific to the largest-pivot choice.
 
 Both **positive** checks (a valid factorization reconstructs to `err ≈ 0`) and **negative controls**
 (the same metric reports a large error / `NaN` when a hypothesis is violated) are included, so a
diff --git a/NN/Examples/Factorization/JacobiRate.lean b/NN/Examples/Factorization/JacobiRate.lean
new file mode 100644
index 0000000..60b51aa
--- /dev/null
+++ b/NN/Examples/Factorization/JacobiRate.lean
@@ -0,0 +1,102 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Examples.Factorization.Common
+meta import NN.Examples.Factorization.Common
+
+/-!
+# Example: the classical Jacobi sweep contracts at a fixed linear rate
+
+These checks corroborate the **Tier 3** development in
+`NN.Proofs.Tensor.Basic.FactorizationsJacobiRate`: the *aggregate* convergence rate of the classical
+(largest-pivot) Jacobi strategy. Annihilating the **largest** off-diagonal pivot multiplies the
+off-diagonal mass by at most `1 − 2/(n² − n) < 1`:
+
+`‖offDiag(Jᵀ A J)‖² ≤ (1 − 2/(n² − n)) · ‖offDiag A‖²`     (`jacobi_off_decrease_classical`)
+
+because the largest pivot carries at least the average share `‖offDiag A‖²/(n² − n)` of the mass
+(`offSq_le_count_mul_max`). The test matrix has one dominant off-diagonal entry, so the contrast
+between annihilating it and annihilating a tiny one is stark.
+
+The checks exhibit the theorem *and* its largest-pivot hypothesis biting (negative control):
+
+* **Positive — pivot carries ≥ average share.** `‖offDiag A‖² ≤ (n² − n) · A[p,q]²` for the largest
+  pivot (`offSq_le_count_mul_max` on the concrete matrix).
+* **Positive — largest pivot meets the rate.** Annihilating the dominant entry `A[0,1]` contracts the
+  off-diagonal mass below `(1 − 2/(n² − n)) · ‖offDiag A‖²` (in fact far below — it nearly diagonalises).
+* **Negative — a non-largest pivot misses the rate.** Annihilating a *tiny* off-diagonal entry
+  `A[0,2]` still removes `2·A[0,2]²` of mass (the per-rotation identity always holds), but that is far
+  too little to meet the guaranteed factor: the off-diagonal mass stays *above* `(1 − 2/(n²−n))·‖offDiag A‖²`.
+  This is exactly why the rate is for the *largest-pivot* strategy and the cyclic sweep needs the
+  research-grade Forsythe–Henrici bound instead.
+-/
+
+@[expose] public section
+
+
+namespace NN.Examples.Factorization.JacobiRate
+
+/-- A symmetric `3×3` matrix with one dominant off-diagonal entry `A[0,1] = 5` and two tiny ones
+`A[0,2] = A[1,2] = 0.1`. -/
+def A : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) :=
+  mkMat [[1, 5, 0.1],
+         [5, 2, 0.1],
+         [0.1, 0.1, 3]]
+
+/-- Off-diagonal count `n² − n` for `n = 3`, and the guaranteed contraction factor `1 − 2/(n²−n)`. -/
+def offCount : Float := 3 * 3 - 3          -- = 6
+def factor : Float := 1 - 2 / offCount     -- = 2/3
+
+def offBefore : Float := offDiagFrobSq A
+
+/-- The largest off-diagonal entry is `A[0,1] = 5`; its square is the per-rotation drop budget. -/
+def bigSq : Float := let x := Spec.get2 A ⟨0, by decide⟩ ⟨1, by decide⟩; x * x   -- = 25
+/-- A tiny off-diagonal entry `A[0,2] = 0.1`. -/
+def smallSq : Float := let x := Spec.get2 A ⟨0, by decide⟩ ⟨2, by decide⟩; x * x -- = 0.01
+
+/-- Annihilate the **largest** pivot `(0,1)` — the classical choice. -/
+def Abig : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := jacobiRotateAt A 0 1
+def offBig : Float := offDiagFrobSq Abig
+
+/-- Annihilate a **tiny** pivot `(0,2)` — a non-largest (e.g. cyclic-order) choice. -/
+def Asmall : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := jacobiRotateAt A 0 2
+def offSmall : Float := offDiagFrobSq Asmall
+
+#eval IO.println s!"off-diagonal mass = {offBefore}; average share = {offBefore / offCount}; \
+  guaranteed post-rotation bound (1-2/(n²-n))·mass = {factor * offBefore}"
+#eval IO.println s!"largest pivot²  = {bigSq}  (≥ average) → off-mass after = {offBig}"
+#eval IO.println s!"tiny    pivot²  = {smallSq} (< average) → off-mass after = {offSmall}"
+
+-- Positive — the largest pivot carries at least the average share: `‖offDiag A‖² ≤ (n²−n)·A[p,q]²`
+-- (`offSq_le_count_mul_max`). The "violation amount" is `0` when the bound holds.
+#eval assertLt "largest pivot carries ≥ average share: ‖offDiag A‖² ≤ (n²−n)·A[0,1]²"
+  (max (0.0 : Float) (offBefore - offCount * bigSq))
+
+-- Positive — annihilating the largest pivot meets the linear rate (`jacobi_off_decrease_classical`).
+#eval assertLt "classical contraction: ‖offDiag A'‖² ≤ (1−2/(n²−n))·‖offDiag A‖²"
+  (max (0.0 : Float) (offBig - factor * offBefore))
+
+-- Positive — the largest pivot really is annihilated.
+#eval assertLt "largest-pivot rotation annihilates A'[0,1] ≈ 0"
+  (Float.abs (Spec.get2 Abig ⟨0, by decide⟩ ⟨1, by decide⟩))
+
+/-! ## Negative control: the largest-pivot hypothesis is necessary
+
+Annihilating a *tiny* off-diagonal entry obeys the per-rotation identity (mass drops by `2·A[0,2]²`)
+but removes far too little to meet the guaranteed factor — the off-diagonal mass stays above
+`(1 − 2/(n²−n))·‖offDiag A‖²`. -/
+
+-- The tiny pivot is below the average share, so the count bound does *not* certify the rate for it.
+#eval IO.println s!"tiny pivot still annihilated: A'[0,2] = {Spec.get2 Asmall ⟨0, by decide⟩ ⟨2, by decide⟩}, \
+  and mass did drop ({offBefore} → {offSmall}) — just not by enough"
+
+-- Negative — a non-largest pivot misses the guaranteed contraction by a wide margin.
+#eval assertGe "non-largest pivot fails the (1−2/(n²−n)) rate (largest-pivot hypothesis needed)"
+  (offSmall - factor * offBefore) 0.5
+
+end NN.Examples.Factorization.JacobiRate
diff --git a/NN/Proofs/Tensor/Basic.lean b/NN/Proofs/Tensor/Basic.lean
index f2e8ffa..33718c6 100644
--- a/NN/Proofs/Tensor/Basic.lean
+++ b/NN/Proofs/Tensor/Basic.lean
@@ -14,6 +14,7 @@ public import NN.Proofs.Tensor.Basic.FactorizationsReconstruction
 public import NN.Proofs.Tensor.Basic.FactorizationsOrthonormal
 public import NN.Proofs.Tensor.Basic.FactorizationsJacobi
 public import NN.Proofs.Tensor.Basic.FactorizationsJacobiDecrease
+public import NN.Proofs.Tensor.Basic.FactorizationsJacobiRate
 public import NN.Proofs.Tensor.Basic.BoundsNorms
 public import NN.Proofs.Tensor.Basic.Algebra
 
diff --git a/NN/Proofs/Tensor/Basic/FactorizationsJacobiRate.lean b/NN/Proofs/Tensor/Basic/FactorizationsJacobiRate.lean
new file mode 100644
index 0000000..70bc78c
--- /dev/null
+++ b/NN/Proofs/Tensor/Basic/FactorizationsJacobiRate.lean
@@ -0,0 +1,163 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Proofs.Tensor.Basic.FactorizationsJacobiDecrease
+
+/-!
+# The aggregate Jacobi convergence rate (classical largest-pivot strategy)
+
+[`FactorizationsJacobiDecrease`](./FactorizationsJacobiDecrease.lean) proved the *per-rotation*
+identity `‖offDiag(Jᵀ A J)‖² = ‖offDiag A‖² − 2 · A[p,q]²` exactly. That is a statement about one
+rotation; it says nothing on its own about how fast the off-diagonal mass falls over many rotations —
+the *aggregate rate*, and hence how many sweeps are needed.
+
+This file proves the aggregate rate **for the classical (largest-pivot) strategy**, which is the
+elementary, a-priori-provable part of the convergence story:
+
+* `offSq_le_count_mul_max` — the off-diagonal mass is at most the off-diagonal count `n² − n` times
+  the largest squared off-diagonal entry. So if the pivot `(p, q)` is chosen to be the *largest*
+  off-diagonal entry, `A[p,q]² ≥ ‖offDiag A‖² / (n² − n)`.
+* `jacobi_off_decrease_classical` — combining that lower bound on the pivot with the exact
+  per-rotation decrease gives a genuine **linear contraction**: one largest-pivot rotation multiplies
+  the off-diagonal mass by at most `1 − 2/(n² − n) < 1`.
+* `geom_bound_of_contraction` / `tendsto_zero_of_contraction` — any quantity that contracts by a fixed
+  factor `ρ < 1` at every step is bounded by `ρ^k` and tends to `0`. Composed with the single-step
+  contraction (with `ρ = 1 − 2/(n² − n)` and `offSq_nonneg`), this is an a-priori proof that the
+  classical Jacobi eigenvalue algorithm drives the off-diagonal mass to zero geometrically.
+
+## Honest scope: classical vs. cyclic
+
+The executable solver runs the **cyclic** sweep (pivots visited in fixed row-major order), *not* the
+classical largest-pivot rule. The per-step contraction above genuinely fails for a cyclic pivot: a
+fixed-order pivot need not be the largest off-diagonal entry, so `2 · A[p,q]²` can fall well short of
+`2 · ‖offDiag A‖² / (n² − n)` (and a later rotation in the same sweep can even refill an entry an
+earlier one zeroed). Bounding the *sum of the cyclic pivots* below — the per-sweep contraction factor
+— is the Forsythe–Henrici / Schönhage result, which Mathlib v4.30.0 has no theory for and which is not
+provable by this elementary argument. The abstract `geom_bound_of_contraction` is stated for an
+*arbitrary* per-step factor `ρ`, so the moment such a cyclic per-sweep bound is available it plugs in
+directly; until then the cyclic rate stays captured by the exact a-posteriori residual certificate of
+[`FactorizationsJacobi`](./FactorizationsJacobi.lean), never by `sorry`.
+-/
+
+@[expose] public section
+
+namespace Spec.Factorization
+
+open Matrix
+open scoped BigOperators
+
+variable {n : Nat}
+
+/-! ## The off-diagonal mass is nonnegative and bounded by the count times the largest entry -/
+
+/-- The squared off-diagonal mass is nonnegative (it is a sum of squares). -/
+theorem offSq_nonneg (M : Matrix (Fin n) (Fin n) ℝ) : 0 ≤ offSq M := by
+  rw [offSq_eq_sum]
+  refine Finset.sum_nonneg (fun i _ => Finset.sum_nonneg (fun j _ => ?_))
+  by_cases h : i = j
+  · simp [h]
+  · simp only [if_neg h]; positivity
+
+/-- The constant off-diagonal sum: there are exactly `n² − n` off-diagonal positions, so summing a
+constant `K` over them gives `(n² − n) · K`. -/
+private theorem sum_const_offdiag (K : ℝ) :
+    ∑ i : Fin n, ∑ j : Fin n, (if i = j then (0 : ℝ) else K) = ((n : ℝ) ^ 2 - (n : ℝ)) * K := by
+  have hinner : ∀ i : Fin n,
+      ∑ j : Fin n, (if i = j then (0 : ℝ) else K) = ((n : ℝ) - 1) * K := by
+    intro i
+    have hsplit : ∀ j : Fin n,
+        (if i = j then (0 : ℝ) else K) = K - (if i = j then K else 0) := by
+      intro j; by_cases h : i = j <;> simp [h]
+    rw [Finset.sum_congr rfl (fun j _ => hsplit j), Finset.sum_sub_distrib,
+      Finset.sum_const, Finset.sum_ite_eq]
+    simp only [Finset.mem_univ, if_true, Finset.card_univ, Fintype.card_fin, nsmul_eq_mul]
+    ring
+  rw [Finset.sum_congr rfl (fun i _ => hinner i), Finset.sum_const]
+  simp only [Finset.card_univ, Fintype.card_fin, nsmul_eq_mul]
+  ring
+
+/-- **The off-diagonal mass is at most the off-diagonal count `n² − n` times the largest squared
+off-diagonal entry.** With `(p', q')` achieving that maximum, this says
+`‖offDiag M‖² ≤ (n² − n) · M[p',q']²`, i.e. the largest pivot carries at least an average share of the
+mass — the bound the classical Jacobi strategy exploits. -/
+theorem offSq_le_count_mul_max (M : Matrix (Fin n) (Fin n) ℝ) (p' q' : Fin n)
+    (hmax : ∀ i j : Fin n, i ≠ j → (M i j) ^ 2 ≤ (M p' q') ^ 2) :
+    offSq M ≤ ((n : ℝ) ^ 2 - (n : ℝ)) * (M p' q') ^ 2 := by
+  rw [offSq_eq_sum, ← sum_const_offdiag ((M p' q') ^ 2)]
+  refine Finset.sum_le_sum (fun i _ => Finset.sum_le_sum (fun j _ => ?_))
+  by_cases h : i = j
+  · simp [h]
+  · simp only [if_neg h]; exact hmax i j h
+
+/-! ## The classical (largest-pivot) single-step contraction -/
+
+variable (A : Matrix (Fin n) (Fin n) ℝ) (p q : Nat) (hp : p < n) (hq : q < n) (hpq : p ≠ q)
+  (c s : ℝ)
+
+include hpq in
+/-- **Classical Jacobi linear convergence — one step.** If the pivot `(p, q)` is the *largest*
+off-diagonal entry (`hmax`), `A` is symmetric there (`hsym`), and `J` is the Givens rotation that
+annihilates it (`hannih`), then conjugating by `J` contracts the squared off-diagonal mass by the
+fixed factor `1 − 2/(n² − n) < 1`:
+
+`‖offDiag(Jᵀ A J)‖² ≤ (1 − 2/(n² − n)) · ‖offDiag A‖²`.
+
+This is the exact per-rotation decrease `2 · A[p,q]²` (`jacobi_off_decrease`) combined with the
+pivot lower bound `A[p,q]² ≥ ‖offDiag A‖²/(n² − n)` (`offSq_le_count_mul_max`). It is an a-priori
+convergence rate for the largest-pivot strategy; the *cyclic* strategy the solver uses does not
+satisfy the largest-pivot hypothesis and needs the research-grade Forsythe–Henrici bound instead. -/
+theorem jacobi_off_decrease_classical (hn : 2 ≤ n) (hcs : c ^ 2 + s ^ 2 = 1)
+    (hsym : A ⟨q, hq⟩ ⟨p, hp⟩ = A ⟨p, hp⟩ ⟨q, hq⟩)
+    (hannih : ((toM n (Spec.arrGivens n p q c s))ᵀ * A * toM n (Spec.arrGivens n p q c s))
+      ⟨p, hp⟩ ⟨q, hq⟩ = 0)
+    (hmax : ∀ i j : Fin n, i ≠ j → (A i j) ^ 2 ≤ (A ⟨p, hp⟩ ⟨q, hq⟩) ^ 2) :
+    offSq ((toM n (Spec.arrGivens n p q c s))ᵀ * A * toM n (Spec.arrGivens n p q c s))
+      ≤ (1 - 2 / ((n : ℝ) ^ 2 - (n : ℝ))) * offSq A := by
+  have hdec := jacobi_off_decrease A p q hp hq hpq c s hcs hsym hannih
+  have hbound := offSq_le_count_mul_max A ⟨p, hp⟩ ⟨q, hq⟩ hmax
+  have hN : (0 : ℝ) < (n : ℝ) ^ 2 - (n : ℝ) := by
+    have h2 : (2 : ℝ) ≤ (n : ℝ) := by exact_mod_cast hn
+    nlinarith
+  rw [hdec, sub_mul, one_mul, div_mul_eq_mul_div]
+  apply sub_le_sub_left
+  rw [div_le_iff₀ hN]
+  nlinarith [hbound]
+
+/-! ## Iterating the contraction: geometric convergence -/
+
+end Spec.Factorization
+
+namespace Spec.Factorization
+
+/-- **A fixed-factor contraction is bounded by a geometric sequence.** If `a (k+1) ≤ ρ · a k` for
+all `k` with `0 ≤ ρ`, then `a k ≤ ρ^k · a 0`. Applied with `a k = ‖offDiag Aₖ‖²` and
+`ρ = 1 − 2/(n² − n)` from `jacobi_off_decrease_classical`, this is the geometric a-priori rate of the
+classical Jacobi algorithm. The factor `ρ` is arbitrary, so any future per-sweep cyclic bound plugs
+in here unchanged. -/
+theorem geom_bound_of_contraction (a : ℕ → ℝ) (ρ : ℝ) (hρ : 0 ≤ ρ)
+    (hstep : ∀ k, a (k + 1) ≤ ρ * a k) : ∀ k, a k ≤ ρ ^ k * a 0 := by
+  intro k
+  induction k with
+  | zero => simp
+  | succ m ih =>
+    calc a (m + 1) ≤ ρ * a m := hstep m
+      _ ≤ ρ * (ρ ^ m * a 0) := mul_le_mul_of_nonneg_left ih hρ
+      _ = ρ ^ (m + 1) * a 0 := by ring
+
+/-- **The contraction drives the quantity to zero.** With a genuine factor `ρ < 1` (and `0 ≤ a k`,
+which holds for `offSq` by `offSq_nonneg`), the off-diagonal mass tends to `0`: the classical Jacobi
+algorithm provably converges to a diagonal matrix. -/
+theorem tendsto_zero_of_contraction (a : ℕ → ℝ) (ρ : ℝ) (hρ0 : 0 ≤ ρ) (hρ1 : ρ < 1)
+    (hnn : ∀ k, 0 ≤ a k) (hstep : ∀ k, a (k + 1) ≤ ρ * a k) :
+    Filter.Tendsto a Filter.atTop (nhds 0) := by
+  apply squeeze_zero hnn (geom_bound_of_contraction a ρ hρ0 hstep)
+  have hpow : Filter.Tendsto (fun k => ρ ^ k) Filter.atTop (nhds 0) :=
+    tendsto_pow_atTop_nhds_zero_of_lt_one hρ0 hρ1
+  simpa using hpow.mul_const (a 0)
+
+end Spec.Factorization
diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
index 4003d8f..e4869b5 100644
--- a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
+++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
@@ -186,6 +186,35 @@ confirm the identity numerically (one rotation takes the off-diagonal mass `6 
 total mass conserved at `35`) and show its hypotheses biting: a wrong-angle rotation misses the
 decrease, a non-orthogonal one breaks mass invariance.
 
+# Aggregate rate: linear convergence of the classical strategy
+
+The per-rotation identity removes `2 · A[p,q]²` of off-diagonal mass per step. Turning that into an
+*aggregate* rate — a factor by which the mass falls each step, and hence a bound on how many steps are
+needed — requires a lower bound on the pivot. For the *classical* strategy, which always annihilates
+the *largest* off-diagonal entry, that bound is elementary, and
+[`NN.Proofs.Tensor.Basic.FactorizationsJacobiRate`](https://github.com/lean-dojo/TorchLean/blob/main/NN/Proofs/Tensor/Basic/FactorizationsJacobiRate.lean)
+proves it exactly over `ℝ`. There are `n² − n` off-diagonal positions, so the largest one carries at
+least the average share of the mass (`offSq_le_count_mul_max`):
+
+$$`A[p,q]^2 \;\ge\; \frac{\bigl\|\operatorname{offDiag} A\bigr\|_F^2}{n^2 - n}.`
+
+Substituting this into the per-rotation decrease gives a genuine *linear contraction*
+(`jacobi_off_decrease_classical`):
+
+$$`\bigl\|\operatorname{offDiag}(J^\top A J)\bigr\|_F^2 \;\le\; \Bigl(1 - \tfrac{2}{n^2 - n}\Bigr)\,\bigl\|\operatorname{offDiag} A\bigr\|_F^2,`
+
+a fixed factor strictly below `1`. A fixed-factor contraction iterates to a geometric bound
+(`geom_bound_of_contraction`: `aₖ ≤ ρᵏ · a₀`) and, since `offSq ≥ 0` (`offSq_nonneg`) and the factor
+is `< 1`, drives the off-diagonal mass to zero (`tendsto_zero_of_contraction`). So the classical
+Jacobi eigenvalue algorithm provably converges, with an a-priori geometric rate. The geometric
+machinery is stated for an *arbitrary* per-step factor `ρ`, so it is exactly the slot a future cyclic
+per-sweep bound would fill. The executable witnesses in
+[`NN.Examples.Factorization.JacobiRate`](https://github.com/lean-dojo/TorchLean/blob/main/NN/Examples/Factorization/JacobiRate.lean)
+exhibit the contrast on a matrix with one dominant entry (`A[0,1] = 5`): annihilating the largest
+pivot collapses the off-diagonal mass `50.04 → 0.04`, far under the guaranteed `33.36`, while
+annihilating a tiny pivot `A[0,2] = 0.1` removes only `0.02` and stays *above* the guaranteed bound —
+the numerical teeth of the largest-pivot hypothesis.
+
 # Exact QR reconstruction
 
 The QR factorization admits the same treatment. `qr_mul_eq` (in the same file) proves that for an
@@ -226,16 +255,23 @@ into a future Mathlib matrix-level QR contribution.
 
 With Cholesky and QR fully reconstructed (`A = L · Lᵀ`, `A = Q · R`, `Qᵀ Q = 1`), the Jacobi run
 proved faithful — `V` orthogonal and `A = V · Af · Vᵀ` exactly, so the residual certificate holds
-*unconditionally* for the real solver output — and the *per-rotation* progress proved exactly (each
-annihilating rotation removes `2 · A[p,q]²` of off-diagonal mass), the single property still not
-available as an a-priori theorem is the *aggregate rate*: that a full *cyclic* sweep, choosing its
-pivots in fixed row-major order rather than always the largest, drives the off-diagonal mass to zero
-fast enough that finitely many sweeps suffice. Summing the per-rotation decrease over a sweep is exact;
-what is research-grade is bounding the *sum of the pivots* below in terms of the total off-diagonal
-mass when the pivots are visited cyclically — the Forsythe–Henrici / Schönhage convergence result.
-Mathlib v4.30.0 has no Jacobi convergence theory, so that aggregate rate remains captured by the exact
-a-posteriori residual certificate above — bounded numerically by the `assertLt` checks on concrete
-inputs — never by `sorry`. Everything else is exact: the algebraic faithfulness of the decomposition
-(orthogonality, orthogonal similarity, the residual identity, the per-rotation decrease, and
+*unconditionally* for the real solver output — the *per-rotation* progress proved exactly (each
+annihilating rotation removes `2 · A[p,q]²` of off-diagonal mass), and the *aggregate* rate of the
+*classical largest-pivot* strategy proved to be geometric (linear contraction by `1 − 2/(n²−n)`,
+iterating to convergence), the one property still not available as an a-priori theorem is the
+aggregate rate *for the cyclic ordering the solver actually uses*: that visiting pivots in fixed
+row-major order, rather than always the largest, still drives the off-diagonal mass to zero fast
+enough that finitely many sweeps suffice. The gap is precise. The classical bound rests on the
+largest pivot carrying at least the average share of the mass; a cyclically-chosen pivot need not, so
+its single-step decrease can fall arbitrarily short of `2·‖offDiag A‖²/(n²−n)` (and a later rotation
+in the same sweep can refill an entry an earlier one zeroed). Summing the per-rotation decrease over a
+sweep is exact; what is research-grade is bounding the *sum of the cyclic pivots* below in terms of
+the total off-diagonal mass — the Forsythe–Henrici / Schönhage convergence result. Mathlib v4.30.0 has
+no cyclic-Jacobi convergence theory, so that cyclic rate remains captured by the exact a-posteriori
+residual certificate above — bounded numerically by the `assertLt` checks on concrete inputs — never
+by `sorry`; and the geometric machinery (`geom_bound_of_contraction`, `tendsto_zero_of_contraction`)
+is stated for an arbitrary per-step factor, ready to consume such a bound the moment it exists.
+Everything else is exact: the algebraic faithfulness of the decomposition (orthogonality, orthogonal
+similarity, the residual identity, the per-rotation decrease, the classical-strategy linear rate, and
 correctness in the zero-residual limit) is proved, and the specification-level facts the kernel methods
 rely on are independent of the convergence step, so the CHD foundation is complete.

From b03ab4da72f4239dbf943f67e4831a0e2b933b82 Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Sun, 31 May 2026 11:18:11 -0700
Subject: [PATCH 09/22] Prove Cholesky positive-pivot keystone; make
 kernel-ridge solve unconditional
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add the deferred keystone `choleskyFn_diag_pos_of_posDef`: for a
positive-definite A, every executable Cholesky pivot is strictly positive
(equivalently the radicand A[j,j] − Σ_{k<j} L[j,k]² > 0 at every step).

Proof avoids matrix inverses entirely. Strong induction on the pivot index:
- `choleskyFn_dot_eq_local` — localized reconstruction needing only the
  smaller pivot's positivity (which is all the original `choleskyFn_dot_eq`
  ever uses), powering the induction.
- The Schur-complement witness z is built by reusing the already-proven
  back-substitution `triSolveUpperFn_mulVec` on the leading block (z m = 1,
  z annihilates the leading columns of L).
- `double_sum_gram` collapses the Gram part of zᵀAz to L[m,m]²; the residual
  part reduces to the single (m,m) entry, giving zᵀAz = radicand exactly.
- `Matrix.PosDef.dotProduct_mulVec_pos` forces zᵀAz > 0, hence radicand > 0,
  hence the pivot √radicand > 0.

Unconditional corollaries (no pivot hypothesis):
- `solveRidgeFn_mulVec_of_posSemidef` — for PSD K and γ > 0, solveRidgeFn
  solves (K+γ·I)·x = b exactly. The fully discharged verified core of CHD
  `solve_variationnal`.
- `solveRidgeSpec_mulVec_of_posSemidef` — tensor-level form.

Also lands the spec-layer solve defs (triSolveLowerFn/triSolveUpperFn/
cholSolveFn/addScaledIdFn/solveRidgeFn/solveRidgeSpec), the registration
import, positive/negative `#eval` examples exhibiting the keystone dichotomy
(SPD K+γ·I → all pivots > 0; singular K → a zero pivot), and the blueprint
update (the direct kernel-ridge solve route now has nothing left to prove).

sorry/omega/admit-free across all new and changed source.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 NN/Examples/Factorization.lean                |   8 +
 NN/Examples/Factorization/RidgeSolve.lean     | 129 ++++
 NN/Proofs/Tensor/Basic.lean                   |   1 +
 .../Tensor/Basic/FactorizationsSolve.lean     | 643 ++++++++++++++++++
 NN/Spec/Core/Tensor/Factorizations.lean       |  47 ++
 .../Ch4_Verification/Factorizations.lean      |  65 +-
 6 files changed, 891 insertions(+), 2 deletions(-)
 create mode 100644 NN/Examples/Factorization/RidgeSolve.lean
 create mode 100644 NN/Proofs/Tensor/Basic/FactorizationsSolve.lean

diff --git a/NN/Examples/Factorization.lean b/NN/Examples/Factorization.lean
index 10be535..2e4d60d 100644
--- a/NN/Examples/Factorization.lean
+++ b/NN/Examples/Factorization.lean
@@ -13,6 +13,7 @@ public import NN.Examples.Factorization.SymEig
 public import NN.Examples.Factorization.SVD
 public import NN.Examples.Factorization.JacobiDecrease
 public import NN.Examples.Factorization.JacobiRate
+public import NN.Examples.Factorization.RidgeSolve
 
 /-!
 # Matrix factorization examples
@@ -39,6 +40,13 @@ factorization misbehaves.
   `‖offDiag(Jᵀ A J)‖² ≤ (1 − 2/(n²−n))·‖offDiag A‖²` (`jacobi_off_decrease_classical`); **negative
   control**: annihilating a non-largest (tiny) pivot misses the guaranteed factor, so the rate is
   specific to the largest-pivot choice.
+- `RidgeSolve` — the kernel-ridge (Tikhonov) linear solve `(K + γ·I)·x = b` via Cholesky +
+  forward/back substitution (`solveRidgeFn_mulVec_of_posSemidef`, the verified core of CHD
+  `solve_variationnal`, now *unconditional* for PSD `K` and `γ > 0`): for a rank-deficient Gram kernel
+  `K = G·Gᵀ` and `γ > 0`, `solveRidgeFn` reconstructs `b` to machine precision; **negative control**:
+  with `γ = 0` the singular `K` has a zero Cholesky pivot and the solve diverges (`NaN`), so
+  regularization is necessary. Also exhibits the **keystone** `choleskyFn_diag_pos_of_posDef`: the SPD
+  `K + γ·I` has all-positive Cholesky pivots, while the singular `K` has a zero pivot (PosDef needed).
 
 Both **positive** checks (a valid factorization reconstructs to `err ≈ 0`) and **negative controls**
 (the same metric reports a large error / `NaN` when a hypothesis is violated) are included, so a
diff --git a/NN/Examples/Factorization/RidgeSolve.lean b/NN/Examples/Factorization/RidgeSolve.lean
new file mode 100644
index 0000000..6fd9ab2
--- /dev/null
+++ b/NN/Examples/Factorization/RidgeSolve.lean
@@ -0,0 +1,129 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Examples.Factorization.Common
+meta import NN.Examples.Factorization.Common
+
+/-!
+# Example: the kernel-ridge (Tikhonov) linear solve
+
+These checks corroborate the development in `NN.Proofs.Tensor.Basic.FactorizationsSolve`: the
+Cholesky-based solve of `(K + γ·I)·x = b`, the linear solve at the heart of CHD `solve_variationnal`.
+
+The verified pipeline is:
+
+* `triSolveLowerFn` / `triSolveUpperFn` solve triangular systems by forward/back substitution
+  (`triSolveLowerFn_mulVec`, `triSolveUpperFn_mulVec` — exact);
+* `cholSolveFn` composes them through a Cholesky factor `L` to solve `(L·Lᵀ)·x = b`
+  (`cholSolveFn_mulVec` — exact);
+* `solveRidgeFn` factors `K + γ·I` and solves, giving `(K + γ·I)·x = b`
+  (`solveRidgeFn_mulVec`, under the SPD success condition `posDef_addScaledIdFn` provides).
+
+The kernel `K = G · Gᵀ` here is a rank-deficient (singular) Gram matrix — exactly the GP/kernel
+setting CHD targets — so it is *not* invertible on its own. The checks exhibit:
+
+* **Positive — regularization makes it solvable.** With `γ = 0.5 > 0`, `K + γ·I` is SPD, the Cholesky
+  succeeds, and `solveRidgeFn` returns `x` with `(K + γ·I)·x = b` to machine precision (the exact
+  `solveRidgeFn_mulVec`).
+* **Negative — regularization is necessary.** With `γ = 0` the singular `K` has a zero Cholesky pivot:
+  forward/back substitution divides by zero and the residual blows up (`NaN`/large). This is why CHD
+  regularizes; it is also exactly the `γ > 0` hypothesis of `posDef_addScaledIdFn`.
+-/
+
+@[expose] public section
+
+
+namespace NN.Examples.Factorization.RidgeSolve
+
+/-- Build a length-`n` `Float` vector from a list (missing entries `0`). -/
+def mkVec {n : Nat} (xs : List Float) : Spec.Tensor Float (.dim n .scalar) :=
+  Spec.ofVecFn (fun i => xs.getD i.val 0.0)
+
+/-- The regularized matrix `K + γ·I` as a tensor. -/
+def addGammaI {n : Nat} (K : Spec.Tensor Float (.dim n (.dim n .scalar))) (γ : Float) :
+    Spec.Tensor Float (.dim n (.dim n .scalar)) :=
+  Spec.ofMatFn (fun i j => Spec.get2 K i j + (if i.val == j.val then γ else 0.0))
+
+/-- `ℓ¹` magnitude `Σᵢ |vᵢ|` of a vector (residual size). A *sum* rather than a `max` so that a `NaN`
+entry — produced when an unregularized singular solve divides by a zero pivot — propagates to the
+result instead of being silently dropped by `Float`'s `max`. -/
+def vecAbsErr {n : Nat} (v : Spec.Tensor Float (.dim n .scalar)) : Float :=
+  (List.finRange n).foldl (fun a i => a + Float.abs (Spec.Tensor.toScalar (Spec.get v i))) 0.0
+
+/-- Residual `(K + γ·I)·x − b` of a proposed solution `x`. -/
+def ridgeResidual {n : Nat} (K : Spec.Tensor Float (.dim n (.dim n .scalar))) (γ : Float)
+    (b x : Spec.Tensor Float (.dim n .scalar)) : Spec.Tensor Float (.dim n .scalar) :=
+  Spec.ofVecFn (fun i =>
+    Spec.Tensor.toScalar (Spec.get (Spec.matVecMulSpec (addGammaI K γ) x) i)
+      - Spec.Tensor.toScalar (Spec.get b i))
+
+/-- A `3 × 2` factor; its Gram `K = G · Gᵀ` is a rank-2 (hence singular) `3 × 3` kernel matrix. -/
+def G : Spec.Tensor Float (.dim 3 (.dim 2 .scalar)) :=
+  mkMat [[1, 2],
+         [3, 1],
+         [0, 1]]
+
+/-- The (symmetric, PSD, singular) kernel `K = G · Gᵀ`. -/
+def K : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := mm G (tr G)
+
+def γ : Float := 0.5
+def b : Spec.Tensor Float (.dim 3 .scalar) := mkVec [1, 2, 3]
+
+/-- The ridge solution `x = (K + γ·I)⁻¹ b`, via the verified Cholesky solve. -/
+def x : Spec.Tensor Float (.dim 3 .scalar) := Spec.solveRidgeSpec K γ b
+
+#eval IO.println s!"K = G·Gᵀ (rank-2, singular); γ = {γ}; b = {vecToList b}"
+#eval IO.println s!"ridge solution x = {vecToList x}"
+#eval IO.println s!"residual (K+γI)·x − b = {vecToList (ridgeResidual K γ b x)}"
+
+-- Positive — the verified solve reconstructs `b`: `(K + γ·I)·x = b` (instance of `solveRidgeFn_mulVec`).
+#eval assertLt "kernel-ridge solve: (K + γ·I)·x = b to machine precision"
+  (vecAbsErr (ridgeResidual K γ b x))
+
+/-! ## Negative control: regularization is necessary
+
+The kernel `K` is singular, so with `γ = 0` its Cholesky has a zero pivot and the substitution
+divides by zero — the "solution" does not satisfy the (singular) system. -/
+
+def x0 : Spec.Tensor Float (.dim 3 .scalar) := Spec.solveRidgeSpec K 0.0 b
+
+#eval IO.println s!"unregularized (γ = 0) on singular K: x0 = {vecToList x0}, \
+  residual = {vecToList (ridgeResidual K 0.0 b x0)}"
+
+-- Negative — without regularization the singular system is not solved (zero pivot → NaN/blow-up).
+#eval assertReconFails "unregularized solve of singular K fails (γ = 0 → zero Cholesky pivot)"
+  (vecAbsErr (ridgeResidual K 0.0 b x0))
+
+/-! ## Keystone: positive-definite ⟹ strictly positive Cholesky pivots
+
+`Spec.Factorization.Reconstruction.choleskyFn_diag_pos_of_posDef` proves that an SPD matrix has *all*
+Cholesky pivots `> 0` — exactly the success condition the solve needs — and
+`solveRidgeFn_mulVec_of_posSemidef` uses it to make the ridge solve unconditional for PSD `K`, `γ > 0`.
+These checks exhibit the dichotomy the keystone formalizes. -/
+
+/-- Count of non-positive Cholesky pivots of a square matrix. A `NaN` pivot (from `√(negative)` on a
+non-SPD matrix) also counts, since `NaN > 0` is `false`. The keystone guarantees this is `0` for an
+SPD matrix. -/
+def numNonPosPivots {k : Nat} (M : Spec.Tensor Float (.dim k (.dim k .scalar))) : Float :=
+  let L := Spec.choleskySpec M
+  (List.finRange k).foldl (fun acc j => acc + (if Spec.get2 L j j > 0 then 0.0 else 1.0)) 0.0
+
+/-- The SPD regularized matrix `K + γ·I` (`γ = 0.5 > 0`, `K` PSD). -/
+def Kγ : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := addGammaI K γ
+
+#eval IO.println s!"Cholesky pivots of K + γ·I (SPD): {vecToList (diagOf (Spec.choleskySpec Kγ))}"
+#eval IO.println s!"Cholesky pivots of K (singular, γ = 0): {vecToList (diagOf (Spec.choleskySpec K))}"
+
+-- Positive — SPD ⟹ every Cholesky pivot is > 0 (an instance of `choleskyFn_diag_pos_of_posDef`).
+#eval assertLt "SPD K + γ·I has all-positive Cholesky pivots (keystone)" (numNonPosPivots Kγ)
+
+-- Negative — the singular kernel `K` (PSD but not PD) has a non-positive pivot, so PosDef is needed.
+#eval assertGe "singular K has a non-positive Cholesky pivot (PosDef necessary)"
+  (numNonPosPivots K) 0.5
+
+end NN.Examples.Factorization.RidgeSolve
diff --git a/NN/Proofs/Tensor/Basic.lean b/NN/Proofs/Tensor/Basic.lean
index 33718c6..efaa964 100644
--- a/NN/Proofs/Tensor/Basic.lean
+++ b/NN/Proofs/Tensor/Basic.lean
@@ -11,6 +11,7 @@ public import NN.Proofs.Tensor.Basic.Folds
 public import NN.Proofs.Tensor.Basic.LinearAlgebra
 public import NN.Proofs.Tensor.Basic.Factorizations
 public import NN.Proofs.Tensor.Basic.FactorizationsReconstruction
+public import NN.Proofs.Tensor.Basic.FactorizationsSolve
 public import NN.Proofs.Tensor.Basic.FactorizationsOrthonormal
 public import NN.Proofs.Tensor.Basic.FactorizationsJacobi
 public import NN.Proofs.Tensor.Basic.FactorizationsJacobiDecrease
diff --git a/NN/Proofs/Tensor/Basic/FactorizationsSolve.lean b/NN/Proofs/Tensor/Basic/FactorizationsSolve.lean
new file mode 100644
index 0000000..b5edb94
--- /dev/null
+++ b/NN/Proofs/Tensor/Basic/FactorizationsSolve.lean
@@ -0,0 +1,643 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Proofs.Tensor.Basic.FactorizationsReconstruction
+public import NN.Proofs.Tensor.Basic.FactorizationsOrthonormal
+public import Mathlib.Data.Real.StarOrdered
+
+/-!
+# The Cholesky linear solve and the kernel-ridge (Tikhonov) solve
+
+[`FactorizationsReconstruction`](./FactorizationsReconstruction.lean) proved that the executable
+Cholesky factor satisfies `A = L · Lᵀ` exactly (over `ℝ`, under positive pivots). This file uses that
+to verify the *linear solve* built on top of it — forward/back substitution — and hence the
+kernel-ridge solve `(K + γ·I)·x = b` that is the numerical heart of CHD `solve_variationnal`.
+
+## Main results
+
+* `triSolveLowerFn_mulVec` / `triSolveUpperFn_mulVec` — forward/back substitution are correct: for a
+  lower- (resp. upper-) triangular matrix with nonzero diagonal, the computed vector solves
+  `L · y = b` (resp. `U · x = y`) **exactly**. These are finite, non-iterative algorithms, so the
+  identity is exact over `ℝ` — no residual/asymptotic caveat.
+* `cholSolveFn_mulVec` — composing the two substitutions through a Cholesky factor `L` solves
+  `(L · Lᵀ) · x = b` exactly.
+* `solveRidgeFn_mulVec` — the kernel-ridge solve: if the Cholesky pivots of `K + γ·I` are positive
+  (the success condition), then `solveRidgeFn K γ b` solves `(K + γ·I)·x = b` exactly.
+* `choleskyFn_diag_pos_of_posDef` — the **keystone**: a positive-definite matrix has strictly positive
+  executable Cholesky pivots (the radicand `A[j,j] − Σ_{k<j} L[j,k]² > 0` at each step), proved via an
+  explicit Schur-complement quadratic-form witness.
+* `solveRidgeFn_mulVec_of_posSemidef` (and its tensor-level form) — composing the two: for a
+  positive-semidefinite kernel `K` and `γ > 0`, `solveRidgeFn K γ b` solves `(K + γ·I)·x = b` exactly,
+  with **no pivot hypothesis**. This is the fully discharged verified `solve_variationnal`.
+
+## Method
+
+Each substitution is a `Function.update` fold over the index list (`finRange n` forward, its reverse
+for back-substitution). The key observation is that **no induction on the solved values is needed**:
+the entry `yᵢ` is *defined* to make row `i` of the equation hold, so unfolding its definition and
+using triangularity (the not-yet-visited and structurally-zero terms drop out of the row dot product)
+gives `(L · y)ᵢ = bᵢ` directly. Two generic lemmas — `foldl_update_read` (the value written at the
+split index) and `foldl_update_stable` (earlier entries are never overwritten) — capture the fold
+bookkeeping; `sum_split_lt_eq_gt` performs the `k < i / k = i / k > i` trichotomy on the row sum.
+-/
+
+@[expose] public section
+
+namespace Spec.Factorization.Reconstruction
+
+open Matrix
+open scoped BigOperators
+
+variable {n : Nat}
+
+/-! ## Generic `Function.update`-fold bookkeeping -/
+
+/-- An update-fold never changes an index it does not visit. -/
+theorem foldl_update_not_mem (H : (Fin n → ℝ) → Fin n → ℝ) (l : List (Fin n))
+    (init : Fin n → ℝ) {x : Fin n} (hx : x ∉ l) :
+    (l.foldl (fun acc j => Function.update acc j (H acc j)) init) x = init x := by
+  induction l generalizing init with
+  | nil => simp
+  | cons a t ih =>
+      rw [List.foldl_cons, ih (Function.update init a (H init a))
+        (fun h => hx (List.mem_cons_of_mem _ h))]
+      have hxa : x ≠ a := by rintro rfl; exact hx (by simp)
+      exact Function.update_of_ne hxa _ _
+
+/-- Reading an update-fold over `l₁ ++ i :: l₂` at the split index `i` (not revisited in `l₂`)
+returns the step value applied to the `l₁`-prefix state. -/
+theorem foldl_update_read (H : (Fin n → ℝ) → Fin n → ℝ) (l₁ l₂ : List (Fin n))
+    (init : Fin n → ℝ) {i : Fin n} (hi : i ∉ l₂) :
+    ((l₁ ++ i :: l₂).foldl (fun acc j => Function.update acc j (H acc j)) init) i
+      = H (l₁.foldl (fun acc j => Function.update acc j (H acc j)) init) i := by
+  rw [List.foldl_append, List.foldl_cons, foldl_update_not_mem H l₂ _ hi, Function.update_self]
+
+/-- An update-fold over `l₁ ++ i :: l₂` agrees with its `l₁`-prefix at any index `≠ i` not in `l₂`. -/
+theorem foldl_update_stable (H : (Fin n → ℝ) → Fin n → ℝ) (l₁ l₂ : List (Fin n))
+    (init : Fin n → ℝ) {i m : Fin n} (hm : m ∉ l₂) (hmi : m ≠ i) :
+    ((l₁ ++ i :: l₂).foldl (fun acc j => Function.update acc j (H acc j)) init) m
+      = (l₁.foldl (fun acc j => Function.update acc j (H acc j)) init) m := by
+  rw [List.foldl_append, List.foldl_cons, foldl_update_not_mem H l₂ _ hm,
+    Function.update_of_ne hmi]
+
+/-! ## Splitting a `Fin n` sum at an index -/
+
+/-- Split a sum over `Fin n` into the `k < i`, `k = i`, and `k > i` parts. -/
+theorem sum_split_lt_eq_gt (i : Fin n) (f : Fin n → ℝ) :
+    (∑ k, f k) = (∑ k, if k.val < i.val then f k else 0) + f i
+      + (∑ k, if i.val < k.val then f k else 0) := by
+  rw [show f i = ∑ k, (if k = i then f k else 0) by
+        rw [Finset.sum_ite_eq' Finset.univ i f]; simp]
+  rw [← Finset.sum_add_distrib, ← Finset.sum_add_distrib]
+  apply Finset.sum_congr rfl
+  intro k _
+  rcases lt_trichotomy k.val i.val with h | h | h
+  · have hne : k ≠ i := fun e => by rw [e] at h; exact lt_irrefl _ h
+    rw [if_pos h, if_neg hne, if_neg (by linarith), add_zero, add_zero]
+  · have hki : k = i := Fin.ext h
+    rw [if_neg (by linarith), if_pos hki, if_neg (by linarith), zero_add, add_zero]
+  · have hne : k ≠ i := fun e => by rw [e] at h; exact lt_irrefl _ h
+    rw [if_neg (by linarith), if_neg hne, if_pos h, zero_add, zero_add]
+
+/-! ## `finRange` order splits -/
+
+/-- `finRange n` splits at index `i` as the strictly-smaller prefix, `i`, then the strictly-larger
+suffix. -/
+theorem finRange_split (i : Fin n) :
+    List.finRange n
+      = (List.finRange n).take i.val ++ i :: (List.finRange n).drop (i.val + 1) := by
+  have hlen : i.val < (List.finRange n).length := by rw [List.length_finRange]; exact i.isLt
+  conv_lhs => rw [← List.take_append_drop i.val (List.finRange n)]
+  congr 1
+  rw [List.drop_eq_getElem_cons hlen]
+  congr 1
+  simp [List.getElem_finRange]
+
+/-! ## Forward substitution solves a lower-triangular system exactly -/
+
+/-- **Forward substitution is correct.** For a lower-triangular `L` (`L i j = 0` when `i < j`) with
+nonzero diagonal, `triSolveLowerFn L b` solves `L · y = b` exactly: row `i` of `L · y` is `bᵢ`. -/
+theorem triSolveLowerFn_mulVec (L : Fin n → Fin n → ℝ)
+    (hlow : ∀ i j, i < j → L i j = 0) (hdiag : ∀ i, L i i ≠ 0) (b : Fin n → ℝ) (i : Fin n) :
+    (∑ k, L i k * Spec.triSolveLowerFn L b k) = b i := by
+  set H : (Fin n → ℝ) → Fin n → ℝ := fun acc j => (b j - Spec.dotFn (L j) acc) / L j j with hH
+  set y := Spec.triSolveLowerFn L b with hy
+  set pre := ((List.finRange n).take i.val).foldl
+    (fun acc j => Function.update acc j (H acc j)) (fun _ => 0) with hpre
+  -- `y` is the update-fold over `finRange n`.
+  have hyeq : y = (List.finRange n).foldl (fun acc j => Function.update acc j (H acc j))
+      (fun _ => 0) := rfl
+  -- `i` is not revisited after its turn, and not in its own prefix.
+  have hi₂ : i ∉ (List.finRange n).drop (i.val + 1) := fun hmem => by
+    have := mem_drop_finRange hmem; linarith
+  have hi₁ : i ∉ (List.finRange n).take i.val := fun hmem => by
+    have := mem_take_finRange hmem; exact lt_irrefl _ this
+  -- value written at `i`, prefix value at `i`, and stability for `k < i`.
+  have hy_i : y i = (b i - Spec.dotFn (L i) pre) / L i i := by
+    rw [hyeq]; conv_lhs => rw [finRange_split i]
+    rw [foldl_update_read H _ _ _ hi₂]
+  have hpre_i : pre i = 0 := by rw [hpre, foldl_update_not_mem H _ _ hi₁]
+  have hy_lt : ∀ m : Fin n, m.val < i.val → y m = pre m := by
+    intro m hm
+    have hm₂ : m ∉ (List.finRange n).drop (i.val + 1) := fun hmem => by
+      have := mem_drop_finRange hmem; linarith
+    have hmi : m ≠ i := fun e => by rw [e] at hm; exact lt_irrefl _ hm
+    rw [hyeq]; conv_lhs => rw [finRange_split i]
+    rw [foldl_update_stable H _ _ _ hm₂ hmi]
+  -- the row dot product `dotFn (L i) pre` is the masked partial sum over `k < i`.
+  have hdot : Spec.dotFn (L i) pre = ∑ k, if k.val < i.val then L i k * pre k else 0 := by
+    rw [dotFn_eq_sum, sum_split_lt_eq_gt i (fun k => L i k * pre k)]
+    rw [hpre_i, mul_zero]
+    rw [show (∑ k, if i.val < k.val then L i k * pre k else 0) = 0 by
+          apply Finset.sum_eq_zero; intro k _
+          by_cases hk : i.val < k.val
+          · rw [if_pos hk, hlow i k (by exact hk), zero_mul]
+          · rw [if_neg hk]]
+    ring
+  -- assemble row `i` of `L · y`.
+  rw [sum_split_lt_eq_gt i (fun k => L i k * y k)]
+  rw [show (∑ k, if i.val < k.val then L i k * y k else 0) = 0 by
+        apply Finset.sum_eq_zero; intro k _
+        by_cases hk : i.val < k.val
+        · rw [if_pos hk, hlow i k (by exact hk), zero_mul]
+        · rw [if_neg hk]]
+  rw [show (∑ k, if k.val < i.val then L i k * y k else 0)
+        = ∑ k, if k.val < i.val then L i k * pre k else 0 by
+        apply Finset.sum_congr rfl; intro k _
+        by_cases hk : k.val < i.val
+        · rw [if_pos hk, if_pos hk, hy_lt k hk]
+        · rw [if_neg hk, if_neg hk]]
+  rw [← hdot, hy_i, add_zero]
+  have hdi : L i i ≠ 0 := hdiag i
+  field_simp
+  ring
+
+/-! ## Back substitution solves an upper-triangular system exactly -/
+
+/-- `(finRange n).reverse` splits at index `i` as the strictly-larger block (reversed suffix),
+then `i`, then the strictly-smaller block (reversed prefix). -/
+theorem finRange_reverse_split (i : Fin n) :
+    (List.finRange n).reverse
+      = ((List.finRange n).drop (i.val + 1)).reverse
+        ++ i :: ((List.finRange n).take i.val).reverse := by
+  conv_lhs => rw [finRange_split i]
+  rw [List.reverse_append, List.reverse_cons, List.append_assoc, List.singleton_append]
+
+/-- **Back substitution is correct.** For an upper-triangular `U` (`U i j = 0` when `j < i`) with
+nonzero diagonal, `triSolveUpperFn U c` solves `U · x = c` exactly: row `i` of `U · x` is `cᵢ`. -/
+theorem triSolveUpperFn_mulVec (U : Fin n → Fin n → ℝ)
+    (hup : ∀ i j, j < i → U i j = 0) (hdiag : ∀ i, U i i ≠ 0) (c : Fin n → ℝ) (i : Fin n) :
+    (∑ k, U i k * Spec.triSolveUpperFn U c k) = c i := by
+  set H : (Fin n → ℝ) → Fin n → ℝ := fun acc j => (c j - Spec.dotFn (U j) acc) / U j j with hH
+  set y := Spec.triSolveUpperFn U c with hy
+  set pre := (((List.finRange n).drop (i.val + 1)).reverse).foldl
+    (fun acc j => Function.update acc j (H acc j)) (fun _ => 0) with hpre
+  have hyeq : y = ((List.finRange n).reverse).foldl
+      (fun acc j => Function.update acc j (H acc j)) (fun _ => 0) := rfl
+  have hi₂ : i ∉ ((List.finRange n).take i.val).reverse := fun hmem => by
+    rw [List.mem_reverse] at hmem; have := mem_take_finRange hmem; exact lt_irrefl _ this
+  have hi₁ : i ∉ ((List.finRange n).drop (i.val + 1)).reverse := fun hmem => by
+    rw [List.mem_reverse] at hmem; have := mem_drop_finRange hmem; linarith
+  have hy_i : y i = (c i - Spec.dotFn (U i) pre) / U i i := by
+    rw [hyeq]; conv_lhs => rw [finRange_reverse_split i]
+    rw [foldl_update_read H _ _ _ hi₂]
+  have hpre_i : pre i = 0 := by rw [hpre, foldl_update_not_mem H _ _ hi₁]
+  have hy_gt : ∀ m : Fin n, i.val < m.val → y m = pre m := by
+    intro m hm
+    have hm₂ : m ∉ ((List.finRange n).take i.val).reverse := fun hmem => by
+      rw [List.mem_reverse] at hmem; have := mem_take_finRange hmem; linarith
+    have hmi : m ≠ i := fun e => by rw [e] at hm; exact lt_irrefl _ hm
+    rw [hyeq]; conv_lhs => rw [finRange_reverse_split i]
+    rw [foldl_update_stable H _ _ _ hm₂ hmi]
+  have hdot : Spec.dotFn (U i) pre = ∑ k, if i.val < k.val then U i k * pre k else 0 := by
+    rw [dotFn_eq_sum, sum_split_lt_eq_gt i (fun k => U i k * pre k)]
+    rw [hpre_i, mul_zero]
+    rw [show (∑ k, if k.val < i.val then U i k * pre k else 0) = 0 by
+          apply Finset.sum_eq_zero; intro k _
+          by_cases hk : k.val < i.val
+          · rw [if_pos hk, hup i k (by exact hk), zero_mul]
+          · rw [if_neg hk]]
+    ring
+  rw [sum_split_lt_eq_gt i (fun k => U i k * y k)]
+  rw [show (∑ k, if k.val < i.val then U i k * y k else 0) = 0 by
+        apply Finset.sum_eq_zero; intro k _
+        by_cases hk : k.val < i.val
+        · rw [if_pos hk, hup i k (by exact hk), zero_mul]
+        · rw [if_neg hk]]
+  rw [show (∑ k, if i.val < k.val then U i k * y k else 0)
+        = ∑ k, if i.val < k.val then U i k * pre k else 0 by
+        apply Finset.sum_congr rfl; intro k _
+        by_cases hk : i.val < k.val
+        · rw [if_pos hk, if_pos hk, hy_gt k hk]
+        · rw [if_neg hk, if_neg hk]]
+  rw [← hdot, hy_i]
+  have hdi : U i i ≠ 0 := hdiag i
+  field_simp
+  ring
+
+/-! ## The Cholesky linear solve -/
+
+/-- **Cholesky solve is correct.** For a lower-triangular `L` with nonzero diagonal, the two-pass
+substitution `cholSolveFn L b` solves `(L · Lᵀ) · x = b` exactly. -/
+theorem cholSolveFn_mulVec (L : Fin n → Fin n → ℝ)
+    (hlow : ∀ i j, i < j → L i j = 0) (hdiag : ∀ i, L i i ≠ 0) (b : Fin n → ℝ) :
+    (Matrix.of L * (Matrix.of L)ᵀ) *ᵥ (Spec.cholSolveFn L b) = b := by
+  set z := Spec.triSolveLowerFn L b with hz
+  set U : Fin n → Fin n → ℝ := fun i k => L k i with hU
+  have hup : ∀ i j, j < i → U i j = 0 := fun i j hji => hlow j i hji
+  have hUdiag : ∀ i, U i i ≠ 0 := fun i => hdiag i
+  have hUp : (Matrix.of L)ᵀ *ᵥ (Spec.cholSolveFn L b) = z := by
+    funext i
+    have hx : Spec.cholSolveFn L b = Spec.triSolveUpperFn U z := rfl
+    show (∑ k, ((Matrix.of L)ᵀ i k) * Spec.cholSolveFn L b k) = z i
+    simp only [Matrix.transpose_apply, Matrix.of_apply]
+    rw [hx]
+    exact triSolveUpperFn_mulVec U hup hUdiag z i
+  have hLow : (Matrix.of L) *ᵥ z = b := by
+    funext i
+    show (∑ k, (Matrix.of L i k) * z k) = b i
+    simp only [Matrix.of_apply]
+    exact triSolveLowerFn_mulVec L hlow hdiag b i
+  calc (Matrix.of L * (Matrix.of L)ᵀ) *ᵥ (Spec.cholSolveFn L b)
+      = Matrix.of L *ᵥ ((Matrix.of L)ᵀ *ᵥ (Spec.cholSolveFn L b)) := by
+        rw [Matrix.mulVec_mulVec]
+    _ = Matrix.of L *ᵥ z := by rw [hUp]
+    _ = b := hLow
+
+/-! ## The kernel-ridge (Tikhonov) solve -/
+
+/-- **Kernel-ridge solve is correct (conditional on Cholesky success).** If `K` is symmetric and the
+Cholesky pivots of `K + γ·I` are positive — exactly the condition under which the SPD Cholesky
+succeeds — then `solveRidgeFn K γ b` solves `(K + γ·I)·x = b` exactly. This is the verified core of
+CHD `solve_variationnal`; the positive-pivot hypothesis is discharged unconditionally for an SPD
+`K + γ·I` (PSD kernel `K`, `γ > 0`) in the companion development. -/
+theorem solveRidgeFn_mulVec (K : Fin n → Fin n → ℝ) (γ : ℝ) (b : Fin n → ℝ)
+    (hsymm : ∀ i j, K i j = K j i)
+    (hpos : ∀ j : Fin n, 0 < Spec.choleskyFn (Spec.addScaledIdFn K γ) j j) :
+    (Matrix.of (Spec.addScaledIdFn K γ)) *ᵥ (Spec.solveRidgeFn K γ b) = b := by
+  set A := Spec.addScaledIdFn K γ with hA
+  have hAsymm : ∀ i j, A i j = A j i := by
+    intro i j
+    show K i j + (if i = j then γ else 0) = K j i + (if j = i then γ else 0)
+    rw [hsymm i j]
+    by_cases h : i = j
+    · rw [h]
+    · rw [if_neg h, if_neg (fun e => h e.symm)]
+  obtain ⟨hlowM, hreconM⟩ := isCholesky_of_pos A hAsymm hpos
+  have hlow : ∀ i j, i < j → Spec.choleskyFn A i j = 0 := fun i j hij => by
+    have := hlowM i j hij; simpa using this
+  have hdiag : ∀ i, Spec.choleskyFn A i i ≠ 0 := fun i => ne_of_gt (hpos i)
+  have hxeq : Spec.solveRidgeFn K γ b = Spec.cholSolveFn (Spec.choleskyFn A) b := rfl
+  rw [hxeq, hreconM]
+  exact cholSolveFn_mulVec (Spec.choleskyFn A) hlow hdiag b
+
+/-! ## The regularized matrix `K + γ·I` is symmetric positive-definite
+
+For a positive-semidefinite kernel `K` and regularization `γ > 0`, `K + γ·I` is positive definite —
+the precondition under which the Cholesky-based `solveRidgeFn` is the genuine linear solve. Combined
+with the keystone below (`choleskyFn_diag_pos_of_posDef`: an SPD matrix has strictly positive
+executable Cholesky pivots), this discharges the positive-pivot hypothesis of `solveRidgeFn_mulVec`
+unconditionally, giving `solveRidgeFn_mulVec_of_posSemidef`. -/
+
+/-- `Matrix.of (addScaledIdFn K γ) = Matrix.of K + γ • 1`. -/
+theorem of_addScaledIdFn (K : Fin n → Fin n → ℝ) (γ : ℝ) :
+    Matrix.of (Spec.addScaledIdFn K γ) = Matrix.of K + γ • (1 : Matrix (Fin n) (Fin n) ℝ) := by
+  ext i j
+  simp only [Matrix.of_apply, Matrix.add_apply, Matrix.smul_apply, Matrix.one_apply,
+    Spec.addScaledIdFn, smul_eq_mul]
+  by_cases h : i = j <;> simp [h]
+
+/-- **The regularized (ridge) matrix is SPD.** For a PSD kernel `K` and `γ > 0`, `K + γ·I` is positive
+definite. This is the precondition that makes the Cholesky ridge solve `solveRidgeFn` well-posed
+(its Cholesky factorization exists with positive pivots). -/
+theorem posDef_addScaledIdFn {K : Fin n → Fin n → ℝ} (hK : (Matrix.of K).PosSemidef)
+    {γ : ℝ} (hγ : 0 < γ) : (Matrix.of (Spec.addScaledIdFn K γ)).PosDef := by
+  rw [of_addScaledIdFn]
+  exact Matrix.PosDef.posSemidef_add hK (Matrix.PosDef.one.smul hγ)
+
+/-! ## Keystone: a positive-definite matrix has strictly positive Cholesky pivots
+
+The remaining ingredient that makes `solveRidgeFn_mulVec` unconditional for SPD inputs: for a
+*positive-definite* `A`, every executable Cholesky pivot is `> 0`. Equivalently, the radicand
+`A[j,j] − Σ_{k<j} L[j,k]² > 0` at every step, so the `√` never sees a non-positive argument.
+
+The argument is the classical Schur-complement fact, formalized as an **explicit quadratic-form
+witness** (so it needs no matrix inverse). By strong induction on `j`, the leading `j`-block
+reconstructs from the pivots below `j` (`choleskyFn_dot_eq_local`). Back-substitution
+(`triSolveUpperFn`, already proven correct in this file) produces a vector `z` with `z j = 1` whose
+`A`-quadratic form `zᵀ A z` is exactly the radicand; positive-definiteness forces `zᵀ A z > 0`. -/
+
+/-- A double Gram sum collapses to a sum of squares:
+`∑ᵢ∑ⱼ zᵢ·(∑ₗ Mᵢₗ Mⱼₗ)·zⱼ = ∑ₗ (∑ᵢ zᵢ Mᵢₗ)²`. (The `A = M·Mᵀ` reconstruction turns the witness
+quadratic form into a manifestly nonnegative shape.) -/
+theorem double_sum_gram (z : Fin n → ℝ) (M : Fin n → Fin n → ℝ) :
+    (∑ i, ∑ j, z i * ((∑ l, M i l * M j l) * z j))
+      = ∑ l, (∑ i, z i * M i l) * (∑ i, z i * M i l) := by
+  have hexp : (∑ i, ∑ j, z i * ((∑ l, M i l * M j l) * z j))
+      = ∑ i, ∑ j, ∑ l, (z i * M i l) * (z j * M j l) := by
+    refine Finset.sum_congr rfl (fun i _ => Finset.sum_congr rfl (fun j _ => ?_))
+    rw [Finset.sum_mul, Finset.mul_sum]
+    exact Finset.sum_congr rfl (fun l _ => by ring)
+  rw [hexp,
+    show (∑ i, ∑ j, ∑ l, (z i * M i l) * (z j * M j l))
+        = ∑ i, ∑ l, ∑ j, (z i * M i l) * (z j * M j l)
+      from Finset.sum_congr rfl (fun i _ => Finset.sum_comm),
+    Finset.sum_comm]
+  refine Finset.sum_congr rfl (fun l _ => ?_)
+  rw [Fintype.sum_mul_sum (fun i => z i * M i l) (fun j => z j * M j l)]
+
+/-- **Localized per-entry Cholesky reconstruction.** The proof of `choleskyFn_dot_eq` only uses the
+positivity of the *smaller* pivot `L[j,j]`, so for `j ≤ i` the reconstruction `∑ₖ L[i,k]·L[j,k] =
+A[i,j]` holds assuming only `0 < L[j,j]` — not global positivity. This is what powers the strong
+induction in `choleskyFn_diag_pos_of_posDef`. -/
+theorem choleskyFn_dot_eq_local (A : Fin n → Fin n → ℝ) {i j : Fin n}
+    (hjpos : 0 < Spec.choleskyFn A j j) (hji : j.val ≤ i.val) :
+    (∑ k, Spec.choleskyFn A i k * Spec.choleskyFn A j k) = A i j := by
+  set L := Spec.choleskyFn A with hL
+  have key : ∀ k : Fin n, L i k * L j k
+      = (if k.val < j.val then L i k * L j k else 0) + (if k = j then L i j * L j j else 0) := by
+    intro k
+    rcases lt_trichotomy k.val j.val with h | h | h
+    · have hne : k ≠ j := fun hk => by rw [hk] at h; exact lt_irrefl _ h
+      rw [if_pos h, if_neg hne, add_zero]
+    · have hkj : k = j := Fin.ext h
+      rw [if_neg (by rw [h]; exact lt_irrefl _), if_pos hkj, zero_add, hkj]
+    · have hne : k ≠ j := fun hk => by rw [hk] at h; exact lt_irrefl _ h
+      rw [if_neg (Nat.not_lt.mpr (le_of_lt h)), if_neg hne, add_zero,
+        show L j k = 0 from by rw [hL]; exact Spec.Factorization.choleskyFn_lower_triangular A h,
+        mul_zero]
+  rw [show (∑ k, L i k * L j k)
+      = ∑ k, ((if k.val < j.val then L i k * L j k else 0) + (if k = j then L i j * L j j else 0))
+      from Finset.sum_congr rfl (fun k _ => key k),
+    Finset.sum_add_distrib, Finset.sum_ite_eq' Finset.univ j (fun _ => L i j * L j j)]
+  simp only [Finset.mem_univ, if_true]
+  rcases eq_or_lt_of_le hji with heq | hlt
+  · have hij' : i = j := Fin.ext heq.symm
+    subst hij'
+    have hrad : 0 < A i i - (∑ k, if k.val < i.val then L i k * L i k else 0) := by
+      have hp := hjpos
+      rw [hL, choleskyFn_diag_eq] at hp
+      exact Real.sqrt_pos.mp hp
+    have hsq : L i i * L i i = A i i - (∑ k, if k.val < i.val then L i k * L i k else 0) := by
+      conv_lhs => rw [hL, choleskyFn_diag_eq A i]
+      exact Real.mul_self_sqrt hrad.le
+    rw [hsq]; ring
+  · have hne : L j j ≠ 0 := ne_of_gt hjpos
+    have hmul : L i j * L j j
+        = A i j - (∑ k, if k.val < j.val then L i k * L j k else 0) := by
+      rw [hL, choleskyFn_offdiag_eq A hlt, div_mul_eq_mul_div, mul_div_assoc, div_self hne, mul_one]
+    rw [hmul]; ring
+
+/-- **The radicand / Schur keystone.** For a positive-definite `A`, every executable Cholesky pivot is
+strictly positive: `0 < L[j,j]`. Hence the SPD Cholesky succeeds and the ridge solve is exact. -/
+theorem choleskyFn_diag_pos_of_posDef (A : Fin n → Fin n → ℝ) (hpd : (Matrix.of A).PosDef)
+    (m : Fin n) : 0 < Spec.choleskyFn A m m := by
+  -- symmetry of `A` from Hermitian-ness
+  have hsymm : ∀ i j, A i j = A j i := by
+    intro i j
+    have h := hpd.1.apply i j
+    simp only [Matrix.of_apply, star_trivial] at h
+    exact h.symm
+  -- strong induction on `m.val`
+  suffices H : ∀ N : Nat, ∀ m : Fin n, m.val = N → 0 < Spec.choleskyFn A m m by
+    exact H m.val m rfl
+  intro N
+  induction N using Nat.strong_induction_on with
+  | _ N IH =>
+    intro m hmN
+    have ihpos : ∀ i : Fin n, i.val < m.val → 0 < Spec.choleskyFn A i i := fun i hi =>
+      IH i.val (hmN ▸ hi) i rfl
+    -- reduce to positivity of the radicand
+    rw [choleskyFn_diag_eq A m, Real.sqrt_pos]
+    -- localized reconstruction for pairs `≤ m` with at least one index `< m`
+    have hAij : ∀ i j : Fin n, i.val ≤ m.val → j.val ≤ m.val → (i.val < m.val ∨ j.val < m.val) →
+        (∑ l, Spec.choleskyFn A i l * Spec.choleskyFn A j l) = A i j := by
+      intro i j _ _ hor
+      rcases le_total j.val i.val with hle | hle
+      · have hjm : j.val < m.val := by
+          rcases hor with h | h
+          · exact lt_of_le_of_lt hle h
+          · exact h
+        exact choleskyFn_dot_eq_local A (ihpos j hjm) hle
+      · have him : i.val < m.val := by
+          rcases hor with h | h
+          · exact h
+          · exact lt_of_le_of_lt hle h
+        rw [show (∑ l, Spec.choleskyFn A i l * Spec.choleskyFn A j l)
+              = ∑ l, Spec.choleskyFn A j l * Spec.choleskyFn A i l
+            from Finset.sum_congr rfl (fun l _ => mul_comm _ _),
+          choleskyFn_dot_eq_local A (ihpos i him) hle, hsymm j i]
+    -- the back-substitution system solving `(Lₘᵀ) z = −(row m of L)` on the leading block
+    set U' : Fin n → Fin n → ℝ := fun l i =>
+      if l.val < m.val then (if i.val < m.val then Spec.choleskyFn A i l else 0)
+      else (if i = l then 1 else 0) with hU'
+    set c : Fin n → ℝ := fun l => if l.val < m.val then -(Spec.choleskyFn A m l) else 0 with hc
+    set x' := Spec.triSolveUpperFn U' c with hx'
+    set z : Fin n → ℝ := fun i => if i = m then 1 else x' i with hz
+    have zm1 : z m = 1 := by simp [hz]
+    -- `U'` is upper-triangular with nonzero diagonal
+    have hup : ∀ a b : Fin n, b.val < a.val → U' a b = 0 := by
+      intro a b hba
+      simp only [hU']
+      by_cases ha : a.val < m.val
+      · rw [if_pos ha]
+        by_cases hb : b.val < m.val
+        · rw [if_pos hb]; exact Spec.Factorization.choleskyFn_lower_triangular A hba
+        · rw [if_neg hb]
+      · rw [if_neg ha, if_neg (by intro e; rw [e] at hba; exact lt_irrefl _ hba)]
+    have hUdiag : ∀ a : Fin n, U' a a ≠ 0 := by
+      intro a
+      simp only [hU']
+      by_cases ha : a.val < m.val
+      · rw [if_pos ha, if_pos ha]; exact ne_of_gt (ihpos a ha)
+      · rw [if_neg ha]; simp
+    have hsolve : ∀ l : Fin n, (∑ i, U' l i * x' i) = c l := fun l =>
+      triSolveUpperFn_mulVec U' hup hUdiag c l
+    -- entries `≥ m` of the solve vanish
+    have hx'_ge : ∀ l : Fin n, m.val ≤ l.val → x' l = 0 := by
+      intro l hl
+      have hlm : ¬ l.val < m.val := Nat.not_lt.mpr hl
+      have hsum : (∑ i, U' l i * x' i) = x' l := by
+        rw [show (∑ i, U' l i * x' i) = ∑ i, (if i = l then x' i else 0) from
+          Finset.sum_congr rfl (fun i _ => by
+            simp only [hU', if_neg hlm]
+            by_cases hi : i = l
+            · rw [if_pos hi, if_pos hi, one_mul]
+            · rw [if_neg hi, if_neg hi, zero_mul]),
+          Finset.sum_ite_eq' Finset.univ l (fun i => x' i)]
+        simp
+      have hcl : c l = 0 := by simp only [hc, if_neg hlm]
+      have := hsolve l
+      rw [hsum, hcl] at this
+      exact this
+    have hz_gt : ∀ i : Fin n, m.val < i.val → z i = 0 := by
+      intro i hi
+      have hne : i ≠ m := fun e => by rw [e] at hi; exact lt_irrefl _ hi
+      simp only [hz, if_neg hne]
+      exact hx'_ge i (le_of_lt hi)
+    -- the witness annihilates the leading columns of `L`
+    have hker : ∀ l : Fin n, l.val < m.val → (∑ i, z i * Spec.choleskyFn A i l) = 0 := by
+      intro l hlm
+      have hpl : (∑ i, (if i.val < m.val then x' i * Spec.choleskyFn A i l else 0))
+          = -(Spec.choleskyFn A m l) := by
+        have h := hsolve l
+        rw [show c l = -(Spec.choleskyFn A m l) from by simp only [hc, if_pos hlm]] at h
+        rw [← h]
+        refine Finset.sum_congr rfl (fun i _ => ?_)
+        simp only [hU', if_pos hlm]
+        by_cases hi : i.val < m.val
+        · rw [if_pos hi, if_pos hi, mul_comm]
+        · rw [if_neg hi, if_neg hi, zero_mul]
+      have tw : ∀ i : Fin n, z i * Spec.choleskyFn A i l
+          = (if i.val < m.val then x' i * Spec.choleskyFn A i l else 0)
+            + (if i = m then Spec.choleskyFn A m l else 0) := by
+        intro i
+        rcases lt_trichotomy i.val m.val with hi | hi | hi
+        · have hne : i ≠ m := fun e => by rw [e] at hi; exact lt_irrefl _ hi
+          rw [if_pos hi, if_neg hne, add_zero]
+          simp only [hz, if_neg hne]
+        · have him : i = m := Fin.ext hi
+          rw [if_neg (by rw [hi]; exact lt_irrefl _), if_pos him, zero_add, him, zm1, one_mul]
+        · have hne : i ≠ m := fun e => by rw [e] at hi; exact lt_irrefl _ hi
+          rw [if_neg (Nat.not_lt.mpr (le_of_lt hi)), if_neg hne, add_zero,
+            show z i = 0 from hz_gt i hi, zero_mul]
+      rw [show (∑ i, z i * Spec.choleskyFn A i l)
+          = ∑ i, ((if i.val < m.val then x' i * Spec.choleskyFn A i l else 0)
+            + (if i = m then Spec.choleskyFn A m l else 0))
+          from Finset.sum_congr rfl (fun i _ => tw i),
+        Finset.sum_add_distrib, Finset.sum_ite_eq' Finset.univ m (fun _ => Spec.choleskyFn A m l)]
+      simp only [Finset.mem_univ, if_true]
+      rw [hpl]; ring
+    -- value of the column-`l` contraction `∑ᵢ zᵢ L[i,l]`
+    have wval : ∀ l : Fin n, (∑ i, z i * Spec.choleskyFn A i l)
+        = if l = m then Spec.choleskyFn A m m else 0 := by
+      intro l
+      rcases lt_trichotomy l.val m.val with hl | hl | hl
+      · rw [if_neg (fun e => by rw [e] at hl; exact lt_irrefl _ hl)]
+        exact hker l hl
+      · have hlm : l = m := Fin.ext hl
+        rw [if_pos hlm, hlm]
+        have hper : ∀ i : Fin n, z i * Spec.choleskyFn A i m
+            = if i = m then Spec.choleskyFn A m m else 0 := by
+          intro i
+          rcases lt_trichotomy i.val m.val with hi | hi | hi
+          · rw [if_neg (fun e => by rw [e] at hi; exact lt_irrefl _ hi),
+              Spec.Factorization.choleskyFn_lower_triangular A hi, mul_zero]
+          · have him : i = m := Fin.ext hi
+            rw [if_pos him, him, zm1, one_mul]
+          · rw [if_neg (fun e => by rw [e] at hi; exact lt_irrefl _ hi),
+              show z i = 0 from hz_gt i hi, zero_mul]
+        rw [Finset.sum_congr rfl (fun i _ => hper i),
+          Finset.sum_ite_eq' Finset.univ m (fun _ => Spec.choleskyFn A m m)]
+        simp
+      · rw [if_neg (fun e => by rw [e] at hl; exact lt_irrefl _ hl)]
+        refine Finset.sum_eq_zero (fun i _ => ?_)
+        rcases Nat.lt_or_ge i.val l.val with hi | hi
+        · rw [Spec.Factorization.choleskyFn_lower_triangular A hi, mul_zero]
+        · rw [show z i = 0 from hz_gt i (lt_of_lt_of_le hl hi), zero_mul]
+    -- the Gram term `T1 = ∑ₗ (∑ᵢ zᵢ L[i,l])² = L[m,m]²`
+    have T1eval : (∑ l, (∑ i, z i * Spec.choleskyFn A i l) * (∑ i, z i * Spec.choleskyFn A i l))
+        = Spec.choleskyFn A m m * Spec.choleskyFn A m m := by
+      rw [show (∑ l, (∑ i, z i * Spec.choleskyFn A i l) * (∑ i, z i * Spec.choleskyFn A i l))
+            = ∑ l, (if l = m then Spec.choleskyFn A m m * Spec.choleskyFn A m m else 0)
+          from Finset.sum_congr rfl (fun l _ => by
+            rw [wval l]
+            by_cases hlm : l = m
+            · rw [if_pos hlm, if_pos hlm]
+            · rw [if_neg hlm, if_neg hlm, mul_zero]),
+        Finset.sum_ite_eq' Finset.univ m (fun _ => Spec.choleskyFn A m m * Spec.choleskyFn A m m)]
+      simp
+    -- the residual term `T2 = ∑ᵢ∑ⱼ zᵢ (A[i,j] − R[i,j]) zⱼ` reduces to the `(m,m)` entry
+    have T2eval : (∑ i, ∑ j, z i
+          * ((A i j - ∑ l, Spec.choleskyFn A i l * Spec.choleskyFn A j l) * z j))
+        = A m m - ∑ l, Spec.choleskyFn A m l * Spec.choleskyFn A m l := by
+      rw [Finset.sum_eq_single m]
+      · rw [Finset.sum_eq_single m]
+        · rw [zm1]; ring
+        · intro j _ hj
+          rcases lt_trichotomy j.val m.val with hjm | hjm | hjm
+          · rw [hAij m j (le_refl _) (le_of_lt hjm) (Or.inr hjm)]; ring
+          · exact absurd (Fin.ext hjm) hj
+          · rw [show z j = 0 from hz_gt j hjm, mul_zero, mul_zero]
+        · intro h; exact absurd (Finset.mem_univ m) h
+      · intro i _ hi
+        refine Finset.sum_eq_zero (fun j _ => ?_)
+        rcases lt_trichotomy i.val m.val with him | him | him
+        · rcases lt_trichotomy j.val m.val with hjm | hjm | hjm
+          · rw [hAij i j (le_of_lt him) (le_of_lt hjm) (Or.inl him)]; ring
+          · rw [hAij i j (le_of_lt him) (le_of_eq hjm) (Or.inl him)]; ring
+          · rw [show z j = 0 from hz_gt j hjm, mul_zero, mul_zero]
+        · exact absurd (Fin.ext him) hi
+        · rw [show z i = 0 from hz_gt i him, zero_mul]
+      · intro h; exact absurd (Finset.mem_univ m) h
+    -- splitting the full squared norm of row `m` of `L` (the `> m` part vanishes)
+    have Rmm_split : (∑ l, Spec.choleskyFn A m l * Spec.choleskyFn A m l)
+        = (∑ k, if k.val < m.val then Spec.choleskyFn A m k * Spec.choleskyFn A m k else 0)
+          + Spec.choleskyFn A m m * Spec.choleskyFn A m m := by
+      rw [sum_split_lt_eq_gt m (fun l => Spec.choleskyFn A m l * Spec.choleskyFn A m l),
+        show (∑ k, if m.val < k.val then Spec.choleskyFn A m k * Spec.choleskyFn A m k else 0) = 0
+          from Finset.sum_eq_zero (fun k _ => by
+            by_cases hk : m.val < k.val
+            · rw [if_pos hk, Spec.Factorization.choleskyFn_lower_triangular A hk, zero_mul]
+            · rw [if_neg hk])]
+      ring
+    -- the witness quadratic form `zᵀ A z` equals the radicand
+    have hqf : star z ⬝ᵥ (Matrix.of A *ᵥ z) = ∑ i, ∑ j, z i * (A i j * z j) := by
+      show (∑ i, star (z i) * (∑ j, (Matrix.of A) i j * z j)) = _
+      refine Finset.sum_congr rfl (fun i _ => ?_)
+      rw [star_trivial, Finset.mul_sum]
+      exact Finset.sum_congr rfl (fun j _ => by rw [Matrix.of_apply])
+    have hQsplit : (∑ i, ∑ j, z i * (A i j * z j))
+        = (∑ i, ∑ j, z i * ((∑ l, Spec.choleskyFn A i l * Spec.choleskyFn A j l) * z j))
+          + (∑ i, ∑ j, z i
+              * ((A i j - ∑ l, Spec.choleskyFn A i l * Spec.choleskyFn A j l) * z j)) := by
+      rw [← Finset.sum_add_distrib]
+      refine Finset.sum_congr rfl (fun i _ => ?_)
+      rw [← Finset.sum_add_distrib]
+      exact Finset.sum_congr rfl (fun j _ => by ring)
+    have hqf_eq_rad : star z ⬝ᵥ (Matrix.of A *ᵥ z)
+        = A m m - ∑ k, if k.val < m.val then Spec.choleskyFn A m k * Spec.choleskyFn A m k else 0 := by
+      rw [hqf, hQsplit, double_sum_gram z (Spec.choleskyFn A), T1eval, T2eval, Rmm_split]
+      ring
+    -- positive-definiteness applied to the nonzero witness finishes it
+    have hz_ne : z ≠ 0 := fun h => one_ne_zero (by
+      have hzm := congrFun h m; rwa [zm1, Pi.zero_apply] at hzm)
+    have hpos := hpd.dotProduct_mulVec_pos hz_ne
+    rw [hqf_eq_rad] at hpos
+    exact hpos
+
+/-! ## The kernel-ridge solve, unconditional for SPD inputs -/
+
+/-- **Kernel-ridge solve, unconditional for an SPD regularized system.** For a positive-semidefinite
+kernel `K` and `γ > 0`, `solveRidgeFn K γ b` solves `(K + γ·I)·x = b` exactly — with *no* pivot
+hypothesis. This is the fully discharged verified `solve_variationnal`: the keystone
+`choleskyFn_diag_pos_of_posDef` supplies the positive pivots from `posDef_addScaledIdFn`. -/
+theorem solveRidgeFn_mulVec_of_posSemidef (K : Fin n → Fin n → ℝ) (γ : ℝ) (b : Fin n → ℝ)
+    (hK : (Matrix.of K).PosSemidef) (hγ : 0 < γ) :
+    (Matrix.of (Spec.addScaledIdFn K γ)) *ᵥ (Spec.solveRidgeFn K γ b) = b := by
+  have hpd : (Matrix.of (Spec.addScaledIdFn K γ)).PosDef := posDef_addScaledIdFn hK hγ
+  have hsymm : ∀ i j, K i j = K j i := by
+    intro i j
+    have h := hK.1.apply i j
+    simp only [Matrix.of_apply, star_trivial] at h
+    exact h.symm
+  exact solveRidgeFn_mulVec K γ b hsymm
+    (fun j => choleskyFn_diag_pos_of_posDef (Spec.addScaledIdFn K γ) hpd j)
+
+/-- **Tensor-level kernel-ridge solve, unconditional for SPD inputs.** For a tensor kernel `K` whose
+matrix view is positive-semidefinite and `γ > 0`, `solveRidgeSpec K γ b` solves `(K + γ·I)·x = b`
+exactly: `(K + γ·I) *ᵥ (solveRidgeSpec K γ b) = b`. -/
+theorem solveRidgeSpec_mulVec_of_posSemidef (K : Spec.Tensor ℝ (.dim n (.dim n .scalar))) (γ : ℝ)
+    (b : Spec.Tensor ℝ (.dim n .scalar)) (hK : (Matrix.of (Spec.toMatFn K)).PosSemidef) (hγ : 0 < γ) :
+    (Matrix.of (Spec.addScaledIdFn (Spec.toMatFn K) γ)) *ᵥ (Spec.toVecFn (Spec.solveRidgeSpec K γ b))
+      = Spec.toVecFn b := by
+  have hround : Spec.toVecFn (Spec.solveRidgeSpec K γ b)
+      = Spec.solveRidgeFn (Spec.toMatFn K) γ (Spec.toVecFn b) := by
+    funext i; rfl
+  rw [hround]
+  exact solveRidgeFn_mulVec_of_posSemidef (Spec.toMatFn K) γ (Spec.toVecFn b) hK hγ
diff --git a/NN/Spec/Core/Tensor/Factorizations.lean b/NN/Spec/Core/Tensor/Factorizations.lean
index 0fbe845..98c2b42 100644
--- a/NN/Spec/Core/Tensor/Factorizations.lean
+++ b/NN/Spec/Core/Tensor/Factorizations.lean
@@ -134,6 +134,53 @@ def choleskySpec {n : Nat} (A : Tensor α (.dim n (.dim n .scalar))) :
     Tensor α (.dim n (.dim n .scalar)) :=
   ofMatFn (choleskyFn (toMatFn A))
 
+/-! ## Triangular solves and the kernel-ridge (Tikhonov) linear solve
+
+Once `A` is factored as `A = L · Lᵀ` (Cholesky), the linear system `A · x = b` is solved by two
+triangular substitutions: forward-solve `L · z = b`, then back-solve `Lᵀ · x = z`. Each substitution
+visits the unknowns in an order such that, when row `i` is reached, every unknown it depends on has
+already been computed; the accumulator `acc` holds those values and `0` everywhere else, so the dot
+`dotFn (row i) acc` is exactly the required partial sum (the not-yet-solved and structurally-zero
+terms drop out). This is the linear solve at the heart of CHD `solve_variationnal`. -/
+
+/-- Forward substitution: solve `L · y = b` for a lower-triangular `L` with nonzero diagonal.
+Unknowns are visited `0, 1, …, n-1`; when row `i` is reached `acc` holds `y₀ … yᵢ₋₁` (and `0`
+elsewhere), so `dotFn (L i) acc = Σ_{k<i} L[i,k]·yₖ` by lower-triangularity. -/
+def triSolveLowerFn {n : Nat} (L : Fin n → Fin n → α) (b : Fin n → α) : Fin n → α :=
+  (List.finRange n).foldl
+    (fun acc i => Function.update acc i ((b i - dotFn (L i) acc) / L i i))
+    (fun _ => 0)
+
+/-- Back substitution: solve `U · x = y` for an upper-triangular `U` with nonzero diagonal.
+Unknowns are visited `n-1, …, 1, 0`; when row `i` is reached `acc` holds `xᵢ₊₁ … xₙ₋₁` (and `0`
+elsewhere), so `dotFn (U i) acc = Σ_{k>i} U[i,k]·xₖ` by upper-triangularity. -/
+def triSolveUpperFn {n : Nat} (U : Fin n → Fin n → α) (y : Fin n → α) : Fin n → α :=
+  (List.finRange n).reverse.foldl
+    (fun acc i => Function.update acc i ((y i - dotFn (U i) acc) / U i i))
+    (fun _ => 0)
+
+/-- Solve `A · x = b` given a Cholesky factor `L` of `A` (so `A = L · Lᵀ`): forward-solve
+`L · z = b`, then back-solve `Lᵀ · x = z`. -/
+def cholSolveFn {n : Nat} (L : Fin n → Fin n → α) (b : Fin n → α) : Fin n → α :=
+  triSolveUpperFn (fun i k => L k i) (triSolveLowerFn L b)
+
+/-- The regularized matrix `K + γ·I` as a function. For a symmetric PSD kernel `K` and `γ > 0`
+this is symmetric positive-definite, so its Cholesky factorization succeeds. -/
+def addScaledIdFn {n : Nat} (K : Fin n → Fin n → α) (γ : α) : Fin n → Fin n → α :=
+  fun i j => K i j + (if i = j then γ else 0)
+
+/-- The Tikhonov-regularized (kernel-ridge) solve `(K + γ·I)·x = b`, via the Cholesky factorization
+of `K + γ·I`. This is the linear solve at the core of CHD `solve_variationnal`. -/
+def solveRidgeFn {n : Nat} (K : Fin n → Fin n → α) (γ : α) (b : Fin n → α) : Fin n → α :=
+  cholSolveFn (choleskyFn (addScaledIdFn K γ)) b
+
+/-- Tensor-level kernel-ridge solve: `(K + γ·I)·x = b`.
+
+PyTorch analogue: `torch.linalg.solve(K + γ·I, b)` (specialized to the SPD Cholesky path). -/
+def solveRidgeSpec {n : Nat} (K : Tensor α (.dim n (.dim n .scalar))) (γ : α)
+    (b : Tensor α (.dim n .scalar)) : Tensor α (.dim n .scalar) :=
+  ofVecFn (solveRidgeFn (toMatFn K) γ (toVecFn b))
+
 /-! ## QR factorization (modified Gram–Schmidt)
 
 For `A : m × n`, produce `Q : m × n` with orthonormal columns and `R : n × n` upper-triangular
diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
index e4869b5..a50fddd 100644
--- a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
+++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
@@ -106,6 +106,56 @@ pivot term, and the positive-pivot hypothesis discharges the two side conditions
 radicand for the diagonal (`Real.mul_self_sqrt`) and a non-zero divisor for the below-diagonal
 entries. Symmetry of `A` extends the lower-triangular reconstruction to the whole matrix.
 
+# Solving the regularized system: verified `solve_variationnal`
+
+The eigendecomposition route above gives `(K + γI)⁻¹` as an abstract identity. But CHD does not form
+inverses; it *solves* the regularized system `(K + γI)·x = b`, and the SPD structure makes the direct
+Cholesky route both faster and — crucially for verification — *exact*: because `K + γI` is symmetric
+positive-definite, its Cholesky factorization is finite, so the whole solve carries no asymptotic
+caveat. This is the second, complementary verified route to `solve_variationnal`, in
+[`NN.Proofs.Tensor.Basic.FactorizationsSolve`](https://github.com/lean-dojo/TorchLean/blob/main/NN/Proofs/Tensor/Basic/FactorizationsSolve.lean).
+
+The solve is two triangular substitutions. Forward substitution `triSolveLowerFn` and back
+substitution `triSolveUpperFn` are *exact*: for a lower- (resp. upper-) triangular matrix with nonzero
+diagonal,
+
+$$`(L\,y)_i = b_i \quad\text{and}\quad (U\,x)_i = c_i \qquad\text{for every } i.`
+
+The key observation is that *no induction on the solved values is needed*: the entry `yᵢ` is defined
+precisely to make row `i` balance, so unfolding it and using triangularity — the not-yet-visited and
+structurally-zero terms drop out of the row dot product — gives the identity directly
+(`triSolveLowerFn_mulVec`, `triSolveUpperFn_mulVec`). Each substitution is a `Function.update` fold
+over the index list (`finRange n` forward, its reverse for back-substitution); two generic lemmas,
+`foldl_update_read` and `foldl_update_stable`, capture the bookkeeping that the value written at index
+`i` is never overwritten and earlier values are already in place.
+
+Composing them through a Cholesky factor solves the SPD system exactly (`cholSolveFn_mulVec`):
+
+$$`(L\,L^\top)\,x = b, \qquad x = \texttt{backSolve}\,L^\top\,(\texttt{forwardSolve}\,L\,b).`
+
+Specializing `L` to the Cholesky factor of `K + γI` gives `solveRidgeFn_mulVec`: if the Cholesky
+pivots of `K + γI` are positive — the success condition — then `solveRidgeFn K γ b` solves
+`(K + γI)·x = b` *exactly*. The `RidgeSolve` example exercises this on a rank-deficient Gram kernel
+`K = G·Gᵀ`: with `γ = 0.5` the residual is zero to machine precision, while the *negative control*
+`γ = 0` hits a zero pivot on the singular `K` and diverges — regularization is what makes the solve
+well-posed.
+
+That success condition is now discharged, so the headline `solveRidgeFn_mulVec_of_posSemidef` is
+*unconditional*: for a positive-semidefinite kernel `K` and `γ > 0`, `solveRidgeFn K γ b` solves
+`(K + γI)·x = b` exactly with no pivot hypothesis. Two facts combine. First, `posDef_addScaledIdFn`
+proves `K + γI` is positive-definite (via `Matrix.PosDef.one`, `Matrix.PosDef.smul`,
+`Matrix.PosDef.posSemidef_add`) — genuinely SPD, exactly the regime where Cholesky succeeds. Second,
+the *keystone* `choleskyFn_diag_pos_of_posDef` proves that a positive-definite matrix has
+*strictly positive* executable Cholesky pivots (equivalently the radicand `A[j,j] − Σ_{k<j} L[j,k]² > 0` at each
+step). The proof is the leading-principal Schur-complement fact, formalized as an *explicit
+quadratic-form witness* so it needs no matrix inverse: by strong induction on `j`, the leading block
+reconstructs from the pivots below `j` (`choleskyFn_dot_eq_local`), and back-substitution — the
+`triSolveUpperFn` already proven correct here — produces a vector `z` with `z_j = 1` whose `A`-quadratic
+form `zᵀ A z` *equals* the radicand; positive-definiteness (`Matrix.PosDef.dotProduct_mulVec_pos`)
+forces `zᵀ A z > 0`. The `RidgeSolve` example also exhibits the keystone directly: the SPD `K + γI` has
+all-positive pivots, while the singular `K` has a zero pivot — PosDef is necessary. Nothing here is an
+unproved axiom.
+
 # The a-posteriori residual certificate
 
 For the iterative routines, the replacement for an impossible a-priori convergence proof is an exact
@@ -271,7 +321,18 @@ no cyclic-Jacobi convergence theory, so that cyclic rate remains captured by the
 residual certificate above — bounded numerically by the `assertLt` checks on concrete inputs — never
 by `sorry`; and the geometric machinery (`geom_bound_of_contraction`, `tendsto_zero_of_contraction`)
 is stated for an arbitrary per-step factor, ready to consume such a bound the moment it exists.
+
+On the *direct* solve route there is nothing left to do, because it avoids the eigensolver entirely.
+The kernel-ridge solve `(K + γI)·x = b` is proved correct *exactly* (via verified forward/back
+substitution and Cholesky), the regularized matrix is proved SPD for `γ > 0` (`posDef_addScaledIdFn`),
+and the positive-pivot success condition is now discharged from that SPD fact by the keystone
+`choleskyFn_diag_pos_of_posDef` (the radicand `A[j,j] − Σ_{k<j} L[j,k]² > 0`, proved via the explicit
+Schur-complement quadratic-form witness). Composing them, `solveRidgeFn_mulVec_of_posSemidef` makes the
+verified `solve_variationnal` *unconditional* for any positive-semidefinite kernel `K` and `γ > 0`, with
+no pivot hypothesis remaining.
+
 Everything else is exact: the algebraic faithfulness of the decomposition (orthogonality, orthogonal
 similarity, the residual identity, the per-rotation decrease, the classical-strategy linear rate, and
-correctness in the zero-residual limit) is proved, and the specification-level facts the kernel methods
-rely on are independent of the convergence step, so the CHD foundation is complete.
+correctness in the zero-residual limit), the finite Cholesky/QR reconstructions, and the
+Cholesky-based regularized solve are proved, and the specification-level facts the kernel methods rely
+on are independent of the convergence step, so the CHD foundation is complete.

From e59f536725397bb15ade6c13c83cc5c37c9d98ca Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Sun, 31 May 2026 11:47:09 -0700
Subject: [PATCH 10/22] Close the kernel-ridge solve loop: SPD Cholesky
 capstone + inverse form
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Two capstones on top of the positive-pivot keystone, making the verified
solve_variationnal match the form CHD actually specifies.

Proofs (NN/Proofs/Tensor/Basic/FactorizationsSolve.lean):

* cholesky_posDef — for any PosDef A, the executable choleskyFn IS a genuine
  Cholesky factor (lower-triangular, A = L·Lᵀ, strictly positive diagonal),
  with no pivot/symmetry/success hypothesis. Combines the keystone
  choleskyFn_diag_pos_of_posDef with isCholesky_of_pos.
* solveRidgeFn_eq_inv_mulVec (+ tensor-level solveRidgeSpec_eq_inv_mulVec) —
  solveRidgeFn K γ b = (K + γ·I)⁻¹ b, the closed form CHD solve_variationnal
  specifies. Invertibility from Matrix.PosDef.isUnit; the solve identity
  (K + γ·I)·x = b then pins x uniquely to the inverse. No inverse is ever
  formed by the algorithm.

Examples (NN/Examples/Factorization/RidgeSolve.lean), 8 #eval checks green:

* Capstone reconstruction: SPD K + γ·I gives L·Lᵀ = K + γ·I to machine
  precision; negative control an indefinite matrix hits √(negative) = NaN and
  fails — PosDef (not mere symmetry) is necessary. (Documented subtlety: the
  singular PSD K still reconstructs with a zero pivot; the zero pivot breaks
  only the solve, the dichotomy the keystone isolates — so the reconstruction
  negative control uses a genuinely indefinite matrix.)
* Inverse form: columns built by solveRidgeSpec K γ eⱼ assemble into
  (K + γ·I)⁻¹, and (K + γ·I)·(K + γ·I)⁻¹ = I; negative control γ = 0 on the
  singular K diverges (NaN).

Docs: blueprint Ch4 Factorizations chapter gains a capstone paragraph and a
closed-loop note in "What remains"; the two example module docstrings updated.

sorry/admit/omega-free.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 NN/Examples/Factorization.lean                |  6 +-
 NN/Examples/Factorization/RidgeSolve.lean     | 68 +++++++++++++++++++
 .../Tensor/Basic/FactorizationsSolve.lean     | 60 ++++++++++++++++
 .../Ch4_Verification/Factorizations.lean      | 26 ++++++-
 4 files changed, 158 insertions(+), 2 deletions(-)

diff --git a/NN/Examples/Factorization.lean b/NN/Examples/Factorization.lean
index 2e4d60d..79dc2e1 100644
--- a/NN/Examples/Factorization.lean
+++ b/NN/Examples/Factorization.lean
@@ -46,7 +46,11 @@ factorization misbehaves.
   `K = G·Gᵀ` and `γ > 0`, `solveRidgeFn` reconstructs `b` to machine precision; **negative control**:
   with `γ = 0` the singular `K` has a zero Cholesky pivot and the solve diverges (`NaN`), so
   regularization is necessary. Also exhibits the **keystone** `choleskyFn_diag_pos_of_posDef`: the SPD
-  `K + γ·I` has all-positive Cholesky pivots, while the singular `K` has a zero pivot (PosDef needed).
+  `K + γ·I` has all-positive Cholesky pivots, while the singular `K` has a zero pivot (PosDef needed);
+  and the two **capstones** — `cholesky_posDef` (the SPD Cholesky reconstructs `L·Lᵀ = K + γ·I`
+  exactly, while an *indefinite* matrix fails with a `NaN` pivot) and `solveRidgeFn_eq_inv_mulVec` (the
+  solve *is* the regularized inverse: its columns assemble into `(K + γ·I)⁻¹` with
+  `(K + γ·I)·(K + γ·I)⁻¹ = I`).
 
 Both **positive** checks (a valid factorization reconstructs to `err ≈ 0`) and **negative controls**
 (the same metric reports a large error / `NaN` when a hypothesis is violated) are included, so a
diff --git a/NN/Examples/Factorization/RidgeSolve.lean b/NN/Examples/Factorization/RidgeSolve.lean
index 6fd9ab2..2eeb5ff 100644
--- a/NN/Examples/Factorization/RidgeSolve.lean
+++ b/NN/Examples/Factorization/RidgeSolve.lean
@@ -33,6 +33,16 @@ setting CHD targets — so it is *not* invertible on its own. The checks exhibit
 * **Negative — regularization is necessary.** With `γ = 0` the singular `K` has a zero Cholesky pivot:
   forward/back substitution divides by zero and the residual blows up (`NaN`/large). This is why CHD
   regularizes; it is also exactly the `γ > 0` hypothesis of `posDef_addScaledIdFn`.
+
+It then exercises the two capstone theorems that close the solve story:
+
+* `cholesky_posDef` — for the SPD `K + γ·I` the executable Cholesky reconstructs *exactly*
+  (`L · Lᵀ = K + γ·I`); an *indefinite* matrix instead gets a `√(negative) = NaN` pivot and fails, so
+  positive-definiteness is what the capstone needs.
+* `solveRidgeFn_eq_inv_mulVec` — `solveRidgeFn K γ b = (K + γ·I)⁻¹ b`, the closed form CHD
+  `solve_variationnal` specifies. Solving against each basis vector builds the columns of the inverse,
+  and the assembled matrix satisfies `(K + γ·I) · (K + γ·I)⁻¹ = I` — no inverse is ever formed by the
+  algorithm; every column is a verified Cholesky solve.
 -/
 
 @[expose] public section
@@ -126,4 +136,62 @@ def Kγ : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := addGammaI K γ
 #eval assertGe "singular K has a non-positive Cholesky pivot (PosDef necessary)"
   (numNonPosPivots K) 0.5
 
+/-! ## Capstone: the SPD Cholesky reconstructs exactly
+
+`Spec.Factorization.Reconstruction.cholesky_posDef` bundles the keystone with the reconstruction
+theorem: for the *positive-definite* `K + γ·I`, the executable Cholesky factor is a genuine factor —
+`L · Lᵀ = K + γ·I` exactly — with no pivot or symmetry hypothesis. The negative control is an
+*indefinite* symmetric matrix: there a radicand goes negative, the pivot is `√(negative) = NaN`, and
+reconstruction fails — so positive-definiteness (not mere symmetry) is what the capstone needs. (Note
+the singular `K` itself, being PSD, *does* reconstruct with a zero pivot; the zero pivot breaks only
+the *solve*, which is the dichotomy the keystone above isolates.) -/
+
+/-- An indefinite symmetric matrix (top-left block has eigenvalues `3, −1`): not PosDef, so its
+Cholesky hits `√(negative)`. -/
+def Aindef : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) :=
+  mkMat [[1, 2, 0],
+         [2, 1, 0],
+         [0, 0, 1]]
+
+-- Positive — the SPD `K + γ·I` Cholesky reconstructs exactly: `L · Lᵀ = K + γ·I` (`cholesky_posDef`).
+#eval assertLt "SPD Cholesky reconstructs: L·Lᵀ = K + γ·I (capstone)"
+  (frobSqErr (let L := Spec.choleskySpec Kγ; mm L (tr L)) Kγ)
+
+-- Negative — an indefinite matrix gets a `√(negative) = NaN` pivot, so it does not reconstruct.
+#eval assertReconFails "indefinite matrix Cholesky does not reconstruct (PosDef necessary)"
+  (frobSqErr (let L := Spec.choleskySpec Aindef; mm L (tr L)) Aindef)
+
+/-! ## Closing the loop: the ridge solve *is* the regularized inverse
+
+`Spec.Factorization.Reconstruction.solveRidgeFn_eq_inv_mulVec` proves `solveRidgeFn K γ b
+= (K + γ·I)⁻¹ b` — the closed form CHD `solve_variationnal` specifies. Solving against each standard
+basis vector `eⱼ` therefore produces column `j` of `(K + γ·I)⁻¹`; assembling the columns gives a
+genuine inverse, witnessed by `(K + γ·I) · (K + γ·I)⁻¹ = I`. No matrix inverse is formed by the
+algorithm — every column comes from the verified Cholesky solve. -/
+
+/-- The `j`-th standard basis vector. -/
+def unitVec {k : Nat} (j : Fin k) : Spec.Tensor Float (.dim k .scalar) :=
+  Spec.ofVecFn (fun i => if i = j then 1.0 else 0.0)
+
+/-- The `k × k` identity matrix. -/
+def idMat {k : Nat} : Spec.Tensor Float (.dim k (.dim k .scalar)) :=
+  Spec.ofMatFn (fun i j => if i = j then 1.0 else 0.0)
+
+/-- The regularized inverse `(K + γ·I)⁻¹`, built column-by-column by the verified ridge solve: column
+`j` is `solveRidgeSpec K γ eⱼ` (an instance of `solveRidgeFn_eq_inv_mulVec`). -/
+def ridgeInv {k : Nat} (K : Spec.Tensor Float (.dim k (.dim k .scalar))) (γ : Float) :
+    Spec.Tensor Float (.dim k (.dim k .scalar)) :=
+  Spec.ofMatFn (fun i j => Spec.Tensor.toScalar (Spec.get (Spec.solveRidgeSpec K γ (unitVec j)) i))
+
+#eval IO.println s!"(K+γI)⁻¹ diagonal (assembled from ridge solves): \
+  {vecToList (diagOf (ridgeInv K γ))}"
+
+-- Positive — the assembled inverse really inverts: `(K + γ·I) · (K + γ·I)⁻¹ = I`.
+#eval assertLt "ridge solve builds the regularized inverse: (K+γI)·(K+γI)⁻¹ = I"
+  (frobSqErr (mm Kγ (ridgeInv K γ)) idMat)
+
+-- Negative — with `γ = 0` the singular `K` has no inverse: the column solves diverge (NaN).
+#eval assertReconFails "unregularized singular K has no inverse (γ = 0 → solve diverges)"
+  (frobSqErr (mm K (ridgeInv K 0.0)) idMat)
+
 end NN.Examples.Factorization.RidgeSolve
diff --git a/NN/Proofs/Tensor/Basic/FactorizationsSolve.lean b/NN/Proofs/Tensor/Basic/FactorizationsSolve.lean
index b5edb94..86fa1d9 100644
--- a/NN/Proofs/Tensor/Basic/FactorizationsSolve.lean
+++ b/NN/Proofs/Tensor/Basic/FactorizationsSolve.lean
@@ -611,6 +611,28 @@ theorem choleskyFn_diag_pos_of_posDef (A : Fin n → Fin n → ℝ) (hpd : (Matr
     rw [hqf_eq_rad] at hpos
     exact hpos
 
+/-! ## Capstone: the executable Cholesky *is* the factorization of any SPD matrix
+
+Combining the keystone with the reconstruction theorem proved in `FactorizationsReconstruction`, the
+executable `choleskyFn` is — with *no* hypothesis beyond positive-definiteness — a genuine Cholesky
+factor of any SPD matrix: lower-triangular, with `A = L · Lᵀ`, and strictly positive diagonal. This is
+the unconditional statement "`choleskyFn` computes the Cholesky factorization of an SPD matrix". -/
+
+/-- **The executable Cholesky factorization of an SPD matrix.** For a positive-definite `A`,
+`choleskyFn A` is a genuine Cholesky factor of `A` (lower-triangular, `A = L · Lᵀ`) with strictly
+positive diagonal — no pivot, symmetry, or success hypothesis. The positivity of the pivots is the
+keystone `choleskyFn_diag_pos_of_posDef`; the factorization identity is `isCholesky_of_pos` fed by it. -/
+theorem cholesky_posDef (A : Fin n → Fin n → ℝ) (hpd : (Matrix.of A).PosDef) :
+    Spec.Factorization.IsCholesky (Matrix.of A) (Matrix.of (Spec.choleskyFn A))
+      ∧ ∀ j, 0 < Spec.choleskyFn A j j := by
+  have hsymm : ∀ i j, A i j = A j i := by
+    intro i j
+    have h := hpd.1.apply i j
+    simp only [Matrix.of_apply, star_trivial] at h
+    exact h.symm
+  have hpos : ∀ j, 0 < Spec.choleskyFn A j j := fun j => choleskyFn_diag_pos_of_posDef A hpd j
+  exact ⟨isCholesky_of_pos A hsymm hpos, hpos⟩
+
 /-! ## The kernel-ridge solve, unconditional for SPD inputs -/
 
 /-- **Kernel-ridge solve, unconditional for an SPD regularized system.** For a positive-semidefinite
@@ -641,3 +663,41 @@ theorem solveRidgeSpec_mulVec_of_posSemidef (K : Spec.Tensor ℝ (.dim n (.dim n
     funext i; rfl
   rw [hround]
   exact solveRidgeFn_mulVec_of_posSemidef (Spec.toMatFn K) γ (Spec.toVecFn b) hK hγ
+
+/-! ## Closing the loop: the ridge solve *is* the regularized inverse
+
+CHD `solve_variationnal` is specified as `x = (K + γ·I)⁻¹ b`. The solve theorems above prove
+`(K + γ·I)·x = b`; positive-definiteness makes `K + γ·I` invertible, so that equation pins `x` down
+*uniquely* — and identifies the computed `solveRidgeFn` with the closed form `(K + γ·I)⁻¹ b`. This is
+the exact statement CHD consumes, with no inverse ever formed by the algorithm itself. -/
+
+/-- **The ridge solve equals the regularized inverse applied to `b`.** For a positive-semidefinite
+kernel `K` and `γ > 0`, the computed `solveRidgeFn K γ b` is exactly `(K + γ·I)⁻¹ b` — the closed form
+CHD `solve_variationnal` specifies. Invertibility comes from `posDef_addScaledIdFn` (PosDef ⟹ unit),
+and the solve identity `solveRidgeFn_mulVec_of_posSemidef` then forces equality with the inverse. -/
+theorem solveRidgeFn_eq_inv_mulVec (K : Fin n → Fin n → ℝ) (γ : ℝ) (b : Fin n → ℝ)
+    (hK : (Matrix.of K).PosSemidef) (hγ : 0 < γ) :
+    Spec.solveRidgeFn K γ b = (Matrix.of (Spec.addScaledIdFn K γ))⁻¹ *ᵥ b := by
+  set M := Matrix.of (Spec.addScaledIdFn K γ) with hM
+  have hpd : M.PosDef := posDef_addScaledIdFn hK hγ
+  have hdet : IsUnit M.det := (Matrix.isUnit_iff_isUnit_det (A := M)).mp hpd.isUnit
+  have hsolve : M *ᵥ (Spec.solveRidgeFn K γ b) = b :=
+    solveRidgeFn_mulVec_of_posSemidef K γ b hK hγ
+  calc Spec.solveRidgeFn K γ b
+      = (M⁻¹ * M) *ᵥ (Spec.solveRidgeFn K γ b) := by
+        rw [Matrix.nonsing_inv_mul M hdet, Matrix.one_mulVec]
+    _ = M⁻¹ *ᵥ (M *ᵥ (Spec.solveRidgeFn K γ b)) := by rw [Matrix.mulVec_mulVec]
+    _ = M⁻¹ *ᵥ b := by rw [hsolve]
+
+/-- **Tensor-level: the ridge solve equals the regularized inverse.** For a tensor kernel `K` whose
+matrix view is positive-semidefinite and `γ > 0`, `solveRidgeSpec K γ b` is the regularized inverse
+`(K + γ·I)⁻¹` applied to `b`. -/
+theorem solveRidgeSpec_eq_inv_mulVec (K : Spec.Tensor ℝ (.dim n (.dim n .scalar))) (γ : ℝ)
+    (b : Spec.Tensor ℝ (.dim n .scalar)) (hK : (Matrix.of (Spec.toMatFn K)).PosSemidef) (hγ : 0 < γ) :
+    Spec.toVecFn (Spec.solveRidgeSpec K γ b)
+      = (Matrix.of (Spec.addScaledIdFn (Spec.toMatFn K) γ))⁻¹ *ᵥ Spec.toVecFn b := by
+  have hround : Spec.toVecFn (Spec.solveRidgeSpec K γ b)
+      = Spec.solveRidgeFn (Spec.toMatFn K) γ (Spec.toVecFn b) := by
+    funext i; rfl
+  rw [hround]
+  exact solveRidgeFn_eq_inv_mulVec (Spec.toMatFn K) γ (Spec.toVecFn b) hK hγ
diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
index a50fddd..d60d5a0 100644
--- a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
+++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
@@ -156,6 +156,27 @@ forces `zᵀ A z > 0`. The `RidgeSolve` example also exhibits the keystone direc
 all-positive pivots, while the singular `K` has a zero pivot — PosDef is necessary. Nothing here is an
 unproved axiom.
 
+Two capstones close the solve story. First, the keystone and the reconstruction theorem combine into
+`cholesky_posDef`: for *any* positive-definite `A`, the executable `choleskyFn` is — with no pivot,
+symmetry, or success hypothesis — a genuine Cholesky factor (`A = L · Lᵀ`, lower-triangular, strictly
+positive diagonal). This is the unconditional statement "`choleskyFn` computes the Cholesky
+factorization of an SPD matrix". The `RidgeSolve` example exhibits both directions: the SPD `K + γI`
+reconstructs to machine precision, while an *indefinite* matrix hits a `√(negative) = NaN` pivot and
+fails — positive-definiteness, not mere symmetry, is the hypothesis the capstone needs. (A singular
+PSD `K` still reconstructs, with a zero pivot; the zero pivot breaks only the *solve*, which is exactly
+the dichotomy the keystone isolates.)
+
+Second, `solveRidgeFn_eq_inv_mulVec` identifies the computed solve with the closed form CHD specifies:
+
+$$`\texttt{solveRidgeFn}\,K\,\gamma\,b \;=\; (K + \gamma I)^{-1} b.`
+
+The solve theorems prove `(K + γI)·x = b`; positive-definiteness makes `K + γI` invertible
+(`Matrix.PosDef.isUnit`), so that equation pins `x` down *uniquely* and forces equality with the
+inverse — closing the loop to `solve_variationnal`'s `(K + γI)⁻¹ b` *without the algorithm ever forming
+an inverse*. The `RidgeSolve` example makes this concrete: solving against each standard basis vector
+`eⱼ` produces column `j` of `(K + γI)⁻¹`, and the assembled matrix satisfies
+`(K + γI) · (K + γI)⁻¹ = I` to machine precision, every column coming from the verified Cholesky solve.
+
 # The a-posteriori residual certificate
 
 For the iterative routines, the replacement for an impossible a-priori convergence proof is an exact
@@ -329,7 +350,10 @@ and the positive-pivot success condition is now discharged from that SPD fact by
 `choleskyFn_diag_pos_of_posDef` (the radicand `A[j,j] − Σ_{k<j} L[j,k]² > 0`, proved via the explicit
 Schur-complement quadratic-form witness). Composing them, `solveRidgeFn_mulVec_of_posSemidef` makes the
 verified `solve_variationnal` *unconditional* for any positive-semidefinite kernel `K` and `γ > 0`, with
-no pivot hypothesis remaining.
+no pivot hypothesis remaining. The loop to the CHD specification is closed by
+`solveRidgeFn_eq_inv_mulVec`, which upgrades the solve identity `(K + γI)·x = b` to the closed form
+`x = (K + γI)⁻¹ b` (uniqueness from invertibility), and by `cholesky_posDef`, which states
+unconditionally that the executable Cholesky *is* the factorization of any SPD matrix.
 
 Everything else is exact: the algebraic faithfulness of the decomposition (orthogonality, orthogonal
 similarity, the residual identity, the per-rotation decrease, the classical-strategy linear rate, and

From a79b972c09f2e09027c39d4ee5db8bd605608ba5 Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Sun, 31 May 2026 12:19:05 -0700
Subject: [PATCH 11/22] Identify the CHD eig-form routines: solve_variationnal,
 find_gamma, Z_test
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Tier A's predicate foundation (IsSymEig, add_smul_inv, trace_eq, det_eq) already
existed; this adds the three concrete CHD routines built on it, mirroring
interpolatory.py's eigendecomposition route, with their exact algebra proved over
ℝ from the IsSymEig specification (no appeal to Jacobi convergence).

Spec (NN/Spec/Core/Tensor/Factorizations.lean): executable eig-form mirrors —
projFn (Pga = Vᵀga), ridgeCoeffFn (rᵢ = γ/(λᵢ+γ)), variationalSolveFn, varNoiseFn,
plus tensor wrappers variationalSolveSpec / varNoiseSpec.

Proofs (new NN/Proofs/Tensor/Basic/FactorizationsVariational.lean, sorry/omega-free):
- variationalSolveFn_eq_neg_inv_mulVec: the eig-form solve_variationnal IS -(K+γI)⁻¹·ga
  (from add_smul_inv).
- variationalSolveFn_eq_neg_solveRidgeFn: eig route = Cholesky route — two independent
  implementations of solve_variationnal agree on the one closed form.
- IsSymEig.eigenvalues_nonneg: PSD ⟹ λ ≥ 0 (via VᵀAV PSD-congruence), discharging
  λᵢ+γ ≠ 0 from γ > 0.
- varNoiseFn_eq_ratio: the noise / find_gamma loss / Z_test statistic is the spectral
  ratio Σ(Pgaᵢ·rᵢ)² / Σ Pgaᵢ²·rᵢ.
- ridgeCoeffFn_pos/le_one, varNoiseFn_nonneg/le_one: for a PSD spectrum and γ > 0 the
  noise is a genuine fraction in [0,1].
- projFn_mulVec_self, varNoiseFn_projFn_mulVec: Z_test spectral invariance — feeding
  ga = V·z drops V, so the statistic depends on the kernel only through its spectrum.

Examples (new NN/Examples/Factorization/Variational.lean): 8 green #eval checks on an
SPD kernel — (K+γI)·yb = -ga and yb = -solveRidgeSpec to machine precision, noise ∈
[0,1], spectral invariance noise(V·z)=noise(z); negative controls: wrong eigenvectors
break the solve (residual 3.72), γ = -0.7 pushes noise to -7.19.

Blueprint: new "CHD routines" section + updated "What remains". Scope honesty: only the
deterministic algebra is proved; Z_test's Gaussian sampling/percentiles are statistical,
not algebraic, and are exercised numerically rather than proved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 NN/Examples/Factorization.lean                |   8 +
 NN/Examples/Factorization/Variational.lean    | 162 ++++++++++++++
 NN/Proofs/Tensor/Basic.lean                   |   1 +
 .../Basic/FactorizationsVariational.lean      | 200 ++++++++++++++++++
 NN/Spec/Core/Tensor/Factorizations.lean       |  54 +++++
 .../Ch4_Verification/Factorizations.lean      |  57 ++++-
 6 files changed, 481 insertions(+), 1 deletion(-)
 create mode 100644 NN/Examples/Factorization/Variational.lean
 create mode 100644 NN/Proofs/Tensor/Basic/FactorizationsVariational.lean

diff --git a/NN/Examples/Factorization.lean b/NN/Examples/Factorization.lean
index 79dc2e1..3d06494 100644
--- a/NN/Examples/Factorization.lean
+++ b/NN/Examples/Factorization.lean
@@ -14,6 +14,7 @@ public import NN.Examples.Factorization.SVD
 public import NN.Examples.Factorization.JacobiDecrease
 public import NN.Examples.Factorization.JacobiRate
 public import NN.Examples.Factorization.RidgeSolve
+public import NN.Examples.Factorization.Variational
 
 /-!
 # Matrix factorization examples
@@ -51,6 +52,13 @@ factorization misbehaves.
   exactly, while an *indefinite* matrix fails with a `NaN` pivot) and `solveRidgeFn_eq_inv_mulVec` (the
   solve *is* the regularized inverse: its columns assemble into `(K + γ·I)⁻¹` with
   `(K + γ·I)·(K + γ·I)⁻¹ = I`).
+- `Variational` — the *eigendecomposition* form of CHD `perform_regression_and_find_gamma`
+  (`interpolatory.py`): from `eigh(K)`, the variational solve `yb = -(K + γ·I)⁻¹·ga`, the agreement of
+  the eig and Cholesky routes (`variationalSolveFn_eq_neg_solveRidgeFn`), the
+  `noise`/`find_gamma`-loss/`Z_test` statistic as a spectral ratio bounded in `[0,1]`
+  (`varNoiseFn_nonneg`, `varNoiseFn_le_one`), and `Z_test` spectral invariance
+  (`varNoiseFn_projFn_mulVec`); **negative controls**: wrong eigenvectors break the solve, and `γ < 0`
+  pushes the noise outside `[0,1]`.
 
 Both **positive** checks (a valid factorization reconstructs to `err ≈ 0`) and **negative controls**
 (the same metric reports a large error / `NaN` when a hypothesis is violated) are included, so a
diff --git a/NN/Examples/Factorization/Variational.lean b/NN/Examples/Factorization/Variational.lean
new file mode 100644
index 0000000..83f6e44
--- /dev/null
+++ b/NN/Examples/Factorization/Variational.lean
@@ -0,0 +1,162 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Examples.Factorization.Common
+meta import NN.Examples.Factorization.Common
+
+/-!
+# Example: the CHD variational solve, noise, and `Z_test` statistic (eigendecomposition form)
+
+These checks corroborate `NN.Proofs.Tensor.Basic.FactorizationsVariational`, the eigendecomposition
+route CHD's `perform_regression_and_find_gamma` actually takes (`interpolatory.py`). From `eigh(K)` it
+forms the projected data `Pga = Vᵀ·ga` and shrinkage coefficients `rᵢ = γ/(λᵢ+γ)`, then runs three
+routines off that shared core. We exercise each:
+
+* **The variational solve is the regularized inverse.** `variationalSolveSpec` returns
+  `yb = -(K+γ·I)⁻¹·ga`, so `(K+γ·I)·yb = -ga` to machine precision (`variationalSolveFn_eq_inv_mulVec`).
+* **Eig route = Cholesky route.** The same `yb` equals `-solveRidgeSpec K γ ga` — the verified Cholesky
+  solve from `FactorizationsSolve` — to machine precision (`variationalSolveFn_eq_neg_solveRidgeFn`):
+  two independent implementations, one closed form.
+* **The noise is a fraction.** `varNoiseSpec` (the `noise`, the `find_gamma` loss, the `Z_test`
+  statistic) lies in `[0,1]` (`varNoiseFn_nonneg`, `varNoiseFn_le_one`).
+* **`Z_test` spectral invariance.** Feeding `ga = V·z` makes `V` drop out: the noise of `V·z` under `V`
+  equals the noise of `z` under the identity (`varNoiseFn_projFn_mulVec`) — the statistic depends on
+  the kernel only through its eigenvalues.
+
+Negative controls give the metrics teeth:
+
+* feeding the **wrong** eigenvectors (the identity instead of the true `V`) breaks the solve — the
+  residual `(K+γ·I)·yb + ga` is large, so the *actual* eigendecomposition is needed;
+* with **`γ < 0`** the shrinkage coefficients leave `(0,1]` and the noise falls outside `[0,1]`, so
+  `γ > 0` is necessary for the bound.
+-/
+
+@[expose] public section
+
+
+namespace NN.Examples.Factorization.Variational
+
+/-- Build a length-`n` `Float` vector from a list (missing entries `0`). -/
+def mkVec {n : Nat} (xs : List Float) : Spec.Tensor Float (.dim n .scalar) :=
+  Spec.ofVecFn (fun i => xs.getD i.val 0.0)
+
+/-- The regularized matrix `K + γ·I` as a tensor. -/
+def addGammaI {n : Nat} (K : Spec.Tensor Float (.dim n (.dim n .scalar))) (γ : Float) :
+    Spec.Tensor Float (.dim n (.dim n .scalar)) :=
+  Spec.ofMatFn (fun i j => Spec.get2 K i j + (if i.val == j.val then γ else 0.0))
+
+/-- Matrix–vector product `M · v`. -/
+def mv {n : Nat} (M : Spec.Tensor Float (.dim n (.dim n .scalar)))
+    (v : Spec.Tensor Float (.dim n .scalar)) : Spec.Tensor Float (.dim n .scalar) :=
+  Spec.matVecMulSpec M v
+
+/-- Entrywise negation of a vector. -/
+def negVec {n : Nat} (v : Spec.Tensor Float (.dim n .scalar)) : Spec.Tensor Float (.dim n .scalar) :=
+  Spec.ofVecFn (fun i => 0.0 - Spec.Tensor.toScalar (Spec.get v i))
+
+/-- `ℓ¹` magnitude `Σᵢ |vᵢ|` (a sum, so a `NaN` entry propagates instead of being dropped). -/
+def vecAbsErr {n : Nat} (v : Spec.Tensor Float (.dim n .scalar)) : Float :=
+  (List.finRange n).foldl (fun a i => a + Float.abs (Spec.Tensor.toScalar (Spec.get v i))) 0.0
+
+/-- `ℓ¹` distance `Σᵢ |uᵢ − vᵢ|` between two vectors. -/
+def vecDist {n : Nat} (u v : Spec.Tensor Float (.dim n .scalar)) : Float :=
+  (List.finRange n).foldl
+    (fun a i => a + Float.abs (Spec.Tensor.toScalar (Spec.get u i)
+      - Spec.Tensor.toScalar (Spec.get v i))) 0.0
+
+/-- The `k × k` identity matrix. -/
+def idMat {k : Nat} : Spec.Tensor Float (.dim k (.dim k .scalar)) :=
+  Spec.ofMatFn (fun i j => if i = j then 1.0 else 0.0)
+
+/-- A symmetric positive-definite kernel (eigenvalues ≈ {0.5858, 2, 3.4142}). -/
+def K : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) :=
+  mkMat [[2, 1, 0],
+         [1, 2, 1],
+         [0, 1, 2]]
+
+def γ : Float := 0.5
+def ga : Spec.Tensor Float (.dim 3 .scalar) := mkVec [1, 2, 3]
+
+/-- Eigendecomposition `K = V·diag(λ)·Vᵀ` via cyclic Jacobi (12 sweeps). -/
+def eig : Spec.Tensor Float (.dim 3 .scalar) × Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) :=
+  Spec.symEigJacobiSpec K 12
+def evals : Spec.Tensor Float (.dim 3 .scalar) := eig.1
+def V : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := eig.2
+
+/-- The variational solution `yb = -(K + γ·I)⁻¹·ga` (eigendecomposition form). -/
+def yb : Spec.Tensor Float (.dim 3 .scalar) := Spec.variationalSolveSpec evals V γ ga
+
+#eval IO.println s!"eigenvalues λ = {vecToList evals}; γ = {γ}; ga = {vecToList ga}"
+#eval IO.println s!"variational solution yb = {vecToList yb}"
+#eval IO.println s!"(K+γI)·yb + ga = {vecToList (Spec.ofVecFn (fun i =>
+  Spec.Tensor.toScalar (Spec.get (mv (addGammaI K γ) yb) i)
+    + Spec.Tensor.toScalar (Spec.get ga i)))}"
+
+/-! ## The variational solve is the regularized-inverse solve -/
+
+-- Positive — `yb = -(K+γI)⁻¹·ga`, so `(K+γI)·yb = -ga`, i.e. `(K+γI)·yb + ga ≈ 0`.
+#eval assertLt "variational solve: (K+γI)·yb = -ga to machine precision"
+  (vecAbsErr (Spec.ofVecFn (fun i =>
+    Spec.Tensor.toScalar (Spec.get (mv (addGammaI K γ) yb) i)
+      + Spec.Tensor.toScalar (Spec.get ga i))))
+
+-- Positive — eig route = Cholesky route: `yb = -solveRidgeSpec K γ ga` to machine precision.
+#eval assertLt "eig-form solve = -(Cholesky ridge solve) (two implementations agree)"
+  (vecDist yb (negVec (Spec.solveRidgeSpec K γ ga)))
+
+/-! ## The noise level is a fraction in `[0,1]` -/
+
+/-- The CHD `noise` / `find_gamma` loss / `Z_test` statistic at this `(K, γ, ga)`. -/
+def noise : Float := Spec.varNoiseSpec evals V γ ga
+
+#eval IO.println s!"noise level = {noise}"
+
+-- Positive — `noise ≤ 1` (err = noise − 1 < tol ⟺ noise < 1 + tol).
+#eval assertLt "noise ≤ 1 (find_gamma loss is a fraction)" (noise - 1.0)
+-- Positive — `0 ≤ noise` (err = −noise < tol ⟺ noise > −tol).
+#eval assertLt "0 ≤ noise" (0.0 - noise)
+
+/-! ## `Z_test` spectral invariance: feeding `ga = V·z` drops `V` -/
+
+def z : Spec.Tensor Float (.dim 3 .scalar) := mkVec [0.7, -1.3, 2.1]
+/-- Data expressed in eigencoordinates: `ga = V·z`. -/
+def gaVz : Spec.Tensor Float (.dim 3 .scalar) := mv V z
+
+#eval IO.println s!"noise(V·z under V) = {Spec.varNoiseSpec evals V γ gaVz}; \
+  noise(z under I) = {Spec.varNoiseSpec evals idMat γ z}"
+
+-- Positive — `noise` of `V·z` under `V` equals `noise` of `z` under the identity (spectral only).
+#eval assertApproxEq "Z_test statistic depends only on the spectrum (ga = V·z ⟹ V drops out)"
+  (Spec.varNoiseSpec evals V γ gaVz) (Spec.varNoiseSpec evals idMat γ z)
+
+/-! ## Negative controls -/
+
+/-- The solve fed the **wrong** eigenvectors (identity instead of the true `V`). -/
+def ybWrong : Spec.Tensor Float (.dim 3 .scalar) := Spec.variationalSolveSpec evals idMat γ ga
+
+#eval IO.println s!"wrong-V residual (K+γI)·ybWrong + ga = {vecToList (Spec.ofVecFn (fun i =>
+  Spec.Tensor.toScalar (Spec.get (mv (addGammaI K γ) ybWrong) i)
+    + Spec.Tensor.toScalar (Spec.get ga i)))}"
+
+-- Negative — with the wrong eigenvectors the solve no longer inverts: the residual is large.
+#eval assertGe "wrong eigenvectors break the solve (true eigendecomposition needed)"
+  (vecAbsErr (Spec.ofVecFn (fun i =>
+    Spec.Tensor.toScalar (Spec.get (mv (addGammaI K γ) ybWrong) i)
+      + Spec.Tensor.toScalar (Spec.get ga i)))) 0.5
+
+/-- The noise level computed with `γ < 0` (here `γ = -0.7`, below the smallest eigenvalue ≈ 0.586, so a
+shrinkage coefficient `rᵢ = γ/(λᵢ+γ)` leaves `(0,1]`). -/
+def noiseNeg : Float := Spec.varNoiseSpec evals V (-0.7) ga
+
+#eval IO.println s!"noise with γ = -0.7 (outside [0,1]) = {noiseNeg}"
+
+-- Negative — with `γ < 0` the noise falls outside `[0,1]`, so `γ > 0` is necessary for the bound.
+#eval assertGe "γ < 0 pushes noise outside [0,1] (γ > 0 necessary)"
+  (max (0.0 - noiseNeg) (noiseNeg - 1.0)) 0.01
+
+end NN.Examples.Factorization.Variational
diff --git a/NN/Proofs/Tensor/Basic.lean b/NN/Proofs/Tensor/Basic.lean
index efaa964..aecf544 100644
--- a/NN/Proofs/Tensor/Basic.lean
+++ b/NN/Proofs/Tensor/Basic.lean
@@ -12,6 +12,7 @@ public import NN.Proofs.Tensor.Basic.LinearAlgebra
 public import NN.Proofs.Tensor.Basic.Factorizations
 public import NN.Proofs.Tensor.Basic.FactorizationsReconstruction
 public import NN.Proofs.Tensor.Basic.FactorizationsSolve
+public import NN.Proofs.Tensor.Basic.FactorizationsVariational
 public import NN.Proofs.Tensor.Basic.FactorizationsOrthonormal
 public import NN.Proofs.Tensor.Basic.FactorizationsJacobi
 public import NN.Proofs.Tensor.Basic.FactorizationsJacobiDecrease
diff --git a/NN/Proofs/Tensor/Basic/FactorizationsVariational.lean b/NN/Proofs/Tensor/Basic/FactorizationsVariational.lean
new file mode 100644
index 0000000..61facf0
--- /dev/null
+++ b/NN/Proofs/Tensor/Basic/FactorizationsVariational.lean
@@ -0,0 +1,200 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Proofs.Tensor.Basic.Factorizations
+public import NN.Proofs.Tensor.Basic.FactorizationsSolve
+
+/-!
+# CHD `solve_variationnal`, `find_gamma`, and `Z_test` (eigendecomposition form)
+
+[`Factorizations`](./Factorizations.lean) proved the *predicate-level* spectral facts CHD consumes —
+the regularized inverse `(K + γ·I)⁻¹ = V·diag(1/(λ+γ))·Vᵀ` (`IsSymEig.add_smul_inv`), the trace/det
+sums, and the SVD ⟹ Gram-eigendecomposition bridge. This file closes the gap up to the three concrete
+CHD routines built on those facts (`interpolatory.py`): `solve_variationnal`, `find_gamma`, `Z_test`.
+
+All three are computed from `eigh(K)` and share one arithmetic core: the projected data
+`Pga = Vᵀ·ga` and the shrinkage coefficients `rᵢ = γ/(λᵢ + γ)`. The executable definitions
+(`Spec.variationalSolveFn`, `Spec.varNoiseFn`, …) mirror `interpolatory.py` verbatim. The theorems
+here identify what they compute:
+
+* **`variationalSolveFn_eq_neg_inv_mulVec`** — the eigendecomposition-form variational solution
+  `yb = -V·(Pga/(λ+γ))` *is* the regularized-inverse solve `-(K + γ·I)⁻¹·ga` (from `add_smul_inv`).
+* **`variationalSolveFn_eq_neg_solveRidgeFn`** — hence the eig route and the *Cholesky* route
+  (`solveRidgeFn`, verified in `FactorizationsSolve`) compute the **same** `solve_variationnal`, up to
+  CHD's sign convention. Two independent implementations, one closed form.
+* **`varNoiseFn_eq_ratio`** — the `noise` level (= the `find_gamma` loss = the `Z_test` per-sample
+  statistic) is the spectral ratio `Σ (Pgaᵢ·rᵢ)² / Σ Pgaᵢ²·rᵢ`.
+* **`varNoiseFn_nonneg` / `varNoiseFn_le_one`** — for a PSD spectrum (`λᵢ ≥ 0`) and `γ > 0` the noise
+  lies in `[0, 1]`, because each shrinkage coefficient does (`ridgeCoeffFn_pos`, `ridgeCoeffFn_le_one`).
+  This is the meaningful invariant: the CHD noise level is a genuine fraction.
+* **`projFn_mulVec_self` / `varNoiseFn_projFn_mulVec`** — feeding data `ga = V·z` makes `V` drop out of
+  the statistic, so it depends on the kernel only through its spectrum. This is the deterministic
+  content of "the `Z_test` null distribution depends only on the eigenvalues" (the *distributional*
+  step — Gaussian sampling and percentiles — is out of scope here, exercised numerically instead).
+
+`IsSymEig.eigenvalues_nonneg` supplies the `λᵢ ≥ 0` hypothesis from a positive-semidefinite kernel.
+
+Scope honesty: everything here is exact over `ℝ`, proved from the *specification* `IsSymEig` (so it
+holds for whatever eigendecomposition the solver returns), not from the asymptotic Jacobi convergence.
+-/
+
+@[expose] public section
+
+namespace Spec.Factorization
+
+open Matrix
+open scoped BigOperators
+open Spec.Factorization.Reconstruction
+
+variable {n : Nat}
+
+/-! ## Bridge: the projection is `Vᵀ·ga` -/
+
+/-- `Spec.projFn V ga = Vᵀ *ᵥ ga`: the executable projection is multiplication by `Vᵀ`. -/
+theorem projFn_eq_mulVec (V : Matrix (Fin n) (Fin n) ℝ) (ga : Fin n → ℝ) :
+    Spec.projFn V ga = Vᵀ *ᵥ ga := by
+  funext i
+  rw [Spec.projFn, dotFn_eq_sum]
+  show ∑ k, V k i * ga k = ∑ k, Vᵀ i k * ga k
+  exact Finset.sum_congr rfl (fun k _ => by rw [Matrix.transpose_apply])
+
+/-- Feeding `ga = V·z` recovers `z`: `projFn V (V *ᵥ z) = z` when `Vᵀ·V = 1`. The change of variables
+that makes the `Z_test` statistic depend on the kernel only through its spectrum. -/
+theorem projFn_mulVec_self {V : Matrix (Fin n) (Fin n) ℝ} (hV : Vᵀ * V = 1) (z : Fin n → ℝ) :
+    Spec.projFn V (V *ᵥ z) = z := by
+  rw [projFn_eq_mulVec, Matrix.mulVec_mulVec, hV, Matrix.one_mulVec]
+
+/-! ## The variational solution is the regularized inverse -/
+
+/-- **The eigendecomposition-form `solve_variationnal` is the regularized-inverse solve.** Given an
+eigendecomposition `IsSymEig A Λ V` and `γ` avoiding every `-λᵢ`, the CHD solution
+`yb = -V·(Pga/(λ+γ))` equals `-(A + γ·I)⁻¹·ga`. Proved directly from `add_smul_inv`. -/
+theorem variationalSolveFn_eq_neg_inv_mulVec
+    {A V : Matrix (Fin n) (Fin n) ℝ} {Λ : Fin n → ℝ}
+    (h : IsSymEig A Λ V) (γ : ℝ) (hγ : ∀ i, Λ i + γ ≠ 0) (ga : Fin n → ℝ) :
+    Spec.variationalSolveFn Λ V γ ga
+      = -((A + γ • (1 : Matrix (Fin n) (Fin n) ℝ))⁻¹ *ᵥ ga) := by
+  rw [h.add_smul_inv γ hγ]
+  funext i
+  simp only [Spec.variationalSolveFn, Pi.neg_apply]
+  congr 1
+  rw [dotFn_eq_sum]
+  rw [show (V * Matrix.diagonal (fun j => (Λ j + γ)⁻¹) * Vᵀ) *ᵥ ga
+        = V *ᵥ (fun j => (Λ j + γ)⁻¹ * Spec.projFn V ga j) from by
+        rw [← Matrix.mulVec_mulVec, ← Matrix.mulVec_mulVec]
+        congr 1
+        funext j
+        rw [Matrix.mulVec_diagonal, ← projFn_eq_mulVec]]
+  show ∑ j, V i j * (Spec.projFn V ga j / (Λ j + γ))
+      = ∑ j, V i j * ((Λ j + γ)⁻¹ * Spec.projFn V ga j)
+  exact Finset.sum_congr rfl (fun j _ => by rw [div_eq_mul_inv]; ring)
+
+/-! ## PSD kernels have nonnegative eigenvalues -/
+
+/-- For a positive-semidefinite `A`, every eigenvalue in *any* `IsSymEig` decomposition is `≥ 0`. The
+`i`-th eigenvalue is the quadratic form `vᵢᵀ A vᵢ` of the `i`-th eigenvector, which PSD makes
+nonnegative. -/
+theorem IsSymEig.eigenvalues_nonneg {A V : Matrix (Fin n) (Fin n) ℝ} {Λ : Fin n → ℝ}
+    (h : IsSymEig A Λ V) (hA : A.PosSemidef) (i : Fin n) : 0 ≤ Λ i := by
+  obtain ⟨hV, hAeq⟩ := h
+  -- `Vᵀ A V = diag Λ` (orthogonal conjugation collapses to the diagonal)
+  have hconj : Vᵀ * A * V = Matrix.diagonal Λ := by
+    rw [hAeq,
+      show Vᵀ * (V * Matrix.diagonal Λ * Vᵀ) * V
+          = (Vᵀ * V) * Matrix.diagonal Λ * (Vᵀ * V) by simp [Matrix.mul_assoc],
+      hV, Matrix.one_mul, Matrix.mul_one]
+  -- over ℝ, `Vᴴ = Vᵀ`, so PSD-congruence `Vᵀ A V` is PSD, i.e. `diag Λ` is PSD
+  have hVH : (Vᴴ : Matrix (Fin n) (Fin n) ℝ) = Vᵀ := by
+    ext a b; simp [Matrix.conjTranspose_apply, Matrix.transpose_apply]
+  have hps : (Matrix.diagonal Λ).PosSemidef := by
+    have hcong := hA.conjTranspose_mul_mul_same V
+    rwa [hVH, hconj] at hcong
+  have hdiag := hps.diag_nonneg (i := i)
+  rwa [Matrix.diagonal_apply_eq] at hdiag
+
+/-- **The eig route and the Cholesky route agree.** For a PSD kernel `K` and `γ > 0`, the
+eigendecomposition-form `variationalSolveFn` equals `-solveRidgeFn` (the verified Cholesky solve of
+`FactorizationsSolve`): two independent implementations of CHD `solve_variationnal`, both equal to
+`-(K + γ·I)⁻¹·ga`. -/
+theorem variationalSolveFn_eq_neg_solveRidgeFn
+    {K : Fin n → Fin n → ℝ} {Λ : Fin n → ℝ} {V : Matrix (Fin n) (Fin n) ℝ}
+    (h : IsSymEig (Matrix.of K) Λ V) (hK : (Matrix.of K).PosSemidef) {γ : ℝ} (hγ : 0 < γ)
+    (ga : Fin n → ℝ) :
+    Spec.variationalSolveFn Λ V γ ga = -(Spec.solveRidgeFn K γ ga) := by
+  have hΛ : ∀ i, 0 ≤ Λ i := h.eigenvalues_nonneg hK
+  have hγne : ∀ i, Λ i + γ ≠ 0 := fun i => (by have := hΛ i; linarith : (0:ℝ) < Λ i + γ).ne'
+  rw [variationalSolveFn_eq_neg_inv_mulVec h γ hγne ga,
+    show Spec.solveRidgeFn K γ ga = (Matrix.of (Spec.addScaledIdFn K γ))⁻¹ *ᵥ ga from
+      solveRidgeFn_eq_inv_mulVec K γ ga hK hγ,
+    of_addScaledIdFn]
+
+/-! ## The noise / `find_gamma` loss / `Z_test` statistic -/
+
+/-- **The noise functional as a spectral ratio.** `varNoiseFn` (the CHD `noise`, the `find_gamma` loss,
+and the `Z_test` per-sample statistic) is `Σᵢ (Pgaᵢ·rᵢ)² / Σᵢ Pgaᵢ²·rᵢ`, with `rᵢ = γ/(λᵢ + γ)`. -/
+theorem varNoiseFn_eq_ratio (Λ : Fin n → ℝ) (γ : ℝ) (Pga : Fin n → ℝ) :
+    Spec.varNoiseFn Λ γ Pga
+      = (∑ i, (Pga i * (γ / (Λ i + γ))) ^ 2) / (∑ i, Pga i ^ 2 * (γ / (Λ i + γ))) := by
+  simp only [Spec.varNoiseFn, Spec.ridgeCoeffFn]
+  rw [dotFn_eq_sum, dotFn_eq_sum]
+  congr 1
+  · exact Finset.sum_congr rfl (fun i _ => by ring)
+  · exact Finset.sum_congr rfl (fun i _ => by ring)
+
+/-- A shrinkage coefficient is strictly positive for a PSD spectrum and `γ > 0`. -/
+theorem ridgeCoeffFn_pos {Λ : Fin n → ℝ} (hΛ : ∀ i, 0 ≤ Λ i) {γ : ℝ} (hγ : 0 < γ) (i : Fin n) :
+    0 < Spec.ridgeCoeffFn Λ γ i := by
+  rw [Spec.ridgeCoeffFn]; exact div_pos hγ (by have := hΛ i; linarith)
+
+/-- A shrinkage coefficient is at most `1` for a PSD spectrum and `γ > 0`. -/
+theorem ridgeCoeffFn_le_one {Λ : Fin n → ℝ} (hΛ : ∀ i, 0 ≤ Λ i) {γ : ℝ} (hγ : 0 < γ) (i : Fin n) :
+    Spec.ridgeCoeffFn Λ γ i ≤ 1 := by
+  rw [Spec.ridgeCoeffFn, div_le_one (by have := hΛ i; linarith)]
+  have := hΛ i; linarith
+
+/-- **The noise level is nonnegative** for a PSD spectrum and `γ > 0`. -/
+theorem varNoiseFn_nonneg {Λ : Fin n → ℝ} (hΛ : ∀ i, 0 ≤ Λ i) {γ : ℝ} (hγ : 0 < γ)
+    (Pga : Fin n → ℝ) : 0 ≤ Spec.varNoiseFn Λ γ Pga := by
+  rw [varNoiseFn_eq_ratio]
+  apply div_nonneg
+  · exact Finset.sum_nonneg (fun i _ => sq_nonneg _)
+  · refine Finset.sum_nonneg (fun i _ => ?_)
+    have hd : (0:ℝ) < Λ i + γ := by have := hΛ i; linarith
+    exact mul_nonneg (sq_nonneg _) (div_nonneg hγ.le hd.le)
+
+/-- **The noise level is at most `1`** for a PSD spectrum and `γ > 0`: each squared shrinkage
+coefficient `rᵢ²` is dominated by `rᵢ` (since `0 ≤ rᵢ ≤ 1`), so the numerator is at most the
+denominator. The CHD `noise` is therefore a genuine fraction in `[0, 1]`. -/
+theorem varNoiseFn_le_one {Λ : Fin n → ℝ} (hΛ : ∀ i, 0 ≤ Λ i) {γ : ℝ} (hγ : 0 < γ)
+    (Pga : Fin n → ℝ) : Spec.varNoiseFn Λ γ Pga ≤ 1 := by
+  rw [varNoiseFn_eq_ratio]
+  have hdenom_nonneg : 0 ≤ ∑ i, Pga i ^ 2 * (γ / (Λ i + γ)) :=
+    Finset.sum_nonneg (fun i _ => by
+      have hd : (0:ℝ) < Λ i + γ := by have := hΛ i; linarith
+      exact mul_nonneg (sq_nonneg _) (div_nonneg hγ.le hd.le))
+  have hle : (∑ i, (Pga i * (γ / (Λ i + γ))) ^ 2) ≤ ∑ i, Pga i ^ 2 * (γ / (Λ i + γ)) := by
+    refine Finset.sum_le_sum (fun i _ => ?_)
+    have hd : (0:ℝ) < Λ i + γ := by have := hΛ i; linarith
+    have hr0 : 0 ≤ γ / (Λ i + γ) := div_nonneg hγ.le hd.le
+    have hr1 : γ / (Λ i + γ) ≤ 1 := by rw [div_le_one hd]; have := hΛ i; linarith
+    rw [show (Pga i * (γ / (Λ i + γ))) ^ 2 = Pga i ^ 2 * (γ / (Λ i + γ)) ^ 2 by ring]
+    apply mul_le_mul_of_nonneg_left _ (sq_nonneg _)
+    nlinarith [mul_nonneg hr0 (sub_nonneg.mpr hr1)]
+  rcases hdenom_nonneg.lt_or_eq with hpos | h0
+  · rw [div_le_one hpos]; exact hle
+  · rw [← h0, div_zero]; exact zero_le_one
+
+/-- **`Z_test` spectral invariance.** Replacing the data by `ga = V·z` removes `V` from the statistic:
+`varNoiseFn Λ γ (projFn V (V·z)) = varNoiseFn Λ γ z`. So the functional `Z_test` samples depends on the
+kernel only through its eigenvalues. -/
+theorem varNoiseFn_projFn_mulVec {V : Matrix (Fin n) (Fin n) ℝ} (hV : Vᵀ * V = 1)
+    (Λ : Fin n → ℝ) (γ : ℝ) (z : Fin n → ℝ) :
+    Spec.varNoiseFn Λ γ (Spec.projFn V (V *ᵥ z)) = Spec.varNoiseFn Λ γ z := by
+  rw [projFn_mulVec_self hV]
+
+end Spec.Factorization
diff --git a/NN/Spec/Core/Tensor/Factorizations.lean b/NN/Spec/Core/Tensor/Factorizations.lean
index 98c2b42..b06f667 100644
--- a/NN/Spec/Core/Tensor/Factorizations.lean
+++ b/NN/Spec/Core/Tensor/Factorizations.lean
@@ -376,4 +376,58 @@ def svdSpec {m n : Nat} (A : Tensor α (.dim m (.dim n .scalar))) (sweeps : Nat
     else 0
   (ofMatFn U, ofVecFn σ, ofMatFn (fun i j => arrGet Vf i.val j.val))
 
+/-! ## CHD variational solve, noise, and γ-selection (eigendecomposition form)
+
+CHD's `perform_regression_and_find_gamma` does not use the Cholesky route above; it works through the
+*eigendecomposition* `K = V · diag(λ) · Vᵀ` returned by `eigh(K)` (`symEigJacobiSpec`). Three routines
+share one arithmetic core — the *projected data* `Pga = Vᵀ · ga` and the *shrinkage coefficients*
+`rᵢ = γ/(λᵢ + γ)`:
+
+* `solve_variationnal` returns the solution `yb = -V·(Pga/(λ+γ))` (`= -(K+γ·I)⁻¹·ga`) and a scalar
+  `noise` level;
+* `find_gamma` minimises that *same* `noise` functional over `γ`;
+* `Z_test` evaluates the `noise` functional on random Gaussian data to obtain a null distribution.
+
+The definitions below mirror `interpolatory.py` verbatim, taking the eigenpairs `(Λ, V)` the solver
+returns — exactly as CHD passes `eigh(K)` into them. Their algebraic identities (the solve is the
+regularized inverse; the noise is a spectral quadratic-form ratio in `[0,1]`) are proved in
+[`NN.Proofs.Tensor.Basic.FactorizationsVariational`](../../../Proofs/Tensor/Basic/FactorizationsVariational.lean).
+-/
+
+/-- Projected data `Pga = Vᵀ · ga`: component `i` is `⟨vᵢ, ga⟩`, the coordinate of `ga` along
+eigenvector `i` (column `i` of `V`). Mirrors `np.dot(eigenvectors.T, ga)`. -/
+def projFn {n : Nat} (V : Fin n → Fin n → α) (ga : Fin n → α) : Fin n → α :=
+  fun i => dotFn (fun k => V k i) ga
+
+/-- Shrinkage coefficient `rᵢ = γ/(λᵢ + γ)`. For a PSD spectrum (`λᵢ ≥ 0`) and `γ > 0` this lies in
+`(0, 1]`. Mirrors `coeffs = gamma / (eigenvalues + gamma)`. -/
+def ridgeCoeffFn {n : Nat} (Λ : Fin n → α) (γ : α) : Fin n → α :=
+  fun i => γ / (Λ i + γ)
+
+/-- The CHD variational solution `yb = -V·(Pga / (λ + γ))` in eigendecomposition form. Equal to
+`-(K + γ·I)⁻¹·ga`. Mirrors `yb = -np.dot(eigenvectors, Pga / (eigenvalues + gamma))`. -/
+def variationalSolveFn {n : Nat} (Λ : Fin n → α) (V : Fin n → Fin n → α) (γ : α) (ga : Fin n → α) :
+    Fin n → α :=
+  let Pga := projFn V ga
+  fun i => -dotFn (V i) (fun j => Pga j / (Λ j + γ))
+
+/-- The CHD `noise` level (also the `find_gamma` loss and the `Z_test` per-sample statistic):
+`Σᵢ (Pgaᵢ·rᵢ)² / Σᵢ (Pgaᵢ·rᵢ)·Pgaᵢ`, with `rᵢ = γ/(λᵢ + γ)`. Mirrors
+`noise = np.dot(Pgacoeff, Pgacoeff) / np.dot(Pgacoeff, Pga)` where `Pgacoeff = Pga * coeffs`. -/
+def varNoiseFn {n : Nat} (Λ : Fin n → α) (γ : α) (Pga : Fin n → α) : α :=
+  let pc := fun i => Pga i * ridgeCoeffFn Λ γ i
+  dotFn pc pc / dotFn pc Pga
+
+/-- Tensor-level CHD variational solve `yb = -(K + γ·I)⁻¹·ga`, from eigenpairs `(evals, V)`. -/
+def variationalSolveSpec {n : Nat} (evals : Tensor α (.dim n .scalar))
+    (V : Tensor α (.dim n (.dim n .scalar))) (γ : α) (ga : Tensor α (.dim n .scalar)) :
+    Tensor α (.dim n .scalar) :=
+  ofVecFn (variationalSolveFn (toVecFn evals) (toMatFn V) γ (toVecFn ga))
+
+/-- Tensor-level CHD `noise` / `find_gamma` loss, from eigenvalues, `γ` and the data `ga` (projected
+internally as `Pga = Vᵀ·ga`). -/
+def varNoiseSpec {n : Nat} (evals : Tensor α (.dim n .scalar))
+    (V : Tensor α (.dim n (.dim n .scalar))) (γ : α) (ga : Tensor α (.dim n .scalar)) : α :=
+  varNoiseFn (toVecFn evals) γ (projFn (toMatFn V) (toVecFn ga))
+
 end Spec
diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
index d60d5a0..1c0b959 100644
--- a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
+++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
@@ -177,6 +177,56 @@ an inverse*. The `RidgeSolve` example makes this concrete: solving against each
 `eⱼ` produces column `j` of `(K + γI)⁻¹`, and the assembled matrix satisfies
 `(K + γI) · (K + γI)⁻¹ = I` to machine precision, every column coming from the verified Cholesky solve.
 
+# The CHD routines: variational solve, `find_gamma`, and `Z_test`
+
+The two solve routes above invert `K + γI`. But CHD's `perform_regression_and_find_gamma`
+(`interpolatory.py`) does not stop there: it takes the *eigendecomposition* route — `eigh(K)` once, then
+three routines computed from the eigenpairs `(λ, V)`. They share one arithmetic core: the *projected
+data* `Pga = Vᵀ ga` and the *shrinkage coefficients* `rᵢ = γ/(λᵢ + γ)`.
+[`NN.Proofs.Tensor.Basic.FactorizationsVariational`](https://github.com/lean-dojo/TorchLean/blob/main/NN/Proofs/Tensor/Basic/FactorizationsVariational.lean)
+identifies what each computes; everything is exact over `ℝ`, proved from the *specification*
+`IsSymEig` (so it holds for whatever eigendecomposition the solver returns, asymptotic or not).
+
+The variational solution `solve_variationnal` returns, in eigendecomposition form,
+`yb = -V (Pga/(λ+γ))`. `variationalSolveFn_eq_neg_inv_mulVec` proves this *is* the regularized-inverse
+solve, directly from `add_smul_inv`:
+
+$$`\texttt{variationalSolveFn}\,\Lambda\,V\,\gamma\,ga \;=\; -\,(K + \gamma I)^{-1} ga.`
+
+So the eigendecomposition route and the Cholesky route compute the *same* `solve_variationnal`:
+`variationalSolveFn_eq_neg_solveRidgeFn` proves `variationalSolveFn = -\,\texttt{solveRidgeFn}` for a
+positive-semidefinite kernel `K` and `γ > 0` — two independent implementations agreeing on the one
+closed form `-(K + γI)⁻¹ ga`. The supporting fact `IsSymEig.eigenvalues_nonneg` (a PSD matrix's
+eigenvalues are `≥ 0`, via the congruence `Vᵀ A V` being positive-semidefinite) discharges the
+`λᵢ + γ ≠ 0` side condition from `γ > 0`.
+
+`find_gamma` and `Z_test` share a second quantity, the `noise` level. `varNoiseFn_eq_ratio` exhibits it
+as a spectral quadratic-form ratio:
+
+$$`\texttt{noise} \;=\; \frac{\sum_i (Pga_i\, r_i)^2}{\sum_i Pga_i^2\, r_i},
+\qquad r_i = \frac{\gamma}{\lambda_i + \gamma}.`
+
+`find_gamma` minimises this functional over `γ`; `Z_test` evaluates it on random Gaussian data. The
+load-bearing invariant is that the `noise` is a genuine *fraction*: for a PSD spectrum (`λᵢ ≥ 0`) and
+`γ > 0`, each coefficient satisfies `0 < rᵢ ≤ 1` (`ridgeCoeffFn_pos`, `ridgeCoeffFn_le_one`), so
+`rᵢ² ≤ rᵢ` makes the numerator dominated by the denominator, giving
+
+$$`0 \;\le\; \texttt{noise} \;\le\; 1`
+
+(`varNoiseFn_nonneg`, `varNoiseFn_le_one`). Finally, the `Z_test` statistic depends on the kernel only
+through its *spectrum*: replacing the data by `ga = V z` makes `V` cancel, `projFn V (V z) = z`
+(`projFn_mulVec_self`), so `varNoiseFn Λ γ (projFn V (V z)) = varNoiseFn Λ γ z`
+(`varNoiseFn_projFn_mulVec`). This is the deterministic content of "the `Z_test` null distribution
+depends only on the eigenvalues"; the *distributional* step — Gaussian sampling and the 5%/95%
+percentiles — is statistical rather than algebraic and is left to runtime, exercised numerically.
+
+The `Variational` example confirms all four on a concrete SPD kernel: `(K + γI)·yb = -ga` and
+`yb = -\texttt{solveRidgeSpec}` to machine precision, `noise ∈ [0,1]`, and the spectral invariance
+`noise(V z) = noise(z)` to machine precision. Its *negative controls* show the hypotheses biting:
+feeding the *wrong* eigenvectors (the identity in place of `V`) makes the solve residual large, and
+`γ < 0` pushes the `noise` outside `[0,1]` — so the true eigendecomposition and `γ > 0` are both
+necessary.
+
 # The a-posteriori residual certificate
 
 For the iterative routines, the replacement for an impossible a-priori convergence proof is an exact
@@ -359,4 +409,9 @@ Everything else is exact: the algebraic faithfulness of the decomposition (ortho
 similarity, the residual identity, the per-rotation decrease, the classical-strategy linear rate, and
 correctness in the zero-residual limit), the finite Cholesky/QR reconstructions, and the
 Cholesky-based regularized solve are proved, and the specification-level facts the kernel methods rely
-on are independent of the convergence step, so the CHD foundation is complete.
+on are independent of the convergence step. The three concrete CHD routines built on them are now
+identified too: the eigendecomposition-form `solve_variationnal` equals `-(K + γI)⁻¹ ga` and agrees
+with the Cholesky route, and the `noise`/`find_gamma`-loss/`Z_test` statistic is a spectral ratio
+provably in `[0,1]` that depends on the kernel only through its spectrum. So the CHD foundation is
+complete, the one remaining open item being statistical, not algebraic — the `Z_test`'s Gaussian
+sampling and percentiles, exercised numerically rather than proved.

From 2b82b8faa9a56efd46f3483abaa836b33230bb13 Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Sun, 31 May 2026 13:11:09 -0700
Subject: [PATCH 12/22] Discharge the PSD hypothesis: the CHD linear-mode
 kernel is positive-semidefinite
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Every verified CHD solve/find_gamma/Z_test theorem assumes hK : (Matrix.of K).PosSemidef,
but CHD builds K from data (Modes/kernels.py) — that hypothesis was never discharged.
This proves it for the linear mode, the same move as the positive-pivot keystone: turn
an assumed precondition into a theorem.

Spec (NN/Spec/Core/Tensor/Factorizations.lean): maskColsFn (which_dim column masking),
linearKernelFn (K[i,j] = 1 + scale·⟨Φi,Φj⟩ = 𝟙𝟙ᵀ + scale·ΦΦᵀ, mirroring
LinearMode.vectorized_kernel), and the tensor wrapper linearKernelSpec.

Proofs (new NN/Proofs/Tensor/Basic/FactorizationsKernels.lean, sorry/omega-free):
- linearKernelFn_posSemidef: K is PSD for scale ≥ 0 — 𝟙𝟙ᵀ is a rank-one Gram (PSD),
  ΦΦᵀ is a Gram (posSemidef_self_mul_conjTranspose), scale ≥ 0 keeps it PSD
  (PosSemidef.smul), PosSemidef.add closes the sum.
- linearKernelFn_symm: symmetry, a corollary of PosSemidef.isHermitian.
- linearKernelSpec_posSemidef: the tensor-level statement the solve theorems consume —
  so solveRidgeSpec (linearKernelSpec X w scale) γ b is now an unconditional exact solve
  for γ > 0, no PSD hypothesis left to assume.

Examples (new NN/Examples/Factorization/LinearKernel.lean): 6 green #eval checks —
K = Kᵀ, matches the CHD LinearMode formula, all Jacobi eigenvalues ≥ 0 (feature masking
preserved), the PSD kernel feeds an exact ridge solve (residual 0); negative control:
scale = -1 makes 𝟙𝟙ᵀ − ΦΦᵀ indefinite (two negative eigenvalues).

Blueprint: new "Building the kernel" section; flags the follow-ons — quadratic via the
Schur product theorem (PosSemidef.hadamard, in Mathlib) and Gaussian via Schoenberg
(Bochner not in Mathlib v4.30.0, the new research-grade item).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 NN/Examples/Factorization.lean                |   7 +
 NN/Examples/Factorization/LinearKernel.lean   | 134 ++++++++++++++++++
 NN/Proofs/Tensor/Basic.lean                   |   1 +
 .../Tensor/Basic/FactorizationsKernels.lean   |  99 +++++++++++++
 NN/Spec/Core/Tensor/Factorizations.lean       |  32 +++++
 .../Ch4_Verification/Factorizations.lean      |  31 ++++
 6 files changed, 304 insertions(+)
 create mode 100644 NN/Examples/Factorization/LinearKernel.lean
 create mode 100644 NN/Proofs/Tensor/Basic/FactorizationsKernels.lean

diff --git a/NN/Examples/Factorization.lean b/NN/Examples/Factorization.lean
index 3d06494..bae41f9 100644
--- a/NN/Examples/Factorization.lean
+++ b/NN/Examples/Factorization.lean
@@ -15,6 +15,7 @@ public import NN.Examples.Factorization.JacobiDecrease
 public import NN.Examples.Factorization.JacobiRate
 public import NN.Examples.Factorization.RidgeSolve
 public import NN.Examples.Factorization.Variational
+public import NN.Examples.Factorization.LinearKernel
 
 /-!
 # Matrix factorization examples
@@ -59,6 +60,12 @@ factorization misbehaves.
   (`varNoiseFn_nonneg`, `varNoiseFn_le_one`), and `Z_test` spectral invariance
   (`varNoiseFn_projFn_mulVec`); **negative controls**: wrong eigenvectors break the solve, and `γ < 0`
   pushes the noise outside `[0,1]`.
+- `LinearKernel` — CHD *builds* the kernel from data (`Modes/kernels.py`); the linear mode is
+  `K = 𝟙𝟙ᵀ + scale·Φ·Φᵀ`, proven symmetric positive-semidefinite for `scale ≥ 0`
+  (`linearKernelFn_posSemidef`), which discharges the `PosSemidef` hypothesis every solve/`find_gamma`
+  theorem assumes. Checks: `K = Kᵀ`, matches the CHD `LinearMode` formula, all Jacobi eigenvalues `≥ 0`
+  (masking a feature preserved), and the PSD kernel feeds an exact ridge solve; **negative control**:
+  `scale < 0` makes `K` indefinite (a negative eigenvalue appears).
 
 Both **positive** checks (a valid factorization reconstructs to `err ≈ 0`) and **negative controls**
 (the same metric reports a large error / `NaN` when a hypothesis is violated) are included, so a
diff --git a/NN/Examples/Factorization/LinearKernel.lean b/NN/Examples/Factorization/LinearKernel.lean
new file mode 100644
index 0000000..4d5d617
--- /dev/null
+++ b/NN/Examples/Factorization/LinearKernel.lean
@@ -0,0 +1,134 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Examples.Factorization.Common
+meta import NN.Examples.Factorization.Common
+
+/-!
+# Example: the CHD linear-mode kernel is symmetric positive-semidefinite
+
+These checks corroborate `NN.Proofs.Tensor.Basic.FactorizationsKernels`. The whole verified CHD
+solve / `find_gamma` / `Z_test` development assumes the kernel matrix `K` is positive-semidefinite;
+CHD *builds* `K` from data (`Modes/kernels.py`). For the linear mode,
+
+`K = 𝟙𝟙ᵀ + scale · Φ·Φᵀ`   (`Φ` = column-masked data),
+
+which `linearKernelFn_posSemidef` proves PSD for `scale ≥ 0` — discharging that standing hypothesis for
+the real linear kernel. We exhibit:
+
+* **symmetric** — `K = Kᵀ` to machine precision (`linearKernelFn_symm`);
+* **matches CHD** — `K[i,j] = 1 + scale·⟨xᵢ, xⱼ⟩` agrees with the direct `LinearMode` formula;
+* **positive-semidefinite** — every Jacobi eigenvalue is `≥ 0` (the numeric witness of
+  `linearKernelFn_posSemidef`), and masking a feature (`w = [1,0]`) keeps it PSD;
+* **feeds the verified solve** — because `K` is PSD, `solveRidgeSpec K γ b` is the exact regularized
+  solve for `γ > 0` (`(K+γ·I)·x = b` to machine precision), with no PSD hypothesis left to assume.
+
+**Negative control**: with `scale = -1` the kernel `𝟙𝟙ᵀ − Φ·Φᵀ` is indefinite — a Jacobi eigenvalue
+goes negative — so `scale ≥ 0` is necessary for the PSD guarantee.
+-/
+
+@[expose] public section
+
+
+namespace NN.Examples.Factorization.LinearKernel
+
+/-- Build a length-`n` `Float` vector from a list (missing entries `0`). -/
+def mkVec {n : Nat} (xs : List Float) : Spec.Tensor Float (.dim n .scalar) :=
+  Spec.ofVecFn (fun i => xs.getD i.val 0.0)
+
+/-- Count Jacobi eigenvalues that are negative (below `−10⁻⁹`). `0` certifies positive-semidefiniteness
+numerically; `≥ 1` certifies an indefinite matrix. -/
+def numNegEigs {k : Nat} (M : Spec.Tensor Float (.dim k (.dim k .scalar))) : Float :=
+  let evals := (Spec.symEigJacobiSpec M 12).1
+  (List.finRange k).foldl
+    (fun a i => a + (if Spec.Tensor.toScalar (Spec.get evals i) < -1e-9 then 1.0 else 0.0)) 0.0
+
+/-- The regularized matrix `K + γ·I` as a tensor. -/
+def addGammaI {n : Nat} (K : Spec.Tensor Float (.dim n (.dim n .scalar))) (γ : Float) :
+    Spec.Tensor Float (.dim n (.dim n .scalar)) :=
+  Spec.ofMatFn (fun i j => Spec.get2 K i j + (if i.val == j.val then γ else 0.0))
+
+/-- `ℓ¹` magnitude `Σᵢ |vᵢ|` (a sum, so a `NaN` propagates). -/
+def vecAbsErr {n : Nat} (v : Spec.Tensor Float (.dim n .scalar)) : Float :=
+  (List.finRange n).foldl (fun a i => a + Float.abs (Spec.Tensor.toScalar (Spec.get v i))) 0.0
+
+/-- Residual `(K + γ·I)·x − b`. -/
+def ridgeResidual {n : Nat} (K : Spec.Tensor Float (.dim n (.dim n .scalar))) (γ : Float)
+    (b x : Spec.Tensor Float (.dim n .scalar)) : Spec.Tensor Float (.dim n .scalar) :=
+  Spec.ofVecFn (fun i =>
+    Spec.Tensor.toScalar (Spec.get (Spec.matVecMulSpec (addGammaI K γ) x) i)
+      - Spec.Tensor.toScalar (Spec.get b i))
+
+/-- A `4 × 2` data matrix (4 samples, 2 features). -/
+def X : Spec.Tensor Float (.dim 4 (.dim 2 .scalar)) :=
+  mkMat [[1, 0],
+         [0, 1],
+         [1, 1],
+         [2, 1]]
+
+/-- Selection mask `which_dim = [1,1]` (both features active). -/
+def wAll : Spec.Tensor Float (.dim 2 .scalar) := mkVec [1, 1]
+def scale : Float := 2.0
+
+/-- The linear-mode kernel `K = 𝟙𝟙ᵀ + scale·Φ·Φᵀ` (4×4). -/
+def K : Spec.Tensor Float (.dim 4 (.dim 4 .scalar)) := Spec.linearKernelSpec X wAll scale
+
+/-- Direct CHD `LinearMode` formula `Kref[i,j] = 1 + scale·Σ_k X[i,k]·X[j,k]` (mask all-ones). -/
+def Kref : Spec.Tensor Float (.dim 4 (.dim 4 .scalar)) :=
+  Spec.ofMatFn (fun i j => 1.0 + scale *
+    (List.finRange 2).foldl (fun a k => a + Spec.get2 X i k * Spec.get2 X j k) 0.0)
+
+#eval IO.println s!"linear kernel K =\n{(List.finRange 4).map (fun i =>
+  (List.finRange 4).map (fun j => Spec.get2 K i j))}"
+#eval IO.println s!"eigenvalues of K = {vecToList (Spec.symEigJacobiSpec K 12).1}"
+
+-- Positive — `K` is symmetric (`linearKernelFn_symm`).
+#eval assertLt "linear kernel is symmetric: K = Kᵀ" (maxMatErr K (tr K))
+
+-- Positive — `K` matches the direct CHD `LinearMode` formula.
+#eval assertLt "linear kernel matches CHD LinearMode formula" (maxMatErr K Kref)
+
+-- Positive — `K` is positive-semidefinite: no negative Jacobi eigenvalue (`linearKernelFn_posSemidef`).
+#eval assertLt "linear kernel is PSD: no negative eigenvalue" (numNegEigs K)
+
+/-! ## Masking a feature preserves PSD -/
+
+/-- Mask out feature 1: `which_dim = [1,0]`. -/
+def wMask : Spec.Tensor Float (.dim 2 .scalar) := mkVec [1, 0]
+def Kmask : Spec.Tensor Float (.dim 4 (.dim 4 .scalar)) := Spec.linearKernelSpec X wMask scale
+
+#eval IO.println s!"masked-feature kernel eigenvalues = {vecToList (Spec.symEigJacobiSpec Kmask 12).1}"
+
+-- Positive — masking a feature keeps the kernel PSD (PSD holds for any mask).
+#eval assertLt "masked linear kernel is still PSD" (numNegEigs Kmask)
+
+/-! ## The PSD kernel feeds the verified ridge solve -/
+
+def γ : Float := 0.5
+def b : Spec.Tensor Float (.dim 4 .scalar) := mkVec [1, 2, 3, 4]
+/-- The ridge solution against the linear kernel `K`. -/
+def x : Spec.Tensor Float (.dim 4 .scalar) := Spec.solveRidgeSpec K γ b
+
+#eval IO.println s!"ridge solve on the linear kernel: residual = {vecToList (ridgeResidual K γ b x)}"
+
+-- Positive — `K` PSD ⟹ `solveRidgeSpec K γ b` is the exact solve of `(K+γI)·x = b` (γ > 0, no
+-- PSD hypothesis to assume — it is now proven for this kernel).
+#eval assertLt "PSD linear kernel ⟹ exact ridge solve (K+γI)·x = b"
+  (vecAbsErr (ridgeResidual K γ b x))
+
+/-! ## Negative control: `scale < 0` breaks positive-semidefiniteness -/
+
+/-- The same kernel with `scale = -1`: `𝟙𝟙ᵀ − Φ·Φᵀ`, indefinite. -/
+def Kneg : Spec.Tensor Float (.dim 4 (.dim 4 .scalar)) := Spec.linearKernelSpec X wAll (-1.0)
+
+#eval IO.println s!"scale = -1 kernel eigenvalues = {vecToList (Spec.symEigJacobiSpec Kneg 12).1}"
+
+-- Negative — with `scale < 0` the kernel is indefinite: at least one eigenvalue is negative.
+#eval assertGe "scale < 0 breaks PSD (indefinite kernel)" (numNegEigs Kneg) 1.0
+
+end NN.Examples.Factorization.LinearKernel
diff --git a/NN/Proofs/Tensor/Basic.lean b/NN/Proofs/Tensor/Basic.lean
index aecf544..7ca7dc3 100644
--- a/NN/Proofs/Tensor/Basic.lean
+++ b/NN/Proofs/Tensor/Basic.lean
@@ -13,6 +13,7 @@ public import NN.Proofs.Tensor.Basic.Factorizations
 public import NN.Proofs.Tensor.Basic.FactorizationsReconstruction
 public import NN.Proofs.Tensor.Basic.FactorizationsSolve
 public import NN.Proofs.Tensor.Basic.FactorizationsVariational
+public import NN.Proofs.Tensor.Basic.FactorizationsKernels
 public import NN.Proofs.Tensor.Basic.FactorizationsOrthonormal
 public import NN.Proofs.Tensor.Basic.FactorizationsJacobi
 public import NN.Proofs.Tensor.Basic.FactorizationsJacobiDecrease
diff --git a/NN/Proofs/Tensor/Basic/FactorizationsKernels.lean b/NN/Proofs/Tensor/Basic/FactorizationsKernels.lean
new file mode 100644
index 0000000..dc399aa
--- /dev/null
+++ b/NN/Proofs/Tensor/Basic/FactorizationsKernels.lean
@@ -0,0 +1,99 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Proofs.Tensor.Basic.Factorizations
+public import NN.Proofs.Tensor.Basic.FactorizationsOrthonormal
+public import Mathlib.Data.Real.StarOrdered
+
+/-!
+# CHD mode kernels are symmetric positive-semidefinite
+
+The entire verified CHD solve / `find_gamma` / `Z_test` development takes the kernel matrix `K` as
+input under the hypothesis `(Matrix.of K).PosSemidef`. CHD does not receive `K`; it *builds* it from
+data (`Modes/kernels.py`). This file discharges that standing hypothesis for the **linear mode** — the
+first and simplest of CHD's kernels — exactly as the positive-pivot keystone discharged the Cholesky
+success condition.
+
+The linear-mode kernel is `K[i,j] = 1 + scale · ⟨Φ i, Φ j⟩` with `Φ` the column-masked data, i.e.
+
+`K = 𝟙𝟙ᵀ + scale · Φ·Φᵀ`,
+
+a sum of the all-ones matrix (a rank-one Gram, PSD) and a scaled Gram matrix `Φ·Φᵀ` (PSD for
+`scale ≥ 0` by `posSemidef_self_mul_conjTranspose`). `PosSemidef.add` / `PosSemidef.smul` finish it.
+
+* `linearKernelFn_posSemidef` — `(Matrix.of (linearKernelFn X w scale)).PosSemidef` for `0 ≤ scale`.
+* `linearKernelFn_symm` — `K` is symmetric (a corollary, via `PosSemidef.isHermitian`).
+* `linearKernelSpec_posSemidef` — the tensor-level statement, the form the solve theorems consume.
+
+Quadratic mode (`PosSemidef.hadamard`, the Schur product theorem) and Gaussian mode (Bochner /
+Schoenberg, not in Mathlib v4.30.0) are the natural follow-ons.
+-/
+
+@[expose] public section
+
+namespace Spec.Factorization
+
+open Matrix
+open scoped BigOperators
+open Spec.Factorization.Reconstruction
+
+variable {n d : Nat}
+
+/-- Over `ℝ`, `Φᴴ = Φᵀ` (the star is trivial), for any rectangular matrix. -/
+private theorem conjTranspose_eq_transpose {m k : Nat} (Φ : Matrix (Fin m) (Fin k) ℝ) :
+    (Φᴴ : Matrix (Fin k) (Fin m) ℝ) = Φᵀ := by
+  ext a b; simp [Matrix.conjTranspose_apply, Matrix.transpose_apply]
+
+/-- The Gram matrix `Φ·Φᵀ` is positive-semidefinite (real form of
+`posSemidef_self_mul_conjTranspose`). -/
+private theorem posSemidef_mul_transpose_self {m k : Nat} (Φ : Matrix (Fin m) (Fin k) ℝ) :
+    (Φ * Φᵀ).PosSemidef := by
+  have h := Matrix.posSemidef_self_mul_conjTranspose Φ
+  rwa [conjTranspose_eq_transpose Φ] at h
+
+/-- **The linear-mode kernel is symmetric positive-semidefinite.** For data `X`, selection mask `w`,
+and `scale ≥ 0`, `K = 𝟙𝟙ᵀ + scale·Φ·Φᵀ` is PSD — discharging the `PosSemidef` hypothesis of the CHD
+solve / `find_gamma` development for the real linear kernel. -/
+theorem linearKernelFn_posSemidef (X : Fin n → Fin d → ℝ) (w : Fin d → ℝ) {scale : ℝ}
+    (hscale : 0 ≤ scale) : (Matrix.of (Spec.linearKernelFn X w scale)).PosSemidef := by
+  -- the masked data as a matrix, and the all-ones column
+  set Φ : Matrix (Fin n) (Fin d) ℝ := Matrix.of (Spec.maskColsFn X w) with hΦ
+  set Ψ : Matrix (Fin n) (Fin 1) ℝ := Matrix.of (fun _ _ => 1) with hΨ
+  -- `K = Ψ·Ψᵀ + scale • (Φ·Φᵀ)`
+  have hKeq : Matrix.of (Spec.linearKernelFn X w scale) = Ψ * Ψᵀ + scale • (Φ * Φᵀ) := by
+    ext i j
+    simp only [Matrix.of_apply, Matrix.add_apply, Matrix.smul_apply, smul_eq_mul,
+      Matrix.mul_apply, Matrix.transpose_apply, Spec.linearKernelFn, hΦ, hΨ]
+    rw [dotFn_eq_sum, Fin.sum_univ_one]
+    simp only [Spec.maskColsFn]
+    ring
+  rw [hKeq]
+  exact (posSemidef_mul_transpose_self Ψ).add ((posSemidef_mul_transpose_self Φ).smul hscale)
+
+/-- The linear-mode kernel is symmetric: `K[i,j] = K[j,i]`. -/
+theorem linearKernelFn_symm (X : Fin n → Fin d → ℝ) (w : Fin d → ℝ) {scale : ℝ}
+    (hscale : 0 ≤ scale) (i j : Fin n) :
+    Spec.linearKernelFn X w scale i j = Spec.linearKernelFn X w scale j i := by
+  have h := (linearKernelFn_posSemidef X w hscale).isHermitian
+  have e : (Matrix.of (Spec.linearKernelFn X w scale))ᴴ i j
+      = (Matrix.of (Spec.linearKernelFn X w scale)) i j := by rw [h]
+  simpa [Matrix.conjTranspose_apply, Matrix.of_apply] using e.symm
+
+/-- **Tensor-level: the linear-mode kernel is positive-semidefinite.** The form the verified solve
+consumes: `(Matrix.of (toMatFn (linearKernelSpec X w scale))).PosSemidef` for `scale ≥ 0`, so e.g.
+`solveRidgeSpec (linearKernelSpec X w scale) γ b` is the exact regularized solve for any `γ > 0`. -/
+theorem linearKernelSpec_posSemidef (X : Spec.Tensor ℝ (.dim n (.dim d .scalar)))
+    (w : Spec.Tensor ℝ (.dim d .scalar)) {scale : ℝ} (hscale : 0 ≤ scale) :
+    (Matrix.of (Spec.toMatFn (Spec.linearKernelSpec X w scale))).PosSemidef := by
+  have hround : Spec.toMatFn (Spec.linearKernelSpec X w scale)
+      = Spec.linearKernelFn (Spec.toMatFn X) (Spec.toVecFn w) scale := by
+    funext i j; rfl
+  rw [hround]
+  exact linearKernelFn_posSemidef _ _ hscale
+
+end Spec.Factorization
diff --git a/NN/Spec/Core/Tensor/Factorizations.lean b/NN/Spec/Core/Tensor/Factorizations.lean
index b06f667..eac2076 100644
--- a/NN/Spec/Core/Tensor/Factorizations.lean
+++ b/NN/Spec/Core/Tensor/Factorizations.lean
@@ -430,4 +430,36 @@ def varNoiseSpec {n : Nat} (evals : Tensor α (.dim n .scalar))
     (V : Tensor α (.dim n (.dim n .scalar))) (γ : α) (ga : Tensor α (.dim n .scalar)) : α :=
   varNoiseFn (toVecFn evals) γ (projFn (toMatFn V) (toVecFn ga))
 
+/-! ## CHD mode kernels (`Modes/kernels.py`)
+
+Everything above takes the kernel matrix `K` as input, assuming it is symmetric positive-semidefinite.
+CHD *builds* `K` from data: for each pair of samples, a mode kernel evaluates a feature inner product.
+The simplest is the **linear mode** (`LinearMode.vectorized_kernel`):
+
+`K[i,j] = 1 + scale · Σ_k (which_dim_k · X[i,k]) · X[j,k]`,
+
+a constant `1` plus a scaled Gram matrix of the (column-masked) data. For the binary selection mask
+`which_dim_k ∈ {0,1}` CHD uses, `which_dim_k · X[i,k] · X[j,k] = (which_dim_k X[i,k])·(which_dim_k X[j,k])`,
+so `K = 𝟙𝟙ᵀ + scale · Φ·Φᵀ` with `Φ` the masked data — manifestly symmetric positive-semidefinite for
+`scale ≥ 0`. That PSD fact (proved in `FactorizationsKernels`) discharges the standing `PosSemidef`
+hypothesis of the solve/`find_gamma` development for the real linear kernel. -/
+
+/-- Column-mask the data matrix by a per-feature weight `w` (CHD `which_dim`): zero out / scale
+feature `k` by `w k`. -/
+def maskColsFn {n d : Nat} (X : Fin n → Fin d → α) (w : Fin d → α) : Fin n → Fin d → α :=
+  fun i k => w k * X i k
+
+/-- Linear-mode kernel matrix `K[i,j] = 1 + scale · ⟨Φ i, Φ j⟩`, `Φ = maskColsFn X w` the masked data.
+For a binary selection mask this is exactly CHD `LinearMode.vectorized_kernel`. -/
+def linearKernelFn {n d : Nat} (X : Fin n → Fin d → α) (w : Fin d → α) (scale : α) :
+    Fin n → Fin n → α :=
+  fun i j => 1 + scale * dotFn (maskColsFn X w i) (maskColsFn X w j)
+
+/-- Tensor-level linear-mode kernel: `K = 𝟙𝟙ᵀ + scale · Φ·Φᵀ` from data `X` and selection mask `w`.
+
+PyTorch analogue: `1 + scale * (X * which_dim).matmul(X.T)`. -/
+def linearKernelSpec {n d : Nat} (X : Tensor α (.dim n (.dim d .scalar)))
+    (w : Tensor α (.dim d .scalar)) (scale : α) : Tensor α (.dim n (.dim n .scalar)) :=
+  ofMatFn (linearKernelFn (toMatFn X) (toVecFn w) scale)
+
 end Spec
diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
index 1c0b959..7a3dcb9 100644
--- a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
+++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
@@ -227,6 +227,37 @@ feeding the *wrong* eigenvectors (the identity in place of `V`) makes the solve
 `γ < 0` pushes the `noise` outside `[0,1]` — so the true eigendecomposition and `γ > 0` are both
 necessary.
 
+# Building the kernel: the linear mode is positive-semidefinite
+
+Every result above takes the kernel `K` as input *under the hypothesis* that it is positive
+-semidefinite — the solve needs `K + γI` to be SPD, the noise bound needs `λᵢ ≥ 0`. But CHD does not
+receive `K`; it *builds* it from data (`Modes/kernels.py`). Discharging that standing `PosSemidef`
+hypothesis for the kernels CHD actually constructs is the same move as the positive-pivot keystone:
+turn an assumed precondition into a theorem.
+[`NN.Proofs.Tensor.Basic.FactorizationsKernels`](https://github.com/lean-dojo/TorchLean/blob/main/NN/Proofs/Tensor/Basic/FactorizationsKernels.lean)
+takes the first and simplest mode. The *linear* kernel is
+
+$$`K[i,j] = 1 + \texttt{scale}\cdot\langle \Phi_i, \Phi_j\rangle,
+\qquad K = \mathbf{1}\mathbf{1}^\top + \texttt{scale}\cdot \Phi\,\Phi^\top,`
+
+with `Φ` the column-masked data (`which_dim`). `linearKernelFn_posSemidef` proves this is symmetric
+positive-semidefinite whenever `scale ≥ 0`: the all-ones matrix `𝟙𝟙ᵀ` is a rank-one Gram (hence PSD),
+`Φ Φᵀ` is a Gram matrix (PSD by `posSemidef_self_mul_conjTranspose`), `scale ≥ 0` keeps the scaling
+PSD (`PosSemidef.smul`), and `PosSemidef.add` closes the sum. Symmetry (`linearKernelFn_symm`) is then
+a corollary of `PosSemidef.isHermitian`. Composed with the solve development, this makes
+`solveRidgeSpec (linearKernelSpec X w scale) γ b` an *unconditional* exact solve for `γ > 0` — no PSD
+hypothesis left to assume. The `LinearKernel` example confirms `K = Kᵀ`, the match with the CHD
+`LinearMode` formula, all-nonnegative Jacobi eigenvalues (with feature masking preserved), and the
+downstream exact ridge solve; its *negative control* takes `scale = -1`, where `𝟙𝟙ᵀ − Φ Φᵀ` is
+indefinite and a Jacobi eigenvalue goes negative — so `scale ≥ 0` is necessary.
+
+The other two modes are the natural follow-ons. The *quadratic* kernel is an entrywise square of a
+PSD matrix, so it is PSD by the *Schur product theorem* (`PosSemidef.hadamard`, available in
+Mathlib), under the hyperparameter sign conditions. The *Gaussian* (RBF) kernel
+`exp(-(xᵢ-xⱼ)²/2\ell²)` is PSD by Schoenberg's theorem — writing `exp(xy/\ell²)` as a power series whose
+terms are Hadamard powers of a rank-one Gram — but Mathlib v4.30.0 has no Bochner/Gaussian-kernel PSD
+theory, so it is the new honest research-grade item, parallel to the cyclic-Jacobi rate.
+
 # The a-posteriori residual certificate
 
 For the iterative routines, the replacement for an impossible a-priori convergence proof is an exact

From 4a1c17c1e24a46d0b86833c57e2f9246753c7cc8 Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Sun, 31 May 2026 14:45:20 -0700
Subject: [PATCH 13/22] Prove the CHD quadratic-mode kernel is
 positive-semidefinite
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Discharge the standing PosSemidef hypothesis for the second of CHD's
non-interpolatory kernels. The quadratic mode

  K[i,j] = scale·(alpha + ⟨Φ i, Φ j⟩)² + (1 − alpha²·scale)

expands algebraically to

  K = 𝟙𝟙ᵀ + (2·scale·alpha)·Φ·Φᵀ + scale·(Φ·Φᵀ ⊙ Φ·Φᵀ),

a sum of three PSD pieces: the all-ones Gram, a nonnegative multiple of
the data Gram Φ·Φᵀ, and a nonnegative multiple of its Hadamard square —
PSD by the Schur product theorem (PosSemidef.hadamard). So K is PSD for
scale ≥ 0 and alpha ≥ 0.

- Spec: quadraticKernelFn / quadraticKernelSpec (NN/Spec/.../Factorizations.lean),
  the exact CHD QuadraticMode.vectorized_kernel.
- Proof: quadraticKernelFn_posSemidef + _symm + tensor-level
  quadraticKernelSpec_posSemidef (FactorizationsKernels), sorry/omega-free.
- Example: NN/Examples/Factorization/QuadraticKernel.lean — 8 green checks
  (symmetry, CHD-formula match, all eigenvalues ≥ 0 with masking preserved,
  exact ridge solve), two genuine negative controls (alpha = −1 → 2 negative
  eigenvalues, scale = −1 → 3).
- Blueprint: dedicated quadratic-mode section (Ch4 Factorizations).

Gaussian mode (Schoenberg/Bochner, absent from Mathlib v4.30.0) remains
the research-grade follow-on.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 NN/Examples/Factorization.lean                |   8 +
 .../Factorization/QuadraticKernel.lean        | 150 ++++++++++++++++++
 .../Tensor/Basic/FactorizationsKernels.lean   |  63 +++++++-
 NN/Spec/Core/Tensor/Factorizations.lean       |  18 +++
 .../Ch4_Verification/Factorizations.lean      |  31 +++-
 5 files changed, 262 insertions(+), 8 deletions(-)
 create mode 100644 NN/Examples/Factorization/QuadraticKernel.lean

diff --git a/NN/Examples/Factorization.lean b/NN/Examples/Factorization.lean
index bae41f9..424a2f1 100644
--- a/NN/Examples/Factorization.lean
+++ b/NN/Examples/Factorization.lean
@@ -16,6 +16,7 @@ public import NN.Examples.Factorization.JacobiRate
 public import NN.Examples.Factorization.RidgeSolve
 public import NN.Examples.Factorization.Variational
 public import NN.Examples.Factorization.LinearKernel
+public import NN.Examples.Factorization.QuadraticKernel
 
 /-!
 # Matrix factorization examples
@@ -66,6 +67,13 @@ factorization misbehaves.
   theorem assumes. Checks: `K = Kᵀ`, matches the CHD `LinearMode` formula, all Jacobi eigenvalues `≥ 0`
   (masking a feature preserved), and the PSD kernel feeds an exact ridge solve; **negative control**:
   `scale < 0` makes `K` indefinite (a negative eigenvalue appears).
+- `QuadraticKernel` — CHD's *quadratic* mode (`Modes/kernels.py`),
+  `K = scale·(alpha + Φ·Φᵀ)² + (1 − alpha²·scale) = 𝟙𝟙ᵀ + (2·scale·alpha)·Φ·Φᵀ + scale·(Φ·Φᵀ ⊙ Φ·Φᵀ)`,
+  proven symmetric positive-semidefinite for `scale ≥ 0` and `alpha ≥ 0` via the **Schur product
+  theorem** on the Hadamard square (`quadraticKernelFn_posSemidef`). Checks mirror the linear mode:
+  `K = Kᵀ`, matches the CHD `QuadraticMode` formula, all Jacobi eigenvalues `≥ 0` (masking preserved),
+  PSD kernel feeds an exact ridge solve; **negative controls**: both `alpha < 0` and `scale < 0` make
+  `K` indefinite, so both bounds are necessary.
 
 Both **positive** checks (a valid factorization reconstructs to `err ≈ 0`) and **negative controls**
 (the same metric reports a large error / `NaN` when a hypothesis is violated) are included, so a
diff --git a/NN/Examples/Factorization/QuadraticKernel.lean b/NN/Examples/Factorization/QuadraticKernel.lean
new file mode 100644
index 0000000..4b6196e
--- /dev/null
+++ b/NN/Examples/Factorization/QuadraticKernel.lean
@@ -0,0 +1,150 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Examples.Factorization.Common
+meta import NN.Examples.Factorization.Common
+
+/-!
+# Example: the CHD quadratic-mode kernel is symmetric positive-semidefinite
+
+These checks corroborate `NN.Proofs.Tensor.Basic.FactorizationsKernels`. As with the linear mode, the
+whole verified CHD solve / `find_gamma` / `Z_test` development assumes the kernel `K` is
+positive-semidefinite, and CHD *builds* `K` from data (`Modes/kernels.py`). For the quadratic mode,
+
+`K[i,j] = scale · (alpha + ⟨Φ i, Φ j⟩)² + (1 − alpha²·scale)`   (`Φ` = column-masked data),
+
+which expands to `K = 𝟙𝟙ᵀ + (2·scale·alpha)·Φ·Φᵀ + scale·(Φ·Φᵀ ⊙ Φ·Φᵀ)` — the last term a **Hadamard
+square**, PSD by the Schur product theorem. `quadraticKernelFn_posSemidef` proves `K` PSD for
+`scale ≥ 0` and `alpha ≥ 0`, discharging that standing hypothesis for the real quadratic kernel. We
+exhibit:
+
+* **symmetric** — `K = Kᵀ` to machine precision (`quadraticKernelFn_symm`);
+* **matches CHD** — `K[i,j] = scale·(alpha + ⟨xᵢ, xⱼ⟩)² + (1 − alpha²·scale)` agrees with the direct
+  `QuadraticMode.vectorized_kernel` formula;
+* **positive-semidefinite** — every Jacobi eigenvalue is `≥ 0` (the numeric witness of
+  `quadraticKernelFn_posSemidef`), and masking a feature (`w = [1,0]`) keeps it PSD;
+* **feeds the verified solve** — because `K` is PSD, `solveRidgeSpec K γ b` is the exact regularized
+  solve for `γ > 0` (`(K+γ·I)·x = b` to machine precision).
+
+**Negative controls**: with `alpha < 0` the middle term `2·scale·alpha·Φ·Φᵀ` goes negative (the
+diagonal `scale·alpha² + … ` drops below zero) and with `scale < 0` the whole quadratic part flips sign
+— in both cases a Jacobi eigenvalue goes negative, so `scale ≥ 0` *and* `alpha ≥ 0` are both necessary.
+-/
+
+@[expose] public section
+
+
+namespace NN.Examples.Factorization.QuadraticKernel
+
+/-- Build a length-`n` `Float` vector from a list (missing entries `0`). -/
+def mkVec {n : Nat} (xs : List Float) : Spec.Tensor Float (.dim n .scalar) :=
+  Spec.ofVecFn (fun i => xs.getD i.val 0.0)
+
+/-- Count Jacobi eigenvalues that are negative (below `−10⁻⁹`). `0` certifies positive-semidefiniteness
+numerically; `≥ 1` certifies an indefinite matrix. -/
+def numNegEigs {k : Nat} (M : Spec.Tensor Float (.dim k (.dim k .scalar))) : Float :=
+  let evals := (Spec.symEigJacobiSpec M 12).1
+  (List.finRange k).foldl
+    (fun a i => a + (if Spec.Tensor.toScalar (Spec.get evals i) < -1e-9 then 1.0 else 0.0)) 0.0
+
+/-- The regularized matrix `K + γ·I` as a tensor. -/
+def addGammaI {n : Nat} (K : Spec.Tensor Float (.dim n (.dim n .scalar))) (γ : Float) :
+    Spec.Tensor Float (.dim n (.dim n .scalar)) :=
+  Spec.ofMatFn (fun i j => Spec.get2 K i j + (if i.val == j.val then γ else 0.0))
+
+/-- `ℓ¹` magnitude `Σᵢ |vᵢ|` (a sum, so a `NaN` propagates). -/
+def vecAbsErr {n : Nat} (v : Spec.Tensor Float (.dim n .scalar)) : Float :=
+  (List.finRange n).foldl (fun a i => a + Float.abs (Spec.Tensor.toScalar (Spec.get v i))) 0.0
+
+/-- Residual `(K + γ·I)·x − b`. -/
+def ridgeResidual {n : Nat} (K : Spec.Tensor Float (.dim n (.dim n .scalar))) (γ : Float)
+    (b x : Spec.Tensor Float (.dim n .scalar)) : Spec.Tensor Float (.dim n .scalar) :=
+  Spec.ofVecFn (fun i =>
+    Spec.Tensor.toScalar (Spec.get (Spec.matVecMulSpec (addGammaI K γ) x) i)
+      - Spec.Tensor.toScalar (Spec.get b i))
+
+/-- A `4 × 2` data matrix (4 samples, 2 features). -/
+def X : Spec.Tensor Float (.dim 4 (.dim 2 .scalar)) :=
+  mkMat [[1, 0],
+         [0, 1],
+         [1, 1],
+         [2, 1]]
+
+/-- Selection mask `which_dim = [1,1]` (both features active). -/
+def wAll : Spec.Tensor Float (.dim 2 .scalar) := mkVec [1, 1]
+/-- Kernel scale (CHD `QuadraticMode._scale`). -/
+def scale : Float := 2.0
+/-- Quadratic offset (CHD `alpha = 0.5·scales["linear"]/scale`; here `0.5·2.0/2.0 = 0.5`). -/
+def alpha : Float := 0.5
+
+/-- The quadratic-mode kernel `K = scale·(alpha + Φ·Φᵀ)² + (1 − alpha²·scale)` (4×4). -/
+def K : Spec.Tensor Float (.dim 4 (.dim 4 .scalar)) := Spec.quadraticKernelSpec X wAll scale alpha
+
+/-- Direct CHD `QuadraticMode.vectorized_kernel` formula (mask all-ones):
+`Kref[i,j] = scale·(alpha + Σ_k X[i,k]·X[j,k])² + (1 − alpha²·scale)`. -/
+def Kref : Spec.Tensor Float (.dim 4 (.dim 4 .scalar)) :=
+  Spec.ofMatFn (fun i j =>
+    let m := (List.finRange 2).foldl (fun a k => a + Spec.get2 X i k * Spec.get2 X j k) 0.0
+    scale * (alpha + m) ^ 2 + (1.0 - alpha ^ 2 * scale))
+
+#eval IO.println s!"quadratic kernel K =\n{(List.finRange 4).map (fun i =>
+  (List.finRange 4).map (fun j => Spec.get2 K i j))}"
+#eval IO.println s!"eigenvalues of K = {vecToList (Spec.symEigJacobiSpec K 12).1}"
+
+-- Positive — `K` is symmetric (`quadraticKernelFn_symm`).
+#eval assertLt "quadratic kernel is symmetric: K = Kᵀ" (maxMatErr K (tr K))
+
+-- Positive — `K` matches the direct CHD `QuadraticMode` formula.
+#eval assertLt "quadratic kernel matches CHD QuadraticMode formula" (maxMatErr K Kref)
+
+-- Positive — `K` is PSD: no negative Jacobi eigenvalue (`quadraticKernelFn_posSemidef`).
+#eval assertLt "quadratic kernel is PSD: no negative eigenvalue" (numNegEigs K)
+
+/-! ## Masking a feature preserves PSD -/
+
+/-- Mask out feature 1: `which_dim = [1,0]`. -/
+def wMask : Spec.Tensor Float (.dim 2 .scalar) := mkVec [1, 0]
+def Kmask : Spec.Tensor Float (.dim 4 (.dim 4 .scalar)) := Spec.quadraticKernelSpec X wMask scale alpha
+
+#eval IO.println s!"masked-feature kernel eigenvalues = {vecToList (Spec.symEigJacobiSpec Kmask 12).1}"
+
+-- Positive — masking a feature keeps the kernel PSD (PSD holds for any mask).
+#eval assertLt "masked quadratic kernel is still PSD" (numNegEigs Kmask)
+
+/-! ## The PSD kernel feeds the verified ridge solve -/
+
+def γ : Float := 0.5
+def b : Spec.Tensor Float (.dim 4 .scalar) := mkVec [1, 2, 3, 4]
+/-- The ridge solution against the quadratic kernel `K`. -/
+def x : Spec.Tensor Float (.dim 4 .scalar) := Spec.solveRidgeSpec K γ b
+
+#eval IO.println s!"ridge solve on the quadratic kernel: residual = {vecToList (ridgeResidual K γ b x)}"
+
+-- Positive — `K` PSD ⟹ `solveRidgeSpec K γ b` is the exact solve of `(K+γI)·x = b` (γ > 0).
+#eval assertLt "PSD quadratic kernel ⟹ exact ridge solve (K+γI)·x = b"
+  (vecAbsErr (ridgeResidual K γ b x))
+
+/-! ## Negative controls: `alpha < 0` and `scale < 0` break positive-semidefiniteness -/
+
+/-- The same kernel with `alpha = −1`: the linear term `2·scale·alpha·Φ·Φᵀ` is negative. -/
+def Kalpha : Spec.Tensor Float (.dim 4 (.dim 4 .scalar)) := Spec.quadraticKernelSpec X wAll scale (-1.0)
+
+#eval IO.println s!"alpha = -1 kernel eigenvalues = {vecToList (Spec.symEigJacobiSpec Kalpha 12).1}"
+
+-- Negative — with `alpha < 0` the kernel is indefinite: at least one eigenvalue is negative.
+#eval assertGe "alpha < 0 breaks PSD (indefinite kernel)" (numNegEigs Kalpha) 1.0
+
+/-- The same kernel with `scale = −1`: the whole quadratic part flips sign. -/
+def Kscale : Spec.Tensor Float (.dim 4 (.dim 4 .scalar)) := Spec.quadraticKernelSpec X wAll (-1.0) alpha
+
+#eval IO.println s!"scale = -1 kernel eigenvalues = {vecToList (Spec.symEigJacobiSpec Kscale 12).1}"
+
+-- Negative — with `scale < 0` the kernel is indefinite: at least one eigenvalue is negative.
+#eval assertGe "scale < 0 breaks PSD (indefinite kernel)" (numNegEigs Kscale) 1.0
+
+end NN.Examples.Factorization.QuadraticKernel
diff --git a/NN/Proofs/Tensor/Basic/FactorizationsKernels.lean b/NN/Proofs/Tensor/Basic/FactorizationsKernels.lean
index dc399aa..fd698b3 100644
--- a/NN/Proofs/Tensor/Basic/FactorizationsKernels.lean
+++ b/NN/Proofs/Tensor/Basic/FactorizationsKernels.lean
@@ -9,6 +9,7 @@ module
 public import NN.Proofs.Tensor.Basic.Factorizations
 public import NN.Proofs.Tensor.Basic.FactorizationsOrthonormal
 public import Mathlib.Data.Real.StarOrdered
+public import Mathlib.Analysis.Matrix.Order
 
 /-!
 # CHD mode kernels are symmetric positive-semidefinite
@@ -30,8 +31,21 @@ a sum of the all-ones matrix (a rank-one Gram, PSD) and a scaled Gram matrix `Φ
 * `linearKernelFn_symm` — `K` is symmetric (a corollary, via `PosSemidef.isHermitian`).
 * `linearKernelSpec_posSemidef` — the tensor-level statement, the form the solve theorems consume.
 
-Quadratic mode (`PosSemidef.hadamard`, the Schur product theorem) and Gaussian mode (Bochner /
-Schoenberg, not in Mathlib v4.30.0) are the natural follow-ons.
+The **quadratic mode** is `K[i,j] = scale·(alpha + ⟨Φ i, Φ j⟩)² + (1 − alpha²·scale)`, which expands
+algebraically to
+
+`K = 𝟙𝟙ᵀ + (2·scale·alpha)·Φ·Φᵀ + scale·(Φ·Φᵀ ⊙ Φ·Φᵀ)`,
+
+a sum of: the all-ones Gram (PSD), a nonnegative multiple of the Gram `Φ·Φᵀ` (PSD), and a nonnegative
+multiple of the **Hadamard square** of that Gram — PSD by the **Schur product theorem**
+(`PosSemidef.hadamard`). So `K` is PSD whenever `scale ≥ 0` and `alpha ≥ 0`.
+
+* `quadraticKernelFn_posSemidef` — `(Matrix.of (quadraticKernelFn X w scale alpha)).PosSemidef` for
+  `0 ≤ scale` and `0 ≤ alpha`.
+* `quadraticKernelFn_symm` / `quadraticKernelSpec_posSemidef` — symmetry and the tensor-level form.
+
+Gaussian mode (Bochner / Schoenberg positive-definiteness, not in Mathlib v4.30.0) is the natural
+remaining follow-on.
 -/
 
 @[expose] public section
@@ -96,4 +110,49 @@ theorem linearKernelSpec_posSemidef (X : Spec.Tensor ℝ (.dim n (.dim d .scalar
   rw [hround]
   exact linearKernelFn_posSemidef _ _ hscale
 
+/-- **The quadratic-mode kernel is positive-semidefinite.** For data `X`, selection mask `w`, and
+`scale ≥ 0`, `alpha ≥ 0`, `K[i,j] = scale·(alpha + ⟨Φ i, Φ j⟩)² + (1 − alpha²·scale)` is PSD. The proof
+expands `K = 𝟙𝟙ᵀ + (2·scale·alpha)·Φ·Φᵀ + scale·(Φ·Φᵀ ⊙ Φ·Φᵀ)` and adds three PSD pieces, the last via
+the **Schur product theorem** `PosSemidef.hadamard`. -/
+theorem quadraticKernelFn_posSemidef (X : Fin n → Fin d → ℝ) (w : Fin d → ℝ) {scale alpha : ℝ}
+    (hscale : 0 ≤ scale) (halpha : 0 ≤ alpha) :
+    (Matrix.of (Spec.quadraticKernelFn X w scale alpha)).PosSemidef := by
+  -- the masked data as a matrix, the all-ones column, and the data Gram `M = Φ·Φᵀ`
+  set Φ : Matrix (Fin n) (Fin d) ℝ := Matrix.of (Spec.maskColsFn X w) with hΦ
+  set Ψ : Matrix (Fin n) (Fin 1) ℝ := Matrix.of (fun _ _ => 1) with hΨ
+  -- `K = Ψ·Ψᵀ + (2·scale·alpha)·(Φ·Φᵀ) + scale·((Φ·Φᵀ) ⊙ (Φ·Φᵀ))`
+  have hKeq : Matrix.of (Spec.quadraticKernelFn X w scale alpha)
+      = Ψ * Ψᵀ + (2 * scale * alpha) • (Φ * Φᵀ) + scale • ((Φ * Φᵀ) ⊙ (Φ * Φᵀ)) := by
+    ext i j
+    simp only [Matrix.of_apply, Matrix.add_apply, Matrix.smul_apply, smul_eq_mul,
+      Matrix.mul_apply, Matrix.transpose_apply, Matrix.hadamard_apply, Spec.quadraticKernelFn, hΦ, hΨ]
+    rw [dotFn_eq_sum, Fin.sum_univ_one]
+    simp only [Spec.maskColsFn]
+    ring
+  rw [hKeq]
+  have hM : (Φ * Φᵀ).PosSemidef := posSemidef_mul_transpose_self Φ
+  have hc : (0 : ℝ) ≤ 2 * scale * alpha := by positivity
+  exact ((posSemidef_mul_transpose_self Ψ).add (hM.smul hc)).add ((hM.hadamard hM).smul hscale)
+
+/-- The quadratic-mode kernel is symmetric: `K[i,j] = K[j,i]`. -/
+theorem quadraticKernelFn_symm (X : Fin n → Fin d → ℝ) (w : Fin d → ℝ) {scale alpha : ℝ}
+    (hscale : 0 ≤ scale) (halpha : 0 ≤ alpha) (i j : Fin n) :
+    Spec.quadraticKernelFn X w scale alpha i j = Spec.quadraticKernelFn X w scale alpha j i := by
+  have h := (quadraticKernelFn_posSemidef X w hscale halpha).isHermitian
+  have e : (Matrix.of (Spec.quadraticKernelFn X w scale alpha))ᴴ i j
+      = (Matrix.of (Spec.quadraticKernelFn X w scale alpha)) i j := by rw [h]
+  simpa [Matrix.conjTranspose_apply, Matrix.of_apply] using e.symm
+
+/-- **Tensor-level: the quadratic-mode kernel is positive-semidefinite.** The form the verified solve
+consumes, so e.g. `solveRidgeSpec (quadraticKernelSpec X w scale alpha) γ b` is the exact regularized
+solve for any `γ > 0` whenever `scale ≥ 0` and `alpha ≥ 0`. -/
+theorem quadraticKernelSpec_posSemidef (X : Spec.Tensor ℝ (.dim n (.dim d .scalar)))
+    (w : Spec.Tensor ℝ (.dim d .scalar)) {scale alpha : ℝ} (hscale : 0 ≤ scale) (halpha : 0 ≤ alpha) :
+    (Matrix.of (Spec.toMatFn (Spec.quadraticKernelSpec X w scale alpha))).PosSemidef := by
+  have hround : Spec.toMatFn (Spec.quadraticKernelSpec X w scale alpha)
+      = Spec.quadraticKernelFn (Spec.toMatFn X) (Spec.toVecFn w) scale alpha := by
+    funext i j; rfl
+  rw [hround]
+  exact quadraticKernelFn_posSemidef _ _ hscale halpha
+
 end Spec.Factorization
diff --git a/NN/Spec/Core/Tensor/Factorizations.lean b/NN/Spec/Core/Tensor/Factorizations.lean
index eac2076..20c1c8a 100644
--- a/NN/Spec/Core/Tensor/Factorizations.lean
+++ b/NN/Spec/Core/Tensor/Factorizations.lean
@@ -462,4 +462,22 @@ def linearKernelSpec {n d : Nat} (X : Tensor α (.dim n (.dim d .scalar)))
     (w : Tensor α (.dim d .scalar)) (scale : α) : Tensor α (.dim n (.dim n .scalar)) :=
   ofMatFn (linearKernelFn (toMatFn X) (toVecFn w) scale)
 
+/-- Quadratic-mode kernel matrix `K[i,j] = scale · (alpha + ⟨Φ i, Φ j⟩)² + (1 − alpha²·scale)`,
+`Φ = maskColsFn X w` the masked data. This is exactly CHD `QuadraticMode.vectorized_kernel`. Algebraically
+it expands to `K = 𝟙𝟙ᵀ + (2·scale·alpha)·Φ·Φᵀ + scale·(Φ·Φᵀ ⊙ Φ·Φᵀ)` (the last a Hadamard square), so it
+is positive-semidefinite for `scale ≥ 0` and `alpha ≥ 0` by the Schur product theorem — see
+`FactorizationsKernels`. The square is written as a product to stay polymorphic over `Context α`. -/
+def quadraticKernelFn {n d : Nat} (X : Fin n → Fin d → α) (w : Fin d → α) (scale alpha : α) :
+    Fin n → Fin n → α :=
+  fun i j =>
+    let m := dotFn (maskColsFn X w i) (maskColsFn X w j)
+    scale * ((alpha + m) * (alpha + m)) + (1 - alpha * alpha * scale)
+
+/-- Tensor-level quadratic-mode kernel from data `X`, selection mask `w`, `scale`, and offset `alpha`.
+
+PyTorch analogue: `scale * (alpha + (X * which_dim).matmul(X.T))**2 + (1 - alpha**2 * scale)`. -/
+def quadraticKernelSpec {n d : Nat} (X : Tensor α (.dim n (.dim d .scalar)))
+    (w : Tensor α (.dim d .scalar)) (scale alpha : α) : Tensor α (.dim n (.dim n .scalar)) :=
+  ofMatFn (quadraticKernelFn (toMatFn X) (toVecFn w) scale alpha)
+
 end Spec
diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
index 7a3dcb9..f764e37 100644
--- a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
+++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
@@ -251,12 +251,31 @@ hypothesis left to assume. The `LinearKernel` example confirms `K = Kᵀ`, the m
 downstream exact ridge solve; its *negative control* takes `scale = -1`, where `𝟙𝟙ᵀ − Φ Φᵀ` is
 indefinite and a Jacobi eigenvalue goes negative — so `scale ≥ 0` is necessary.
 
-The other two modes are the natural follow-ons. The *quadratic* kernel is an entrywise square of a
-PSD matrix, so it is PSD by the *Schur product theorem* (`PosSemidef.hadamard`, available in
-Mathlib), under the hyperparameter sign conditions. The *Gaussian* (RBF) kernel
-`exp(-(xᵢ-xⱼ)²/2\ell²)` is PSD by Schoenberg's theorem — writing `exp(xy/\ell²)` as a power series whose
-terms are Hadamard powers of a rank-one Gram — but Mathlib v4.30.0 has no Bochner/Gaussian-kernel PSD
-theory, so it is the new honest research-grade item, parallel to the cyclic-Jacobi rate.
+# Building the kernel: the quadratic mode is positive-semidefinite
+
+The *quadratic* mode (`QuadraticMode.vectorized_kernel`) is the second kernel CHD builds:
+
+$$`K[i,j] = \texttt{scale}\cdot(\alpha + \langle \Phi_i, \Phi_j\rangle)^2 + (1 - \alpha^2\texttt{scale}).`
+
+Squaring and collecting terms makes the PSD structure explicit:
+
+$$`K = \mathbf{1}\mathbf{1}^\top + (2\,\texttt{scale}\,\alpha)\cdot \Phi\,\Phi^\top
+       + \texttt{scale}\cdot\bigl(\Phi\,\Phi^\top \odot \Phi\,\Phi^\top\bigr),`
+
+a sum of three PSD pieces: the all-ones Gram, a nonnegative multiple of the data Gram `Φ Φᵀ`, and a
+nonnegative multiple of its *Hadamard square* `Φ Φᵀ ⊙ Φ Φᵀ`. The last is PSD by the *Schur product
+theorem* `PosSemidef.hadamard` (the Hadamard product of PSD matrices is PSD), which Mathlib v4.30.0
+provides. `quadraticKernelFn_posSemidef` assembles the three with `PosSemidef.add`/`PosSemidef.smul`
+and proves `K` PSD whenever `scale ≥ 0` *and* `alpha ≥ 0` — both conditions are real: the
+`QuadraticKernel` example's two *negative controls* take `alpha = -1` and `scale = -1`, and each makes
+a Jacobi eigenvalue go negative. As with the linear mode, this discharges the standing `PosSemidef`
+hypothesis, so `solveRidgeSpec (quadraticKernelSpec X w scale alpha) γ b` is an unconditional exact
+solve for `γ > 0`, and `quadraticKernelFn_symm` gives symmetry from `PosSemidef.isHermitian`.
+
+The remaining mode is the *Gaussian* (RBF) kernel `exp(-(xᵢ-xⱼ)²/2\ell²)`, PSD by Schoenberg's theorem
+— writing `exp(xy/\ell²)` as a power series whose terms are Hadamard powers of a rank-one Gram — but
+Mathlib v4.30.0 has no Bochner/Gaussian-kernel PSD theory, so it remains the honest research-grade
+item, parallel to the cyclic-Jacobi rate.
 
 # The a-posteriori residual certificate
 

From 7713616e58d57e902963fc30ce934acc72658ab6 Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Sun, 31 May 2026 16:11:24 -0700
Subject: [PATCH 14/22] Prove the CHD Gaussian-mode kernel is
 positive-semidefinite
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Discharge the third and last CHD kernel mode's PosSemidef hypothesis,
without Bochner/Schoenberg (absent from Mathlib v4.30.0), via an
elementary Hadamard-exponential route that reuses the Schur product
theorem already used for the quadratic mode.

Spec (Factorizations.lean): gaussianKernelFn/gaussianKernelSpec,
K[i,j] = scale·∏_dim (1 + w[dim]·exp(−(X[i,dim]−X[j,dim])²/2l²)),
matching CHD GaussianMode (foldl product + squared-diff-as-product to
stay polymorphic over the law-free Context).

Proof (FactorizationsKernels.lean), sorry/admit/omega-free:
- posSemidef_of_tendsto — the PSD cone is closed under entrywise limits
  (the one genuinely new, upstreamable lemma: the quadratic form is
  continuous in the entries, {≥0} is closed).
- posSemidef_map_exp — entrywise exp of a PSD matrix is PSD via the
  Hadamard-power series exp∘G = Σ G^∘k/k!.
- posSemidef_gaussianCol — a single Gaussian exp(−c(yᵢ−yⱼ)²) is PSD by
  diagonal congruence of the entrywise exponential of the rank-one Gram.
- gaussianKernelFn_posSemidef — each feature factor 𝟙𝟙ᵀ + w·Gaussian is
  PSD, product over features via the Schur product theorem; PSD for
  scale ≥ 0 and a nonnegative mask w ≥ 0. Plus _symm and the
  tensor-level gaussianKernelSpec_posSemidef.

Example (GaussianKernel.lean): 7/7 #eval checks — symmetric, matches
the CHD GaussianMode formula, PSD (no negative eigenvalue), masked PSD,
exact ridge solve; negative controls scale=−1 and a negative mask
weight w=[−2,0] both correctly rejected.

Blueprint Ch4: replaced the "research-grade" Gaussian paragraph with a
section documenting the discharge; all three modes now PSD-verified.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 NN/Examples/Factorization.lean                |  10 +
 NN/Examples/Factorization/GaussianKernel.lean | 155 +++++++++++
 .../Tensor/Basic/FactorizationsKernels.lean   | 241 +++++++++++++++++-
 NN/Spec/Core/Tensor/Factorizations.lean       |  25 ++
 .../Ch4_Verification/Factorizations.lean      |  55 +++-
 5 files changed, 477 insertions(+), 9 deletions(-)
 create mode 100644 NN/Examples/Factorization/GaussianKernel.lean

diff --git a/NN/Examples/Factorization.lean b/NN/Examples/Factorization.lean
index 424a2f1..d6774e7 100644
--- a/NN/Examples/Factorization.lean
+++ b/NN/Examples/Factorization.lean
@@ -17,6 +17,7 @@ public import NN.Examples.Factorization.RidgeSolve
 public import NN.Examples.Factorization.Variational
 public import NN.Examples.Factorization.LinearKernel
 public import NN.Examples.Factorization.QuadraticKernel
+public import NN.Examples.Factorization.GaussianKernel
 
 /-!
 # Matrix factorization examples
@@ -74,6 +75,15 @@ factorization misbehaves.
   `K = Kᵀ`, matches the CHD `QuadraticMode` formula, all Jacobi eigenvalues `≥ 0` (masking preserved),
   PSD kernel feeds an exact ridge solve; **negative controls**: both `alpha < 0` and `scale < 0` make
   `K` indefinite, so both bounds are necessary.
+- `GaussianKernel` — CHD's *Gaussian* (fully-nonlinear) mode (`Modes/kernels.py`),
+  `K = scale·∏_dim (1 + w[dim]·exp(−(X[i,dim]−X[j,dim])²/2l²))`, proven symmetric positive-semidefinite
+  for `scale ≥ 0` and a nonnegative mask `w ≥ 0` (`gaussianKernelFn_posSemidef`) — *without*
+  Bochner/Schoenberg, via the entrywise-exponential Hadamard-power series (the PSD cone closed under
+  limits) and the **Schur product theorem** over features. Checks mirror the other modes: `K = Kᵀ`,
+  matches the CHD `GaussianMode` product formula, all Jacobi eigenvalues `≥ 0` (masking preserved), PSD
+  kernel feeds an exact ridge solve; **negative controls**: `scale < 0` and a *negative mask weight*
+  (`w = [−2,0]`, which drives the diagonal below zero) both make `K` indefinite. With the linear,
+  quadratic, and Gaussian modes all discharged, every CHD kernel build is now PSD-verified.
 
 Both **positive** checks (a valid factorization reconstructs to `err ≈ 0`) and **negative controls**
 (the same metric reports a large error / `NaN` when a hypothesis is violated) are included, so a
diff --git a/NN/Examples/Factorization/GaussianKernel.lean b/NN/Examples/Factorization/GaussianKernel.lean
new file mode 100644
index 0000000..112b7f4
--- /dev/null
+++ b/NN/Examples/Factorization/GaussianKernel.lean
@@ -0,0 +1,155 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Examples.Factorization.Common
+meta import NN.Examples.Factorization.Common
+
+/-!
+# Example: the CHD Gaussian-mode product kernel is symmetric positive-semidefinite
+
+These checks corroborate `NN.Proofs.Tensor.Basic.FactorizationsKernels`. As with the linear and
+quadratic modes, the whole verified CHD solve / `find_gamma` / `Z_test` development assumes the kernel
+`K` is positive-semidefinite, and CHD *builds* `K` from data (`Modes/kernels.py`). The Gaussian
+(fully-nonlinear) mode introduces the per-feature Gaussian `exp(−Δ²/2l²)`, whose product contribution is
+
+`K[i,j] = scale · ∏_dim (1 + w[dim] · exp(−(X[i,dim]−X[j,dim])²/2l²))`   (`scale · jnp.prod(1 + w·exps)`).
+
+`gaussianKernelFn_posSemidef` proves `K` PSD for `scale ≥ 0` and a nonnegative mask `w ≥ 0`, *without*
+Bochner/Schoenberg (absent from Mathlib): the entrywise exponential of a PSD matrix is PSD (a
+Hadamard-power series, the PSD cone closed under limits), each feature factor `𝟙𝟙ᵀ + w·Gaussian` is PSD,
+and the product over features is PSD by the **Schur product theorem**. We exhibit:
+
+* **symmetric** — `K = Kᵀ` to machine precision (`gaussianKernelFn_symm`);
+* **matches CHD** — `K[i,j]` agrees with the direct `scale · ∏_dim (1 + w·exp(−Δ²/2l²))` formula;
+* **positive-semidefinite** — every Jacobi eigenvalue is `≥ 0` (the numeric witness of
+  `gaussianKernelFn_posSemidef`), and masking a feature (`w = [1,0]`) keeps it PSD;
+* **feeds the verified solve** — because `K` is PSD, `solveRidgeSpec K γ b` is the exact regularized
+  solve for `γ > 0` (`(K+γ·I)·x = b` to machine precision).
+
+**Negative controls**: with `scale < 0` the whole kernel flips sign, and with a *negative* mask weight
+(`w = [−2,0]`) a feature factor `1 − 2·exp(−Δ²/2l²)` drives the diagonal below zero — in both cases a
+Jacobi eigenvalue goes negative, so `scale ≥ 0` *and* `w ≥ 0` are both necessary.
+-/
+
+@[expose] public section
+
+
+namespace NN.Examples.Factorization.GaussianKernel
+
+/-- Build a length-`n` `Float` vector from a list (missing entries `0`). -/
+def mkVec {n : Nat} (xs : List Float) : Spec.Tensor Float (.dim n .scalar) :=
+  Spec.ofVecFn (fun i => xs.getD i.val 0.0)
+
+/-- Count Jacobi eigenvalues that are negative (below `−10⁻⁹`). `0` certifies positive-semidefiniteness
+numerically; `≥ 1` certifies an indefinite matrix. -/
+def numNegEigs {k : Nat} (M : Spec.Tensor Float (.dim k (.dim k .scalar))) : Float :=
+  let evals := (Spec.symEigJacobiSpec M 12).1
+  (List.finRange k).foldl
+    (fun a i => a + (if Spec.Tensor.toScalar (Spec.get evals i) < -1e-9 then 1.0 else 0.0)) 0.0
+
+/-- The regularized matrix `K + γ·I` as a tensor. -/
+def addGammaI {n : Nat} (K : Spec.Tensor Float (.dim n (.dim n .scalar))) (γ : Float) :
+    Spec.Tensor Float (.dim n (.dim n .scalar)) :=
+  Spec.ofMatFn (fun i j => Spec.get2 K i j + (if i.val == j.val then γ else 0.0))
+
+/-- `ℓ¹` magnitude `Σᵢ |vᵢ|` (a sum, so a `NaN` propagates). -/
+def vecAbsErr {n : Nat} (v : Spec.Tensor Float (.dim n .scalar)) : Float :=
+  (List.finRange n).foldl (fun a i => a + Float.abs (Spec.Tensor.toScalar (Spec.get v i))) 0.0
+
+/-- Residual `(K + γ·I)·x − b`. -/
+def ridgeResidual {n : Nat} (K : Spec.Tensor Float (.dim n (.dim n .scalar))) (γ : Float)
+    (b x : Spec.Tensor Float (.dim n .scalar)) : Spec.Tensor Float (.dim n .scalar) :=
+  Spec.ofVecFn (fun i =>
+    Spec.Tensor.toScalar (Spec.get (Spec.matVecMulSpec (addGammaI K γ) x) i)
+      - Spec.Tensor.toScalar (Spec.get b i))
+
+/-- A `4 × 2` data matrix (4 samples, 2 features). -/
+def X : Spec.Tensor Float (.dim 4 (.dim 2 .scalar)) :=
+  mkMat [[1, 0],
+         [0, 1],
+         [1, 1],
+         [2, 1]]
+
+/-- Selection mask `which_dim = [1,1]` (both features active). -/
+def wAll : Spec.Tensor Float (.dim 2 .scalar) := mkVec [1, 1]
+/-- Kernel scale (CHD `GaussianMode._scale`). -/
+def scale : Float := 1.0
+/-- Gaussian length scale (CHD `GaussianMode.l`). -/
+def l : Float := 1.0
+
+/-- The Gaussian-mode product kernel `K = scale · ∏_dim (1 + w·exp(−Δ²/2l²))` (4×4). -/
+def K : Spec.Tensor Float (.dim 4 (.dim 4 .scalar)) := Spec.gaussianKernelSpec X wAll scale l
+
+/-- Direct CHD `GaussianMode` product formula (mask all-ones):
+`Kref[i,j] = scale · ∏_k (1 + w[k]·exp(−(X[i,k]−X[j,k])²/2l²))`. -/
+def Kref : Spec.Tensor Float (.dim 4 (.dim 4 .scalar)) :=
+  Spec.ofMatFn (fun i j =>
+    scale * (List.finRange 2).foldl
+      (fun acc k =>
+        let dx := Spec.get2 X i k - Spec.get2 X j k
+        acc * (1.0 + Spec.Tensor.toScalar (Spec.get wAll k) * Float.exp (-(dx * dx) / (2.0 * l * l))))
+      1.0)
+
+#eval IO.println s!"Gaussian kernel K =\n{(List.finRange 4).map (fun i =>
+  (List.finRange 4).map (fun j => Spec.get2 K i j))}"
+#eval IO.println s!"eigenvalues of K = {vecToList (Spec.symEigJacobiSpec K 12).1}"
+
+-- Positive — `K` is symmetric (`gaussianKernelFn_symm`).
+#eval assertLt "Gaussian kernel is symmetric: K = Kᵀ" (maxMatErr K (tr K))
+
+-- Positive — `K` matches the direct CHD `GaussianMode` product formula.
+#eval assertLt "Gaussian kernel matches CHD GaussianMode formula" (maxMatErr K Kref)
+
+-- Positive — `K` is PSD: no negative Jacobi eigenvalue (`gaussianKernelFn_posSemidef`).
+#eval assertLt "Gaussian kernel is PSD: no negative eigenvalue" (numNegEigs K)
+
+/-! ## Masking a feature preserves PSD -/
+
+/-- Mask out feature 1: `which_dim = [1,0]` (still `w ≥ 0`). -/
+def wMask : Spec.Tensor Float (.dim 2 .scalar) := mkVec [1, 0]
+def Kmask : Spec.Tensor Float (.dim 4 (.dim 4 .scalar)) := Spec.gaussianKernelSpec X wMask scale l
+
+#eval IO.println s!"masked-feature kernel eigenvalues = {vecToList (Spec.symEigJacobiSpec Kmask 12).1}"
+
+-- Positive — masking a feature keeps the kernel PSD (PSD holds for any nonnegative mask).
+#eval assertLt "masked Gaussian kernel is still PSD" (numNegEigs Kmask)
+
+/-! ## The PSD kernel feeds the verified ridge solve -/
+
+def γ : Float := 0.5
+def b : Spec.Tensor Float (.dim 4 .scalar) := mkVec [1, 2, 3, 4]
+/-- The ridge solution against the Gaussian kernel `K`. -/
+def x : Spec.Tensor Float (.dim 4 .scalar) := Spec.solveRidgeSpec K γ b
+
+#eval IO.println s!"ridge solve on the Gaussian kernel: residual = {vecToList (ridgeResidual K γ b x)}"
+
+-- Positive — `K` PSD ⟹ `solveRidgeSpec K γ b` is the exact solve of `(K+γI)·x = b` (γ > 0).
+#eval assertLt "PSD Gaussian kernel ⟹ exact ridge solve (K+γI)·x = b"
+  (vecAbsErr (ridgeResidual K γ b x))
+
+/-! ## Negative controls: `scale < 0` and a negative mask weight break positive-semidefiniteness -/
+
+/-- The same kernel with `scale = −1`: the whole product is negated. -/
+def Kscale : Spec.Tensor Float (.dim 4 (.dim 4 .scalar)) := Spec.gaussianKernelSpec X wAll (-1.0) l
+
+#eval IO.println s!"scale = -1 kernel eigenvalues = {vecToList (Spec.symEigJacobiSpec Kscale 12).1}"
+
+-- Negative — with `scale < 0` the kernel is indefinite: at least one eigenvalue is negative.
+#eval assertGe "scale < 0 breaks PSD (indefinite kernel)" (numNegEigs Kscale) 1.0
+
+/-- A negative mask weight `w = [−2,0]`: the feature factor `1 − 2·exp(−Δ²/2l²)` makes the diagonal
+(`Δ = 0 ⟹ 1 − 2 = −1`) negative, so the kernel cannot be PSD. -/
+def wNeg : Spec.Tensor Float (.dim 2 .scalar) := mkVec [-2, 0]
+def Kw : Spec.Tensor Float (.dim 4 (.dim 4 .scalar)) := Spec.gaussianKernelSpec X wNeg scale l
+
+#eval IO.println s!"w = [-2,0] kernel eigenvalues = {vecToList (Spec.symEigJacobiSpec Kw 12).1}"
+
+-- Negative — a negative mask weight makes the kernel indefinite: at least one eigenvalue is negative.
+#eval assertGe "negative mask weight breaks PSD (indefinite kernel)" (numNegEigs Kw) 1.0
+
+end NN.Examples.Factorization.GaussianKernel
diff --git a/NN/Proofs/Tensor/Basic/FactorizationsKernels.lean b/NN/Proofs/Tensor/Basic/FactorizationsKernels.lean
index fd698b3..864c6df 100644
--- a/NN/Proofs/Tensor/Basic/FactorizationsKernels.lean
+++ b/NN/Proofs/Tensor/Basic/FactorizationsKernels.lean
@@ -10,6 +10,9 @@ public import NN.Proofs.Tensor.Basic.Factorizations
 public import NN.Proofs.Tensor.Basic.FactorizationsOrthonormal
 public import Mathlib.Data.Real.StarOrdered
 public import Mathlib.Analysis.Matrix.Order
+public import Mathlib.Analysis.Normed.Algebra.Exponential
+public import Mathlib.Analysis.SpecialFunctions.Exponential
+public import Mathlib.Topology.Instances.Matrix
 
 /-!
 # CHD mode kernels are symmetric positive-semidefinite
@@ -44,8 +47,22 @@ multiple of the **Hadamard square** of that Gram — PSD by the **Schur product
   `0 ≤ scale` and `0 ≤ alpha`.
 * `quadraticKernelFn_symm` / `quadraticKernelSpec_posSemidef` — symmetry and the tensor-level form.
 
-Gaussian mode (Bochner / Schoenberg positive-definiteness, not in Mathlib v4.30.0) is the natural
-remaining follow-on.
+The **Gaussian mode** product kernel
+`K[i,j] = scale · ∏_dim (1 + w[dim]·exp(−(X[i,dim]−X[j,dim])²/2l²))` is also discharged here, *without*
+Bochner/Schoenberg (absent from Mathlib v4.30.0), via an elementary Hadamard-exponential route that
+reuses the same Schur product theorem:
+
+* `posSemidef_of_tendsto` — the PSD cone is **closed under entrywise limits** (the one genuinely new,
+  independently-useful lemma: the quadratic form is continuous in the entries, `{≥0}` is closed).
+* `posSemidef_map_exp` — the **entrywise exponential** of a PSD matrix is PSD: `exp∘G = Σₖ G^∘k/k!`,
+  each Hadamard power `G^∘k` PSD by the Schur product theorem, the partial sums PSD, the limit PSD.
+* `posSemidef_gaussianCol` — a single Gaussian matrix `exp(−c·(yᵢ−yⱼ)²)` is PSD (`c ≥ 0`), by writing it
+  as `D·(exp∘(2c·yyᵀ))·Dᵀ` — a diagonal congruence of an entrywise-exponential of the rank-one Gram.
+* `gaussianKernelFn_posSemidef` — each feature factor `𝟙𝟙ᵀ + w[dim]·Gaussian` is PSD, and the product
+  over features is PSD by the Schur product theorem; so `K` is PSD for `scale ≥ 0` and a mask `w ≥ 0`.
+  `gaussianKernelFn_symm` / `gaussianKernelSpec_posSemidef` give symmetry and the tensor-level form.
+
+All three CHD non-interpolatory/nonlinear modes (linear, quadratic, Gaussian) are now PSD-discharged.
 -/
 
 @[expose] public section
@@ -155,4 +172,224 @@ theorem quadraticKernelSpec_posSemidef (X : Spec.Tensor ℝ (.dim n (.dim d .sca
   rw [hround]
   exact quadraticKernelFn_posSemidef _ _ hscale halpha
 
+/-! ## The Gaussian mode: an elementary Hadamard-exponential PSD proof
+
+CHD's Gaussian (fully-nonlinear) kernel introduces `exp(−Δ²/2l²)`, which has no *finite* algebraic
+PSD identity. We discharge it without Bochner/Schoenberg by the classical Schur route: the entrywise
+exponential of a PSD matrix is PSD (Hadamard-power series), and the Gaussian is a diagonal congruence
+of such an exponential. The PSD-cone-closed-under-limits lemma is the only genuinely new ingredient. -/
+
+open scoped Topology
+open Filter
+
+variable {N : Nat}
+
+/-- The all-ones matrix `𝟙𝟙ᵀ` is positive-semidefinite (a rank-one Gram). -/
+private theorem posSemidef_ones : (Matrix.of (fun _ _ : Fin N => (1 : ℝ))).PosSemidef := by
+  have h := Matrix.posSemidef_vecMulVec_self_star (fun _ : Fin N => (1 : ℝ))
+  have he : Matrix.vecMulVec (fun _ : Fin N => (1 : ℝ)) (star (fun _ : Fin N => (1 : ℝ)))
+      = Matrix.of (fun _ _ : Fin N => (1 : ℝ)) := by
+    ext i j; simp [Matrix.vecMulVec_apply, Pi.star_apply]
+  rwa [he] at h
+
+/-- **The PSD cone is closed under entrywise limits.** If real positive-semidefinite matrices `A k`
+converge entrywise to `B`, then `B` is positive-semidefinite. The quadratic form `xᵀ·M·x` is continuous
+in `M`'s entries, and `{y | 0 ≤ y}` is closed. -/
+private theorem posSemidef_of_tendsto {A : ℕ → Matrix (Fin N) (Fin N) ℝ}
+    {B : Matrix (Fin N) (Fin N) ℝ} (hA : ∀ k, (A k).PosSemidef)
+    (hlim : Tendsto A atTop (𝓝 B)) : B.PosSemidef := by
+  have hentry : ∀ i j, Tendsto (fun k => A k i j) atTop (𝓝 (B i j)) :=
+    fun i j => (hlim.apply_nhds i).apply_nhds j
+  -- entry symmetry of any real Hermitian matrix
+  have hsymm_entry : ∀ (M : Matrix (Fin N) (Fin N) ℝ), M.IsHermitian → ∀ i j, M i j = M j i := by
+    intro M hM i j
+    have e : Mᴴ i j = M i j := congrFun (congrFun hM i) j
+    rw [Matrix.conjTranspose_apply, star_trivial] at e
+    exact e.symm
+  -- B is Hermitian (symmetric over ℝ)
+  have hBsymm : B.IsHermitian := by
+    ext i j
+    rw [Matrix.conjTranspose_apply, star_trivial]
+    refine tendsto_nhds_unique (hentry j i) ?_
+    have hfun : (fun k => A k j i) = (fun k => A k i j) :=
+      funext fun k => (hsymm_entry (A k) (hA k).isHermitian j i)
+    rw [hfun]; exact hentry i j
+  refine Matrix.PosSemidef.of_dotProduct_mulVec_nonneg hBsymm (fun x => ?_)
+  have hquad : ∀ (M : Matrix (Fin N) (Fin N) ℝ),
+      star x ⬝ᵥ (M *ᵥ x) = ∑ i, ∑ j, star (x i) * (M i j * x j) := by
+    intro M
+    simp only [dotProduct, Matrix.mulVec, Pi.star_apply, Finset.mul_sum]
+  have key : Tendsto (fun k => star x ⬝ᵥ (A k *ᵥ x)) atTop (𝓝 (star x ⬝ᵥ (B *ᵥ x))) := by
+    simp only [hquad]
+    refine tendsto_finsetSum _ (fun i _ => ?_)
+    refine tendsto_finsetSum _ (fun j _ => ?_)
+    exact tendsto_const_nhds.mul ((hentry i j).mul tendsto_const_nhds)
+  exact ge_of_tendsto' key (fun k => (hA k).dotProduct_mulVec_nonneg x)
+
+/-- The `k`-fold Hadamard (entrywise) power of `G`, with `G^∘0 = 𝟙𝟙ᵀ` (the all-ones matrix). -/
+private def hadamardPow (G : Matrix (Fin N) (Fin N) ℝ) : ℕ → Matrix (Fin N) (Fin N) ℝ
+  | 0 => Matrix.of (fun _ _ => 1)
+  | (k + 1) => G ⊙ hadamardPow G k
+
+private theorem hadamardPow_apply (G : Matrix (Fin N) (Fin N) ℝ) (k : ℕ) (i j : Fin N) :
+    hadamardPow G k i j = (G i j) ^ k := by
+  induction k with
+  | zero => simp [hadamardPow]
+  | succ k ih =>
+    rw [hadamardPow, Matrix.hadamard_apply, ih, pow_succ]; ring
+
+private theorem posSemidef_hadamardPow {G : Matrix (Fin N) (Fin N) ℝ} (hG : G.PosSemidef) (k : ℕ) :
+    (hadamardPow G k).PosSemidef := by
+  induction k with
+  | zero => exact posSemidef_ones
+  | succ k ih => exact hG.hadamard ih
+
+/-- **The entrywise exponential of a PSD matrix is PSD.** `exp∘G = Σₖ G^∘k/k!`: each Hadamard power is
+PSD by the Schur product theorem, the partial sums are PSD, and the PSD cone is closed under limits. -/
+private theorem posSemidef_map_exp {G : Matrix (Fin N) (Fin N) ℝ} (hG : G.PosSemidef) :
+    (G.map Real.exp).PosSemidef := by
+  set S : ℕ → Matrix (Fin N) (Fin N) ℝ :=
+    fun n => ∑ k ∈ Finset.range n, ((k.factorial : ℝ)⁻¹) • hadamardPow G k with hS
+  have hSpsd : ∀ n, (S n).PosSemidef := by
+    intro n
+    refine Matrix.posSemidef_sum _ (fun k _ => ?_)
+    exact (posSemidef_hadamardPow hG k).smul (by positivity)
+  have hlim : Tendsto S atTop (𝓝 (G.map Real.exp)) := by
+    refine tendsto_pi_nhds.mpr (fun i => ?_)
+    refine tendsto_pi_nhds.mpr (fun j => ?_)
+    have hentry : (fun n => S n i j)
+        = (fun n => ∑ k ∈ Finset.range n, ((k.factorial : ℝ)⁻¹) * (G i j) ^ k) := by
+      funext n
+      simp only [hS, Matrix.sum_apply, Matrix.smul_apply, smul_eq_mul, hadamardPow_apply]
+    rw [hentry]
+    have hsum : HasSum (fun k => ((k.factorial : ℝ)⁻¹) * (G i j) ^ k) (Real.exp (G i j)) := by
+      have h := NormedSpace.exp_series_hasSum_exp' (𝕂 := ℝ) (G i j)
+      simp only [smul_eq_mul] at h
+      rwa [← Real.exp_eq_exp_ℝ] at h
+    have hmap : (G.map Real.exp) i j = Real.exp (G i j) := by simp [Matrix.map_apply]
+    rw [hmap]
+    exact hsum.tendsto_sum_nat
+  exact posSemidef_of_tendsto hSpsd hlim
+
+/-- **A single Gaussian matrix is positive-semidefinite.** For `c ≥ 0`, the matrix
+`exp(−c·(yᵢ−yⱼ)²)` is PSD: writing the exponent as `−c·yᵢ² + 2c·yᵢyⱼ − c·yⱼ²`, it is the diagonal
+congruence `D·(exp∘(2c·yyᵀ))·Dᵀ` of the entrywise exponential of the (PSD) rank-one Gram `yyᵀ`. -/
+private theorem posSemidef_gaussianCol (y : Fin N → ℝ) {c : ℝ} (hc : 0 ≤ c) :
+    (Matrix.of (fun i j => Real.exp (-(c * ((y i - y j) * (y i - y j)))))).PosSemidef := by
+  set G : Matrix (Fin N) (Fin N) ℝ := (2 * c) • Matrix.vecMulVec y y with hG
+  have hGpsd : G.PosSemidef := by
+    have hv : (Matrix.vecMulVec y (star y)).PosSemidef := Matrix.posSemidef_vecMulVec_self_star y
+    have hstar : Matrix.vecMulVec y (star y) = Matrix.vecMulVec y y := by
+      ext i j; simp [Matrix.vecMulVec_apply]
+    rw [hstar] at hv
+    rw [hG]; exact hv.smul (mul_nonneg (by norm_num) hc)
+  have hMpsd : (G.map Real.exp).PosSemidef := posSemidef_map_exp hGpsd
+  set D : Matrix (Fin N) (Fin N) ℝ := Matrix.diagonal (fun i => Real.exp (-(c * (y i * y i)))) with hD
+  have hcong : (D * (G.map Real.exp) * Dᴴ).PosSemidef := hMpsd.mul_mul_conjTranspose_same D
+  have hDH : (Dᴴ : Matrix (Fin N) (Fin N) ℝ) = D := by
+    rw [hD]; simp
+  rw [hDH] at hcong
+  have heq : D * (G.map Real.exp) * D
+      = Matrix.of (fun i j => Real.exp (-(c * ((y i - y j) * (y i - y j))))) := by
+    ext i j
+    rw [hD, Matrix.mul_diagonal, Matrix.diagonal_mul]
+    simp only [Matrix.of_apply, hG, Matrix.map_apply, Matrix.smul_apply, Matrix.vecMulVec_apply,
+      smul_eq_mul]
+    rw [← Real.exp_add, ← Real.exp_add]
+    congr 1; ring
+  rwa [heq] at hcong
+
+/-- Folding scalar multiplication over a list is the product of the mapped list. -/
+private theorem foldl_mul_eq_prod {ι : Type} (l : List ι) (g : ι → ℝ) (a : ℝ) :
+    l.foldl (fun acc x => acc * g x) a = a * (l.map g).prod := by
+  induction l generalizing a with
+  | nil => simp
+  | cons x xs ih => simp only [List.foldl_cons, List.map_cons, List.prod_cons, ih]; ring
+
+/-- A Hadamard product (over a finset) of positive-semidefinite matrices is positive-semidefinite —
+the Schur product theorem, iterated. -/
+private theorem posSemidef_prod_hadamard {ι : Type} [DecidableEq ι]
+    (F : ι → Matrix (Fin N) (Fin N) ℝ) (s : Finset ι) (hF : ∀ k ∈ s, (F k).PosSemidef) :
+    (Matrix.of (fun i j => ∏ k ∈ s, (F k) i j)).PosSemidef := by
+  induction s using Finset.induction with
+  | empty => simpa only [Finset.prod_empty] using (posSemidef_ones (N := N))
+  | @insert a s ha ih =>
+    rw [show (Matrix.of (fun i j => ∏ k ∈ insert a s, (F k) i j))
+        = (F a) ⊙ Matrix.of (fun i j => ∏ k ∈ s, (F k) i j) from by
+          ext i j; simp only [Matrix.hadamard_apply, Matrix.of_apply, Finset.prod_insert ha]]
+    exact (hF a (Finset.mem_insert_self a s)).hadamard
+      (ih (fun k hk => hF k (Finset.mem_insert_of_mem hk)))
+
+variable {n d : Nat}
+
+/-- **The Gaussian-mode product kernel is positive-semidefinite.** For data `X`, a nonnegative
+selection mask `w ≥ 0`, and `scale ≥ 0`,
+`K[i,j] = scale · ∏_dim (1 + w[dim]·exp(−(X[i,dim]−X[j,dim])²/2l²))` is PSD. Each feature factor
+`𝟙𝟙ᵀ + w[dim]·Gaussian` is PSD (`posSemidef_ones` + `posSemidef_gaussianCol`), and the product over
+features is PSD by the **Schur product theorem** (`posSemidef_prod_hadamard`). -/
+theorem gaussianKernelFn_posSemidef (X : Fin n → Fin d → ℝ) (w : Fin d → ℝ) {scale l : ℝ}
+    (hscale : 0 ≤ scale) (hw : ∀ k, 0 ≤ w k) :
+    (Matrix.of (Spec.gaussianKernelFn X w scale l)).PosSemidef := by
+  -- the per-feature factor matrices
+  set F : Fin d → Matrix (Fin n) (Fin n) ℝ :=
+    fun k => Matrix.of (fun i j => 1 + w k *
+      Real.exp (-((X i k - X j k) * (X i k - X j k)) / ((1 + 1) * l * l))) with hF
+  have hFpsd : ∀ k, (F k).PosSemidef := by
+    intro k
+    -- the per-feature Gaussian is PSD via `posSemidef_gaussianCol`
+    have hGauss : (Matrix.of (fun i j =>
+        Real.exp (-((X i k - X j k) * (X i k - X j k)) / ((1 + 1) * l * l)))).PosSemidef := by
+      have h := posSemidef_gaussianCol (fun i => X i k)
+        (c := ((1 + 1) * l * l)⁻¹) (inv_nonneg.mpr (by nlinarith [mul_self_nonneg l]))
+      have he : (Matrix.of (fun i j =>
+          Real.exp (-(((1 + 1) * l * l)⁻¹ * ((X i k - X j k) * (X i k - X j k))))))
+          = Matrix.of (fun i j =>
+            Real.exp (-((X i k - X j k) * (X i k - X j k)) / ((1 + 1) * l * l))) := by
+        ext i j
+        show Real.exp _ = Real.exp _
+        congr 1; ring
+      rwa [he] at h
+    -- F k = 𝟙𝟙ᵀ + w k • Gaussian
+    have hsplit : F k = Matrix.of (fun _ _ : Fin n => (1 : ℝ))
+        + (w k) • Matrix.of (fun i j =>
+            Real.exp (-((X i k - X j k) * (X i k - X j k)) / ((1 + 1) * l * l))) := by
+      rw [hF]; ext i j
+      simp only [Matrix.add_apply, Matrix.smul_apply, Matrix.of_apply, smul_eq_mul]
+    rw [hsplit]
+    exact posSemidef_ones.add (hGauss.smul (hw k))
+  -- the product matrix is PSD
+  have hPpsd : (Matrix.of (fun i j => ∏ k, (F k) i j)).PosSemidef :=
+    posSemidef_prod_hadamard F Finset.univ (fun k _ => hFpsd k)
+  -- the kernel is `scale • (product matrix)`
+  have hKeq : Matrix.of (Spec.gaussianKernelFn X w scale l)
+      = scale • Matrix.of (fun i j => ∏ k, (F k) i j) := by
+    ext i j
+    rw [Matrix.smul_apply, Matrix.of_apply, Matrix.of_apply, smul_eq_mul, Spec.gaussianKernelFn,
+      foldl_mul_eq_prod, one_mul, ← List.ofFn_eq_map, List.prod_ofFn]
+    rfl
+  rw [hKeq]
+  exact hPpsd.smul hscale
+
+/-- The Gaussian-mode product kernel is symmetric: `K[i,j] = K[j,i]`. -/
+theorem gaussianKernelFn_symm (X : Fin n → Fin d → ℝ) (w : Fin d → ℝ) {scale l : ℝ}
+    (hscale : 0 ≤ scale) (hw : ∀ k, 0 ≤ w k) (i j : Fin n) :
+    Spec.gaussianKernelFn X w scale l i j = Spec.gaussianKernelFn X w scale l j i := by
+  have h := (gaussianKernelFn_posSemidef (scale := scale) (l := l) X w hscale hw).isHermitian
+  have e : (Matrix.of (Spec.gaussianKernelFn X w scale l))ᴴ i j
+      = (Matrix.of (Spec.gaussianKernelFn X w scale l)) i j := by rw [h]
+  simpa [Matrix.conjTranspose_apply, Matrix.of_apply] using e.symm
+
+/-- **Tensor-level: the Gaussian-mode product kernel is positive-semidefinite.** The form the verified
+solve consumes, so e.g. `solveRidgeSpec (gaussianKernelSpec X w scale l) γ b` is the exact regularized
+solve for any `γ > 0` whenever `scale ≥ 0` and the mask `w ≥ 0`. -/
+theorem gaussianKernelSpec_posSemidef (X : Spec.Tensor ℝ (.dim n (.dim d .scalar)))
+    (w : Spec.Tensor ℝ (.dim d .scalar)) {scale l : ℝ} (hscale : 0 ≤ scale)
+    (hw : ∀ k, 0 ≤ Spec.toVecFn w k) :
+    (Matrix.of (Spec.toMatFn (Spec.gaussianKernelSpec X w scale l))).PosSemidef := by
+  have hround : Spec.toMatFn (Spec.gaussianKernelSpec X w scale l)
+      = Spec.gaussianKernelFn (Spec.toMatFn X) (Spec.toVecFn w) scale l := by
+    funext i j; rfl
+  rw [hround]
+  exact gaussianKernelFn_posSemidef _ _ hscale hw
+
 end Spec.Factorization
diff --git a/NN/Spec/Core/Tensor/Factorizations.lean b/NN/Spec/Core/Tensor/Factorizations.lean
index 20c1c8a..6e6c675 100644
--- a/NN/Spec/Core/Tensor/Factorizations.lean
+++ b/NN/Spec/Core/Tensor/Factorizations.lean
@@ -480,4 +480,29 @@ def quadraticKernelSpec {n d : Nat} (X : Tensor α (.dim n (.dim d .scalar)))
     (w : Tensor α (.dim d .scalar)) (scale alpha : α) : Tensor α (.dim n (.dim n .scalar)) :=
   ofMatFn (quadraticKernelFn (toMatFn X) (toVecFn w) scale alpha)
 
+/-- Gaussian-mode product kernel matrix
+`K[i,j] = scale · ∏_dim (1 + w[dim] · exp(−(X[i,dim]−X[j,dim])²/(2·l²)))` — the Gaussian contribution of
+CHD `GaussianMode` (`scale * jnp.prod(1 + which_dim * exps, axis=2)`, with
+`exps[i,j,dim] = exp(−(X[i,dim]−X[j,dim])²/(2·l²))`).
+
+Each feature factor `1 + w[dim]·k` (with `k` the per-feature Gaussian) is positive-semidefinite — `𝟙𝟙ᵀ`
+plus a nonnegative multiple of the Gaussian PSD matrix — and the product over features is PSD by the
+**Schur product theorem**, so `K` is PSD for `scale ≥ 0` and a nonnegative mask `w ≥ 0`
+(see `FactorizationsKernels`). The product is an explicit `foldl` and the squared difference a product, to
+stay polymorphic over the law-free `Context α`. -/
+def gaussianKernelFn {n d : Nat} (X : Fin n → Fin d → α) (w : Fin d → α) (scale l : α) :
+    Fin n → Fin n → α :=
+  fun i j => scale * (List.finRange d).foldl
+    (fun acc dim =>
+      acc * (1 + w dim *
+        MathFunctions.exp (-((X i dim - X j dim) * (X i dim - X j dim)) / ((1 + 1) * l * l)))) 1
+
+/-- Tensor-level Gaussian-mode product kernel from data `X`, selection mask `w`, `scale`, and length
+scale `l`.
+
+PyTorch analogue: `scale * torch.prod(1 + which_dim * torch.exp(-(dx**2)/(2*l**2)), dim=2)`. -/
+def gaussianKernelSpec {n d : Nat} (X : Tensor α (.dim n (.dim d .scalar)))
+    (w : Tensor α (.dim d .scalar)) (scale l : α) : Tensor α (.dim n (.dim n .scalar)) :=
+  ofMatFn (gaussianKernelFn (toMatFn X) (toVecFn w) scale l)
+
 end Spec
diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
index f764e37..0624096 100644
--- a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
+++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
@@ -272,10 +272,47 @@ a Jacobi eigenvalue go negative. As with the linear mode, this discharges the st
 hypothesis, so `solveRidgeSpec (quadraticKernelSpec X w scale alpha) γ b` is an unconditional exact
 solve for `γ > 0`, and `quadraticKernelFn_symm` gives symmetry from `PosSemidef.isHermitian`.
 
-The remaining mode is the *Gaussian* (RBF) kernel `exp(-(xᵢ-xⱼ)²/2\ell²)`, PSD by Schoenberg's theorem
-— writing `exp(xy/\ell²)` as a power series whose terms are Hadamard powers of a rank-one Gram — but
-Mathlib v4.30.0 has no Bochner/Gaussian-kernel PSD theory, so it remains the honest research-grade
-item, parallel to the cyclic-Jacobi rate.
+# Building the kernel: the Gaussian mode is positive-semidefinite
+
+The third and last mode is the *Gaussian* (fully-nonlinear) kernel. CHD's `GaussianMode` builds, per
+feature, the Gaussian `exp(-(X_{i,d}-X_{j,d})^2/2\ell^2)` and takes their masked product:
+
+$$`K[i,j] = \texttt{scale}\cdot\prod_{d} \bigl(1 + w_d\,\exp(-(X_{i,d}-X_{j,d})^2/2\ell^2)\bigr).`
+
+Unlike the linear and quadratic modes, the Gaussian has *no finite algebraic PSD identity* — `exp` is a
+genuine limit. The textbook proof is Schoenberg/Bochner, which Mathlib v4.30.0 does not have. But there
+is an elementary route that reuses the *same* Schur product theorem, and
+`gaussianKernelFn_posSemidef` carries it out. It rests on one genuinely new, independently useful
+lemma and three assembly steps:
+
+- *The PSD cone is closed under entrywise limits* (`posSemidef_of_tendsto`): if real PSD matrices `A_k`
+  converge entrywise to `B`, then `B` is PSD. The quadratic form `x^\top M x` is a finite polynomial in
+  the entries, hence continuous, and `\{y \mid 0 \le y\}` is closed — so `0 \le x^\top A_k x` passes to
+  the limit. This is the only piece Mathlib lacked, and it belongs in Mathlib.
+- *The entrywise exponential of a PSD matrix is PSD* (`posSemidef_map_exp`): writing
+  `\exp\circ G = \sum_k G^{\odot k}/k!` (Hadamard powers), each `G^{\odot k}` is PSD by the Schur
+  product theorem, each partial sum is PSD (a finite sum of PSD matrices), and the partial sums converge
+  entrywise to `\exp\circ G` (the real exponential series) — so the limit is PSD by the lemma above.
+- *A single Gaussian matrix is PSD* (`posSemidef_gaussianCol`): for `c \ge 0`, the matrix
+  `\exp(-c\,(y_i-y_j)^2)` factors as the diagonal congruence
+  `D\,(\exp\circ(2c\,yy^\top))\,D^\top` with `D = \operatorname{diag}(\exp(-c\,y_i^2))`; the middle
+  factor is the entrywise exponential of the (PSD, rank-one) Gram `yy^\top`, and congruence preserves
+  PSD.
+- *Each feature factor and their product* (`gaussianKernelFn_posSemidef`): `\mathbf{1}\mathbf{1}^\top +
+  w_d\cdot\text{Gaussian}_d` is PSD for `w_d \ge 0`, and the product over features is a Hadamard product
+  of PSD matrices — PSD by the Schur product theorem again. Scaling by `\texttt{scale} \ge 0` finishes.
+
+So `K` is PSD whenever `scale ≥ 0` and the mask is nonnegative (`w ≥ 0`) — discharging the standing
+`PosSemidef` hypothesis for the Gaussian mode, and `gaussianKernelFn_symm` gives symmetry from
+`PosSemidef.isHermitian`. The `GaussianKernel` example confirms `K = Kᵀ`, the match with the CHD
+`GaussianMode` product formula, all-nonnegative Jacobi eigenvalues (with feature masking preserved),
+and the downstream exact ridge solve; its two *negative controls* take `scale = -1` and a *negative
+mask weight* `w = [-2,0]` (whose factor `1 - 2\exp(-\Delta^2/2\ell^2)` drives the diagonal below zero),
+each producing a negative eigenvalue — so `scale ≥ 0` and `w ≥ 0` are both necessary.
+
+With the linear, quadratic, and Gaussian modes all discharged, *every kernel CHD builds is now
+PSD-verified*: there is no `PosSemidef` hypothesis left to assume anywhere in the solve / `find_gamma` /
+`Z_test` development.
 
 # The a-posteriori residual certificate
 
@@ -462,6 +499,10 @@ Cholesky-based regularized solve are proved, and the specification-level facts t
 on are independent of the convergence step. The three concrete CHD routines built on them are now
 identified too: the eigendecomposition-form `solve_variationnal` equals `-(K + γI)⁻¹ ga` and agrees
 with the Cholesky route, and the `noise`/`find_gamma`-loss/`Z_test` statistic is a spectral ratio
-provably in `[0,1]` that depends on the kernel only through its spectrum. So the CHD foundation is
-complete, the one remaining open item being statistical, not algebraic — the `Z_test`'s Gaussian
-sampling and percentiles, exercised numerically rather than proved.
+provably in `[0,1]` that depends on the kernel only through its spectrum. And the kernel build itself
+is now PSD-verified for *all three* CHD modes — linear, quadratic, and Gaussian — so the standing
+`PosSemidef` hypothesis is discharged from data, not assumed, even for the fully-nonlinear kernel. So
+the CHD foundation is complete; the two remaining open items are the cyclic-Jacobi convergence *rate*
+(captured exactly by the a-posteriori residual certificate, never by `sorry`) and the `Z_test`'s
+Gaussian sampling and percentiles — one a proof-only gap on a quantity CHD does not need to *run*, the
+other statistical rather than algebraic and exercised numerically.

From 83c90a0e7d17ce65b21632d8765414e1d4eb3e4a Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Sun, 31 May 2026 16:58:42 -0700
Subject: [PATCH 15/22] Formalize the CHD discovery decision layer (sound &
 complete)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

CHD's outer hypergraph-discovery loop (decision.py, _GraphDiscoveryMain.py)
turns the verified `noise` statistic into graph-structure decisions. This
proves each of those decisions correct, over the same executable specs that
run on Float.

Spec (NN/Spec/Core/Tensor/Factorizations.lean), mirroring the Python verbatim:
- argMinFn / argMaxFn  — the np.argmin / np.argmax folds
- kernelChooserFn      — MinNoiseKernelChooser (valid = noise < Z_low; the `2`
                         sentinel written `1+1` to stay Context-polymorphic)
- modeIncrementFn / modeChooserFn — MaxIncrementModeChooser
- allPrunedFn          — the np.all(active_modes == 0) stopping rule

Proofs (NN/Proofs/Tensor/Basic/FactorizationsDecision.lean, new, sorry-free).
A single generic fold-selection lemma (foldl_select) underwrites both argmin
and argmax; the comparison proofs bridge the Context order test to the real `<`
(gtBool_eq_decide):
- argMinFn_le / argMaxFn_le — the prune step removes a least-activated ancestor
- kernelChooserFn_eq_some / _eq_none — MinNoiseKernelChooser is sound AND
  complete: returns `some s` with s valid and of least noise among all valid
  kernels exactly when a valid kernel exists, else `none`. Its `2`-sentinel
  correctness rests directly on the verified varNoiseFn_le_one (hypothesis
  `hbound`), so the decision is a proved selection over a statistic whose
  [0,1] range was itself proved.
- modeChooserFn_ge — MaxIncrementModeChooser reports the largest noise-jump
  iteration
- allPrunedFn_iff — the stopping test holds iff every ancestor is pruned

Examples (NN/Examples/Factorization/Discovery.lean, new): 13 #eval checks with
negative controls. The end-to-end block eigendecomposes an SPD kernel and runs
a find_gamma sweep feeding the verified varNoiseSpec at several gamma straight
into argMinFn (noises [0.004, 0.040, 0.287], all in [0,1], argmin = 0).

Blueprint Ch4 gains a discovery-decision-layer section and an updated closing
summary. Registrations: Basic.lean (proofs), Factorization.lean (examples).

Verified: NN.Examples.Factorization + NN.Proofs.Tensor.Basic green (2705 jobs);
banned-tactic sweep clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 NN/Examples/Factorization.lean                |  12 ++
 NN/Examples/Factorization/Discovery.lean      | 174 ++++++++++++++++
 NN/Proofs/Tensor/Basic.lean                   |   1 +
 .../Tensor/Basic/FactorizationsDecision.lean  | 197 ++++++++++++++++++
 NN/Spec/Core/Tensor/Factorizations.lean       |  56 +++++
 .../Ch4_Verification/Factorizations.lean      |  63 +++++-
 6 files changed, 497 insertions(+), 6 deletions(-)
 create mode 100644 NN/Examples/Factorization/Discovery.lean
 create mode 100644 NN/Proofs/Tensor/Basic/FactorizationsDecision.lean

diff --git a/NN/Examples/Factorization.lean b/NN/Examples/Factorization.lean
index d6774e7..f887644 100644
--- a/NN/Examples/Factorization.lean
+++ b/NN/Examples/Factorization.lean
@@ -18,6 +18,7 @@ public import NN.Examples.Factorization.Variational
 public import NN.Examples.Factorization.LinearKernel
 public import NN.Examples.Factorization.QuadraticKernel
 public import NN.Examples.Factorization.GaussianKernel
+public import NN.Examples.Factorization.Discovery
 
 /-!
 # Matrix factorization examples
@@ -84,6 +85,17 @@ factorization misbehaves.
   kernel feeds an exact ridge solve; **negative controls**: `scale < 0` and a *negative mask weight*
   (`w = [−2,0]`, which drives the diagonal below zero) both make `K` indefinite. With the linear,
   quadratic, and Gaussian modes all discharged, every CHD kernel build is now PSD-verified.
+- `Discovery` — CHD's *discovery decision layer* (`decision.py`, `_GraphDiscoveryMain.py`), which turns
+  the verified `noise` statistic into graph structure: the activation prune step (`argMinFn`, picks the
+  least-activated ancestor), the `MinNoiseKernelChooser` (`kernelChooserFn`, the least-noise valid kernel
+  with `noise < Z_low`, or `none`), the `MaxIncrementModeChooser` (`modeChooserFn`, the largest
+  `noise`-jump iteration), and the stopping rule (`allPrunedFn`), proved sound/complete in
+  `FactorizationsDecision`. Checks: argmin picks the least-activated ancestor (and not the most), the
+  chooser selects the unique valid kernel / least noise among valid / `none` when none valid, the mode
+  chooser picks the largest-increment iteration, and the stopping rule fires only on the all-zero mask;
+  an **end-to-end** block then feeds the verified `varNoiseSpec` at several `γ` into `argMinFn`, a
+  `find_gamma` sweep selecting the least-noise regularization (all noises in `[0,1]`); **negative
+  controls** confirm the most-activated ancestor and tiny-increment iterations are correctly rejected.
 
 Both **positive** checks (a valid factorization reconstructs to `err ≈ 0`) and **negative controls**
 (the same metric reports a large error / `NaN` when a hypothesis is violated) are included, so a
diff --git a/NN/Examples/Factorization/Discovery.lean b/NN/Examples/Factorization/Discovery.lean
new file mode 100644
index 0000000..f1bae3a
--- /dev/null
+++ b/NN/Examples/Factorization/Discovery.lean
@@ -0,0 +1,174 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Examples.Factorization.Common
+meta import NN.Examples.Factorization.Common
+
+/-!
+# Example: the CHD discovery decision layer
+
+These checks corroborate `NN.Proofs.Tensor.Basic.FactorizationsDecision`. Once a kernel is built and its
+`noise` level computed (`varNoiseSpec`, proven to lie in `[0,1]`), CHD's *discovery loop* turns those
+numbers into graph-structure decisions (`decision.py`, `_GraphDiscoveryMain.py`). We exercise the four
+deterministic choices the loop makes, each with a positive check and a negative control:
+
+* **prune the least-activated ancestor** — `argMinFn` returns the index of the smallest activation
+  (`min_activation = np.argmin(activations)`); the *most*-activated ancestor is correctly **not** chosen;
+* **pick the kernel mode that admits an edge** — `kernelChooserFn` (`MinNoiseKernelChooser`) returns the
+  valid kernel (`noise < Z_low`) of least `noise`, or `none` when no kernel is valid;
+* **report the pruning iteration of largest `noise` jump** — `modeChooserFn` (`MaxIncrementModeChooser`)
+  returns the `argmax` of the increments;
+* **stop when every ancestor is pruned** — `allPrunedFn` fires on the all-zero mask and not before.
+
+The final block closes the loop end-to-end: it builds an SPD kernel, eigendecomposes it, and runs a
+`find_gamma`-style sweep — feeding the *verified* `varNoiseSpec` at several `γ` straight into `argMinFn`
+to select the regularization with least noise. Every decision runs over `Float`, the executable runtime
+scalar.
+-/
+
+@[expose] public section
+
+
+namespace NN.Examples.Factorization.Discovery
+
+/-- A length-3 `Float` family `Fin 3 → Float` from three entries. -/
+def vec3 (a b c : Float) : Fin 3 → Float := fun i => [a, b, c].getD i.val 0.0
+/-- A length-4 `Float` family `Fin 4 → Float` from four entries. -/
+def vec4 (a b c d : Float) : Fin 4 → Float := fun i => [a, b, c, d].getD i.val 0.0
+
+/-- Build a length-`n` `Float` vector tensor from a list (missing entries `0`). -/
+def mkVec {n : Nat} (xs : List Float) : Spec.Tensor Float (.dim n .scalar) :=
+  Spec.ofVecFn (fun i => xs.getD i.val 0.0)
+
+/-- Encode a chooser verdict as an `Int`: `-1` for `none` ("no ancestor"), else the chosen index. -/
+def chooserCode {m : Nat} (o : Option (Fin m)) : Int :=
+  match o with
+  | none => -1
+  | some i => Int.ofNat i.val
+
+/-- Compiled positive assertion that a `Bool` decision is `true`. -/
+def assertTrue (name : String) (b : Bool) : IO Unit :=
+  if b then IO.println s!"{name}: OK"
+  else throw (IO.userError s!"{name}: FAIL (expected true)")
+
+/-- Compiled negative-control assertion that a `Bool` decision is `false` (the property correctly does
+*not* hold). -/
+def assertFalse (name : String) (b : Bool) : IO Unit :=
+  if b then throw (IO.userError s!"{name}: FAIL (expected false)")
+  else IO.println s!"{name}: OK (correctly false)"
+
+/-! ## Pruning: `argMinFn` removes the least-activated ancestor -/
+
+/-- Activations of four candidate ancestors; ancestor 1 is the least-activated. -/
+def activations : Fin 4 → Float := vec4 0.8 0.2 0.5 0.9
+
+#eval IO.println s!"activations = {(List.finRange 4).map activations}, \
+  argMin = {(Spec.argMinFn activations).val}"
+
+-- Positive — the prune step removes the least-activated ancestor (`argMinFn_le`).
+#eval assertTrue "prune picks the least-activated ancestor (argmin = 1)"
+  ((Spec.argMinFn activations).val == 1)
+
+-- Negative — it does *not* remove the most-activated ancestor (index 3).
+#eval assertFalse "prune does not pick the most-activated ancestor"
+  ((Spec.argMinFn activations).val == 3)
+
+/-! ## Kernel chooser: least-noise valid kernel, or `none` -/
+
+/-- Three candidate kernels' `noise` levels and `Z_low` lower bounds. Validity is `noise < Z_low`:
+kernel 0 invalid (`0.3 ≥ 0.2`), kernel 1 valid (`0.1 < 0.4`), kernel 2 invalid (`0.5 ≥ 0.1`). -/
+def noisesA : Fin 3 → Float := vec3 0.3 0.1 0.5
+def ZlowsA : Fin 3 → Float := vec3 0.2 0.4 0.1
+
+#eval IO.println s!"kernel chooser (one valid) -> code {chooserCode (Spec.kernelChooserFn noisesA ZlowsA)}"
+
+-- Positive — exactly kernel 1 is valid, so the chooser admits an edge via kernel 1 (`kernelChooserFn_eq_some`).
+#eval assertTrue "kernel chooser selects the unique valid kernel (some 1)"
+  (chooserCode (Spec.kernelChooserFn noisesA ZlowsA) == 1)
+
+/-- Two valid kernels (0 and 1); the chooser must take the one of *least* noise (kernel 0, `0.05`). -/
+def noisesB : Fin 3 → Float := vec3 0.05 0.1 0.5
+def ZlowsB : Fin 3 → Float := vec3 0.2 0.4 0.1
+
+-- Positive — among valid kernels the chooser takes least noise (kernel 0 beats kernel 1).
+#eval assertTrue "kernel chooser takes least noise among valid (some 0)"
+  (chooserCode (Spec.kernelChooserFn noisesB ZlowsB) == 0)
+
+/-- No kernel is valid (`noise ≥ Z_low` everywhere): the chooser reports "no ancestor". -/
+def noisesC : Fin 3 → Float := vec3 0.5 0.6 0.7
+def ZlowsC : Fin 3 → Float := vec3 0.1 0.2 0.3
+
+-- Negative — no valid kernel ⟹ no edge (`kernelChooserFn_eq_none`); code `-1`.
+#eval assertTrue "kernel chooser reports none when no kernel is valid (code -1)"
+  (chooserCode (Spec.kernelChooserFn noisesC ZlowsC) == -1)
+
+/-! ## Mode chooser: the iteration of largest `noise` increment -/
+
+/-- The per-iteration `noise` sequence of a pruning run. The big jump `0.08 → 0.9` is between iterations
+1 and 2, so `MaxIncrementModeChooser` reports iteration 1 (increment `0.82`). -/
+def noiseSeq : Fin 4 → Float := vec4 0.05 0.08 0.9 0.95
+
+#eval IO.println s!"increments = {(List.finRange 4).map (Spec.modeIncrementFn noiseSeq)}, \
+  modeChooser = {(Spec.modeChooserFn noiseSeq).val}"
+
+-- Positive — the mode chooser reports the largest-jump iteration (`modeChooserFn_ge`).
+#eval assertTrue "mode chooser picks the largest noise-increment iteration (argmax = 1)"
+  ((Spec.modeChooserFn noiseSeq).val == 1)
+
+-- Negative — it does *not* report a tiny-increment iteration (iteration 0, increment 0.03).
+#eval assertFalse "mode chooser does not pick a tiny-increment iteration"
+  ((Spec.modeChooserFn noiseSeq).val == 0)
+
+/-! ## Stopping rule: fire exactly when all ancestors are pruned -/
+
+-- Positive — the loop stops when every ancestor mode is zero (`allPrunedFn_iff`).
+#eval assertTrue "stopping rule fires when all ancestors are pruned"
+  (Spec.allPrunedFn (vec3 0.0 0.0 0.0))
+
+-- Negative — it does not fire while an ancestor remains active.
+#eval assertFalse "stopping rule does not fire while an ancestor remains"
+  (Spec.allPrunedFn (vec3 0.0 1.0 0.0))
+
+/-! ## End-to-end: `find_gamma` feeds the verified `noise` into `argMinFn`
+
+A `find_gamma`-style sweep: build an SPD kernel, eigendecompose it, evaluate the verified
+`varNoiseSpec` at several `γ`, and let `argMinFn` pick the regularization of least noise — exactly the
+discovery layer consuming the verified statistic. More regularization means more noise, so the smallest
+`γ` wins (index 0). -/
+
+/-- A `3 × 3` symmetric positive-definite kernel. -/
+def K : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) :=
+  mkMat [[2.0, 0.5, 0.3],
+         [0.5, 2.0, 0.4],
+         [0.3, 0.4, 2.0]]
+
+/-- Its eigendecomposition `(evals, V)` from the Jacobi solver. -/
+def evals : Spec.Tensor Float (.dim 3 .scalar) := (Spec.symEigJacobiSpec K 12).1
+def V : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := (Spec.symEigJacobiSpec K 12).2
+
+/-- The data vector `ga`. -/
+def ga : Spec.Tensor Float (.dim 3 .scalar) := mkVec [1.0, 2.0, 3.0]
+
+/-- The candidate regularizations, increasing. -/
+def gammas : Fin 3 → Float := vec3 0.01 0.1 1.0
+
+/-- The verified `noise` at each candidate `γ` (`find_gamma`'s loss). -/
+def noiseAt : Fin 3 → Float := fun i => Spec.varNoiseSpec evals V (gammas i) ga
+
+#eval IO.println s!"find_gamma noises = {(List.finRange 3).map noiseAt}, \
+  argMin γ index = {(Spec.argMinFn noiseAt).val}"
+
+-- Positive — every swept noise is a genuine fraction in [0,1] (numeric witness of `varNoiseFn_nonneg`/`_le_one`).
+#eval assertTrue "find_gamma noises all lie in [0,1]"
+  ((List.finRange 3).all (fun i => 0.0 ≤ noiseAt i && noiseAt i ≤ 1.0))
+
+-- Positive — `find_gamma` (argmin of the verified noise) selects the least-regularized γ (index 0).
+#eval assertTrue "find_gamma selects least-noise γ via argMinFn (index 0)"
+  ((Spec.argMinFn noiseAt).val == 0)
+
+end NN.Examples.Factorization.Discovery
diff --git a/NN/Proofs/Tensor/Basic.lean b/NN/Proofs/Tensor/Basic.lean
index 7ca7dc3..6f094ff 100644
--- a/NN/Proofs/Tensor/Basic.lean
+++ b/NN/Proofs/Tensor/Basic.lean
@@ -13,6 +13,7 @@ public import NN.Proofs.Tensor.Basic.Factorizations
 public import NN.Proofs.Tensor.Basic.FactorizationsReconstruction
 public import NN.Proofs.Tensor.Basic.FactorizationsSolve
 public import NN.Proofs.Tensor.Basic.FactorizationsVariational
+public import NN.Proofs.Tensor.Basic.FactorizationsDecision
 public import NN.Proofs.Tensor.Basic.FactorizationsKernels
 public import NN.Proofs.Tensor.Basic.FactorizationsOrthonormal
 public import NN.Proofs.Tensor.Basic.FactorizationsJacobi
diff --git a/NN/Proofs/Tensor/Basic/FactorizationsDecision.lean b/NN/Proofs/Tensor/Basic/FactorizationsDecision.lean
new file mode 100644
index 0000000..c927e86
--- /dev/null
+++ b/NN/Proofs/Tensor/Basic/FactorizationsDecision.lean
@@ -0,0 +1,197 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Proofs.Tensor.Basic.FactorizationsVariational
+public import NN.Proofs.Tensor.Basic.FactorizationsReconstruction
+
+/-!
+# CHD discovery decision layer (`decision.py`, `_GraphDiscoveryMain.py`)
+
+[`FactorizationsVariational`](./FactorizationsVariational.lean) proved that CHD's `noise` level is a
+spectral fraction in `[0,1]`. This file closes the gap up to the *graph-structure decisions* CHD makes
+from those numbers — the outer discovery loop. Each is a deterministic comparison over finite data; the
+executable specs (`Spec.argMinFn`, `Spec.kernelChooserFn`, …) mirror the Python verbatim, and the
+theorems here establish their selection guarantees:
+
+* **`argMinFn_le` / `argMaxFn_le`** — the fold-based `np.argmin`/`np.argmax` really return the index of a
+  least / greatest element (the activation prune step, the mode chooser).
+* **`kernelChooserFn_eq_some` / `kernelChooserFn_eq_none`** — `MinNoiseKernelChooser` is *sound and
+  complete*: it returns `some s` with `s` valid and of least `noise` among valid kernels exactly when a
+  valid kernel exists, and `none` otherwise. The `noise ≤ 1` precondition that makes the `2` sentinel
+  work is exactly the verified `varNoiseFn_le_one`.
+* **`modeChooserFn_ge`** — `MaxIncrementModeChooser` returns the iteration of largest `noise` increment.
+* **`allPrunedFn_iff`** — the stopping test `np.all(active_modes == 0)` holds iff every ancestor is
+  pruned.
+
+Scope honesty: everything is exact over `ℝ`. The comparisons in the specs go through the `Context` order
+test (`gtBool`/`ltBool`); `gtBool_true_iff` (from `FactorizationsReconstruction`) bridges them to the
+real `<`, after which the selection proofs are pure order theory over `Fin (n+1)`.
+-/
+
+@[expose] public section
+
+namespace Spec.Factorization
+
+open Spec.Factorization.Reconstruction
+
+variable {n : Nat}
+
+/-! ## Bridge: the `Context` order tests over `ℝ` -/
+
+/-- Over `ℝ`, `gtBool x y` is the decidable `y < x`. -/
+theorem gtBool_eq_decide (x y : ℝ) : Context.gtBool x y = decide (y < x) := by
+  by_cases h : y < x
+  · have h1 : Context.gtBool x y = true := gtBool_true_iff.mpr h
+    rw [h1]; simp [h]
+  · have h1 : Context.gtBool x y = false := by
+      cases hc : Context.gtBool x y with
+      | false => rfl
+      | true => exact absurd (gtBool_true_iff.mp hc) h
+    rw [h1]; simp [h]
+
+/-- Over `ℝ`, `ltBool x y` is the decidable `x < y`. -/
+theorem ltBool_eq_decide (x y : ℝ) : Spec.ltBool x y = decide (x < y) := by
+  rw [Spec.ltBool, gtBool_eq_decide]
+
+/-! ## A generic fold-selection lemma
+
+Both `argMinFn` and `argMaxFn` are `List.foldl`s of the shape "keep the running best, swap in `j` when
+the `Bool` test `cmp (key j) (key best)` fires". The next lemma proves such a fold returns a `le`-best
+index over `init :: l`, for any preorder `le` whose strict part is decided by `cmp`. Instantiating
+`le := (· ≤ ·)` gives the argmax guarantee; `le := (· ≥ ·)` gives argmin. -/
+
+private theorem foldl_select {m : Nat} (key : Fin m → ℝ) (cmp : ℝ → ℝ → Bool)
+    (le : ℝ → ℝ → Prop) (hrefl : ∀ x, le x x)
+    (htrans : ∀ x y z, le x y → le y z → le x z)
+    (htrue : ∀ x y, cmp x y = true → le y x) (hfalse : ∀ x y, cmp x y = false → le x y)
+    (init : Fin m) (l : List (Fin m)) :
+    le (key init)
+        (key (l.foldl (fun best j => if cmp (key j) (key best) then j else best) init))
+      ∧ ∀ j ∈ l,
+        le (key j)
+          (key (l.foldl (fun best j => if cmp (key j) (key best) then j else best) init)) := by
+  induction l generalizing init with
+  | nil => exact ⟨hrefl _, by simp⟩
+  | cons j₀ t ih =>
+    rw [List.foldl_cons]
+    set best' := (if cmp (key j₀) (key init) then j₀ else init) with hb
+    have hstep_init : le (key init) (key best') := by
+      by_cases hcmp : cmp (key j₀) (key init) = true
+      · rw [hb, if_pos hcmp]; exact htrue _ _ hcmp
+      · rw [hb, if_neg hcmp]; exact hrefl _
+    have hstep_j0 : le (key j₀) (key best') := by
+      by_cases hcmp : cmp (key j₀) (key init) = true
+      · rw [hb, if_pos hcmp]; exact hrefl _
+      · rw [hb, if_neg hcmp]
+        rw [Bool.not_eq_true] at hcmp
+        exact hfalse _ _ hcmp
+    obtain ⟨hm, hc⟩ := ih best'
+    refine ⟨htrans _ _ _ hstep_init hm, ?_⟩
+    intro j hj
+    rcases List.mem_cons.mp hj with rfl | hj'
+    · exact htrans _ _ _ hstep_j0 hm
+    · exact hc j hj'
+
+/-! ## `argmin` / `argmax` -/
+
+/-- **`argMinFn` returns the index of a least element.** -/
+theorem argMinFn_le (a : Fin (n + 1) → ℝ) (j : Fin (n + 1)) :
+    a (Spec.argMinFn a) ≤ a j := by
+  have h := foldl_select (key := a) (cmp := Spec.ltBool) (le := fun p q => q ≤ p)
+    (fun x => le_refl x) (fun x y z hxy hyz => le_trans hyz hxy)
+    (fun x y hh => by rw [ltBool_eq_decide] at hh; exact (of_decide_eq_true hh).le)
+    (fun x y hh => by rw [ltBool_eq_decide] at hh; exact not_lt.mp (of_decide_eq_false hh))
+    (0 : Fin (n + 1)) (List.finRange (n + 1))
+  exact h.2 j (List.mem_finRange j)
+
+/-- **`argMaxFn` returns the index of a greatest element.** -/
+theorem argMaxFn_le (a : Fin (n + 1) → ℝ) (j : Fin (n + 1)) :
+    a j ≤ a (Spec.argMaxFn a) := by
+  have h := foldl_select (key := a) (cmp := Context.gtBool) (le := fun p q => p ≤ q)
+    (fun x => le_refl x) (fun x y z => le_trans)
+    (fun x y hh => by rw [gtBool_eq_decide] at hh; exact (of_decide_eq_true hh).le)
+    (fun x y hh => by rw [gtBool_eq_decide] at hh; exact not_lt.mp (of_decide_eq_false hh))
+    (0 : Fin (n + 1)) (List.finRange (n + 1))
+  exact h.2 j (List.mem_finRange j)
+
+/-! ## `MinNoiseKernelChooser` -/
+
+/-- **`MinNoiseKernelChooser` is sound and complete (some branch).** If some kernel is valid
+(`noise < Z_low`) and all noises respect the ceiling `noise ≤ 1` (the verified `varNoiseFn_le_one`),
+the chooser returns `some s` with `s` itself valid and of least `noise` among all valid kernels. -/
+theorem kernelChooserFn_eq_some {noises Zlows : Fin (n + 1) → ℝ}
+    (hbound : ∀ i, noises i ≤ 1) {v : Fin (n + 1)} (hv : noises v < Zlows v) :
+    ∃ s, Spec.kernelChooserFn noises Zlows = some s ∧ noises s < Zlows s
+      ∧ ∀ j, noises j < Zlows j → noises s ≤ noises j := by
+  -- the `np.where`-replaced key (valid ↦ noise, invalid ↦ the `2` sentinel `1 + 1`)
+  set key : Fin (n + 1) → ℝ :=
+    (fun i => if Spec.ltBool (noises i) (Zlows i) then noises i else (1 : ℝ) + 1) with hkeydef
+  have hkv : ∀ i, noises i < Zlows i → key i = noises i := by
+    intro i hi
+    show (if Spec.ltBool (noises i) (Zlows i) then noises i else (1 : ℝ) + 1) = noises i
+    rw [ltBool_eq_decide]; simp [hi]
+  have hkinv : ∀ i, ¬ noises i < Zlows i → key i = (1 : ℝ) + 1 := by
+    intro i hi
+    show (if Spec.ltBool (noises i) (Zlows i) then noises i else (1 : ℝ) + 1) = (1 : ℝ) + 1
+    rw [ltBool_eq_decide]; simp [hi]
+  set s := Spec.argMinFn key with hs
+  have hle : ∀ j, key s ≤ key j := fun j => argMinFn_le key j
+  -- the chosen `s` is valid: otherwise `key s = 2 ≤ key v = noises v ≤ 1`, impossible
+  have hsvalid : noises s < Zlows s := by
+    by_contra hns
+    have hchain := hle v
+    rw [hkinv s hns, hkv v hv] at hchain
+    have := le_trans hchain (hbound v)
+    norm_num at this
+  refine ⟨s, ?_, hsvalid, ?_⟩
+  · show (if Spec.ltBool (noises s) (Zlows s) then some s else none) = some s
+    have hbt : Spec.ltBool (noises s) (Zlows s) = true := by rw [ltBool_eq_decide]; simp [hsvalid]
+    rw [if_pos hbt]
+  · intro j hj
+    have hchain := hle j
+    rwa [hkv s hsvalid, hkv j hj] at hchain
+
+/-- **`MinNoiseKernelChooser` is sound and complete (none branch).** If no kernel is valid, the chooser
+returns `none` — CHD's "no ancestor" verdict. -/
+theorem kernelChooserFn_eq_none {noises Zlows : Fin (n + 1) → ℝ}
+    (hno : ∀ i, ¬ noises i < Zlows i) : Spec.kernelChooserFn noises Zlows = none := by
+  set key : Fin (n + 1) → ℝ :=
+    (fun i => if Spec.ltBool (noises i) (Zlows i) then noises i else (1 : ℝ) + 1) with hkeydef
+  set s := Spec.argMinFn key with hs
+  show (if Spec.ltBool (noises s) (Zlows s) then some s else none) = none
+  have hbf : Spec.ltBool (noises s) (Zlows s) = true → False := by
+    rw [ltBool_eq_decide]; intro h; exact (hno s) (of_decide_eq_true h)
+  rw [if_neg hbf]
+
+/-! ## `MaxIncrementModeChooser` -/
+
+/-- **`MaxIncrementModeChooser` returns the iteration of largest `noise` increment.** -/
+theorem modeChooserFn_ge (noises : Fin (n + 1) → ℝ) (j : Fin (n + 1)) :
+    Spec.modeIncrementFn noises j ≤ Spec.modeIncrementFn noises (Spec.modeChooserFn noises) := by
+  rw [Spec.modeChooserFn]
+  exact argMaxFn_le (Spec.modeIncrementFn noises) j
+
+/-! ## The stopping rule -/
+
+/-- **The stopping test `np.all(active_modes == 0)` holds iff every ancestor is pruned.** -/
+theorem allPrunedFn_iff {k : Nat} (m : Fin k → ℝ) :
+    Spec.allPrunedFn m = true ↔ ∀ i, m i = 0 := by
+  rw [Spec.allPrunedFn, List.all_eq_true]
+  have key : ∀ i : Fin k,
+      ((!Context.gtBool (m i) 0 && !Context.gtBool 0 (m i)) = true) ↔ m i = 0 := by
+    intro i
+    rw [gtBool_eq_decide, gtBool_eq_decide, ← decide_not, ← decide_not, Bool.and_eq_true,
+      decide_eq_true_eq, decide_eq_true_eq]
+    constructor
+    · rintro ⟨h1, h2⟩; exact le_antisymm (not_lt.mp h1) (not_lt.mp h2)
+    · intro h; rw [h]; exact ⟨lt_irrefl 0, lt_irrefl 0⟩
+  constructor
+  · intro h i; exact (key i).mp (h i (List.mem_finRange i))
+  · intro h i _; exact (key i).mpr (h i)
+
+end Spec.Factorization
diff --git a/NN/Spec/Core/Tensor/Factorizations.lean b/NN/Spec/Core/Tensor/Factorizations.lean
index 6e6c675..13bd437 100644
--- a/NN/Spec/Core/Tensor/Factorizations.lean
+++ b/NN/Spec/Core/Tensor/Factorizations.lean
@@ -505,4 +505,60 @@ def gaussianKernelSpec {n d : Nat} (X : Tensor α (.dim n (.dim d .scalar)))
     (w : Tensor α (.dim d .scalar)) (scale l : α) : Tensor α (.dim n (.dim n .scalar)) :=
   ofMatFn (gaussianKernelFn (toMatFn X) (toVecFn w) scale l)
 
+/-! ## CHD discovery decision layer (`decision.py`, `_GraphDiscoveryMain.py`)
+
+Everything above turns a kernel `K` into a `noise` level (`varNoiseFn`, proven to lie in `[0,1]`) and a
+`Z_test` lower bound `Z_low`. CHD's outer *discovery loop* turns those numbers into graph-structure
+decisions. Four deterministic choices, each a comparison over finite data:
+
+* **which feature to prune next** — the least-*activated* ancestor (`min_activation = np.argmin(activations)`
+  in `helper_functions.step`);
+* **which kernel mode admits an edge** — `MinNoiseKernelChooser`: the valid kernel (`noise < Z_low`) of
+  least `noise`, or none if no kernel is valid (`decision.py`);
+* **which pruning iteration to report** — `MaxIncrementModeChooser`: the iteration of largest `noise`
+  increment (`decision.py`);
+* **when to stop** — `np.all(active_modes == 0)`, every ancestor pruned (`_GraphDiscoveryMain.py`).
+
+The definitions mirror the Python verbatim; their *selection guarantees* (the chosen index really is the
+least/greatest, the chooser is sound and complete against `noise < Z_low`) are proved over `ℝ` in
+[`NN.Proofs.Tensor.Basic.FactorizationsDecision`](../../../Proofs/Tensor/Basic/FactorizationsDecision.lean).
+Comparisons use the `Context` order test (`ltBool`/`gtBool`), so the same definitions run over `Float`. -/
+
+/-- Index of a least element of a nonempty finite family (first on ties), by a left fold — CHD's
+`np.argmin(activations)` for picking the least-activated ancestor to prune. -/
+def argMinFn {n : Nat} (a : Fin (n + 1) → α) : Fin (n + 1) :=
+  (List.finRange (n + 1)).foldl (fun best j => if ltBool (a j) (a best) then j else best)
+    (0 : Fin (n + 1))
+
+/-- Index of a greatest element of a nonempty finite family (first on ties), by a left fold — the
+`np.argmax` underlying the mode chooser. -/
+def argMaxFn {n : Nat} (a : Fin (n + 1) → α) : Fin (n + 1) :=
+  (List.finRange (n + 1)).foldl (fun best j => if Context.gtBool (a j) (a best) then j else best)
+    (0 : Fin (n + 1))
+
+/-- CHD `MinNoiseKernelChooser`. Among candidate kernels with per-kernel `noise` and `Z_low`, a kernel
+is *valid* (admits an edge) when `noise < Z_low`; return the valid kernel of least `noise`, or `none`
+if none is valid. Mirrors `valid = noises < Z_lows; argmin(np.where(valid, noises, 2))` — the `2`
+sentinel (any value above the `noise` ceiling `1`) written `1 + 1` to stay polymorphic. -/
+def kernelChooserFn {n : Nat} (noises Zlows : Fin (n + 1) → α) : Option (Fin (n + 1)) :=
+  let key := fun i => if ltBool (noises i) (Zlows i) then noises i else 1 + 1
+  let s := argMinFn key
+  if ltBool (noises s) (Zlows s) then some s else none
+
+/-- Per-iteration `noise` increments of CHD `MaxIncrementModeChooser`:
+`increments[i] = noises[i+1] − noises[i]` for an interior `i`, and `1 − noises[last]` at the end (the
+gap to the `noise` ceiling, `np.append(increments, 1 - list_of_noises[-1])`). -/
+def modeIncrementFn {n : Nat} (noises : Fin (n + 1) → α) : Fin (n + 1) → α :=
+  fun i => if h : i.val + 1 < n + 1 then noises ⟨i.val + 1, h⟩ - noises i else 1 - noises i
+
+/-- CHD `MaxIncrementModeChooser`: report the pruning iteration with the largest jump in `noise`
+(`np.argmax(increments)`). -/
+def modeChooserFn {n : Nat} (noises : Fin (n + 1) → α) : Fin (n + 1) :=
+  argMaxFn (modeIncrementFn noises)
+
+/-- CHD stopping rule `np.all(active_modes == 0)`: every ancestor has been pruned. An entry counts as
+zero exactly when it is neither positive nor negative in the `Context` order. -/
+def allPrunedFn {k : Nat} (m : Fin k → α) : Bool :=
+  (List.finRange k).all (fun i => !Context.gtBool (m i) 0 && !Context.gtBool 0 (m i))
+
 end Spec
diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
index 0624096..33f44d6 100644
--- a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
+++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
@@ -314,6 +314,50 @@ With the linear, quadratic, and Gaussian modes all discharged, *every kernel CHD
 PSD-verified*: there is no `PosSemidef` hypothesis left to assume anywhere in the solve / `find_gamma` /
 `Z_test` development.
 
+# The discovery decision layer: turning `noise` into graph structure
+
+Everything above produces *numbers* — a kernel, its eigendecomposition, and the `noise` level
+(`varNoiseFn`, proven to be a fraction in `[0,1]`). CHD's outer *discovery loop*
+(`decision.py`, `_GraphDiscoveryMain.py`) turns those numbers into the actual hypergraph: which
+ancestors a node depends on, through which kernel mode.
+[`NN.Proofs.Tensor.Basic.FactorizationsDecision`](https://github.com/lean-dojo/TorchLean/blob/main/NN/Proofs/Tensor/Basic/FactorizationsDecision.lean)
+formalizes the four deterministic choices the loop makes and proves each one's selection guarantee. They
+are comparisons over finite data, so the specs (`Spec.argMinFn`, `Spec.kernelChooserFn`, …) mirror the
+Python verbatim and run over `Float`; the proofs are over `ℝ`, bridged from the `Context` order test
+(`gtBool`/`ltBool`) to the real `<` by `gtBool_eq_decide`.
+
+The backbone is a single generic fold-selection lemma (`foldl_select`): the running-best `List.foldl`
+that both `np.argmin` and `np.argmax` compile to returns a `le`-extremal index over the whole family, for
+*any* preorder `le` whose strict part the `Bool` test decides. Instantiating `le := (· ≤ ·)` and
+`(· ≥ ·)` gives the two endpoints:
+
+- *Prune the least-activated ancestor.* Each step of `helper_functions.step` drops the candidate of
+  smallest *activation* (`min_activation = np.argmin(activations)`); `argMinFn_le` proves the fold returns
+  a global minimizer, `argMaxFn_le` the dual.
+- *Choose the kernel mode that admits an edge.* `MinNoiseKernelChooser` calls a kernel *valid* when its
+  `noise` falls below its `Z_low`, and returns the valid kernel of least `noise`
+  (`argmin(np.where(valid, noises, 2))`), or "no ancestor" if none is valid. `kernelChooserFn_eq_some`
+  and `kernelChooserFn_eq_none` prove it *sound and complete*: it returns `some s` with `s` itself valid
+  and of least `noise` among *all* valid kernels exactly when some kernel is valid, and `none` otherwise.
+  The `2` sentinel that suppresses invalid kernels only works because `noise ≤ 1` — which is exactly the
+  verified `varNoiseFn_le_one`, threaded in as the hypothesis `hbound`. The bound proved two sections ago
+  is what makes the decision correct.
+- *Report the pruning iteration of largest `noise` jump.* `MaxIncrementModeChooser` takes the `argmax`
+  of the successive `noise` increments (with `1 − noise_last` appended); `modeChooserFn_ge` proves the
+  reported iteration has the maximal increment.
+- *Stop when every ancestor is pruned.* `allPrunedFn_iff` proves the stopping test
+  `np.all(active_modes == 0)` holds iff every entry is zero.
+
+So the loop's structural decisions are not heuristics layered on top of unverified arithmetic: each is a
+proved-correct selection over the `noise` statistic whose `[0,1]` range was itself proved. The
+`Discovery` example runs all four on concrete data — argmin picks the least-activated ancestor (and not
+the most-activated one), the chooser selects the unique valid kernel, takes least noise among two valid
+ones, and reports `none` when all are invalid, the mode chooser picks the largest-jump iteration, and the
+stopping rule fires only on the all-zero mask — and then closes the stack end-to-end: it builds an SPD
+kernel, eigendecomposes it, and runs a `find_gamma`-style sweep that feeds the verified `varNoiseSpec` at
+several `γ` straight into `argMinFn`, selecting the least-noise regularization (the smallest `γ`, every
+swept noise landing in `[0,1]` as proved).
+
 # The a-posteriori residual certificate
 
 For the iterative routines, the replacement for an impossible a-priori convergence proof is an exact
@@ -499,10 +543,17 @@ Cholesky-based regularized solve are proved, and the specification-level facts t
 on are independent of the convergence step. The three concrete CHD routines built on them are now
 identified too: the eigendecomposition-form `solve_variationnal` equals `-(K + γI)⁻¹ ga` and agrees
 with the Cholesky route, and the `noise`/`find_gamma`-loss/`Z_test` statistic is a spectral ratio
-provably in `[0,1]` that depends on the kernel only through its spectrum. And the kernel build itself
+provably in `[0,1]` that depends on the kernel only through its spectrum. The kernel build itself
 is now PSD-verified for *all three* CHD modes — linear, quadratic, and Gaussian — so the standing
-`PosSemidef` hypothesis is discharged from data, not assumed, even for the fully-nonlinear kernel. So
-the CHD foundation is complete; the two remaining open items are the cyclic-Jacobi convergence *rate*
-(captured exactly by the a-posteriori residual certificate, never by `sorry`) and the `Z_test`'s
-Gaussian sampling and percentiles — one a proof-only gap on a quantity CHD does not need to *run*, the
-other statistical rather than algebraic and exercised numerically.
+`PosSemidef` hypothesis is discharged from data, not assumed, even for the fully-nonlinear kernel. And
+the *discovery decision layer* on top — the kernel chooser, the activation prune step, the mode
+chooser, and the stopping rule — is now proved sound and complete, with the chooser's correctness
+resting directly on the verified `noise ≤ 1` bound, so the structural decisions are proved selections
+over a statistic whose range was itself proved.
+
+So the CHD foundation is complete, from the kernel build through the regularized solve and the noise
+statistic up to the graph-structure decisions. The two remaining open items are both narrow and
+deliberately scoped: the cyclic-Jacobi convergence *rate* (captured exactly by the a-posteriori
+residual certificate, never by `sorry`) and the `Z_test`'s Gaussian sampling and percentiles — one a
+proof-only gap on a quantity CHD does not need to *run*, the other statistical rather than algebraic
+and exercised numerically.

From bdfdbbaedb21c94caecc6eeb1c36cbdf0c65ea5c Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Sun, 31 May 2026 20:18:23 -0700
Subject: [PATCH 16/22] Speed up Cholesky/ridge-solve #eval via strict array
 @[implemented_by]
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The cyclic-Jacobi eigensolver was migrated to strict `Array (Array α)`,
but `choleskyColsFn` and the triangular solves were left on the functional
`Fin n → Fin n → α` representation. There each Cholesky column is a closure
that re-evaluates every previous column, so reading the full factor `L` is
exponential — and the spec is compiled without `precompileModules`, so any
`#eval` of `solveRidgeSpec`/`choleskySpec` runs that closure in the
interpreter. A single 4×4 ridge solve cost ~310 s; the QuadraticKernel
example took ~645 s to build, and every ridge/Cholesky-using example was
similarly slow.

Add two strict, array-backed runtime implementations and register them with
`@[implemented_by]`:

* `choleskyColsImpl` → `choleskyColsFn` — materializes each column as an
  `Array α`, so a back-reference `L[i,k]` is an O(1) lookup.
* `solveRidgeImpl` → `solveRidgeFn` — factors `K + γ·I = L·Lᵀ` and runs both
  triangular substitutions entirely over `Array`s, building no `Fin n → α`
  accumulator closures.

`@[implemented_by]` swaps only the compiled/interpreted runtime code; the
functional definitions — and every correctness proof that reasons about them
(`FactorizationsSolve`, `FactorizationsReconstruction`, …) — are untouched.
The examples' residual checks `(K+γ·I)·x − b ≈ 0` / `A = L·Lᵀ` numerically
validate that the array path agrees with the proven definition.

Result: QuadraticKernel 645 s → 6.8 s; a full clean rebuild of
`NN.Examples.Factorization` + `NN.Proofs.Tensor.Basic` is 18.8 s (2705 jobs).
No proof changes; sorry/admit/omega-free.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 NN/Spec/Core/Tensor/Factorizations.lean | 78 ++++++++++++++++++++++++-
 1 file changed, 77 insertions(+), 1 deletion(-)

diff --git a/NN/Spec/Core/Tensor/Factorizations.lean b/NN/Spec/Core/Tensor/Factorizations.lean
index 13bd437..6181663 100644
--- a/NN/Spec/Core/Tensor/Factorizations.lean
+++ b/NN/Spec/Core/Tensor/Factorizations.lean
@@ -100,11 +100,43 @@ The columns are computed left to right. Column `j` uses only columns `0 .. j-1`:
 - above:     `L[i,j] = 0`                                           for `i < j`
 -/
 
+/--
+Strict, array-backed runtime implementation of `choleskyColsFn` (registered via `@[implemented_by]`).
+Each column is *materialized* into an `Array α`, so a back-reference `L[i,k]` is an `O(1)` lookup
+rather than a closure that re-evaluates the whole prefix. The closure form below is mathematically
+clean (and is what the proofs reason about), but reading the full factor `L` from it re-evaluates
+columns exponentially — ruinous in the interpreter (`#eval`). This computes the *same* factor strictly;
+the numeric examples (`A = L·Lᵀ`, the ridge-solve residual ≈ 0) validate the two agree.
+-/
+def choleskyColsImpl {n : Nat} (A : Fin n → Fin n → α) : List (Fin n → α) :=
+  let cols : Array (Array α) := (List.finRange n).foldl (fun cols j =>
+    let jv := j.val
+    -- Σ_{k<j} L[j,k]²  (previous columns at row `j`, read from the materialized arrays).
+    let sumsq := (List.finRange n).foldl
+      (fun s k => if k.val < jv then s + (cols.getD k.val #[]).getD jv 0 * (cols.getD k.val #[]).getD jv 0
+        else s) 0
+    let Ljj := MathFunctions.sqrt (A j j - sumsq)
+    let colArr : Array α := Array.ofFn (fun i : Fin n =>
+      if i.val < jv then 0
+      else if i.val == jv then Ljj
+      else
+        -- Σ_{k<j} L[i,k]·L[j,k]
+        let s := (List.finRange n).foldl
+          (fun acc k => if k.val < jv then
+            acc + (cols.getD k.val #[]).getD i.val 0 * (cols.getD k.val #[]).getD jv 0 else acc) 0
+        (A i j - s) / Ljj)
+    cols.push colArr) #[]
+  (List.finRange n).map (fun j => fun i => (cols.getD j.val #[]).getD i.val 0)
+
 /--
 The list of columns of the Cholesky factor `L`, as length-`n` vectors, computed left to right.
 Element `j` of the result is column `j` of `L`. Built by a left fold so that when column `j` is
 formed, `cols` already holds columns `0 .. j-1`.
+
+The runtime implementation is `choleskyColsImpl` (strict arrays); the closure form here is the one the
+correctness proofs reason about. Both compute the same factor.
 -/
+@[implemented_by choleskyColsImpl]
 def choleskyColsFn {n : Nat} (A : Fin n → Fin n → α) : List (Fin n → α) :=
   (List.finRange n).foldl (fun cols j =>
     -- Σ_{k<j} L[j,k]²  (the already-computed columns evaluated at row `j`).
@@ -169,8 +201,52 @@ this is symmetric positive-definite, so its Cholesky factorization succeeds. -/
 def addScaledIdFn {n : Nat} (K : Fin n → Fin n → α) (γ : α) : Fin n → Fin n → α :=
   fun i j => K i j + (if i = j then γ else 0)
 
+/--
+Strict, array-backed runtime implementation of `solveRidgeFn` (registered via `@[implemented_by]`).
+It factors `K + γ·I = L·Lᵀ` and runs both triangular substitutions entirely over `Array`s, so no step
+materializes the deep `Fin n → α` closures the functional definition builds — those re-evaluate
+columns / the substitution accumulator exponentially, which is ruinous in the interpreter (`#eval`).
+Same linear solve; the numeric examples (residual `(K+γ·I)·x − b ≈ 0`) validate the two agree.
+-/
+def solveRidgeImpl {n : Nat} (K : Fin n → Fin n → α) (γ : α) (b : Fin n → α) : Fin n → α :=
+  let A : Fin n → Fin n → α := fun i j => K i j + (if i.val == j.val then γ else 0)
+  -- Cholesky columns, left to right: `cols[j][i] = L[i][j]` (strict arrays, `O(1)` back-reference).
+  let cols : Array (Array α) := (List.finRange n).foldl (fun cols j =>
+    let jv := j.val
+    let sumsq := (List.finRange n).foldl
+      (fun s k => if k.val < jv then let v := (cols.getD k.val #[]).getD jv 0; s + v * v else s) 0
+    let Ljj := MathFunctions.sqrt (A j j - sumsq)
+    cols.push (Array.ofFn (fun i : Fin n =>
+      if i.val < jv then 0
+      else if i.val == jv then Ljj
+      else
+        let s := (List.finRange n).foldl (fun acc k =>
+          if k.val < jv then
+            acc + (cols.getD k.val #[]).getD i.val 0 * (cols.getD k.val #[]).getD jv 0
+          else acc) 0
+        (A i j - s) / Ljj))) #[]
+  let Lent : Nat → Nat → α := fun i j => (cols.getD j #[]).getD i 0
+  -- Forward solve `L · z = b`: `z[i] = (b[i] − Σ_{k<i} L[i,k]·z[k]) / L[i,i]`.
+  let z : Array α := (List.finRange n).foldl (fun z i =>
+    let iv := i.val
+    let s := (List.finRange n).foldl
+      (fun acc k => if k.val < iv then acc + Lent iv k.val * z.getD k.val 0 else acc) 0
+    z.push ((b i - s) / Lent iv iv)) #[]
+  -- Back solve `Lᵀ · x = z`: `x[i] = (z[i] − Σ_{k>i} L[k,i]·x[k]) / L[i,i]`, `i = n−1 … 0`.
+  let x : Array α := (List.finRange n).reverse.foldl (fun xs i =>
+    let iv := i.val
+    let s := (List.finRange n).foldl
+      (fun acc k => if iv < k.val then acc + Lent k.val iv * xs.getD k.val 0 else acc) 0
+    xs.set! iv ((z.getD iv 0 - s) / Lent iv iv)) (Array.replicate n 0)
+  fun i => x.getD i.val 0
+
 /-- The Tikhonov-regularized (kernel-ridge) solve `(K + γ·I)·x = b`, via the Cholesky factorization
-of `K + γ·I`. This is the linear solve at the core of CHD `solve_variationnal`. -/
+of `K + γ·I`. This is the linear solve at the core of CHD `solve_variationnal`.
+
+The runtime implementation is `solveRidgeImpl` (strict arrays); the closure form here, built from the
+verified `choleskyFn` / `triSolve*` pieces, is what the correctness proofs reason about. Both compute
+the same solution. -/
+@[implemented_by solveRidgeImpl]
 def solveRidgeFn {n : Nat} (K : Fin n → Fin n → α) (γ : α) (b : Fin n → α) : Fin n → α :=
   cholSolveFn (choleskyFn (addScaledIdFn K γ)) b
 

From 07ba1b3dacd15d6473de7a759d83f2dfa5633729 Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Sun, 31 May 2026 20:20:16 -0700
Subject: [PATCH 17/22] Formalize the CHD Z_test significance thresholds
 (well-posed)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

CHD's Z_test (interpolatory.py) decides edge significance by comparing the
observed `noise` against the null distribution of the same statistic under
random data: draw N samples, score each one's `noise`, sort, and read off the
5th/95th percentiles as Z_low/Z_high; an edge is significant when
`noise < Z_low`.

Spec layer (Factorizations.lean): `sampleNoisesFn` (each draw scored by the
*same* `varNoiseFn`), `leBool`, the order statistic `kthSmallestFn`
(mergeSort + index), `zLowIdx`/`zHighIdx` (⌊0.05·N⌋ / ⌊0.95·N⌋),
`zLowFn`/`zHighFn`, `zSignificantFn`, and tensor wrappers `zLowSpec`/`zHighSpec`
— mirroring Z_test verbatim and running on Float.

Proofs (FactorizationsDecision, sorry-free): the order-statistic toolkit
(`kthSmallestFn_mem`/`_nonneg`/`_le_one`/`_mono`, bridged to Mathlib's
`sortedLE_mergeSort` via `leBool_eq_le`) and the Z_test guarantees —
`sampleNoisesFn_nonneg`/`_le_one` (every null sample inherits the verified
`noise ∈ [0,1]` bound), `zLowFn`/`zHighFn ∈ [0,1]`, `zLowFn_le_zHighFn`
(Z_low ≤ Z_high by order-statistic monotonicity), and `zTest_admits_edge`
(`noise < Z_low` ⟹ `MinNoiseKernelChooser` returns `some 0`), the chooser's
`noise ≤ 1` precondition again being `varNoiseFn_le_one`. The keystone: the
same [0,1] bound governs both the data noise and the whole null distribution,
so the test is well-posed.

Examples (Discovery): builds the null distribution from a real
eigendecomposition, checks 0 ≤ Z_low ≤ Z_high ≤ 1, shows dominant-eigenvector
data clears the lower tail (significant), and rejects a high noise and a noise
at the upper tail (negative controls).

Blueprint: new "The Z_test: a null-distribution significance threshold"
section and updated scope summary — only the distributional half (Gaussian
draws + calibrated percentile, needs Mathlib.Probability) remains.

Verified: NN.Examples.Factorization + NN.Proofs.Tensor.Basic 2705 jobs green;
blueprint 4949 jobs; sorry/admit/omega-free.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 NN/Examples/Factorization.lean                |   5 +
 NN/Examples/Factorization/Discovery.lean      |  87 ++++++++-
 .../Tensor/Basic/FactorizationsDecision.lean  | 167 ++++++++++++++++++
 NN/Spec/Core/Tensor/Factorizations.lean       |  67 +++++++
 .../Ch4_Verification/Factorizations.lean      |  60 ++++++-
 5 files changed, 374 insertions(+), 12 deletions(-)

diff --git a/NN/Examples/Factorization.lean b/NN/Examples/Factorization.lean
index f887644..386e356 100644
--- a/NN/Examples/Factorization.lean
+++ b/NN/Examples/Factorization.lean
@@ -96,6 +96,11 @@ factorization misbehaves.
   an **end-to-end** block then feeds the verified `varNoiseSpec` at several `γ` into `argMinFn`, a
   `find_gamma` sweep selecting the least-noise regularization (all noises in `[0,1]`); **negative
   controls** confirm the most-activated ancestor and tiny-increment iterations are correctly rejected.
+  A closing **`Z_test`** block exercises the statistical layer: the null-distribution thresholds
+  `Z_low`/`Z_high` (5th/95th percentiles of the per-sample `noise`) are well-posed
+  (`0 ≤ Z_low ≤ Z_high ≤ 1`), data aligned with the dominant eigenvector clears the lower tail
+  (`noise < Z_low`, **positive**), and a high noise / a noise at the upper tail are rejected
+  (**negative controls**) — feeding `MinNoiseKernelChooser` exactly as in CHD.
 
 Both **positive** checks (a valid factorization reconstructs to `err ≈ 0`) and **negative controls**
 (the same metric reports a large error / `NaN` when a hypothesis is violated) are included, so a
diff --git a/NN/Examples/Factorization/Discovery.lean b/NN/Examples/Factorization/Discovery.lean
index f1bae3a..4a40538 100644
--- a/NN/Examples/Factorization/Discovery.lean
+++ b/NN/Examples/Factorization/Discovery.lean
@@ -25,10 +25,17 @@ deterministic choices the loop makes, each with a positive check and a negative
   returns the `argmax` of the increments;
 * **stop when every ancestor is pruned** — `allPrunedFn` fires on the all-zero mask and not before.
 
-The final block closes the loop end-to-end: it builds an SPD kernel, eigendecomposes it, and runs a
-`find_gamma`-style sweep — feeding the *verified* `varNoiseSpec` at several `γ` straight into `argMinFn`
-to select the regularization with least noise. Every decision runs over `Float`, the executable runtime
-scalar.
+A `find_gamma`-style block then closes the loop end-to-end: it builds an SPD kernel, eigendecomposes
+it, and feeds the *verified* `varNoiseSpec` at several `γ` straight into `argMinFn` to select the
+regularization with least noise.
+
+A final **`Z_test`** block adds the statistical layer (`interpolatory.py`): the observed `noise` is
+judged against the null distribution of the *same* statistic under random data — `Z_low`/`Z_high` are
+the 5th/95th percentiles of the per-sample noises. We check the thresholds are well-posed
+(`0 ≤ Z_low ≤ Z_high ≤ 1`, each percentile inheriting the verified `noise ∈ [0,1]` bound) and that the
+verdict `noise < Z_low` flags a real edge — with a genuine positive (data aligned with the dominant
+eigenvector clears the lower tail) and negatives (a high noise, and a noise sitting at the upper tail,
+are both correctly rejected). Every decision runs over `Float`, the executable runtime scalar.
 -/
 
 @[expose] public section
@@ -171,4 +178,76 @@ def noiseAt : Fin 3 → Float := fun i => Spec.varNoiseSpec evals V (gammas i) g
 #eval assertTrue "find_gamma selects least-noise γ via argMinFn (index 0)"
   ((Spec.argMinFn noiseAt).val == 0)
 
+/-! ## `Z_test`: the null-distribution significance thresholds
+
+CHD decides an edge is real by comparing the observed `noise` against the null distribution of the
+*same* statistic under random data: draw `N` samples, score each one's `noise`, sort, and read off the
+5th/95th percentiles as `Z_low`/`Z_high` (`Z_test` in `interpolatory.py`). An edge is significant when
+`noise < Z_low`. These checks corroborate `FactorizationsDecision`: the thresholds are well-posed
+(`0 ≤ Z_low ≤ Z_high ≤ 1`, each percentile inheriting the verified `noise ∈ [0,1]` bound) and the
+verdict drives `MinNoiseKernelChooser`. -/
+
+/-- An `N = 20` family of pseudo-random null draws `sⱼ ∈ ℝ³` (deterministic, standing in for CHD's
+`jax.random.normal` samples). With `N = 20` the percentile indices are `Z_low = ⌊0.05·20⌋ = 1` and
+`Z_high = ⌊0.95·20⌋ = 19`. -/
+def zSamples : Fin 20 → Fin 3 → Float :=
+  fun j i => (Float.ofNat ((j.val * 31 + i.val * 17 + 7) % 23) - 11.0) / 7.0
+
+/-- The regularization at which we run the `Z_test`. -/
+def gammaZ : Float := 0.1
+
+/-- `Z_low`: the 5th percentile of the null `noise` distribution from the verified eigendecomposition. -/
+def zLow : Float := Spec.zLowFn (Spec.toVecFn evals) (Spec.toMatFn V) gammaZ zSamples
+/-- `Z_high`: the 95th percentile of the null `noise` distribution. -/
+def zHigh : Float := Spec.zHighFn (Spec.toVecFn evals) (Spec.toMatFn V) gammaZ zSamples
+
+#eval IO.println s!"Z_test null thresholds: Z_low = {zLow}, Z_high = {zHigh}"
+
+-- Positive — the thresholds are ordered (`zLowFn_le_zHighFn`); `leBool` is the very key the sort uses.
+#eval assertTrue "Z_low ≤ Z_high (order-statistic monotonicity)" (Spec.leBool zLow zHigh)
+
+-- Positive — both thresholds are genuine fractions in [0,1] (`zLowFn_nonneg`/`_le_one`, `zHighFn_*`).
+#eval assertTrue "Z_low and Z_high both lie in [0,1]"
+  (Spec.leBool 0.0 zLow && Spec.leBool zLow 1.0 && Spec.leBool 0.0 zHigh && Spec.leBool zHigh 1.0)
+
+/-- The dominant eigen-direction (largest eigenvalue), found by the verified `argMaxFn`. -/
+def domIdx : Fin 3 := Spec.argMaxFn (Spec.toVecFn evals)
+/-- A "real signal": data aligned with the dominant eigenvector. Its `noise` is exactly the shrinkage
+`γ/(λ_dom+γ)` — the smallest shrinkage, so well below the null tail — the kind of edge CHD keeps. -/
+def signalGa : Fin 3 → Float := fun i => Spec.toMatFn V i domIdx
+/-- The observed `noise` of the signal-aligned data (the verified `varNoiseFn`). -/
+def obsSignal : Float := Spec.varNoiseFn (Spec.toVecFn evals) gammaZ (Spec.projFn (Spec.toMatFn V) signalGa)
+
+#eval IO.println s!"signal-aligned noise = {obsSignal}, significant (noise < Z_low)? \
+  {Spec.zSignificantFn obsSignal zLow}"
+
+-- Positive — the signal-aligned noise is itself a fraction in [0,1] (witness of `varNoiseFn_*`).
+#eval assertTrue "signal-aligned noise lies in [0,1]"
+  (Spec.leBool 0.0 obsSignal && Spec.leBool obsSignal 1.0)
+
+-- Positive — end-to-end: data aligned with the dominant eigenvector clears the null's lower tail, so
+-- the `Z_test` flags a real edge (`noise < Z_low`).
+#eval assertTrue "end-to-end: dominant-direction signal is significant (noise < Z_low)"
+  (Spec.zSignificantFn obsSignal zLow)
+
+-- Positive — a clearly-significant edge (noise 0.05 below threshold 0.20) is flagged (`zSignificantFn`).
+#eval assertTrue "significant edge: noise 0.05 < Z_low 0.20" (Spec.zSignificantFn 0.05 0.20)
+
+-- Negative — a noise *above* the threshold is correctly not significant.
+#eval assertFalse "non-significant: noise 0.50 ≥ Z_low 0.20" (Spec.zSignificantFn 0.50 0.20)
+
+-- Negative — the 95th-percentile value itself is never below the 5th (`zHigh ≥ zLow`), so feeding it as
+-- an "observed" noise is correctly judged non-significant — a faithful negative from the real null.
+#eval assertFalse "Z_high is not below Z_low (a noise at the upper tail is not significant)"
+  (Spec.zSignificantFn zHigh zLow)
+
+-- Positive — the `Z_test` verdict feeds `MinNoiseKernelChooser` (`zTest_admits_edge`): a significant
+-- single kernel is admitted as `some 0`.
+#eval assertTrue "significant kernel is admitted (chooser → some 0)"
+  (chooserCode (Spec.kernelChooserFn (fun _ : Fin 1 => (0.05 : Float)) (fun _ : Fin 1 => 0.20)) == 0)
+
+-- Negative — a non-significant single kernel is rejected (`none`, code -1).
+#eval assertTrue "non-significant kernel is rejected (chooser → none, code -1)"
+  (chooserCode (Spec.kernelChooserFn (fun _ : Fin 1 => (0.50 : Float)) (fun _ : Fin 1 => 0.20)) == -1)
+
 end NN.Examples.Factorization.Discovery
diff --git a/NN/Proofs/Tensor/Basic/FactorizationsDecision.lean b/NN/Proofs/Tensor/Basic/FactorizationsDecision.lean
index c927e86..42a7254 100644
--- a/NN/Proofs/Tensor/Basic/FactorizationsDecision.lean
+++ b/NN/Proofs/Tensor/Basic/FactorizationsDecision.lean
@@ -194,4 +194,171 @@ theorem allPrunedFn_iff {k : Nat} (m : Fin k → ℝ) :
   · intro h i; exact (key i).mp (h i (List.mem_finRange i))
   · intro h i _; exact (key i).mpr (h i)
 
+/-! ## CHD `Z_test`: the null-distribution significance thresholds
+
+`Z_test` (`interpolatory.py`) builds the null distribution of the `noise` statistic under random data,
+sorts the per-sample noises, and reports the 5th/95th percentiles as `Z_low`/`Z_high`. The numerical
+heart — the *value* of each sample's noise — is the **same** `varNoiseFn` whose `[0,1]` bound we already
+proved (`varNoiseFn_nonneg`/`_le_one`). So the percentiles inherit that bound, and `Z_low ≤ Z_high`
+because a 5th percentile never exceeds a 95th — pure order-statistic monotonicity over the sorted list.
+
+The order statistic `kthSmallestFn` sorts with the `Context` comparator `leBool`; over `ℝ` that is the
+real `≤` (`leBool_eq_le`), letting Mathlib's `sortedLE_mergeSort` supply sortedness. -/
+
+/-- Over `ℝ`, the `Context` comparator `leBool x y` is the decidable `x ≤ y`. -/
+theorem leBool_eq_decide (x y : ℝ) : Spec.leBool x y = decide (x ≤ y) := by
+  rw [Spec.leBool, ltBool_eq_decide, ← decide_not, decide_eq_decide]
+  exact not_lt
+
+/-- Over `ℝ`, the `leBool` sort key *is* the decided `(· ≤ ·)`, so `kthSmallestFn` sorts with the
+real order (matching Mathlib's `sortedLE_mergeSort`). -/
+private theorem leBool_eq_le : (Spec.leBool : ℝ → ℝ → Bool) = (fun x y => decide (x ≤ y)) := by
+  funext x y; exact leBool_eq_decide x y
+
+/-- `getD` at an in-range index is the corresponding `getElem` (the `0` fallback is unused). -/
+private theorem getD_zero_eq {L : List ℝ} {i : Nat} (h : i < L.length) : L.getD i 0 = L[i] := by
+  rw [List.getD_eq_getElem?_getD, List.getElem?_eq_getElem h, Option.getD_some]
+
+/-- `kthSmallestFn` over `ℝ` is the `k`-th entry of the list sorted by the *real* order. -/
+theorem kthSmallestFn_eq_sorted_getD {N : Nat} (a : Fin N → ℝ) (k : Nat) :
+    Spec.kthSmallestFn a k = (((List.finRange N).map a).mergeSort (· ≤ ·)).getD k 0 := by
+  rw [Spec.kthSmallestFn, leBool_eq_le]
+
+/-! ### Order-statistic facts -/
+
+/-- **`kthSmallestFn` is one of the family's values** (for an in-range `k`): sorting permutes, so the
+selected entry came from `a`. -/
+theorem kthSmallestFn_mem {N : Nat} (a : Fin N → ℝ) {k : Nat} (hk : k < N) :
+    ∃ i, Spec.kthSmallestFn a k = a i := by
+  have hlen : (((List.finRange N).map a).mergeSort (· ≤ ·)).length = N := by
+    rw [List.length_mergeSort, List.length_map, List.length_finRange]
+  have hk' : k < (((List.finRange N).map a).mergeSort (· ≤ ·)).length := by rw [hlen]; exact hk
+  have hmem : Spec.kthSmallestFn a k ∈ ((List.finRange N).map a).mergeSort (· ≤ ·) := by
+    rw [kthSmallestFn_eq_sorted_getD, getD_zero_eq hk']
+    exact List.getElem_mem hk'
+  rw [List.mem_mergeSort, List.mem_map] at hmem
+  obtain ⟨i, _, hi⟩ := hmem
+  exact ⟨i, hi.symm⟩
+
+/-- **An in-range order statistic is `≥ 0`** when every value is. -/
+theorem kthSmallestFn_nonneg {N : Nat} (a : Fin N → ℝ) (hpos : ∀ i, 0 ≤ a i) {k : Nat}
+    (hk : k < N) : 0 ≤ Spec.kthSmallestFn a k := by
+  obtain ⟨i, hi⟩ := kthSmallestFn_mem a hk; rw [hi]; exact hpos i
+
+/-- **An in-range order statistic is `≤ 1`** when every value is. -/
+theorem kthSmallestFn_le_one {N : Nat} (a : Fin N → ℝ) (hle : ∀ i, a i ≤ 1) {k : Nat}
+    (hk : k < N) : Spec.kthSmallestFn a k ≤ 1 := by
+  obtain ⟨i, hi⟩ := kthSmallestFn_mem a hk; rw [hi]; exact hle i
+
+/-- **Order statistics are monotone in their rank** (`k ≤ k' → kₜₕ ≤ k'ₜₕ`): the underlying list is
+sorted ascending, so later indices hold larger values. This is exactly why `Z_low ≤ Z_high`. -/
+theorem kthSmallestFn_mono {N : Nat} (a : Fin N → ℝ) {k k' : Nat} (hkk : k ≤ k') (hk' : k' < N) :
+    Spec.kthSmallestFn a k ≤ Spec.kthSmallestFn a k' := by
+  have hlen : (((List.finRange N).map a).mergeSort (· ≤ ·)).length = N := by
+    rw [List.length_mergeSort, List.length_map, List.length_finRange]
+  have hkL : k < (((List.finRange N).map a).mergeSort (· ≤ ·)).length := by
+    rw [hlen]; exact lt_of_le_of_lt hkk hk'
+  have hk'L : k' < (((List.finRange N).map a).mergeSort (· ≤ ·)).length := by rw [hlen]; exact hk'
+  rw [kthSmallestFn_eq_sorted_getD, kthSmallestFn_eq_sorted_getD, getD_zero_eq hkL, getD_zero_eq hk'L]
+  exact List.sortedLE_mergeSort.getElem_le_getElem_of_le hkk
+
+/-! ### The percentile indices -/
+
+/-- The 5th-percentile index is in range for a nonempty sample. -/
+theorem zLowIdx_lt {N : Nat} (hN : 0 < N) : Spec.zLowIdx N < N := by
+  rw [Spec.zLowIdx]; exact Nat.div_lt_self hN (by norm_num)
+
+/-- The 95th-percentile index is in range for a nonempty sample. -/
+theorem zHighIdx_lt {N : Nat} (hN : 0 < N) : Spec.zHighIdx N < N := by
+  rw [Spec.zHighIdx, Nat.div_lt_iff_lt_mul (by norm_num : (0 : Nat) < 20)]
+  nlinarith [hN]
+
+/-- The 5th-percentile index never exceeds the 95th. -/
+theorem zLowIdx_le_zHighIdx (N : Nat) : Spec.zLowIdx N ≤ Spec.zHighIdx N := by
+  rw [Spec.zLowIdx, Spec.zHighIdx]
+  exact Nat.div_le_div_right (Nat.le_mul_of_pos_left N (by norm_num))
+
+/-! ### Each null sample's noise inherits the `[0,1]` bound -/
+
+/-- **Every `Z_test` null sample is a genuine fraction** (`0 ≤ noise`): it is `varNoiseFn` of the
+projected draw, and `varNoiseFn_nonneg` already bounds that. -/
+theorem sampleNoisesFn_nonneg {n N : Nat} {Λ : Fin n → ℝ} (hΛ : ∀ i, 0 ≤ Λ i) {γ : ℝ}
+    (hγ : 0 < γ) (V : Fin n → Fin n → ℝ) (samples : Fin N → Fin n → ℝ) (j : Fin N) :
+    0 ≤ Spec.sampleNoisesFn Λ V γ samples j := by
+  rw [Spec.sampleNoisesFn]; exact varNoiseFn_nonneg hΛ hγ _
+
+/-- **Every `Z_test` null sample is `≤ 1`** (`varNoiseFn_le_one`). -/
+theorem sampleNoisesFn_le_one {n N : Nat} {Λ : Fin n → ℝ} (hΛ : ∀ i, 0 ≤ Λ i) {γ : ℝ}
+    (hγ : 0 < γ) (V : Fin n → Fin n → ℝ) (samples : Fin N → Fin n → ℝ) (j : Fin N) :
+    Spec.sampleNoisesFn Λ V γ samples j ≤ 1 := by
+  rw [Spec.sampleNoisesFn]; exact varNoiseFn_le_one hΛ hγ _
+
+/-! ### `Z_low` / `Z_high` are well-posed thresholds -/
+
+/-- **`Z_low` is a genuine fraction in `[0,1]` (lower bound).** -/
+theorem zLowFn_nonneg {n N : Nat} {Λ : Fin n → ℝ} (hΛ : ∀ i, 0 ≤ Λ i) {γ : ℝ} (hγ : 0 < γ)
+    (V : Fin n → Fin n → ℝ) (samples : Fin N → Fin n → ℝ) (hN : 0 < N) :
+    0 ≤ Spec.zLowFn Λ V γ samples := by
+  rw [Spec.zLowFn]
+  exact kthSmallestFn_nonneg _ (fun j => sampleNoisesFn_nonneg hΛ hγ V samples j) (zLowIdx_lt hN)
+
+/-- **`Z_low ≤ 1`.** -/
+theorem zLowFn_le_one {n N : Nat} {Λ : Fin n → ℝ} (hΛ : ∀ i, 0 ≤ Λ i) {γ : ℝ} (hγ : 0 < γ)
+    (V : Fin n → Fin n → ℝ) (samples : Fin N → Fin n → ℝ) (hN : 0 < N) :
+    Spec.zLowFn Λ V γ samples ≤ 1 := by
+  rw [Spec.zLowFn]
+  exact kthSmallestFn_le_one _ (fun j => sampleNoisesFn_le_one hΛ hγ V samples j) (zLowIdx_lt hN)
+
+/-- **`Z_high` is a genuine fraction in `[0,1]` (lower bound).** -/
+theorem zHighFn_nonneg {n N : Nat} {Λ : Fin n → ℝ} (hΛ : ∀ i, 0 ≤ Λ i) {γ : ℝ} (hγ : 0 < γ)
+    (V : Fin n → Fin n → ℝ) (samples : Fin N → Fin n → ℝ) (hN : 0 < N) :
+    0 ≤ Spec.zHighFn Λ V γ samples := by
+  rw [Spec.zHighFn]
+  exact kthSmallestFn_nonneg _ (fun j => sampleNoisesFn_nonneg hΛ hγ V samples j) (zHighIdx_lt hN)
+
+/-- **`Z_high ≤ 1`.** -/
+theorem zHighFn_le_one {n N : Nat} {Λ : Fin n → ℝ} (hΛ : ∀ i, 0 ≤ Λ i) {γ : ℝ} (hγ : 0 < γ)
+    (V : Fin n → Fin n → ℝ) (samples : Fin N → Fin n → ℝ) (hN : 0 < N) :
+    Spec.zHighFn Λ V γ samples ≤ 1 := by
+  rw [Spec.zHighFn]
+  exact kthSmallestFn_le_one _ (fun j => sampleNoisesFn_le_one hΛ hγ V samples j) (zHighIdx_lt hN)
+
+/-- **`Z_low ≤ Z_high`.** The lower percentile of the null distribution never exceeds the upper one —
+the order-statistic monotonicity over the shared sorted noises. The test `Z_low ≤ noise ≤ Z_high` it
+implies (the "no anomaly" window of `_GraphDiscoveryMain.py`) is therefore non-degenerate. -/
+theorem zLowFn_le_zHighFn {n N : Nat} (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ)
+    (samples : Fin N → Fin n → ℝ) (hN : 0 < N) :
+    Spec.zLowFn Λ V γ samples ≤ Spec.zHighFn Λ V γ samples := by
+  rw [Spec.zLowFn, Spec.zHighFn]
+  exact kthSmallestFn_mono _ (zLowIdx_le_zHighIdx N) (zHighIdx_lt hN)
+
+/-! ### Tying the `Z_test` verdict back to the kernel chooser -/
+
+/-- **A significant edge is never anomalously noisy.** If the observed `noise` clears the lower tail
+(`noise < Z_low`), it also sits below the upper tail (`noise < Z_high`), because `Z_low ≤ Z_high`. -/
+theorem zSignificant_lt_zHighFn {n N : Nat} (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ)
+    (samples : Fin N → Fin n → ℝ) (hN : 0 < N) {obs : ℝ}
+    (hsig : obs < Spec.zLowFn Λ V γ samples) : obs < Spec.zHighFn Λ V γ samples :=
+  lt_of_lt_of_le hsig (zLowFn_le_zHighFn Λ V γ samples hN)
+
+/-- **The `Z_test` decision feeds the kernel chooser.** When the observed `noise` of the data clears the
+`Z_low` threshold (`zSignificantFn = true`), the single-kernel `MinNoiseKernelChooser` admits the edge —
+returns `some 0`. This connects the statistical layer (`Z_test`) to the discovery decision layer
+(`kernelChooserFn`, proved sound/complete above); the `noise ≤ 1` ceiling the chooser needs is the
+verified `varNoiseFn_le_one`. -/
+theorem zTest_admits_edge {n N : Nat} {Λ : Fin n → ℝ} (hΛ : ∀ i, 0 ≤ Λ i) {γ : ℝ} (hγ : 0 < γ)
+    (V : Fin n → Fin n → ℝ) (samples : Fin N → Fin n → ℝ) (ga : Fin n → ℝ)
+    (hsig : Spec.zSignificantFn (Spec.varNoiseFn Λ γ (Spec.projFn V ga))
+      (Spec.zLowFn Λ V γ samples) = true) :
+    Spec.kernelChooserFn (fun _ : Fin 1 => Spec.varNoiseFn Λ γ (Spec.projFn V ga))
+      (fun _ : Fin 1 => Spec.zLowFn Λ V γ samples) = some 0 := by
+  have hlt : Spec.varNoiseFn Λ γ (Spec.projFn V ga) < Spec.zLowFn Λ V γ samples := by
+    rw [Spec.zSignificantFn, ltBool_eq_decide] at hsig; exact of_decide_eq_true hsig
+  have hb : ∀ i : Fin 1, (fun _ : Fin 1 => Spec.varNoiseFn Λ γ (Spec.projFn V ga)) i ≤ 1 :=
+    fun _ => varNoiseFn_le_one hΛ hγ _
+  obtain ⟨s, hs, _, _⟩ := kernelChooserFn_eq_some
+    (noises := fun _ : Fin 1 => Spec.varNoiseFn Λ γ (Spec.projFn V ga))
+    (Zlows := fun _ : Fin 1 => Spec.zLowFn Λ V γ samples) hb (v := 0) hlt
+  rw [hs]; exact congrArg some (Fin.fin_one_eq_zero s)
+
 end Spec.Factorization
diff --git a/NN/Spec/Core/Tensor/Factorizations.lean b/NN/Spec/Core/Tensor/Factorizations.lean
index 6181663..27c6bc3 100644
--- a/NN/Spec/Core/Tensor/Factorizations.lean
+++ b/NN/Spec/Core/Tensor/Factorizations.lean
@@ -506,6 +506,73 @@ def varNoiseSpec {n : Nat} (evals : Tensor α (.dim n .scalar))
     (V : Tensor α (.dim n (.dim n .scalar))) (γ : α) (ga : Tensor α (.dim n .scalar)) : α :=
   varNoiseFn (toVecFn evals) γ (projFn (toMatFn V) (toVecFn ga))
 
+/-! ## CHD `Z_test`: null-distribution significance thresholds (`interpolatory.py`)
+
+`varNoiseFn` gives the `noise` of the *observed* data. To decide whether that noise is small *enough*
+to signal a real edge, CHD compares it against the null distribution of the **same** statistic under
+random data (`Z_test`): draw `N` standard-Gaussian samples, score each one's `noise`, sort them, and
+take the 5th and 95th percentiles as `Z_low`/`Z_high`. An edge is significant when the observed
+`noise < Z_low` — strictly below the null's lower tail.
+
+The random draws enter the spec as an explicit family `samples : Fin N → Fin n → α` (row `j` a draw);
+the randomness itself is the caller's, exactly as CHD threads a `jax.random` key into `Z_test`. The
+selection guarantees (each threshold lies in `[0,1]`, and `Z_low ≤ Z_high`) are proved over `ℝ` in
+[`NN.Proofs.Tensor.Basic.FactorizationsDecision`](../../../Proofs/Tensor/Basic/FactorizationsDecision.lean),
+reusing the verified `noise ∈ [0,1]` bound for *every* null sample. -/
+
+/-- The `Z_test` null sample of per-draw `noise` levels: each random draw `samples j` is projected
+(`Pga = Vᵀ·sⱼ`) and scored by the **same** `varNoiseFn` as the data. Mirrors
+`noises = vecdot(Pgas_coeffs, Pgas_coeffs) / vecdot(Pgas_coeffs, Pgas)` in `Z_test`. -/
+def sampleNoisesFn {n N : Nat} (Λ : Fin n → α) (V : Fin n → Fin n → α) (γ : α)
+    (samples : Fin N → Fin n → α) : Fin N → α :=
+  fun j => varNoiseFn Λ γ (projFn V (samples j))
+
+/-- `x ≤ y` as a `Bool` via the `Context` order (`x ≤ y` is `¬ y < x`); the sort key for the order
+statistics below. -/
+def leBool (x y : α) : Bool := !ltBool y x
+
+/-- The `k`-th smallest of a finite family `a : Fin N → α`, by sorting the values (ascending, via the
+`Context` order) and indexing. The `getD … 0` fallback is total; for `k < N` it is a genuine order
+statistic (see `kthSmallestFn_mem`/`_mono` in `FactorizationsDecision`). -/
+def kthSmallestFn {N : Nat} (a : Fin N → α) (k : Nat) : α :=
+  (((List.finRange N).map a).mergeSort leBool).getD k 0
+
+/-- The 5th-percentile index of an `N`-sample null distribution (`int(0.05·N)`, i.e. `⌊N/20⌋`). -/
+def zLowIdx (N : Nat) : Nat := N / 20
+
+/-- The 95th-percentile index of an `N`-sample null distribution (`int(0.95·N)`, i.e. `⌊19·N/20⌋`). -/
+def zHighIdx (N : Nat) : Nat := 19 * N / 20
+
+/-- CHD `Z_low`: the 5th percentile of the null `noise` distribution — the significance threshold an
+observed `noise` must beat (`B_samples[int(0.05·N)]`). -/
+def zLowFn {n N : Nat} (Λ : Fin n → α) (V : Fin n → Fin n → α) (γ : α)
+    (samples : Fin N → Fin n → α) : α :=
+  kthSmallestFn (sampleNoisesFn Λ V γ samples) (zLowIdx N)
+
+/-- CHD `Z_high`: the 95th percentile of the null `noise` distribution (`B_samples[int(0.95·N)]`). -/
+def zHighFn {n N : Nat} (Λ : Fin n → α) (V : Fin n → Fin n → α) (γ : α)
+    (samples : Fin N → Fin n → α) : α :=
+  kthSmallestFn (sampleNoisesFn Λ V γ samples) (zHighIdx N)
+
+/-- CHD's per-kernel significance verdict: the observed `noise` beats the null's lower tail
+(`noise < Z_low`), i.e. the edge is real. This is exactly the validity test `MinNoiseKernelChooser`
+folds over (`noises < Z_lows`). -/
+def zSignificantFn (noise Zlow : α) : Bool := ltBool noise Zlow
+
+/-- Tensor-level `Z_low` threshold from eigenpairs `(evals, V)`, regularization `γ`, and a family of
+null draws (the rows of `S : Tensor (.dim N (.dim n .scalar))`). -/
+def zLowSpec {n N : Nat} (evals : Tensor α (.dim n .scalar))
+    (V : Tensor α (.dim n (.dim n .scalar))) (γ : α)
+    (S : Tensor α (.dim N (.dim n .scalar))) : α :=
+  zLowFn (toVecFn evals) (toMatFn V) γ (toMatFn S)
+
+/-- Tensor-level `Z_high` threshold from eigenpairs `(evals, V)`, regularization `γ`, and a family of
+null draws (the rows of `S`). -/
+def zHighSpec {n N : Nat} (evals : Tensor α (.dim n .scalar))
+    (V : Tensor α (.dim n (.dim n .scalar))) (γ : α)
+    (S : Tensor α (.dim N (.dim n .scalar))) : α :=
+  zHighFn (toVecFn evals) (toMatFn V) γ (toMatFn S)
+
 /-! ## CHD mode kernels (`Modes/kernels.py`)
 
 Everything above takes the kernel matrix `K` as input, assuming it is symmetric positive-semidefinite.
diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
index 33f44d6..42db8eb 100644
--- a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
+++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
@@ -358,6 +358,45 @@ kernel, eigendecomposes it, and runs a `find_gamma`-style sweep that feeds the v
 several `γ` straight into `argMinFn`, selecting the least-noise regularization (the smallest `γ`, every
 swept noise landing in `[0,1]` as proved).
 
+# The `Z_test`: a null-distribution significance threshold
+
+The kernel chooser of the previous section asks whether the observed `noise` falls below a threshold
+`Z_low`. Where does `Z_low` come from? It is not a hand-set constant — it is the *5th percentile of the
+null distribution* of the very same `noise` statistic. CHD's `Z_test` (`interpolatory.py`) draws `N`
+standard-Gaussian samples, scores each one's `noise` with the same `varNoiseFn`, sorts the `N` values,
+and reads off the 5th and 95th percentiles as `Z_low` and `Z_high`. An edge is *significant* — a real
+dependency rather than fitting noise — when the observed `noise` falls below `Z_low`, i.e. strictly
+inside the lower tail of what random data would produce.
+
+[`NN.Proofs.Tensor.Basic.FactorizationsDecision`](https://github.com/lean-dojo/TorchLean/blob/main/NN/Proofs/Tensor/Basic/FactorizationsDecision.lean)
+formalizes this statistical layer. The spec `Spec.zLowFn` / `Spec.zHighFn` mirror `Z_test`: the random
+draws are an explicit family `samples : Fin N → Fin n → α` (the caller's randomness, exactly as CHD
+threads a PRNG key), each is scored by `Spec.sampleNoisesFn` (the *same* `varNoiseFn` again), and the
+percentiles are order statistics `Spec.kthSmallestFn` — the `k`-th entry of the list sorted by the
+`Context` order. Over `ℝ` that sort key is the real `≤` (`leBool_eq_le`), so Mathlib's
+`sortedLE_mergeSort` supplies sortedness and `mergeSort_perm` supplies membership.
+
+The payoff is that the threshold is *well-posed*, and provably so. The keystone is that the `[0,1]`
+bound governing the data noise governs *every null sample too* — it is the same `varNoiseFn`. So:
+
+- `sampleNoisesFn_nonneg` / `_le_one` — each of the `N` null noises is a genuine fraction in `[0,1]`,
+  directly from `varNoiseFn_nonneg` / `varNoiseFn_le_one`.
+- `zLowFn_nonneg` / `zLowFn_le_one` and the `zHighFn` pair — hence each percentile lies in `[0,1]`,
+  because an order statistic is one of the sampled values (`kthSmallestFn_mem`).
+- `zLowFn_le_zHighFn` — and `Z_low ≤ Z_high`, because a 5th percentile never exceeds a 95th. This is
+  *pure order-statistic monotonicity* (`kthSmallestFn_mono`): the underlying list is sorted ascending
+  and `⌊0.05 N⌋ ≤ ⌊0.95 N⌋`. The comparison window `Z_low ≤ noise ≤ Z_high` the loop uses (the
+  "no anomaly" band of `_GraphDiscoveryMain.py`) is therefore non-degenerate.
+
+Finally `zTest_admits_edge` ties the statistical verdict back to the decision layer: when the observed
+`noise` clears `Z_low` (`zSignificantFn = true`), the single-kernel `MinNoiseKernelChooser` admits the
+edge — returns `some 0`. The `noise ≤ 1` ceiling that proof needs is, once more, the verified
+`varNoiseFn_le_one`. The whole statistical decision thus rests on the one spectral bound proved three
+sections ago. The `Discovery` example exhibits the layer end-to-end: it builds the null distribution
+from a real eigendecomposition, checks `0 ≤ Z_low ≤ Z_high ≤ 1`, shows data aligned with the *dominant*
+eigenvector (smallest shrinkage noise) clears the lower tail and is flagged significant, and confirms a
+high noise — and a noise sitting at the upper tail — are both correctly rejected.
+
 # The a-posteriori residual certificate
 
 For the iterative routines, the replacement for an impossible a-priori convergence proof is an exact
@@ -549,11 +588,16 @@ is now PSD-verified for *all three* CHD modes — linear, quadratic, and Gaussia
 the *discovery decision layer* on top — the kernel chooser, the activation prune step, the mode
 chooser, and the stopping rule — is now proved sound and complete, with the chooser's correctness
 resting directly on the verified `noise ≤ 1` bound, so the structural decisions are proved selections
-over a statistic whose range was itself proved.
-
-So the CHD foundation is complete, from the kernel build through the regularized solve and the noise
-statistic up to the graph-structure decisions. The two remaining open items are both narrow and
-deliberately scoped: the cyclic-Jacobi convergence *rate* (captured exactly by the a-posteriori
-residual certificate, never by `sorry`) and the `Z_test`'s Gaussian sampling and percentiles — one a
-proof-only gap on a quantity CHD does not need to *run*, the other statistical rather than algebraic
-and exercised numerically.
+over a statistic whose range was itself proved. The `Z_test` *significance thresholds* are now proved
+well-posed too: `Z_low` and `Z_high` are order statistics of the null `noise` distribution, each
+inheriting the `[0,1]` bound from the shared `varNoiseFn`, with `Z_low ≤ Z_high` by order-statistic
+monotonicity — and the verdict `noise < Z_low` is shown to feed `MinNoiseKernelChooser`.
+
+So the CHD foundation is complete, from the kernel build through the regularized solve, the noise
+statistic, and the `Z_test` thresholds up to the graph-structure decisions. The two remaining open
+items are both narrow and deliberately scoped: the cyclic-Jacobi convergence *rate* (captured exactly
+by the a-posteriori residual certificate, never by `sorry`), and the *distributional* content of the
+`Z_test` — that the draws are Gaussian and the empirical percentile is a calibrated confidence level
+(a probability-theory statement needing `Mathlib.Probability`, distinct from the now-proved
+order-statistic well-posedness). One is a proof-only gap on a quantity CHD does not need to *run*; the
+other is statistical rather than algebraic and exercised numerically.

From b32e2c550084f889767339217400f92827c79034 Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Sun, 31 May 2026 20:54:50 -0700
Subject: [PATCH 18/22] Formalize the CHD Z_test distributional layer
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Close the provable half of the Z_test's distributional content (the part
left open after the well-posed thresholds), in a new sorry-free module
NN/Proofs/Tensor/Basic/FactorizationsZTest.lean:

* Finite-sample calibration (counting, no probability theory): a sorted list
  has at most k entries below its k-th element, so via countP permutation-
  invariance over mergeSort the threshold's own empirical false-positive rate
  is bounded exactly — at most ⌊N/20⌋ ≈ 5% of the N null draws fall below
  Z_low (zLow_null_exceedance_le) and ≈ 5% rise above Z_high
  (zHigh_null_exceedance_le). This is the exact, non-asymptotic guarantee.

* The Gaussian null law (measure theory): modelling the draws as i.i.d.
  standard Gaussian (nullGaussian = Measure.pi (gaussianReal 0 1)), the
  per-draw noise is measurable (measurable_noiseMap), so its pushforward
  noiseLaw is a probability measure (IsProbabilityMeasure) concentrated on
  [0,1] (noiseLaw_Icc_eq_one) — the verified varNoiseFn ∈ [0,1] bound lifted
  to the law. sampleNoisesFn_eq_noiseMap ties CHD's executable statistic to
  the model.

Scope honesty: the asymptotic frontier — empirical→true quantile convergence
(Glivenko–Cantelli/DKW) and the exchangeability rank rate k/(N+1) — needs an
empirical-process theory absent from Mathlib v4.30.0, and is flagged rather
than stubbed with sorry.

Discovery.lean adds positive/negative #eval checks (exactly 1/20 below Z_low,
0 above Z_high, 19/20 below the slack Z_high as a negative control); the
blueprint Ch4 chapter gains a "Z_test distributional layer" section and the
"What remains" note now scopes only the asymptotic half.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 NN/Examples/Factorization.lean                |   9 +-
 NN/Examples/Factorization/Discovery.lean      |  43 ++++
 NN/Proofs/Tensor/Basic.lean                   |   1 +
 .../Tensor/Basic/FactorizationsZTest.lean     | 234 ++++++++++++++++++
 .../Ch4_Verification/Factorizations.lean      |  62 ++++-
 5 files changed, 339 insertions(+), 10 deletions(-)
 create mode 100644 NN/Proofs/Tensor/Basic/FactorizationsZTest.lean

diff --git a/NN/Examples/Factorization.lean b/NN/Examples/Factorization.lean
index 386e356..ec34bbf 100644
--- a/NN/Examples/Factorization.lean
+++ b/NN/Examples/Factorization.lean
@@ -100,7 +100,14 @@ factorization misbehaves.
   `Z_low`/`Z_high` (5th/95th percentiles of the per-sample `noise`) are well-posed
   (`0 ≤ Z_low ≤ Z_high ≤ 1`), data aligned with the dominant eigenvector clears the lower tail
   (`noise < Z_low`, **positive**), and a high noise / a noise at the upper tail are rejected
-  (**negative controls**) — feeding `MinNoiseKernelChooser` exactly as in CHD.
+  (**negative controls**) — feeding `MinNoiseKernelChooser` exactly as in CHD. A final
+  **distributional** sub-block checks the *finite-sample calibration* proved in
+  `FactorizationsZTest`: across the `N = 20` null draws, at most `⌊N/20⌋ ≈ 5%` fall below `Z_low`
+  (`zLow_null_exceedance_le`, here exactly `1/20`) and at most `≈ 5%` rise above `Z_high`
+  (`zHigh_null_exceedance_le`, here `0`); a **negative control** confirms the slack `Z_high`
+  threshold admits `≈ 95%` of the draws, so the `5%` calibration is specific to `Z_low`. (The
+  companion measure-theoretic fact — the i.i.d.-Gaussian null law is a probability measure on
+  `[0,1]`, `noiseLaw_Icc_eq_one` — is noncomputable and lives in the proofs.)
 
 Both **positive** checks (a valid factorization reconstructs to `err ≈ 0`) and **negative controls**
 (the same metric reports a large error / `NaN` when a hypothesis is violated) are included, so a
diff --git a/NN/Examples/Factorization/Discovery.lean b/NN/Examples/Factorization/Discovery.lean
index 4a40538..2acfdd9 100644
--- a/NN/Examples/Factorization/Discovery.lean
+++ b/NN/Examples/Factorization/Discovery.lean
@@ -250,4 +250,47 @@ def obsSignal : Float := Spec.varNoiseFn (Spec.toVecFn evals) gammaZ (Spec.projF
 #eval assertTrue "non-significant kernel is rejected (chooser → none, code -1)"
   (chooserCode (Spec.kernelChooserFn (fun _ : Fin 1 => (0.50 : Float)) (fun _ : Fin 1 => 0.20)) == -1)
 
+/-! ### The distributional layer: finite-sample calibration of the thresholds
+
+The `noise` of each null draw, scored by the same functional as the data (`sampleNoisesFn`). The
+percentile thresholds carry a *non-asymptotic* false-positive guarantee, proved in
+`FactorizationsZTest`: at most `⌊N/20⌋ ≈ 5%` of the `N` draws fall below `Z_low`
+(`zLow_null_exceedance_le`) and at most `N-1-⌊19N/20⌋ ≈ 5%` fall above `Z_high`
+(`zHigh_null_exceedance_le`). On the measure side, modelling the draws as i.i.d. standard Gaussian
+makes the null law a probability measure on `[0,1]` (`noiseLaw_Icc_eq_one`); that part is
+noncomputable, so it is exercised by the proofs rather than `#eval`. -/
+
+/-- The per-draw `noise` levels of the `Z_test` null sample (`N = 20` draws). -/
+def zNullNoises : Fin 20 → Float :=
+  Spec.sampleNoisesFn (Spec.toVecFn evals) (Spec.toMatFn V) gammaZ zSamples
+
+/-- How many of the 20 null draws score strictly below a threshold (the empirical lower-tail count,
+using the very `ltBool` comparator the `Z_test` decision uses). -/
+def countBelow (thr : Float) : Nat :=
+  ((List.finRange 20).filter (fun j => Spec.ltBool (zNullNoises j) thr)).length
+
+/-- How many of the 20 null draws score strictly above a threshold (the empirical upper-tail count). -/
+def countAbove (thr : Float) : Nat :=
+  ((List.finRange 20).filter (fun j => Spec.ltBool thr (zNullNoises j))).length
+
+#eval IO.println s!"null-draw tail counts: below Z_low = {countBelow zLow} (≤ ⌊20/20⌋ = {Spec.zLowIdx 20}), \
+  above Z_high = {countAbove zHigh} (≤ 19 - {Spec.zHighIdx 20} = {20 - 1 - Spec.zHighIdx 20}), \
+  below Z_high = {countBelow zHigh}"
+
+-- Positive — `zLow_null_exceedance_le`: at most `⌊N/20⌋` (≈ 5%) of the null draws beat `Z_low`, i.e.
+-- the threshold's own empirical false-positive rate is bounded by the 5th-percentile rank.
+#eval assertTrue "≤ 5% of null draws fall below Z_low (zLow_null_exceedance_le)"
+  (decide (countBelow zLow ≤ Spec.zLowIdx 20))
+
+-- Positive — `zHigh_null_exceedance_le`: at most `N-1-⌊19N/20⌋` (≈ 5%) of the null draws exceed
+-- `Z_high`. With `N = 20`, `Z_high` is the top order statistic, so nothing strictly exceeds it.
+#eval assertTrue "≤ 5% of null draws rise above Z_high (zHigh_null_exceedance_le)"
+  (decide (countAbove zHigh ≤ 20 - 1 - Spec.zHighIdx 20))
+
+-- Negative control — the *slack* upper threshold `Z_high` admits far more than 5% of the null mass
+-- below it (≈ 95%), so the 5% lower-tail calibration is specific to `Z_low`, not an artifact of any
+-- threshold: a test against `Z_high` would over-reject the null.
+#eval assertTrue "Z_high is a slack threshold: > 5% of null draws fall below it (calibration is specific to Z_low)"
+  (decide (Spec.zLowIdx 20 < countBelow zHigh))
+
 end NN.Examples.Factorization.Discovery
diff --git a/NN/Proofs/Tensor/Basic.lean b/NN/Proofs/Tensor/Basic.lean
index 6f094ff..b81aa8a 100644
--- a/NN/Proofs/Tensor/Basic.lean
+++ b/NN/Proofs/Tensor/Basic.lean
@@ -14,6 +14,7 @@ public import NN.Proofs.Tensor.Basic.FactorizationsReconstruction
 public import NN.Proofs.Tensor.Basic.FactorizationsSolve
 public import NN.Proofs.Tensor.Basic.FactorizationsVariational
 public import NN.Proofs.Tensor.Basic.FactorizationsDecision
+public import NN.Proofs.Tensor.Basic.FactorizationsZTest
 public import NN.Proofs.Tensor.Basic.FactorizationsKernels
 public import NN.Proofs.Tensor.Basic.FactorizationsOrthonormal
 public import NN.Proofs.Tensor.Basic.FactorizationsJacobi
diff --git a/NN/Proofs/Tensor/Basic/FactorizationsZTest.lean b/NN/Proofs/Tensor/Basic/FactorizationsZTest.lean
new file mode 100644
index 0000000..775bdce
--- /dev/null
+++ b/NN/Proofs/Tensor/Basic/FactorizationsZTest.lean
@@ -0,0 +1,234 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Proofs.Tensor.Basic.FactorizationsDecision
+public import NN.Proofs.Tensor.Basic.FactorizationsOrthonormal
+public import Mathlib.Probability.Distributions.Gaussian.Real
+public import Mathlib.MeasureTheory.Constructions.Pi
+public import Mathlib.MeasureTheory.Measure.Map
+public import Mathlib.MeasureTheory.Measure.Typeclasses.Probability
+
+/-!
+# CHD `Z_test`: the distributional layer (`interpolatory.py`)
+
+[`FactorizationsDecision`](./FactorizationsDecision.lean) proved the `Z_test` thresholds are
+*well-posed* — each of `Z_low`/`Z_high` is a genuine order statistic of the per-sample `noise`,
+lying in `[0,1]`, with `Z_low ≤ Z_high`, and the chooser consumes `Z_low`. That is the
+*deterministic* half of the test. This file closes the **distributional** half, in two pieces that
+are honestly provable over Mathlib v4.30.0:
+
+* **Finite-sample calibration (counting).** The operational meaning of "`Z_low` is the 5th
+  percentile" is that the threshold's *own* empirical false-positive rate is controlled: among the
+  `N` null draws, **at most `⌊N/20⌋ ≈ 5%`** score strictly below `Z_low`
+  (`zLow_null_exceedance_le`), and **at most `N-1-⌊19N/20⌋ ≈ 5%`** score strictly above `Z_high`
+  (`zHigh_null_exceedance_le`). These are exact consequences of order-statistic sortedness — a
+  sorted list has at most `k` entries below its `k`-th element — and need no probability theory.
+
+* **The Gaussian null law (measure theory).** CHD draws the null samples i.i.d. standard Gaussian.
+  We model that draw as `nullGaussian n`, the product of `n` standard normals on `Fin n → ℝ`
+  (`Measure.pi (fun _ => gaussianReal 0 1)`), a genuine probability measure. The per-sample `noise`
+  is a *measurable* map (`measurable_noiseMap`), so its **null law** `noiseLaw` is a probability
+  measure (`IsProbabilityMeasure`) **supported in `[0,1]`** (`noiseLaw_Icc_eq_one`) — the verified
+  `varNoiseFn ∈ [0,1]` bound, lifted to the law. `sampleNoisesFn_eq_noiseMap` identifies CHD's
+  executable per-draw statistic with this measurable map, tying the counting layer to the measure.
+
+Scope honesty: what remains genuinely *research-grade* (beyond Mathlib v4.30.0) is the
+*asymptotic* calibration — that the empirical 5%/95% percentiles converge to the true quantiles of
+`noiseLaw` (Glivenko–Cantelli / DKW), and that, under exchangeability of a fresh null draw with the
+sample, the false-positive rate is exactly the rank level `k/(N+1)`. Those need an empirical-process
+theory Mathlib does not yet carry; we do not stub them with `sorry`. The finite-sample false-positive
+*bound* proved here is the exact, non-asymptotic statement the test actually guarantees.
+-/
+
+@[expose] public section
+
+namespace Spec.Factorization
+
+open Spec.Factorization.Reconstruction
+open MeasureTheory ProbabilityTheory
+
+variable {n : Nat}
+
+/-! ## Finite-sample calibration: order-statistic tail counts
+
+A sorted list has at most `k` entries strictly below its `k`-th element, and at most
+`length - 1 - k` entries strictly above it. Pushed through the sort-is-a-permutation invariance of
+`countP`, this bounds how many of the `N` null draws fall on the wrong side of a percentile
+threshold — the test's empirical false-positive rate. -/
+
+/-- In an ascending-sorted list, at most `k` entries are strictly below a cutoff `c ≤ s[k]`: every
+entry from index `k` onward is `≥ s[k] ≥ c`, so all sub-`c` entries live in the length-`k` prefix. -/
+private theorem sortedLE_countP_lt_le {s : List ℝ} (hs : s.SortedLE) {k : Nat}
+    (hk : k < s.length) {c : ℝ} (hc : c ≤ s[k]) :
+    s.countP (fun x => decide (x < c)) ≤ k := by
+  conv_lhs => rw [← List.take_append_drop k s, List.countP_append]
+  have hge : ∀ x ∈ s.drop k, c ≤ x := by
+    intro x hx
+    rw [List.mem_iff_getElem] at hx
+    obtain ⟨i, hi, rfl⟩ := hx
+    rw [List.getElem_drop]
+    exact le_trans hc (hs.getElem_le_getElem_of_le (Nat.le_add_right k i))
+  have hdrop : (s.drop k).countP (fun x => decide (x < c)) = 0 := by
+    rw [List.countP_eq_zero]
+    intro x hx
+    simp only [decide_eq_true_eq]
+    exact not_lt.mpr (hge x hx)
+  have htake : (s.take k).countP (fun x => decide (x < c)) ≤ k := by
+    refine le_trans List.countP_le_length ?_
+    rw [List.length_take]; exact Nat.min_le_left _ _
+  rw [hdrop, Nat.add_zero]; exact htake
+
+/-- In an ascending-sorted list, at most `length - 1 - k` entries are strictly above a cutoff
+`s[k] ≤ c`: every entry up to index `k` is `≤ s[k] ≤ c`, so all super-`c` entries live in the
+length-`(length-(k+1))` suffix. -/
+private theorem sortedLE_countP_gt_le {s : List ℝ} (hs : s.SortedLE) {k : Nat}
+    (hk : k < s.length) {c : ℝ} (hc : s[k] ≤ c) :
+    s.countP (fun x => decide (c < x)) ≤ s.length - 1 - k := by
+  conv_lhs => rw [← List.take_append_drop (k + 1) s, List.countP_append]
+  have htake : (s.take (k + 1)).countP (fun x => decide (c < x)) = 0 := by
+    rw [List.countP_eq_zero]
+    intro x hx
+    rw [List.mem_iff_getElem] at hx
+    obtain ⟨i, hi, rfl⟩ := hx
+    have hi' : i < k + 1 := by
+      rw [List.length_take] at hi; exact lt_of_lt_of_le hi (Nat.min_le_left _ _)
+    rw [List.getElem_take]
+    simp only [decide_eq_true_eq]
+    exact not_lt.mpr (le_trans (hs.getElem_le_getElem_of_le (Nat.lt_succ_iff.mp hi')) hc)
+  have hdrop : (s.drop (k + 1)).countP (fun x => decide (c < x)) ≤ s.length - (k + 1) := by
+    refine le_trans List.countP_le_length ?_
+    rw [List.length_drop]
+  have heq : s.length - (k + 1) = s.length - 1 - k := by rw [Nat.sub_sub, Nat.add_comm]
+  rw [htake, Nat.zero_add]
+  exact le_trans hdrop (le_of_eq heq)
+
+/-- The ascending-`(· ≤ ·)` mergeSort of the family `a`, whose `k`-th entry is `kthSmallestFn a k`. -/
+private theorem kthSmallestFn_eq_getElem {N : Nat} (a : Fin N → ℝ) {k : Nat}
+    (hks : k < (((List.finRange N).map a).mergeSort (· ≤ ·)).length) :
+    Spec.kthSmallestFn a k = (((List.finRange N).map a).mergeSort (· ≤ ·))[k] := by
+  rw [kthSmallestFn_eq_sorted_getD, List.getD_eq_getElem?_getD, List.getElem?_eq_getElem hks,
+    Option.getD_some]
+
+/-- **At most `k` of the family's values are strictly below its `k`-th order statistic.** Sorting is
+a permutation (so `countP` is unchanged) and the sorted list has at most `k` entries below `s[k]`. -/
+theorem kthSmallestFn_strictBelow_count_le {N : Nat} (a : Fin N → ℝ) {k : Nat} (hk : k < N) :
+    ((List.finRange N).map a).countP (fun x => decide (x < Spec.kthSmallestFn a k)) ≤ k := by
+  have hlen : (((List.finRange N).map a).mergeSort (· ≤ ·)).length = N := by
+    rw [List.length_mergeSort, List.length_map, List.length_finRange]
+  have hks : k < (((List.finRange N).map a).mergeSort (· ≤ ·)).length := by rw [hlen]; exact hk
+  rw [kthSmallestFn_eq_getElem a hks,
+    ← List.Perm.countP_eq _ (List.mergeSort_perm ((List.finRange N).map a) (· ≤ ·))]
+  exact sortedLE_countP_lt_le List.sortedLE_mergeSort hks (le_refl _)
+
+/-- **At most `N-1-k` of the family's values are strictly above its `k`-th order statistic.** -/
+theorem kthSmallestFn_strictAbove_count_le {N : Nat} (a : Fin N → ℝ) {k : Nat} (hk : k < N) :
+    ((List.finRange N).map a).countP (fun x => decide (Spec.kthSmallestFn a k < x)) ≤ N - 1 - k := by
+  have hlen : (((List.finRange N).map a).mergeSort (· ≤ ·)).length = N := by
+    rw [List.length_mergeSort, List.length_map, List.length_finRange]
+  have hks : k < (((List.finRange N).map a).mergeSort (· ≤ ·)).length := by rw [hlen]; exact hk
+  rw [kthSmallestFn_eq_getElem a hks,
+    ← List.Perm.countP_eq _ (List.mergeSort_perm ((List.finRange N).map a) (· ≤ ·))]
+  have hcount := sortedLE_countP_gt_le List.sortedLE_mergeSort hks (le_refl _)
+  rw [hlen] at hcount
+  exact hcount
+
+/-! ### The `Z_test` empirical false-positive bounds -/
+
+/-- **`Z_low` controls the lower-tail false-positive rate.** At most `⌊N/20⌋ ≈ 5%` of the `N` null
+draws score strictly below the empirical `Z_low` threshold — exactly the rank that defines the 5th
+percentile. This is the finite-sample, non-asymptotic guarantee the significance test carries. -/
+theorem zLow_null_exceedance_le {n N : Nat} (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ)
+    (samples : Fin N → Fin n → ℝ) (hN : 0 < N) :
+    ((List.finRange N).map (Spec.sampleNoisesFn Λ V γ samples)).countP
+        (fun x => decide (x < Spec.zLowFn Λ V γ samples)) ≤ Spec.zLowIdx N := by
+  rw [Spec.zLowFn]
+  exact kthSmallestFn_strictBelow_count_le _ (zLowIdx_lt hN)
+
+/-- **`Z_high` controls the upper-tail false-positive rate.** At most `N-1-⌊19N/20⌋ ≈ 5%` of the `N`
+null draws score strictly above the empirical `Z_high` threshold. -/
+theorem zHigh_null_exceedance_le {n N : Nat} (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ)
+    (samples : Fin N → Fin n → ℝ) (hN : 0 < N) :
+    ((List.finRange N).map (Spec.sampleNoisesFn Λ V γ samples)).countP
+        (fun x => decide (Spec.zHighFn Λ V γ samples < x)) ≤ N - 1 - Spec.zHighIdx N := by
+  rw [Spec.zHighFn]
+  exact kthSmallestFn_strictAbove_count_le _ (zHighIdx_lt hN)
+
+/-! ## The Gaussian null law
+
+CHD's `Z_test` draws each null sample i.i.d. standard Gaussian. We model one draw as
+`nullGaussian n`: the product of `n` standard normals on `Fin n → ℝ`. The per-sample `noise` is a
+measurable map, so its pushforward — the null law of the statistic — is a probability measure
+concentrated on `[0,1]`. -/
+
+noncomputable section
+
+/-- The per-draw `noise` statistic as a map on raw draws `s : Fin n → ℝ` (one null sample):
+`noiseMap Λ V γ s = varNoiseFn Λ γ (Vᵀ·s)`, the same functional `Z_test` scores each draw with. -/
+noncomputable def noiseMap (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) : (Fin n → ℝ) → ℝ :=
+  fun s => Spec.varNoiseFn Λ γ (Spec.projFn V s)
+
+/-- CHD's executable per-draw null statistic is exactly `noiseMap` applied to that draw. This bridges
+the counting layer (`sampleNoisesFn`) to the measure-theoretic model. -/
+theorem sampleNoisesFn_eq_noiseMap {N : Nat} (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ)
+    (samples : Fin N → Fin n → ℝ) (j : Fin N) :
+    Spec.sampleNoisesFn Λ V γ samples j = noiseMap Λ V γ (samples j) := rfl
+
+/-- A `dotFn` whose entries each depend measurably on a parameter is measurable in that parameter
+(it is the finite sum `∑ₖ f k · g k`). -/
+private theorem measurable_dotFn₂ {β : Type*} [MeasurableSpace β] {f g : β → Fin n → ℝ}
+    (hf : ∀ k, Measurable (fun b => f b k)) (hg : ∀ k, Measurable (fun b => g b k)) :
+    Measurable (fun b => Spec.dotFn (f b) (g b)) := by
+  simp_rw [fun b => dotFn_eq_sum (f b) (g b)]
+  exact Finset.measurable_sum _ (fun k _ => (hf k).mul (hg k))
+
+/-- **The per-draw `noise` statistic is measurable.** It is a ratio of finite sums of products of the
+(measurable) draw coordinates, hence Borel-measurable. -/
+theorem measurable_noiseMap (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) :
+    Measurable (noiseMap Λ V γ) := by
+  have hproj : ∀ k, Measurable (fun s : Fin n → ℝ => Spec.projFn V s k) := fun k =>
+    measurable_dotFn₂ (fun _ => measurable_const) (fun j => measurable_pi_apply j)
+  have hpc : ∀ k, Measurable
+      (fun s : Fin n → ℝ => Spec.projFn V s k * Spec.ridgeCoeffFn Λ γ k) := fun k =>
+    (hproj k).mul measurable_const
+  exact (measurable_dotFn₂ hpc hpc).div (measurable_dotFn₂ hpc hproj)
+
+/-- The standard Gaussian draw of a single `Z_test` null sample: `n` i.i.d. standard normals on
+`Fin n → ℝ`. A genuine probability measure (the product of probability measures). -/
+noncomputable def nullGaussian (n : Nat) : Measure (Fin n → ℝ) :=
+  Measure.pi (fun _ : Fin n => gaussianReal 0 1)
+
+instance instIsProbabilityMeasureNullGaussian (n : Nat) : IsProbabilityMeasure (nullGaussian n) := by
+  unfold nullGaussian; infer_instance
+
+/-- The **null law** of the `Z_test` statistic: the pushforward of the standard-Gaussian draw under
+the per-sample `noise`. This is the distribution `Z_low`/`Z_high` are percentiles of. -/
+noncomputable def noiseLaw (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) : Measure ℝ :=
+  (nullGaussian n).map (noiseMap Λ V γ)
+
+/-- **The null law is a probability measure** (pushforward of one under a measurable map). -/
+instance instIsProbabilityMeasureNoiseLaw (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) :
+    IsProbabilityMeasure (noiseLaw Λ V γ) :=
+  Measure.isProbabilityMeasure_map (measurable_noiseMap Λ V γ).aemeasurable
+
+/-- **The null `noise` distribution lives entirely in `[0,1]`.** Every draw's statistic is in `[0,1]`
+(the verified `varNoiseFn_nonneg`/`varNoiseFn_le_one`), so the law assigns full mass to `[0,1]` —
+the percentiles `Z_low`/`Z_high` are therefore percentiles of a genuine `[0,1]`-valued random
+variable. -/
+theorem noiseLaw_Icc_eq_one {Λ : Fin n → ℝ} (hΛ : ∀ i, 0 ≤ Λ i) {γ : ℝ} (hγ : 0 < γ)
+    (V : Fin n → Fin n → ℝ) :
+    noiseLaw Λ V γ (Set.Icc 0 1) = 1 := by
+  rw [noiseLaw, Measure.map_apply (measurable_noiseMap Λ V γ) measurableSet_Icc]
+  have hpre : noiseMap Λ V γ ⁻¹' Set.Icc 0 1 = Set.univ := by
+    ext s
+    simp only [Set.mem_preimage, Set.mem_Icc, Set.mem_univ, iff_true]
+    exact ⟨varNoiseFn_nonneg hΛ hγ _, varNoiseFn_le_one hΛ hγ _⟩
+  rw [hpre, measure_univ]
+
+end
+
+end Spec.Factorization
diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
index 42db8eb..cd962d4 100644
--- a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
+++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
@@ -218,7 +218,9 @@ through its *spectrum*: replacing the data by `ga = V z` makes `V` cancel, `proj
 (`projFn_mulVec_self`), so `varNoiseFn Λ γ (projFn V (V z)) = varNoiseFn Λ γ z`
 (`varNoiseFn_projFn_mulVec`). This is the deterministic content of "the `Z_test` null distribution
 depends only on the eigenvalues"; the *distributional* step — Gaussian sampling and the 5%/95%
-percentiles — is statistical rather than algebraic and is left to runtime, exercised numerically.
+percentiles — is taken up later (*The `Z_test` distributional layer*), where the finite-sample
+false-positive rate is bounded and the i.i.d.-Gaussian null law is shown to be a probability measure
+on `[0,1]`, leaving only the asymptotic quantile-consistency to runtime.
 
 The `Variational` example confirms all four on a concrete SPD kernel: `(K + γI)·yb = -ga` and
 `yb = -\texttt{solveRidgeSpec}` to machine precision, `noise ∈ [0,1]`, and the spectral invariance
@@ -397,6 +399,41 @@ from a real eigendecomposition, checks `0 ≤ Z_low ≤ Z_high ≤ 1`, shows dat
 eigenvector (smallest shrinkage noise) clears the lower tail and is flagged significant, and confirms a
 high noise — and a noise sitting at the upper tail — are both correctly rejected.
 
+# The `Z_test` distributional layer
+
+The section above proved the thresholds *well-posed*; what it deferred was the *distributional*
+question — what `Z_low` being "the 5th percentile" actually buys, and what it means that the draws
+are Gaussian. `FactorizationsZTest` closes that gap in two honestly-provable halves.
+
+*Finite-sample calibration (counting).* The operational promise of a 5th-percentile threshold is a
+bound on its *own* false-positive rate: of the `N` null draws, only a `5%` minority should beat it.
+That is exactly true, and exact (not asymptotic): in an ascending-sorted list at most `k` entries lie
+strictly below the `k`-th, so — since sorting is a permutation and `List.countP` is permutation-invariant
+— at most `⌊N/20⌋` of the null noises fall below `Z_low` (`zLow_null_exceedance_le`) and at most
+`N-1-⌊19N/20⌋` rise above `Z_high` (`zHigh_null_exceedance_le`). These rest on the same sortedness
+(`List.sortedLE_mergeSort`) that gave `Z_low ≤ Z_high`, now counted rather than compared. The `Discovery`
+example makes the numbers concrete: across `N = 20` draws exactly `1` (`= ⌊20/20⌋`) sits below `Z_low` and
+`0` above `Z_high`, while the slack `Z_high` admits `19` — a negative control showing the `5%` calibration
+is specific to `Z_low`, not an artifact of any threshold.
+
+*The Gaussian null law (measure theory).* CHD draws each null sample i.i.d. standard Gaussian. We model
+one draw as `nullGaussian n := Measure.pi (fun _ => gaussianReal 0 1)`, the product of `n` standard
+normals on `Fin n → ℝ` — a genuine probability measure. The per-draw statistic `noiseMap` (the same
+`varNoiseFn ∘ projFn` the data is scored by, identified with CHD's `sampleNoisesFn` by
+`sampleNoisesFn_eq_noiseMap`) is *measurable* (`measurable_noiseMap`: a ratio of finite sums of products
+of the draw coordinates), so its pushforward `noiseLaw` is a probability measure
+(`IsProbabilityMeasure`). And because every draw's noise lies in `[0,1]` — the verified
+`varNoiseFn_nonneg` / `varNoiseFn_le_one`, now lifted to the law — that law is *concentrated on `[0,1]`*:
+`noiseLaw_Icc_eq_one` shows it assigns full mass to `[0,1]`. So `Z_low`/`Z_high` are percentiles of a
+bona fide `[0,1]`-valued random variable, not of an unconstrained sample.
+
+*What is honestly left.* The remaining step is *asymptotic* calibration — that the empirical 5%/95%
+percentiles converge to the true quantiles of `noiseLaw` (Glivenko–Cantelli / DKW), and that under
+exchangeability of a fresh null draw with the sample the false-positive rate is exactly the rank level
+`k/(N+1)`. Both need an empirical-process theory Mathlib v4.30.0 does not carry, so they are stated as
+the open frontier, never stubbed with `sorry`. The finite-sample false-positive *bound* above is the
+exact, non-asymptotic statement the test actually guarantees.
+
 # The a-posteriori residual certificate
 
 For the iterative routines, the replacement for an impossible a-priori convergence proof is an exact
@@ -591,13 +628,20 @@ resting directly on the verified `noise ≤ 1` bound, so the structural decision
 over a statistic whose range was itself proved. The `Z_test` *significance thresholds* are now proved
 well-posed too: `Z_low` and `Z_high` are order statistics of the null `noise` distribution, each
 inheriting the `[0,1]` bound from the shared `varNoiseFn`, with `Z_low ≤ Z_high` by order-statistic
-monotonicity — and the verdict `noise < Z_low` is shown to feed `MinNoiseKernelChooser`.
+monotonicity — and the verdict `noise < Z_low` is shown to feed `MinNoiseKernelChooser`. The
+*distributional* layer of the `Z_test` is now partly proved too: the threshold's finite-sample
+false-positive rate is bounded exactly (`≤ 5%` of the null draws beat `Z_low`,
+`zLow_null_exceedance_le`; symmetrically for `Z_high`), and — modelling the draws as i.i.d. standard
+Gaussian — the null `noise` law is a genuine probability measure concentrated on `[0,1]`
+(`noiseLaw_Icc_eq_one`).
 
 So the CHD foundation is complete, from the kernel build through the regularized solve, the noise
-statistic, and the `Z_test` thresholds up to the graph-structure decisions. The two remaining open
-items are both narrow and deliberately scoped: the cyclic-Jacobi convergence *rate* (captured exactly
-by the a-posteriori residual certificate, never by `sorry`), and the *distributional* content of the
-`Z_test` — that the draws are Gaussian and the empirical percentile is a calibrated confidence level
-(a probability-theory statement needing `Mathlib.Probability`, distinct from the now-proved
-order-statistic well-posedness). One is a proof-only gap on a quantity CHD does not need to *run*; the
-other is statistical rather than algebraic and exercised numerically.
+statistic, and the `Z_test` thresholds up to the graph-structure decisions. The remaining open items
+are both narrow and deliberately scoped: the cyclic-Jacobi convergence *rate* (captured exactly by the
+a-posteriori residual certificate, never by `sorry`), and the *asymptotic* half of the `Z_test` — that
+the empirical 5%/95% percentiles converge to the true quantiles of the now-proved null law
+(Glivenko–Cantelli / DKW), and that an exchangeable fresh draw is rejected at exactly rank rate
+`k/(N+1)`. That needs an empirical-process theory `Mathlib.Probability` does not yet carry, distinct
+from the finite-sample false-positive bound and probability-measure facts already proved. One is a
+proof-only gap on a quantity CHD does not need to *run*; the other is the genuine statistical frontier,
+flagged rather than stubbed with `sorry`.

From 05ae7235a89f272f5dce2b7439c2f5d169198f3d Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Sun, 31 May 2026 21:49:06 -0700
Subject: [PATCH 19/22] Z_test asymptotic calibration step (a): the
 i.i.d.-draws scaffold
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Lift the single null draw (FactorizationsZTest) to the i.i.d. *sequence*
`nullSeqGaussian n := Measure.infinitePi (fun _ : ℕ => nullGaussian n)` on
`ℕ → (Fin n → ℝ)` — the infinite product, since `Measure.pi` is finite-index
only — and `nullNoise Λ V γ i ω := noiseMap Λ V γ (ω i)`, the same measurable
statistic read off coordinate `i`. Proven sorry-free in the new
`NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean`:

  - nullNoise_iIndepFun (← iIndepFun_infinitePi), pairwise
    nullNoise_pairwise_indepFun in the Pairwise (·⟂ᵢ[μ]·) shape SLLN takes;
  - nullNoise_hasLaw / nullNoise_identDistrib — each draw has the common law
    noiseLaw (via measurePreserving_eval_infinitePi + HasLaw.comp);
  - nullNoise_mem_Icc ([0,1]-valued) and integrable_nullNoise.

That is exactly the hint/hindep/hident triple `strong_law_ae_real` and the
Hoeffding tail consume, so plan steps (b)–(d) become applications of an
in-place scaffold. The *uniform* GC / DKW–Massart sharp constant and the
exchangeability rank rate k/(N+1) stay research-grade (flagged, not sorry'd).

Discovery.lean exercises the computable shadow — the empirical CDF
F̂_N(t) = #{i<N : noiseᵢ ≤ t}/N — checked a bona fide CDF (in [0,1], monotone,
saturating to 1, 0 below support) with a non-degeneracy negative control; six
new #eval checks pass. Aggregate example docstring and the Ch4 blueprint
(distributional-layer + "What remains") updated to record the scaffold.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 NN/Examples/Factorization.lean                |  10 +-
 NN/Examples/Factorization/Discovery.lean      |  51 +++++++
 NN/Proofs/Tensor/Basic.lean                   |   1 +
 .../Basic/FactorizationsZAsymptotic.lean      | 133 ++++++++++++++++++
 .../Ch4_Verification/Factorizations.lean      |  45 ++++--
 5 files changed, 229 insertions(+), 11 deletions(-)
 create mode 100644 NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean

diff --git a/NN/Examples/Factorization.lean b/NN/Examples/Factorization.lean
index ec34bbf..e2376fc 100644
--- a/NN/Examples/Factorization.lean
+++ b/NN/Examples/Factorization.lean
@@ -107,7 +107,15 @@ factorization misbehaves.
   (`zHigh_null_exceedance_le`, here `0`); a **negative control** confirms the slack `Z_high`
   threshold admits `≈ 95%` of the draws, so the `5%` calibration is specific to `Z_low`. (The
   companion measure-theoretic fact — the i.i.d.-Gaussian null law is a probability measure on
-  `[0,1]`, `noiseLaw_Icc_eq_one` — is noncomputable and lives in the proofs.)
+  `[0,1]`, `noiseLaw_Icc_eq_one` — is noncomputable and lives in the proofs.) A closing
+  **asymptotic-scaffold** sub-block corroborates `FactorizationsZAsymptotic` (step (a) of the
+  asymptotic-calibration plan): the i.i.d. null *sequence* `nullNoise` is proven independent,
+  identically distributed with law `noiseLaw`, `[0,1]`-valued and integrable (the SLLN's
+  `hint`/`hindep`/`hident`) — noncomputable, so the `#eval`s exercise its **computable shadow**, the
+  empirical CDF `F̂_N(t) = #{i<N : noiseᵢ ≤ t}/N`: checks that it is a bona fide CDF (in `[0,1]`,
+  monotone, saturating to `1` at the top of the `[0,1]` support, vanishing below `0`), with a
+  **negative control** that it is non-degenerate (rises strictly from `0` to `1`, carrying the
+  distributional content whose convergence to `cdf noiseLaw` is the next increment, step (b)).
 
 Both **positive** checks (a valid factorization reconstructs to `err ≈ 0`) and **negative controls**
 (the same metric reports a large error / `NaN` when a hypothesis is violated) are included, so a
diff --git a/NN/Examples/Factorization/Discovery.lean b/NN/Examples/Factorization/Discovery.lean
index 2acfdd9..7139904 100644
--- a/NN/Examples/Factorization/Discovery.lean
+++ b/NN/Examples/Factorization/Discovery.lean
@@ -293,4 +293,55 @@ def countAbove (thr : Float) : Nat :=
 #eval assertTrue "Z_high is a slack threshold: > 5% of null draws fall below it (calibration is specific to Z_low)"
   (decide (Spec.zLowIdx 20 < countBelow zHigh))
 
+/-! ### The asymptotic-calibration scaffold (step a): the empirical CDF of the null sample
+
+`FactorizationsZAsymptotic` lifts the single null draw to the i.i.d. *sequence* `nullNoise` under the
+product measure `nullSeqGaussian`, proving it independent (`nullNoise_iIndepFun`), identically
+distributed with the common law `noiseLaw` (`nullNoise_hasLaw`, `nullNoise_identDistrib`),
+`[0,1]`-valued (`nullNoise_mem_Icc`) and integrable (`integrable_nullNoise`) — exactly the three
+hypotheses (`hint`/`hindep`/`hident`) the strong law of large numbers consumes. That scaffold is
+*noncomputable* (a statement about an infinite product measure), so it cannot be `#eval`'d; what we
+exercise here is its **computable shadow**, the empirical CDF of the finite null sample
+`F̂_N(t) = #{i < N : noiseᵢ ≤ t} / N`. This is the very object whose almost-sure convergence to
+`cdf noiseLaw` *is* the SLLN application (step b of the plan, not yet formalized). At step (a) the
+i.i.d. sample alone already gives that `F̂_N` is a bona fide CDF — monotone, valued in `[0,1]`,
+saturating to `1` above the support and vanishing below it — which is what we check. -/
+
+/-- Empirical CDF of the `N = 20` null noises at a threshold `t`: the fraction of draws scoring `≤ t`
+(using the `leBool` comparator the order statistics already use). The computable shadow of the
+noncomputable `empCDF` whose consistency is step (b). -/
+def empCdf (t : Float) : Float :=
+  (((List.finRange 20).filter (fun j => Spec.leBool (zNullNoises j) t)).length).toFloat / 20.0
+
+#eval IO.println s!"empirical CDF of the null sample: F(0) = {empCdf 0.0}, F(Z_low) = {empCdf zLow}, \
+  F(Z_high) = {empCdf zHigh}, F(1) = {empCdf 1.0}"
+
+-- Positive — `F̂` is valued in `[0,1]` at every threshold (it is a fraction of the 20 draws), the
+-- finite-sample image of `nullNoise_mem_Icc` / the law `noiseLaw` being a probability measure.
+#eval assertTrue "empirical CDF lies in [0,1] across thresholds"
+  ([0.0, zLow, zHigh, 0.5, 1.0].all
+    (fun t => Spec.leBool 0.0 (empCdf t) && Spec.leBool (empCdf t) 1.0))
+
+-- Positive — `F̂` is monotone nondecreasing: more of the sample falls below a larger threshold.
+-- Since `Z_low ≤ Z_high`, `F̂(Z_low) ≤ F̂(Z_high)` — the empirical shadow of `monotone_cdf`.
+#eval assertTrue "empirical CDF is monotone: Z_low ≤ Z_high ⇒ F(Z_low) ≤ F(Z_high)"
+  (Spec.leBool (empCdf zLow) (empCdf zHigh))
+
+-- Positive — `F̂` saturates to `1`: every null noise lies in `[0,1]` (`nullNoise_mem_Icc`), so all
+-- 20 draws score `≤ 1` and the empirical CDF reaches its full mass there.
+#eval assertTrue "empirical CDF reaches 1 at t = 1 (all null noises ≤ 1, nullNoise_mem_Icc)"
+  (empCdf 1.0 == 1.0)
+
+-- Positive — `F̂` vanishes below the support: no null noise is negative (`nullNoise_mem_Icc`), so
+-- none scores `≤` a negative `t`.
+#eval assertTrue "empirical CDF is 0 below the support (no null noise < 0)"
+  (empCdf (-0.01) == 0.0)
+
+-- Negative control — `F̂` is *not* the constant function: it genuinely rises from `0` to `1` across
+-- the support, so it carries the distributional content the i.i.d. scaffold formalizes. A degenerate
+-- (point-mass) sample would have a flat-then-jump CDF; a sample with no spread would not separate
+-- these thresholds. This is what makes the consistency target of step (b) non-vacuous.
+#eval assertTrue "empirical CDF is non-degenerate: F(below support) < F(1) (carries distribution info)"
+  (Spec.ltBool (empCdf (-0.01)) (empCdf 1.0))
+
 end NN.Examples.Factorization.Discovery
diff --git a/NN/Proofs/Tensor/Basic.lean b/NN/Proofs/Tensor/Basic.lean
index b81aa8a..44fb923 100644
--- a/NN/Proofs/Tensor/Basic.lean
+++ b/NN/Proofs/Tensor/Basic.lean
@@ -15,6 +15,7 @@ public import NN.Proofs.Tensor.Basic.FactorizationsSolve
 public import NN.Proofs.Tensor.Basic.FactorizationsVariational
 public import NN.Proofs.Tensor.Basic.FactorizationsDecision
 public import NN.Proofs.Tensor.Basic.FactorizationsZTest
+public import NN.Proofs.Tensor.Basic.FactorizationsZAsymptotic
 public import NN.Proofs.Tensor.Basic.FactorizationsKernels
 public import NN.Proofs.Tensor.Basic.FactorizationsOrthonormal
 public import NN.Proofs.Tensor.Basic.FactorizationsJacobi
diff --git a/NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean b/NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean
new file mode 100644
index 0000000..0076404
--- /dev/null
+++ b/NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean
@@ -0,0 +1,133 @@
+/-
+Copyright (c) 2026 TorchLean
+Released under MIT license as described in the file LICENSE.
+Authors: TorchLean Team
+-/
+
+module
+
+public import NN.Proofs.Tensor.Basic.FactorizationsZTest
+public import Mathlib.Probability.Independence.InfinitePi
+public import Mathlib.MeasureTheory.Integral.IntegrableOn
+
+/-!
+# CHD `Z_test`: the asymptotic-calibration scaffold (step a)
+
+[`FactorizationsZTest`](./FactorizationsZTest.lean) modelled a *single* `Z_test` null draw as
+`nullGaussian n` (the product of `n` standard normals on `Fin n → ℝ`) and proved the per-draw
+`noise` statistic measurable, with null law `noiseLaw` a probability measure on `[0,1]`. That is
+enough for the finite-sample false-positive bound, but the *asymptotic* calibration —
+empirical 5%/95% percentiles converging to the true quantiles of `noiseLaw` — needs the whole
+**i.i.d. sequence** of null draws, not one of them.
+
+This file builds that sequence and proves it i.i.d.: the scaffold the asymptotic statements
+(Glivenko–Cantelli via the SLLN, the Hoeffding per-`t` rate) are applications of. Concretely:
+
+* **The sequence measure.** `nullSeqGaussian n := Measure.infinitePi (fun _ : ℕ => nullGaussian n)`
+  on `ℕ → (Fin n → ℝ)` — countably many independent copies of one null draw, a genuine probability
+  measure (`instIsProbabilityMeasureNullSeqGaussian`).
+
+* **The `i`-th draw's statistic.** `nullNoise Λ V γ i ω := noiseMap Λ V γ (ω i)` — the same
+  measurable `noiseMap` from `FactorizationsZTest`, read off the `i`-th coordinate.
+
+* **i.i.d.** The coordinate evaluations are independent under the product measure, and composing
+  with the measurable `noiseMap` preserves it (`nullNoise_iIndepFun`, and its pairwise corollary
+  `nullNoise_pairwise_indepFun` in the exact shape `strong_law_ae_real` consumes). Each draw is
+  measure-preservingly the same standard-Gaussian draw, so each has the *same* law `noiseLaw`
+  (`nullNoise_hasLaw`, `nullNoise_identDistrib`). Every draw's noise lies in `[0,1]`
+  (`nullNoise_mem_Icc`), hence is integrable (`integrable_nullNoise`).
+
+So `nullNoise` is an i.i.d. real sequence, each with law `noiseLaw`, valued in `[0,1]` and
+integrable — exactly the three hypotheses (`hint`/`hindep`/`hident`) the strong law of large
+numbers and the Hoeffding tail take. This scaffold is the only genuinely *new* measure-theory
+plumbing; the empirical-CDF consistency and concentration statements (steps b–d of the plan) are
+applications of it, and the *uniform* Glivenko–Cantelli / DKW–Massart sharp constant and the
+exchangeability rank rate remain genuinely research-grade (flagged, never `sorry`'d).
+-/
+
+@[expose] public section
+
+namespace Spec.Factorization
+
+open MeasureTheory ProbabilityTheory
+
+variable {n : Nat}
+
+noncomputable section
+
+/-- The i.i.d. null-draw sequence: countably many independent standard-Gaussian draws, one per
+`Z_test` null sample. The product of probability measures, hence itself a probability measure. -/
+noncomputable def nullSeqGaussian (n : Nat) : Measure (ℕ → Fin n → ℝ) :=
+  Measure.infinitePi (fun _ : ℕ => nullGaussian n)
+
+instance instIsProbabilityMeasureNullSeqGaussian (n : Nat) :
+    IsProbabilityMeasure (nullSeqGaussian n) := by
+  unfold nullSeqGaussian; infer_instance
+
+/-- The `i`-th null draw's `noise` statistic: `noiseMap` applied to the `i`-th coordinate of the
+i.i.d. sequence. As `i` ranges over `ℕ` this is the i.i.d. real sequence the asymptotic calibration
+runs on. -/
+noncomputable def nullNoise (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) :
+    ℕ → (ℕ → Fin n → ℝ) → ℝ :=
+  fun i ω => noiseMap Λ V γ (ω i)
+
+/-- Each draw's `noise` is measurable: the measurable `noiseMap` composed with a coordinate
+projection. -/
+theorem measurable_nullNoise (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (i : ℕ) :
+    Measurable (nullNoise Λ V γ i) :=
+  (measurable_noiseMap Λ V γ).comp (measurable_pi_apply i)
+
+/-- **The null-noise sequence is independent.** The coordinate evaluations of the product measure
+are independent (`iIndepFun_infinitePi`), and composing each with the measurable `noiseMap`
+preserves independence. -/
+theorem nullNoise_iIndepFun (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) :
+    iIndepFun (nullNoise Λ V γ) (nullSeqGaussian n) :=
+  iIndepFun_infinitePi (fun _ => measurable_noiseMap Λ V γ)
+
+/-- The pairwise-independence corollary, in the exact `Pairwise (· ⟂ᵢ[μ] ·) on X` shape the strong
+law of large numbers (`strong_law_ae_real`) consumes for its `hindep` hypothesis. -/
+theorem nullNoise_pairwise_indepFun (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) :
+    Pairwise (Function.onFun (· ⟂ᵢ[nullSeqGaussian n] ·) (nullNoise Λ V γ)) :=
+  fun _ _ hij => (nullNoise_iIndepFun Λ V γ).indepFun hij
+
+/-- **Each draw has the same law, `noiseLaw`.** The `i`-th coordinate projection is measure-
+preserving from the product measure onto a single `nullGaussian n` draw, and composing with the
+measurable `noiseMap` pushes that law forward to `noiseLaw` — independently of `i`. -/
+theorem nullNoise_hasLaw (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (i : ℕ) :
+    HasLaw (nullNoise Λ V γ i) (noiseLaw Λ V γ) (nullSeqGaussian n) := by
+  have hEval := (measurePreserving_eval_infinitePi (fun _ : ℕ => nullGaussian n) i).hasLaw
+  have hNoise : HasLaw (noiseMap Λ V γ) (noiseLaw Λ V γ) (nullGaussian n) :=
+    { aemeasurable := (measurable_noiseMap Λ V γ).aemeasurable
+      map_eq := rfl }
+  exact hNoise.fun_comp hEval
+
+/-- **The null-noise sequence is identically distributed.** Every draw has the common law
+`noiseLaw`, so any two are identically distributed — the `hident` hypothesis of the strong law,
+stated against the `0`-th draw. -/
+theorem nullNoise_identDistrib (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (i : ℕ) :
+    IdentDistrib (nullNoise Λ V γ i) (nullNoise Λ V γ 0) (nullSeqGaussian n) (nullSeqGaussian n) where
+  aemeasurable_fst := (measurable_nullNoise Λ V γ i).aemeasurable
+  aemeasurable_snd := (measurable_nullNoise Λ V γ 0).aemeasurable
+  map_eq := by rw [(nullNoise_hasLaw Λ V γ i).map_eq, (nullNoise_hasLaw Λ V γ 0).map_eq]
+
+/-- **Every draw's noise lies in `[0,1]`**, pointwise — the verified `varNoiseFn` bound applied to
+each coordinate. -/
+theorem nullNoise_mem_Icc {Λ : Fin n → ℝ} (hΛ : ∀ i, 0 ≤ Λ i) {γ : ℝ} (hγ : 0 < γ)
+    (V : Fin n → Fin n → ℝ) (i : ℕ) (ω : ℕ → Fin n → ℝ) :
+    nullNoise Λ V γ i ω ∈ Set.Icc (0 : ℝ) 1 :=
+  Set.mem_Icc.mpr ⟨varNoiseFn_nonneg hΛ hγ _, varNoiseFn_le_one hΛ hγ _⟩
+
+/-- **Each draw's noise is integrable** (bounded in `[0,1]` on the probability space) — the `hint`
+hypothesis of the strong law. -/
+theorem integrable_nullNoise {Λ : Fin n → ℝ} (hΛ : ∀ i, 0 ≤ Λ i) {γ : ℝ} (hγ : 0 < γ)
+    (V : Fin n → Fin n → ℝ) (i : ℕ) :
+    Integrable (nullNoise Λ V γ i) (nullSeqGaussian n) :=
+  Integrable.of_bound (measurable_nullNoise Λ V γ i).aestronglyMeasurable 1
+    (ae_of_all _ fun ω => by
+      have h := Set.mem_Icc.mp (nullNoise_mem_Icc hΛ hγ V i ω)
+      rw [Real.norm_eq_abs, abs_le]
+      exact ⟨by linarith [h.1], h.2⟩)
+
+end
+
+end Spec.Factorization
diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
index cd962d4..5fd0c82 100644
--- a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
+++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
@@ -427,12 +427,31 @@ of the draw coordinates), so its pushforward `noiseLaw` is a probability measure
 `noiseLaw_Icc_eq_one` shows it assigns full mass to `[0,1]`. So `Z_low`/`Z_high` are percentiles of a
 bona fide `[0,1]`-valued random variable, not of an unconstrained sample.
 
-*What is honestly left.* The remaining step is *asymptotic* calibration — that the empirical 5%/95%
-percentiles converge to the true quantiles of `noiseLaw` (Glivenko–Cantelli / DKW), and that under
-exchangeability of a fresh null draw with the sample the false-positive rate is exactly the rank level
-`k/(N+1)`. Both need an empirical-process theory Mathlib v4.30.0 does not carry, so they are stated as
-the open frontier, never stubbed with `sorry`. The finite-sample false-positive *bound* above is the
-exact, non-asymptotic statement the test actually guarantees.
+*The i.i.d. scaffold for the asymptotic step.* `FactorizationsZAsymptotic` takes the first concrete
+step toward that asymptotic calibration — the *pointwise* half, which a survey of
+`Mathlib.Probability` v4.30.0 shows is in fact assemblable sorry-free (a re-scope from the earlier
+"absent from Mathlib" note). It lifts the single null draw to the i.i.d. *sequence*
+`nullSeqGaussian n := Measure.infinitePi (fun _ : ℕ => nullGaussian n)` on `ℕ → (Fin n → ℝ)`, and
+defines `nullNoise Λ V γ i ω := noiseMap Λ V γ (ω i)` — the same measurable `noiseMap`, read off the
+`i`-th coordinate. The coordinate projections are independent under the product measure
+(`iIndepFun_infinitePi`) and composing with `noiseMap` preserves it (`nullNoise_iIndepFun`, with the
+pairwise corollary `nullNoise_pairwise_indepFun`); each projection is measure-preservingly one
+standard-Gaussian draw, so every `nullNoise i` has the *same* law `noiseLaw` (`nullNoise_hasLaw`,
+`nullNoise_identDistrib`); and every draw lies in `[0,1]` (`nullNoise_mem_Icc`) hence is integrable
+(`integrable_nullNoise`). That is exactly the i.i.d.-bounded-integrable triple — `hint`, `hindep`,
+`hident` — that the strong law of large numbers (`strong_law_ae_real`) and the Hoeffding tail consume.
+This scaffold is the only genuinely new measure-theory plumbing; the empirical-CDF consistency
+(Glivenko–Cantelli via the SLLN) and the per-`t` concentration rate `2 exp(-2 N ε²)` (Hoeffding) are
+applications of it, deferred as the next increment.
+
+*What is honestly left.* What stays genuinely research-grade is the *uniform* Glivenko–Cantelli
+(`sup_t |F̂_N - cdf| → 0`) and the full *DKW–Massart* inequality with its sharp constant `2` over
+the supremum — both need the bracketing / VC-class chaining Mathlib v4.30.0 lacks — and the
+*exchangeability rank rate* `k/(N+1)` for a fresh null draw, which needs a symmetric-group
+rank-distribution argument also absent. Those are stated as the open frontier, never stubbed with
+`sorry`. The finite-sample false-positive *bound* above is the exact, non-asymptotic statement the
+test actually guarantees, and the pointwise scaffold is the sorry-free bridge toward the asymptotic
+statement.
 
 # The a-posteriori residual certificate
 
@@ -641,7 +660,13 @@ are both narrow and deliberately scoped: the cyclic-Jacobi convergence *rate* (c
 a-posteriori residual certificate, never by `sorry`), and the *asymptotic* half of the `Z_test` — that
 the empirical 5%/95% percentiles converge to the true quantiles of the now-proved null law
 (Glivenko–Cantelli / DKW), and that an exchangeable fresh draw is rejected at exactly rank rate
-`k/(N+1)`. That needs an empirical-process theory `Mathlib.Probability` does not yet carry, distinct
-from the finite-sample false-positive bound and probability-measure facts already proved. One is a
-proof-only gap on a quantity CHD does not need to *run*; the other is the genuine statistical frontier,
-flagged rather than stubbed with `sorry`.
+`k/(N+1)`. The *pointwise* part of that asymptotic step is no longer fully out of reach: its i.i.d.
+scaffold is now built and proved sorry-free (`FactorizationsZAsymptotic` — `nullNoise` an independent,
+identically-`noiseLaw`-distributed, `[0,1]`-valued, integrable sequence under
+`Measure.infinitePi nullGaussian`), exactly the hypotheses the strong law of large numbers and the
+Hoeffding tail take, so the empirical-CDF consistency and per-`t` rate are now applications rather than
+frontier. What stays genuinely research-grade is the *uniform* Glivenko–Cantelli / DKW–Massart sharp
+constant (bracketing / VC chaining) and the exchangeability rank rate `k/(N+1)`
+(symmetric-group rank distribution) — both absent from `Mathlib.Probability` v4.30.0. One open item is
+a proof-only gap on a quantity CHD does not need to *run*; the other is the genuine statistical
+frontier, flagged rather than stubbed with `sorry`.

From d37099105d0823d279bb13658ae42f06ebedbb87 Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Sun, 31 May 2026 22:08:04 -0700
Subject: [PATCH 20/22] =?UTF-8?q?Z=5Ftest=20asymptotic=20calibration=20ste?=
 =?UTF-8?q?p=20(b):=20pointwise=20Glivenko=E2=80=93Cantelli=20via=20the=20?=
 =?UTF-8?q?SLLN?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Build on the step-(a) i.i.d. scaffold (FactorizationsZAsymptotic) to prove the
pointwise empirical-CDF consistency theorem, sorry-free over Mathlib v4.30.0:

- nullBelow Λ V γ t i ω := (Set.Iic t).indicator 1 (nullNoise Λ V γ i ω), the
  threshold indicators 1{noiseᵢ ≤ t}, and empCDF, their normalized prefix sum.
- nullBelow_pairwise_indepFun / nullBelow_identDistrib / integrable_nullBelow:
  the indicators inherit the scaffold's i.i.d.-bounded-integrable structure
  (indicator composed onto each independent, identically-distributed draw).
- integral_nullBelow_zero: their common mean is exactly cdf (noiseLaw Λ V γ) t
  (HasLaw.integral_comp + integral_indicator_one + cdf_eq_real), so empCDF is the
  Monte-Carlo estimator of the null CDF.
- empCDF_tendsto_cdf: strong_law_ae_real on the hint/hindep/hident triple gives
  ∀ᵐ ω, Tendsto (fun N => empCDF … N t ω) atTop (𝓝 (cdf noiseLaw t)) — pointwise
  Glivenko–Cantelli at each fixed t.

Discovery.lean adds the matching #eval block exercising the computable shadow:
the growing-prefix running mean F̂_N settling toward the full-sample estimate of
cdf noiseLaw t (each prefix a valid [0,1] CDF value, cdf 1 = 1 attained at every
N, with a non-degeneracy negative control). Aggregate docstring and the Ch4
blueprint updated to describe step (b); the uniform GC / DKW–Massart sharp
constant and exchangeability rank rate remain flagged research-grade (no sorry).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 NN/Examples/Factorization.lean                |  8 ++
 NN/Examples/Factorization/Discovery.lean      | 49 ++++++++++
 .../Basic/FactorizationsZAsymptotic.lean      | 98 ++++++++++++++++++-
 .../Ch4_Verification/Factorizations.lean      | 30 ++++--
 4 files changed, 178 insertions(+), 7 deletions(-)

diff --git a/NN/Examples/Factorization.lean b/NN/Examples/Factorization.lean
index e2376fc..0244774 100644
--- a/NN/Examples/Factorization.lean
+++ b/NN/Examples/Factorization.lean
@@ -116,6 +116,14 @@ factorization misbehaves.
   monotone, saturating to `1` at the top of the `[0,1]` support, vanishing below `0`), with a
   **negative control** that it is non-degenerate (rises strictly from `0` to `1`, carrying the
   distributional content whose convergence to `cdf noiseLaw` is the next increment, step (b)).
+  A final **consistency** sub-block corroborates `empCDF_tendsto_cdf` (step (b)): the empirical CDF
+  is the SLLN *running mean* of the bounded i.i.d. indicators `1{noiseᵢ ≤ t}`, whose mean is exactly
+  `cdf noiseLaw t` (`integral_nullBelow_zero`), so almost surely `F̂_N(t) → cdf noiseLaw t` (pointwise
+  Glivenko–Cantelli). The limit needs `N → ∞`, so the `#eval`s watch the **growing-prefix running
+  mean** `F̂_N` settle toward the full-sample estimate: each prefix is a valid `[0,1]` CDF value
+  (bounded summands), the limit value `cdf 1 = 1` is attained at every `N`, and a **negative control**
+  confirms the estimate genuinely moves with `N` (an early prefix differs from the full sample), so
+  the convergence is a real limit being approached rather than a vacuous constant.
 
 Both **positive** checks (a valid factorization reconstructs to `err ≈ 0`) and **negative controls**
 (the same metric reports a large error / `NaN` when a hypothesis is violated) are included, so a
diff --git a/NN/Examples/Factorization/Discovery.lean b/NN/Examples/Factorization/Discovery.lean
index 7139904..5b36f4b 100644
--- a/NN/Examples/Factorization/Discovery.lean
+++ b/NN/Examples/Factorization/Discovery.lean
@@ -344,4 +344,53 @@ def empCdf (t : Float) : Float :=
 #eval assertTrue "empirical CDF is non-degenerate: F(below support) < F(1) (carries distribution info)"
   (Spec.ltBool (empCdf (-0.01)) (empCdf 1.0))
 
+/-! ### Pointwise consistency of the empirical CDF (step b): the SLLN running mean
+
+`FactorizationsZAsymptotic` now proves `empCDF_tendsto_cdf`: for each threshold `t`, the empirical
+CDF `empCDF Λ V γ N t` of the i.i.d. null draws converges *almost surely* to the true CDF
+`cdf noiseLaw t` as `N → ∞` — the pointwise Glivenko–Cantelli theorem, via the strong law of large
+numbers (`strong_law_ae_real`) applied to the bounded i.i.d. indicators `1{noiseᵢ ≤ t}`. The limit
+value is pinned by `integral_nullBelow_zero`: the *mean* of the indicator is exactly
+`cdf noiseLaw t`, so the empirical CDF is the Monte-Carlo estimator of the null CDF. Convergence
+needs `N → ∞`, so it is not directly `#eval`-able; the computable shadow is the **running mean**
+`F̂_N(t) = (1/N)·#{i < N : noiseᵢ ≤ t}` over growing prefixes `N` of the 20-draw sample, which we
+watch settle toward the full-sample estimate `empCdf t` of `cdf noiseLaw t`. -/
+
+/-- Running empirical CDF over the first `N ≤ 20` null draws at threshold `t`: the partial mean of the
+indicator sequence `1{noiseᵢ ≤ t}` that the strong law averages. As `N → ∞` this is precisely the
+quantity `empCDF_tendsto_cdf` sends to `cdf noiseLaw t`; here we watch its growing-`N` prefixes. -/
+def empCdfPrefix (N : Nat) (t : Float) : Float :=
+  (((List.finRange 20).filter
+      (fun j => decide (j.val < N) && Spec.leBool (zNullNoises j) t)).length).toFloat / N.toFloat
+
+#eval IO.println s!"running empirical CDF at t = 0.057 (mid-support) over growing prefixes: \
+  F̂_5 = {empCdfPrefix 5 0.057}, F̂_10 = {empCdfPrefix 10 0.057}, F̂_15 = {empCdfPrefix 15 0.057}, \
+  F̂_20 = {empCdfPrefix 20 0.057} (→ empCdf 0.057 = {empCdf 0.057}, the estimate of cdf noiseLaw 0.057)"
+
+-- Positive — the full prefix `N = 20` *is* the empirical CDF: `empCDF Λ V γ 20 t` evaluated on this
+-- sample. The running mean and the count-fraction `empCdf` coincide at `N = 20` (the shadow of the
+-- `empCDF` definition as a normalized indicator sum).
+#eval assertTrue "running mean at N = 20 equals the empirical CDF (empCDF is the normalized indicator sum)"
+  ([0.0, zLow, 0.5, zHigh, 1.0].all (fun t => empCdfPrefix 20 t == empCdf t))
+
+-- Positive — every running prefix is a valid CDF value in `[0,1]`: a mean of the `[0,1]`-valued
+-- indicators stays in `[0,1]` (the `integrable_nullBelow` / boundedness hypothesis feeding the SLLN).
+#eval assertTrue "every running prefix mean lies in [0,1] (bounded indicators ⇒ bounded average)"
+  ([1, 2, 5, 10, 15, 20].all (fun N =>
+    [0.0, zLow, 0.5, zHigh, 1.0].all (fun t =>
+      Spec.leBool 0.0 (empCdfPrefix N t) && Spec.leBool (empCdfPrefix N t) 1.0)))
+
+-- Positive — the limit value `cdf noiseLaw 1 = 1` is already attained at *every* finite `N`: all
+-- indicators `1{noiseᵢ ≤ 1}` are `1` (`nullNoise_mem_Icc`), so each running mean is exactly `1`. The
+-- empirical CDF converges to the saturation endpoint trivially there.
+#eval assertTrue "running mean saturates to cdf 1 = 1 at every prefix (all noises ≤ 1)"
+  ([1, 2, 5, 10, 15, 20].all (fun N => empCdfPrefix N 1.0 == 1.0))
+
+-- Negative control — consistency is *non-vacuous*: the running estimate genuinely changes with `N`
+-- (an early prefix differs from the full sample at some interior threshold), so the convergence
+-- `F̂_N → cdf noiseLaw t` is a real limit being approached, not a constant already equal to its limit
+-- at `N = 5`. A degenerate (point-mass) sample would make every prefix equal and the SLLN vacuous.
+#eval assertTrue "running empirical CDF is non-trivial: an early prefix differs from the full sample"
+  ([0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, zLow].any (fun t => !(empCdfPrefix 5 t == empCdfPrefix 20 t)))
+
 end NN.Examples.Factorization.Discovery
diff --git a/NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean b/NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean
index 0076404..22053c6 100644
--- a/NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean
+++ b/NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean
@@ -9,9 +9,12 @@ module
 public import NN.Proofs.Tensor.Basic.FactorizationsZTest
 public import Mathlib.Probability.Independence.InfinitePi
 public import Mathlib.MeasureTheory.Integral.IntegrableOn
+public import Mathlib.Probability.StrongLaw
+public import Mathlib.Probability.CDF
+public import Mathlib.MeasureTheory.Integral.Bochner.Set
 
 /-!
-# CHD `Z_test`: the asymptotic-calibration scaffold (step a)
+# CHD `Z_test`: the asymptotic-calibration scaffold and empirical-CDF consistency (steps a–b)
 
 [`FactorizationsZTest`](./FactorizationsZTest.lean) modelled a *single* `Z_test` null draw as
 `nullGaussian n` (the product of `n` standard normals on `Fin n → ℝ`) and proved the per-draw
@@ -43,6 +46,15 @@ numbers and the Hoeffding tail take. This scaffold is the only genuinely *new* m
 plumbing; the empirical-CDF consistency and concentration statements (steps b–d of the plan) are
 applications of it, and the *uniform* Glivenko–Cantelli / DKW–Massart sharp constant and the
 exchangeability rank rate remain genuinely research-grade (flagged, never `sorry`'d).
+
+**Step (b) — pointwise consistency of the empirical CDF** is the first such application, proved
+here. Fix a threshold `t`. The threshold indicators `nullBelow Λ V γ t i ω = 𝟙[nullNoise i ω ≤ t]`
+inherit the i.i.d. structure (composition with the measurable indicator of `Iic t`), are
+`[0,1]`-valued hence integrable, and have common mean `cdf (noiseLaw Λ V γ) t`
+(`integral_nullBelow_zero`). The strong law (`strong_law_ae_real`, Etemadi's pairwise form) then
+yields `empCDF_tendsto_cdf`: almost surely the empirical CDF `empCDF Λ V γ N t` converges to
+`cdf (noiseLaw Λ V γ) t` as `N → ∞` — the pointwise Glivenko–Cantelli theorem. The *uniform*
+(sup-norm over `t`) strengthening and the DKW rate are the remaining steps (c)–(d).
 -/
 
 @[expose] public section
@@ -128,6 +140,90 @@ theorem integrable_nullNoise {Λ : Fin n → ℝ} (hΛ : ∀ i, 0 ≤ Λ i) {γ
       rw [Real.norm_eq_abs, abs_le]
       exact ⟨by linarith [h.1], h.2⟩)
 
+/-! ## Step (b): pointwise consistency of the empirical CDF (Glivenko–Cantelli via the SLLN)
+
+Fix a threshold `t`. The *threshold indicators* `nullBelow Λ V γ t i ω = 𝟙[nullNoise i ω ≤ t]` are,
+like `nullNoise` itself, i.i.d. — composing each independent, identically-distributed draw with the
+measurable indicator of `Iic t` preserves both — and `[0,1]`-valued, hence integrable. Their common
+mean is exactly the CDF of the null law at `t`,
+`∫ ω, nullBelow Λ V γ t 0 ω = (noiseLaw Λ V γ).real (Iic t) = cdf (noiseLaw Λ V γ) t`. The strong law
+of large numbers (`strong_law_ae_real`, Etemadi's pairwise-independent form) then gives, almost
+surely, `empCDF Λ V γ N t ω → cdf (noiseLaw Λ V γ) t` as `N → ∞`: pointwise consistency of the
+empirical distribution function. -/
+
+/-- The threshold indicator of the `i`-th null draw at level `t`: `1` if that draw's `noise` is
+`≤ t`, else `0`. Normalized sums of these are the empirical CDF, and as an i.i.d. bounded sequence
+they are the random variables the strong law runs on. -/
+noncomputable def nullBelow (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (t : ℝ) :
+    ℕ → (ℕ → Fin n → ℝ) → ℝ :=
+  fun i ω => (Set.Iic t).indicator (1 : ℝ → ℝ) (nullNoise Λ V γ i ω)
+
+/-- The **empirical CDF** of the first `N` null draws at threshold `t`:
+`F̂_N(t)(ω) = #{i < N : nullNoise i ω ≤ t} / N`, written as the normalized sum of threshold
+indicators so it plugs directly into the strong law. -/
+noncomputable def empCDF (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (N : ℕ) (t : ℝ)
+    (ω : ℕ → Fin n → ℝ) : ℝ :=
+  (∑ i ∈ Finset.range N, nullBelow Λ V γ t i ω) / (N : ℝ)
+
+/-- Each threshold indicator is measurable: the measurable indicator of `Iic t` composed with the
+measurable `nullNoise`. -/
+theorem measurable_nullBelow (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (t : ℝ) (i : ℕ) :
+    Measurable (nullBelow Λ V γ t i) :=
+  (measurable_const.indicator measurableSet_Iic).comp (measurable_nullNoise Λ V γ i)
+
+/-- **The threshold-indicator sequence is pairwise independent** — composing each independent
+`nullNoise` draw with the measurable indicator of `Iic t` preserves independence. The exact
+`hindep` shape `strong_law_ae_real` consumes. -/
+theorem nullBelow_pairwise_indepFun (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (t : ℝ) :
+    Pairwise (Function.onFun (· ⟂ᵢ[nullSeqGaussian n] ·) (nullBelow Λ V γ t)) := by
+  intro i j hij
+  exact ((nullNoise_iIndepFun Λ V γ).indepFun hij).comp
+    (measurable_const.indicator measurableSet_Iic) (measurable_const.indicator measurableSet_Iic)
+
+/-- **The threshold-indicator sequence is identically distributed** — each is the common `nullNoise`
+law pushed through the same indicator. The `hident` hypothesis of the strong law. -/
+theorem nullBelow_identDistrib (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (t : ℝ) (i : ℕ) :
+    IdentDistrib (nullBelow Λ V γ t i) (nullBelow Λ V γ t 0)
+      (nullSeqGaussian n) (nullSeqGaussian n) :=
+  (nullNoise_identDistrib Λ V γ i).comp (measurable_const.indicator measurableSet_Iic)
+
+/-- **Each threshold indicator is integrable** — it is `[0,1]`-valued on a probability space. The
+`hint` hypothesis of the strong law. -/
+theorem integrable_nullBelow (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (t : ℝ) (i : ℕ) :
+    Integrable (nullBelow Λ V γ t i) (nullSeqGaussian n) :=
+  Integrable.of_bound (measurable_nullBelow Λ V γ t i).aestronglyMeasurable 1
+    (ae_of_all _ fun ω => by
+      show ‖(Set.Iic t).indicator (1 : ℝ → ℝ) (nullNoise Λ V γ i ω)‖ ≤ 1
+      refine le_trans (norm_indicator_le_norm_self _ _) ?_
+      simp)
+
+/-- **The common mean of the threshold indicators is the null CDF at `t`.** Pushing the indicator of
+`Iic t` through the `0`-th draw's law `noiseLaw` (via `HasLaw.integral_comp`) turns the expectation
+into `(noiseLaw Λ V γ).real (Iic t) = cdf (noiseLaw Λ V γ) t`. -/
+theorem integral_nullBelow_zero (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (t : ℝ) :
+    (nullSeqGaussian n)[nullBelow Λ V γ t 0] = cdf (noiseLaw Λ V γ) t := by
+  have hf : AEStronglyMeasurable ((Set.Iic t).indicator (1 : ℝ → ℝ)) (noiseLaw Λ V γ) :=
+    (measurable_const.indicator measurableSet_Iic).aestronglyMeasurable
+  have key := (nullNoise_hasLaw Λ V γ 0).integral_comp hf
+  rw [integral_indicator_one measurableSet_Iic, ← cdf_eq_real] at key
+  exact key
+
+/-- **Pointwise consistency of the empirical CDF (pointwise Glivenko–Cantelli via the SLLN).** For
+each fixed threshold `t`, almost surely the empirical CDF `empCDF` of the i.i.d. null draws converges
+to the true CDF of the null law `noiseLaw` as the number of draws `N → ∞`. This is step (b) of the
+asymptotic-calibration plan — the foundation under the 5%/95% percentile convergence, whose uniform
+and concentration refinements are steps (c)–(d). -/
+theorem empCDF_tendsto_cdf (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (t : ℝ) :
+    ∀ᵐ ω ∂(nullSeqGaussian n),
+      Filter.Tendsto (fun N : ℕ => empCDF Λ V γ N t ω) Filter.atTop
+        (nhds (cdf (noiseLaw Λ V γ) t)) := by
+  have hlaw := strong_law_ae_real (nullBelow Λ V γ t)
+    (integrable_nullBelow Λ V γ t 0)
+    (nullBelow_pairwise_indepFun Λ V γ t)
+    (fun i => nullBelow_identDistrib Λ V γ t i)
+  rw [integral_nullBelow_zero] at hlaw
+  exact hlaw
+
 end
 
 end Spec.Factorization
diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
index 5fd0c82..0a98163 100644
--- a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
+++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
@@ -442,7 +442,23 @@ standard-Gaussian draw, so every `nullNoise i` has the *same* law `noiseLaw` (`n
 `hident` — that the strong law of large numbers (`strong_law_ae_real`) and the Hoeffding tail consume.
 This scaffold is the only genuinely new measure-theory plumbing; the empirical-CDF consistency
 (Glivenko–Cantelli via the SLLN) and the per-`t` concentration rate `2 exp(-2 N ε²)` (Hoeffding) are
-applications of it, deferred as the next increment.
+applications of it.
+
+*Pointwise consistency of the empirical CDF (step b).* The first such application is now proved,
+`empCDF_tendsto_cdf`. Fix a threshold `t`. The threshold indicators
+`nullBelow Λ V γ t i ω := (Set.Iic t).indicator 1 (nullNoise Λ V γ i ω)` — the events `1{noiseᵢ ≤ t}`
+— inherit the scaffold's i.i.d. structure: composing each independent, identically-distributed draw
+with the measurable indicator of `Iic t` preserves both (`nullBelow_pairwise_indepFun`,
+`nullBelow_identDistrib`), and they are `[0,1]`-valued hence integrable (`integrable_nullBelow`).
+Their common mean is pinned by a short `HasLaw.integral_comp` computation that pushes the indicator
+through `noiseLaw`: `∫ ω, nullBelow Λ V γ t 0 ω = (noiseLaw Λ V γ).real (Iic t) = cdf (noiseLaw Λ V γ) t`
+(`integral_nullBelow_zero`) — so the empirical CDF is literally the Monte-Carlo estimator of the null
+CDF. Feeding the `hint`/`hindep`/`hident` triple to `strong_law_ae_real` then yields, almost surely,
+`empCDF Λ V γ N t ω → cdf (noiseLaw Λ V γ) t` as `N → ∞`, where
+`empCDF Λ V γ N t ω := (∑ i ∈ range N, nullBelow Λ V γ t i ω) / N`. That is the *pointwise*
+Glivenko–Cantelli theorem, sorry-free over Mathlib v4.30.0. The executable `Discovery` examples
+exercise its computable shadow — the growing-prefix running mean `F̂_N(t)` settling toward the
+full-sample estimate of `cdf noiseLaw t`.
 
 *What is honestly left.* What stays genuinely research-grade is the *uniform* Glivenko–Cantelli
 (`sup_t |F̂_N - cdf| → 0`) and the full *DKW–Massart* inequality with its sharp constant `2` over
@@ -660,12 +676,14 @@ are both narrow and deliberately scoped: the cyclic-Jacobi convergence *rate* (c
 a-posteriori residual certificate, never by `sorry`), and the *asymptotic* half of the `Z_test` — that
 the empirical 5%/95% percentiles converge to the true quantiles of the now-proved null law
 (Glivenko–Cantelli / DKW), and that an exchangeable fresh draw is rejected at exactly rank rate
-`k/(N+1)`. The *pointwise* part of that asymptotic step is no longer fully out of reach: its i.i.d.
-scaffold is now built and proved sorry-free (`FactorizationsZAsymptotic` — `nullNoise` an independent,
+`k/(N+1)`. The *pointwise* part of that asymptotic step is now proved, not merely scaffolded: on the
+i.i.d. sequence `FactorizationsZAsymptotic` builds (`nullNoise` an independent,
 identically-`noiseLaw`-distributed, `[0,1]`-valued, integrable sequence under
-`Measure.infinitePi nullGaussian`), exactly the hypotheses the strong law of large numbers and the
-Hoeffding tail take, so the empirical-CDF consistency and per-`t` rate are now applications rather than
-frontier. What stays genuinely research-grade is the *uniform* Glivenko–Cantelli / DKW–Massart sharp
+`Measure.infinitePi nullGaussian`), `empCDF_tendsto_cdf` applies the strong law of large numbers to
+the bounded indicators `1{noiseᵢ ≤ t}` — whose mean is exactly `cdf noiseLaw t`
+(`integral_nullBelow_zero`) — to give almost-sure convergence `F̂_N(t) → cdf noiseLaw t` for every
+fixed `t`, the *pointwise* Glivenko–Cantelli theorem, sorry-free. What stays genuinely research-grade
+is the *uniform* Glivenko–Cantelli / DKW–Massart sharp
 constant (bracketing / VC chaining) and the exchangeability rank rate `k/(N+1)`
 (symmetric-group rank distribution) — both absent from `Mathlib.Probability` v4.30.0. One open item is
 a proof-only gap on a quantity CHD does not need to *run*; the other is the genuine statistical

From aa6a3334b345b9b3966d72cbf32cae34fed914ab Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Mon, 1 Jun 2026 07:46:31 -0700
Subject: [PATCH 21/22] Z_test asymptotic calibration step (c): pointwise
 DKW-at-a-point via Hoeffding
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Prove the finite-sample companion of step (b)'s almost-sure limit: at a fixed
threshold t, the empirical CDF of the i.i.d. null draws concentrates around the
true null CDF at the sharp Hoeffding rate.

FactorizationsZAsymptotic.lean:
- nullBelow_subgaussian / nullBelow_neg_subgaussian: the [0,1]-bounded threshold
  indicators are sub-Gaussian with variance proxy 1/4 (hasSubgaussianMGF_of_mem_Icc),
  centered at mean cdf noiseLaw t (integral_nullBelow_eq).
- hoeffding_avg_ge: Mathlib's measure_sum_ge_le_of_iIndepFun with ε ↦ N·ε turns the
  proxy sum N/4 into the sharp exponent -2Nε².
- empCDF_upper_tail / empCDF_lower_tail: P(±(empCDF − cdf) ≥ ε) ≤ exp(-2Nε²).
- empCDF_concentration: two-sided DKW-at-a-point P(|empCDF − cdf| ≥ ε) ≤ 2·exp(-2Nε²)
  via measureReal_union_le + le_abs.

Discovery.lean: matching #eval block exercising the bound's computable shadows — the
tail function 2·exp(-2Nε²) (twice one-sided, decreasing in N and ε, non-vacuous < 1
once 2Nε² > ln 2, trivially 2 at ε = 0) and the observed prefix deviation it governs
(every prefix of ≥ 3 draws keeps F̂_N within ε = 0.3 of the full sample; negative
control: N = 1, 2 deviate by 0.5 > ε).

Docs: aggregate Factorization docstring and Ch4 blueprint updated; uniform DKW–Massart
and quantile-transfer step (d) remain flagged research-grade, never sorry'd.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 NN/Examples/Factorization.lean                |  11 +
 NN/Examples/Factorization/Discovery.lean      |  74 +++++++
 .../Basic/FactorizationsZAsymptotic.lean      | 189 +++++++++++++++++-
 .../Ch4_Verification/Factorizations.lean      |  45 ++++-
 4 files changed, 307 insertions(+), 12 deletions(-)

diff --git a/NN/Examples/Factorization.lean b/NN/Examples/Factorization.lean
index 0244774..b4eb302 100644
--- a/NN/Examples/Factorization.lean
+++ b/NN/Examples/Factorization.lean
@@ -124,6 +124,17 @@ factorization misbehaves.
   (bounded summands), the limit value `cdf 1 = 1` is attained at every `N`, and a **negative control**
   confirms the estimate genuinely moves with `N` (an early prefix differs from the full sample), so
   the convergence is a real limit being approached rather than a vacuous constant.
+  A final **concentration** sub-block corroborates `empCDF_concentration` (step (c)): the
+  Dvoretzky–Kiefer–Wolfowitz inequality *at a single point*, `ℙ(|F̂_N(t) − cdf noiseLaw t| ≥ ε) ≤
+  2·exp(−2·N·ε²)` with the sharp Hoeffding exponent (the threshold indicators are `[0,1]`-bounded,
+  so sub-Gaussian with proxy `1/4`; the one-sided `empCDF_upper_tail`/`empCDF_lower_tail` give
+  `exp(−2Nε²)` each, the union the factor `2`). The probability is noncomputable, so the `#eval`s
+  exercise the bound's two computable shadows: the tail *function* `2·exp(−2Nε²)` (twice the
+  one-sided tail, decreasing in both `N` and `ε`, non-vacuous `< 1` once `2Nε² > ln 2`), and the
+  observed deviation it governs — every prefix of `≥ 3` draws keeps `F̂_N` within `ε = 0.3` of the
+  full-sample estimate uniformly over thresholds, with a **negative control** that the tiniest
+  prefixes (`N = 1, 2`) deviate by `0.5 > ε`, the honest weak-`N` regime where the `2·exp(−2Nε²)`
+  bound is still near `2`.
 
 Both **positive** checks (a valid factorization reconstructs to `err ≈ 0`) and **negative controls**
 (the same metric reports a large error / `NaN` when a hypothesis is violated) are included, so a
diff --git a/NN/Examples/Factorization/Discovery.lean b/NN/Examples/Factorization/Discovery.lean
index 5b36f4b..983747d 100644
--- a/NN/Examples/Factorization/Discovery.lean
+++ b/NN/Examples/Factorization/Discovery.lean
@@ -393,4 +393,78 @@ def empCdfPrefix (N : Nat) (t : Float) : Float :=
 #eval assertTrue "running empirical CDF is non-trivial: an early prefix differs from the full sample"
   ([0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, zLow].any (fun t => !(empCdfPrefix 5 t == empCdfPrefix 20 t)))
 
+/-! ### Pointwise finite-sample concentration (step c): the Hoeffding / DKW-at-a-point bound
+
+`empCDF_concentration` proves, for the i.i.d. null sequence, that at any fixed threshold `t`,
+
+  ℙ(|F̂_N(t) − cdf noiseLaw t| ≥ ε) ≤ 2·exp(−2·N·ε²),
+
+the Dvoretzky–Kiefer–Wolfowitz inequality at a single point, with the sharp Hoeffding exponent (the
+one-sided `empCDF_upper_tail`/`empCDF_lower_tail` give `exp(−2Nε²)` each; their union the factor `2`).
+The probability is noncomputable — a measure on the infinite product of draws — but two computable
+shadows make the statement concrete: the tail-bound *function* `2·exp(−2Nε²)`, and the *observed*
+deviation of the prefix empirical CDF from the full-sample estimate, which that bound governs. -/
+
+/-- The one-sided Hoeffding tail `exp(−2Nε²)` (`empCDF_upper_tail` / `empCDF_lower_tail`). -/
+def oneSidedBound (N : Nat) (ε : Float) : Float := Float.exp (-2.0 * N.toFloat * ε * ε)
+
+/-- The two-sided DKW-at-a-point bound `2·exp(−2Nε²)` (`empCDF_concentration`). -/
+def hoeffdingBound (N : Nat) (ε : Float) : Float := 2.0 * oneSidedBound N ε
+
+#eval IO.println s!"Hoeffding two-sided bound 2·exp(−2Nε²): (N=20,ε=0.3) = {hoeffdingBound 20 0.3}, \
+  (N=20,ε=0.15) = {hoeffdingBound 20 0.15}, (N=10,ε=0.3) = {hoeffdingBound 10 0.3}, \
+  trivial (ε=0) = {hoeffdingBound 20 0.0}"
+
+-- Positive — the two-sided bound is exactly twice the one-sided tail: the union bound assembling
+-- `empCDF_concentration` from `empCDF_upper_tail` + `empCDF_lower_tail`, each `exp(−2Nε²)`.
+#eval assertTrue "two-sided Hoeffding bound = 2 × one-sided tail (upper + lower)"
+  ([(5, 0.1), (10, 0.2), (20, 0.3)].all (fun p => hoeffdingBound p.1 p.2 == 2.0 * oneSidedBound p.1 p.2))
+
+-- Positive — the bound tightens with more samples: doubling `N` shrinks the tail (the `N`-dependence
+-- of the `−2Nε²` exponent, i.e. the finite-sample consistency rate).
+#eval assertTrue "Hoeffding bound decreases in N (more draws ⇒ sharper concentration)"
+  ([0.15, 0.2, 0.3].all (fun ε => Spec.leBool (hoeffdingBound 40 ε) (hoeffdingBound 20 ε)
+      && Spec.leBool (hoeffdingBound 20 ε) (hoeffdingBound 10 ε)))
+
+-- Positive — the bound tightens with a looser tolerance `ε` (the `ε²` in the exponent).
+#eval assertTrue "Hoeffding bound decreases in ε (larger tolerance ⇒ smaller exceedance probability)"
+  ([5, 10, 20].all (fun N => Spec.leBool (hoeffdingBound N 0.4) (hoeffdingBound N 0.2)))
+
+-- Positive — the bound is *non-vacuous* (a genuine probability bound `< 1`) once `2Nε² > ln 2`; at
+-- `N = 20`, `ε = 0.3` it is `≈ 0.055`, so the empirical CDF is within 0.3 of the truth w.p. ≥ 0.945.
+#eval assertTrue "Hoeffding bound is non-vacuous (< 1) at N = 20, ε = 0.3"
+  (Spec.ltBool (hoeffdingBound 20 0.3) 1.0)
+
+-- Negative control — at `ε = 0` the bound is exactly the trivial constant `2` (the vacuous `ℙ ≤ 2`):
+-- concentration says nothing without a positive tolerance, so the `ε²` in the exponent does the work.
+#eval assertTrue "at ε = 0 the Hoeffding bound is the trivial constant 2 (vacuous without tolerance)"
+  (hoeffdingBound 20 0.0 == 2.0)
+
+/-! The bound governs the *observed* fluctuation: as the prefix grows, the empirical CDF `F̂_N`
+concentrates around the full-sample estimate. With tolerance `ε = 0.3`, every prefix of `≥ 3` draws
+stays within `ε` of the full sample uniformly over the threshold grid, while the tiniest prefixes
+(`N = 1, 2`) deviate by `0.5` — exactly the weak-`N` regime where `2·exp(−2Nε²)` is still near `2`. -/
+
+/-- Threshold grid spanning the tight null-noise band `[0.048, 0.062]` plus the `[0,1]` tails. -/
+def devGrid : List Float := [-0.01, 0.0, 0.05, 0.055, 0.057, 0.06, 0.062, 0.2, 0.5, 1.0]
+
+/-- Sup-over-the-grid deviation of the prefix-`N` empirical CDF from the full-sample estimate — the
+quantity the two-sided bound controls (a computable proxy for `|F̂_N(t) − cdf noiseLaw t|`). -/
+def maxDev (N : Nat) : Float :=
+  (devGrid.map (fun t => Float.abs (empCdfPrefix N t - empCdf t))).foldl max 0.0
+
+#eval IO.println s!"max |F̂_N − F̂_20| over the grid: N=1 {maxDev 1}, N=2 {maxDev 2}, N=3 {maxDev 3}, \
+  N=5 {maxDev 5}, N=10 {maxDev 10}, N=20 {maxDev 20}"
+
+-- Positive — concentration: with enough draws the empirical CDF settles within `ε = 0.3` of the
+-- full-sample estimate, uniformly over thresholds (the deviation the two-sided bound governs).
+#eval assertTrue "empirical CDF concentrates: max deviation ≤ ε = 0.3 for every prefix of ≥ 3 draws"
+  ([3, 5, 10, 15, 20].all (fun N => Spec.leBool (maxDev N) 0.3))
+
+-- Negative control — at the tiniest prefixes (`N = 1, 2`) the empirical CDF still deviates by `0.5 > ε`,
+-- so concentration genuinely needs `N` to grow: this is the regime where `2·exp(−2Nε²)` is near `2`,
+-- i.e. the bound is honestly vacuous and the empirical CDF has not yet concentrated.
+#eval assertTrue "concentration needs N to grow: N = 1, 2 prefixes deviate by > ε = 0.3"
+  ([1, 2].all (fun N => Spec.ltBool 0.3 (maxDev N)))
+
 end NN.Examples.Factorization.Discovery
diff --git a/NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean b/NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean
index 22053c6..23ad95d 100644
--- a/NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean
+++ b/NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean
@@ -12,9 +12,10 @@ public import Mathlib.MeasureTheory.Integral.IntegrableOn
 public import Mathlib.Probability.StrongLaw
 public import Mathlib.Probability.CDF
 public import Mathlib.MeasureTheory.Integral.Bochner.Set
+public import Mathlib.Probability.Moments.SubGaussian
 
 /-!
-# CHD `Z_test`: the asymptotic-calibration scaffold and empirical-CDF consistency (steps a–b)
+# CHD `Z_test`: asymptotic calibration — i.i.d. scaffold, empirical-CDF consistency and pointwise concentration (steps a–c)
 
 [`FactorizationsZTest`](./FactorizationsZTest.lean) modelled a *single* `Z_test` null draw as
 `nullGaussian n` (the product of `n` standard normals on `Fin n → ℝ`) and proved the per-draw
@@ -55,6 +56,20 @@ inherit the i.i.d. structure (composition with the measurable indicator of `Iic
 yields `empCDF_tendsto_cdf`: almost surely the empirical CDF `empCDF Λ V γ N t` converges to
 `cdf (noiseLaw Λ V γ) t` as `N → ∞` — the pointwise Glivenko–Cantelli theorem. The *uniform*
 (sup-norm over `t`) strengthening and the DKW rate are the remaining steps (c)–(d).
+
+**Step (c) — pointwise finite-sample concentration (DKW-at-a-point via Hoeffding)** quantifies the
+rate of that convergence at a fixed `t`. Each threshold indicator `nullBelow Λ V γ t i` is bounded in
+`[0,1]`, so — once centered at its mean `cdf (noiseLaw Λ V γ) t` — it has a sub-Gaussian moment
+generating function with variance proxy `1/4` (Hoeffding's lemma, `hasSubgaussianMGF_of_mem_Icc`).
+Mathlib's Hoeffding inequality for sums of independent sub-Gaussians
+(`HasSubgaussianMGF.measure_sum_ge_le_of_iIndepFun`) then gives, for every `N ≥ 1` and `ε ≥ 0`, the
+one-sided tails `empCDF_upper_tail` / `empCDF_lower_tail`
+`ℙ(±(empCDF Λ V γ N t − cdf (noiseLaw Λ V γ) t) ≥ ε) ≤ exp(−2·N·ε²)`, and their union the two-sided
+`empCDF_concentration` `ℙ(|empCDF Λ V γ N t − cdf (noiseLaw Λ V γ) t| ≥ ε) ≤ 2·exp(−2·N·ε²)` — the
+DKW inequality *at a single point* `t`, with the sharp Hoeffding exponent. This is the finite-sample
+companion of step (b)'s almost-sure limit. The *uniform-over-`t`* DKW–Massart bound with the global
+constant `2` (the genuine Dvoretzky–Kiefer–Wolfowitz theorem) is the research-grade strengthening
+still flagged out of scope, and the quantile-transfer step (d) remains.
 -/
 
 @[expose] public section
@@ -63,6 +78,8 @@ namespace Spec.Factorization
 
 open MeasureTheory ProbabilityTheory
 
+open scoped NNReal
+
 variable {n : Nat}
 
 noncomputable section
@@ -224,6 +241,176 @@ theorem empCDF_tendsto_cdf (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (
   rw [integral_nullBelow_zero] at hlaw
   exact hlaw
 
+/-! ## Step (c): pointwise finite-sample concentration (DKW-at-a-point via Hoeffding)
+
+Step (b) is an almost-sure *limit*; step (c) is its quantitative, finite-`N` companion. The threshold
+indicators `nullBelow Λ V γ t i` are bounded in `[0,1]`, hence — centered at their common mean
+`cdf (noiseLaw Λ V γ) t` — sub-Gaussian with variance proxy `(1/2)² = 1/4` (Hoeffding's lemma). Being
+i.i.d., their normalized sum (the empirical CDF) concentrates exponentially: Mathlib's sub-Gaussian
+Hoeffding inequality gives both one-sided tails with rate `exp(−2·N·ε²)` and, by a union bound, the
+two-sided `2·exp(−2·N·ε²)` — the Dvoretzky–Kiefer–Wolfowitz bound *at the single point* `t`. -/
+
+/-- The empirical CDF lies in `[0,1]` pointwise, so each threshold indicator does too. -/
+theorem nullBelow_mem_Icc (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (t : ℝ) (i : ℕ)
+    (ω : ℕ → Fin n → ℝ) : nullBelow Λ V γ t i ω ∈ Set.Icc (0 : ℝ) 1 := by
+  unfold nullBelow
+  rw [Set.indicator_apply]
+  split <;> simp [Set.mem_Icc]
+
+/-- **The threshold indicators are jointly independent** (not just pairwise): composing the i.i.d.
+`nullNoise` sequence with the measurable indicator of `Iic t` preserves joint independence. The shape
+the Hoeffding sum bound consumes. -/
+theorem nullBelow_iIndepFun (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (t : ℝ) :
+    iIndepFun (nullBelow Λ V γ t) (nullSeqGaussian n) :=
+  (nullNoise_iIndepFun Λ V γ).comp (fun _ => (Set.Iic t).indicator (1 : ℝ → ℝ))
+    (fun _ => measurable_const.indicator measurableSet_Iic)
+
+/-- The common mean of the threshold indicators is the null CDF at `t`, for *every* draw `i` (not just
+the `0`-th): identically distributed draws share their integral. -/
+theorem integral_nullBelow_eq (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (t : ℝ) (i : ℕ) :
+    (nullSeqGaussian n)[nullBelow Λ V γ t i] = cdf (noiseLaw Λ V γ) t := by
+  rw [(nullBelow_identDistrib Λ V γ t i).integral_eq, integral_nullBelow_zero]
+
+/-- **Hoeffding's lemma for one threshold indicator.** Centered at its mean `cdf (noiseLaw Λ V γ) t`,
+the `[0,1]`-valued indicator has a sub-Gaussian MGF with variance proxy `((1-0)/2)² = 1/4`. -/
+theorem nullBelow_subgaussian (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (t : ℝ) (i : ℕ) :
+    HasSubgaussianMGF (fun ω => nullBelow Λ V γ t i ω - cdf (noiseLaw Λ V γ) t)
+      (1 / 4 : ℝ≥0) (nullSeqGaussian n) := by
+  have hb : ∀ᵐ ω ∂(nullSeqGaussian n), nullBelow Λ V γ t i ω ∈ Set.Icc (0 : ℝ) 1 :=
+    ae_of_all _ (nullBelow_mem_Icc Λ V γ t i)
+  have h := hasSubgaussianMGF_of_mem_Icc (measurable_nullBelow Λ V γ t i).aemeasurable hb
+  rw [integral_nullBelow_eq] at h
+  rwa [show ((‖(1 : ℝ) - 0‖₊) / 2) ^ 2 = (1 / 4 : ℝ≥0) from by
+        rw [sub_zero, nnnorm_one]; norm_num] at h
+
+/-- The mean of the *negated*, recentred indicator `cdf (noiseLaw Λ V γ) t − nullBelow` is `0`. -/
+theorem integral_negBelow_eq (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (t : ℝ) (i : ℕ) :
+    (nullSeqGaussian n)[fun ω => cdf (noiseLaw Λ V γ) t - nullBelow Λ V γ t i ω] = 0 := by
+  rw [integral_sub (integrable_const _) (integrable_nullBelow Λ V γ t i), integral_const,
+    integral_nullBelow_eq]
+  simp
+
+/-- The negated indicator `cdf (noiseLaw Λ V γ) t − nullBelow` lies in `[cdf − 1, cdf]`. -/
+theorem nullBelow_neg_mem_Icc (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (t : ℝ) (i : ℕ)
+    (ω : ℕ → Fin n → ℝ) :
+    cdf (noiseLaw Λ V γ) t - nullBelow Λ V γ t i ω
+      ∈ Set.Icc (cdf (noiseLaw Λ V γ) t - 1) (cdf (noiseLaw Λ V γ) t) := by
+  have h := Set.mem_Icc.mp (nullBelow_mem_Icc Λ V γ t i ω)
+  rw [Set.mem_Icc]
+  constructor <;> linarith [h.1, h.2]
+
+/-- **Hoeffding's lemma for the negated indicator.** `cdf (noiseLaw Λ V γ) t − nullBelow` is
+`[cdf − 1, cdf]`-valued (length-`1` interval) and already mean-zero, so it is sub-Gaussian with the
+same variance proxy `1/4` — the lower-tail companion of `nullBelow_subgaussian`. -/
+theorem nullBelow_neg_subgaussian (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (t : ℝ) (i : ℕ) :
+    HasSubgaussianMGF (fun ω => cdf (noiseLaw Λ V γ) t - nullBelow Λ V γ t i ω)
+      (1 / 4 : ℝ≥0) (nullSeqGaussian n) := by
+  have hb : ∀ᵐ ω ∂(nullSeqGaussian n),
+      (fun ω => cdf (noiseLaw Λ V γ) t - nullBelow Λ V γ t i ω) ω
+        ∈ Set.Icc (cdf (noiseLaw Λ V γ) t - 1) (cdf (noiseLaw Λ V γ) t) :=
+    ae_of_all _ (nullBelow_neg_mem_Icc Λ V γ t i)
+  have hmeas : Measurable (fun ω => cdf (noiseLaw Λ V γ) t - nullBelow Λ V γ t i ω) :=
+    measurable_const.sub (measurable_nullBelow Λ V γ t i)
+  have h := hasSubgaussianMGF_of_mem_Icc hmeas.aemeasurable hb
+  rw [integral_negBelow_eq] at h
+  simp only [sub_zero] at h
+  rwa [show ((‖cdf (noiseLaw Λ V γ) t - (cdf (noiseLaw Λ V γ) t - 1)‖₊) / 2) ^ 2 = (1 / 4 : ℝ≥0)
+        from by rw [sub_sub_cancel, nnnorm_one]; norm_num] at h
+
+/-- **Hoeffding's inequality for a normalized i.i.d. proxy-`1/4` sub-Gaussian sum.** If `X` is a
+jointly independent sequence on the null-draw space, each centered draw sub-Gaussian with variance
+proxy `1/4`, then for `N ≥ 1` and `ε ≥ 0` the empirical average `(∑_{i<N} X i)/N` exceeds `ε` with
+probability at most `exp(−2·N·ε²)`. The engine under both `empCDF` tails — `ε ↦ N·ε` in Mathlib's
+sum bound turns the proxy sum `N/4` into the sharp exponent `−2Nε²`. -/
+theorem hoeffding_avg_ge {X : ℕ → (ℕ → Fin n → ℝ) → ℝ}
+    (hindep : iIndepFun X (nullSeqGaussian n))
+    (hsub : ∀ i, HasSubgaussianMGF (X i) (1 / 4 : ℝ≥0) (nullSeqGaussian n))
+    {N : ℕ} (hN : 1 ≤ N) {ε : ℝ} (hε : 0 ≤ ε) :
+    (nullSeqGaussian n).real {ω | ε ≤ (∑ i ∈ Finset.range N, X i ω) / (N : ℝ)}
+      ≤ Real.exp (-2 * (N : ℝ) * ε ^ 2) := by
+  have hNR : (0 : ℝ) < N := by exact_mod_cast hN
+  have hbase := HasSubgaussianMGF.measure_sum_ge_le_of_iIndepFun hindep
+    (c := fun _ => (1 / 4 : ℝ≥0)) (s := Finset.range N) (fun i _ => hsub i)
+    (ε := (N : ℝ) * ε) (by positivity)
+  have hset : {ω | (N : ℝ) * ε ≤ ∑ i ∈ Finset.range N, X i ω}
+      = {ω | ε ≤ (∑ i ∈ Finset.range N, X i ω) / (N : ℝ)} := by
+    ext ω
+    simp only [Set.mem_setOf_eq]
+    rw [le_div_iff₀ hNR, mul_comm]
+  rw [hset] at hbase
+  refine hbase.trans (le_of_eq ?_)
+  congr 1
+  simp only [Finset.sum_const, Finset.card_range, nsmul_eq_mul]
+  push_cast
+  field_simp
+  ring
+
+/-- **Upper-tail concentration of the empirical CDF (Hoeffding).** For `N ≥ 1`, `ε ≥ 0`, the empirical
+CDF overshoots the true null CDF at `t` by `ε` with probability `≤ exp(−2·N·ε²)`. -/
+theorem empCDF_upper_tail (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (t : ℝ) {N : ℕ}
+    (hN : 1 ≤ N) {ε : ℝ} (hε : 0 ≤ ε) :
+    (nullSeqGaussian n).real {ω | ε ≤ empCDF Λ V γ N t ω - cdf (noiseLaw Λ V γ) t}
+      ≤ Real.exp (-2 * (N : ℝ) * ε ^ 2) := by
+  have hNR : (0 : ℝ) < N := by exact_mod_cast hN
+  have hind : iIndepFun (fun i ω => nullBelow Λ V γ t i ω - cdf (noiseLaw Λ V γ) t)
+      (nullSeqGaussian n) :=
+    (nullBelow_iIndepFun Λ V γ t).comp (fun _ x => x - cdf (noiseLaw Λ V γ) t)
+      (fun _ => measurable_id.sub_const _)
+  have htail := hoeffding_avg_ge hind (fun i => nullBelow_subgaussian Λ V γ t i) hN hε
+  have hrw : ∀ ω,
+      (∑ i ∈ Finset.range N, (nullBelow Λ V γ t i ω - cdf (noiseLaw Λ V γ) t)) / (N : ℝ)
+        = empCDF Λ V γ N t ω - cdf (noiseLaw Λ V γ) t := by
+    intro ω
+    unfold empCDF
+    rw [Finset.sum_sub_distrib, Finset.sum_const, Finset.card_range, nsmul_eq_mul, sub_div,
+      mul_div_cancel_left₀ (cdf (noiseLaw Λ V γ) t) (ne_of_gt hNR)]
+  simp only [hrw] at htail
+  exact htail
+
+/-- **Lower-tail concentration of the empirical CDF (Hoeffding).** Symmetrically, the empirical CDF
+undershoots the true null CDF at `t` by `ε` with probability `≤ exp(−2·N·ε²)`. -/
+theorem empCDF_lower_tail (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (t : ℝ) {N : ℕ}
+    (hN : 1 ≤ N) {ε : ℝ} (hε : 0 ≤ ε) :
+    (nullSeqGaussian n).real {ω | ε ≤ cdf (noiseLaw Λ V γ) t - empCDF Λ V γ N t ω}
+      ≤ Real.exp (-2 * (N : ℝ) * ε ^ 2) := by
+  have hNR : (0 : ℝ) < N := by exact_mod_cast hN
+  have hind : iIndepFun (fun i ω => cdf (noiseLaw Λ V γ) t - nullBelow Λ V γ t i ω)
+      (nullSeqGaussian n) :=
+    (nullBelow_iIndepFun Λ V γ t).comp (fun _ x => cdf (noiseLaw Λ V γ) t - x)
+      (fun _ => measurable_const.sub measurable_id)
+  have htail := hoeffding_avg_ge hind (fun i => nullBelow_neg_subgaussian Λ V γ t i) hN hε
+  have hrw : ∀ ω,
+      (∑ i ∈ Finset.range N, (cdf (noiseLaw Λ V γ) t - nullBelow Λ V γ t i ω)) / (N : ℝ)
+        = cdf (noiseLaw Λ V γ) t - empCDF Λ V γ N t ω := by
+    intro ω
+    unfold empCDF
+    rw [Finset.sum_sub_distrib, Finset.sum_const, Finset.card_range, nsmul_eq_mul, sub_div,
+      mul_div_cancel_left₀ (cdf (noiseLaw Λ V γ) t) (ne_of_gt hNR)]
+  simp only [hrw] at htail
+  exact htail
+
+/-- **Pointwise finite-sample concentration of the empirical CDF (DKW-at-a-point, step (c)).** For
+each fixed threshold `t`, every `N ≥ 1` and tolerance `ε ≥ 0`, the empirical CDF of the i.i.d. null
+draws deviates from the true null CDF by more than `ε` with probability at most `2·exp(−2·N·ε²)`.
+This is the Dvoretzky–Kiefer–Wolfowitz inequality evaluated at a single point, with the sharp
+Hoeffding exponent — the finite-sample rate underneath step (b)'s almost-sure limit. The
+*uniform-over-`t`* DKW–Massart bound (global constant `2`) is the research-grade strengthening still
+flagged out of scope. -/
+theorem empCDF_concentration (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (t : ℝ) {N : ℕ}
+    (hN : 1 ≤ N) {ε : ℝ} (hε : 0 ≤ ε) :
+    (nullSeqGaussian n).real {ω | ε ≤ |empCDF Λ V γ N t ω - cdf (noiseLaw Λ V γ) t|}
+      ≤ 2 * Real.exp (-2 * (N : ℝ) * ε ^ 2) := by
+  have hsplit : {ω | ε ≤ |empCDF Λ V γ N t ω - cdf (noiseLaw Λ V γ) t|}
+      = {ω | ε ≤ empCDF Λ V γ N t ω - cdf (noiseLaw Λ V γ) t}
+        ∪ {ω | ε ≤ cdf (noiseLaw Λ V γ) t - empCDF Λ V γ N t ω} := by
+    ext ω
+    simp only [Set.mem_setOf_eq, Set.mem_union, le_abs, neg_sub]
+  rw [hsplit]
+  refine (measureReal_union_le _ _).trans ?_
+  have h1 := empCDF_upper_tail Λ V γ t hN hε
+  have h2 := empCDF_lower_tail Λ V γ t hN hε
+  linarith
+
 end
 
 end Spec.Factorization
diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
index 0a98163..d24a9f5 100644
--- a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
+++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
@@ -442,7 +442,7 @@ standard-Gaussian draw, so every `nullNoise i` has the *same* law `noiseLaw` (`n
 `hident` — that the strong law of large numbers (`strong_law_ae_real`) and the Hoeffding tail consume.
 This scaffold is the only genuinely new measure-theory plumbing; the empirical-CDF consistency
 (Glivenko–Cantelli via the SLLN) and the per-`t` concentration rate `2 exp(-2 N ε²)` (Hoeffding) are
-applications of it.
+applications of it — both now proved.
 
 *Pointwise consistency of the empirical CDF (step b).* The first such application is now proved,
 `empCDF_tendsto_cdf`. Fix a threshold `t`. The threshold indicators
@@ -460,13 +460,31 @@ Glivenko–Cantelli theorem, sorry-free over Mathlib v4.30.0. The executable `Di
 exercise its computable shadow — the growing-prefix running mean `F̂_N(t)` settling toward the
 full-sample estimate of `cdf noiseLaw t`.
 
-*What is honestly left.* What stays genuinely research-grade is the *uniform* Glivenko–Cantelli
-(`sup_t |F̂_N - cdf| → 0`) and the full *DKW–Massart* inequality with its sharp constant `2` over
-the supremum — both need the bracketing / VC-class chaining Mathlib v4.30.0 lacks — and the
-*exchangeability rank rate* `k/(N+1)` for a fresh null draw, which needs a symmetric-group
-rank-distribution argument also absent. Those are stated as the open frontier, never stubbed with
-`sorry`. The finite-sample false-positive *bound* above is the exact, non-asymptotic statement the
-test actually guarantees, and the pointwise scaffold is the sorry-free bridge toward the asymptotic
+*Pointwise finite-sample concentration (step c).* Step (b)'s almost-sure limit gains a quantitative,
+finite-`N` companion: `empCDF_concentration`, the Dvoretzky–Kiefer–Wolfowitz inequality *at a single
+point*. The same threshold indicators are `[0,1]`-bounded, so — once centered at their mean
+`cdf (noiseLaw Λ V γ) t` — Hoeffding's lemma (`hasSubgaussianMGF_of_mem_Icc`) makes them sub-Gaussian
+with variance proxy `(1/2)² = 1/4` (`nullBelow_subgaussian`, and the mean-zero negated companion
+`nullBelow_neg_subgaussian` for the lower tail). Mathlib's Hoeffding bound for sums of independent
+sub-Gaussians (`HasSubgaussianMGF.measure_sum_ge_le_of_iIndepFun`), specialised through the
+normalized-average lemma `hoeffding_avg_ge` (where the substitution `ε ↦ N·ε` turns the proxy sum
+`N/4` into the sharp exponent), gives the one-sided tails `empCDF_upper_tail` / `empCDF_lower_tail`,
+`ℙ(±(F̂_N(t) - cdf noiseLaw t) ≥ ε) ≤ exp(-2 N ε²)`; a union bound (`measureReal_union_le`,
+`le_abs`) assembles the two-sided `ℙ(|F̂_N(t) - cdf noiseLaw t| ≥ ε) ≤ 2 exp(-2 N ε²)`. That is the
+DKW inequality at one point with the sharp Hoeffding exponent — sorry-free over Mathlib v4.30.0. The
+`Discovery` examples exercise the bound's two computable shadows: the tail *function* `2 exp(-2 N ε²)`
+(twice the one-sided tail, decreasing in `N` and `ε`, non-vacuous once `2 N ε² > ln 2`) and the
+observed prefix deviation it governs.
+
+*What is honestly left.* With the pointwise pair (b)–(c) proved, what stays genuinely research-grade
+is the *uniform* Glivenko–Cantelli (`sup_t |F̂_N - cdf| → 0`) and the full *DKW–Massart* inequality
+with its sharp constant `2` over the supremum — both need the bracketing / VC-class chaining Mathlib
+v4.30.0 lacks — together with the *quantile-transfer* step (d) (converting CDF concentration into
+convergence of the empirical 5%/95% percentiles to the true quantiles), and the *exchangeability rank
+rate* `k/(N+1)` for a fresh null draw, which needs a symmetric-group rank-distribution argument also
+absent. Those are stated as the open frontier, never stubbed with `sorry`. The finite-sample
+false-positive *bound* above is the exact, non-asymptotic statement the test actually guarantees, and
+the pointwise consistency-plus-concentration pair is the sorry-free bridge toward the asymptotic
 statement.
 
 # The a-posteriori residual certificate
@@ -682,9 +700,14 @@ identically-`noiseLaw`-distributed, `[0,1]`-valued, integrable sequence under
 `Measure.infinitePi nullGaussian`), `empCDF_tendsto_cdf` applies the strong law of large numbers to
 the bounded indicators `1{noiseᵢ ≤ t}` — whose mean is exactly `cdf noiseLaw t`
 (`integral_nullBelow_zero`) — to give almost-sure convergence `F̂_N(t) → cdf noiseLaw t` for every
-fixed `t`, the *pointwise* Glivenko–Cantelli theorem, sorry-free. What stays genuinely research-grade
+fixed `t`, the *pointwise* Glivenko–Cantelli theorem, sorry-free; and its finite-sample companion
+`empCDF_concentration` adds the per-`t` rate `ℙ(|F̂_N(t) - cdf noiseLaw t| ≥ ε) ≤ 2 exp(-2 N ε²)`,
+the DKW inequality at one point, from Hoeffding's lemma on the `[0,1]`-bounded indicators
+(`nullBelow_subgaussian`) and Mathlib's sub-Gaussian sum bound. What stays genuinely research-grade
 is the *uniform* Glivenko–Cantelli / DKW–Massart sharp
-constant (bracketing / VC chaining) and the exchangeability rank rate `k/(N+1)`
-(symmetric-group rank distribution) — both absent from `Mathlib.Probability` v4.30.0. One open item is
+constant over the supremum (bracketing / VC chaining), the *quantile-transfer* step that turns this
+CDF concentration into convergence of the empirical 5%/95% percentiles, and the exchangeability rank
+rate `k/(N+1)`
+(symmetric-group rank distribution) — all absent from `Mathlib.Probability` v4.30.0. One open item is
 a proof-only gap on a quantity CHD does not need to *run*; the other is the genuine statistical
 frontier, flagged rather than stubbed with `sorry`.

From bed4e142d17f99926a06e9dc47cf863201b88b2e Mon Sep 17 00:00:00 2001
From: Nicolas Rouquette <nfr@jpl.nasa.gov>
Date: Mon, 1 Jun 2026 08:20:36 -0700
Subject: [PATCH 22/22] Z_test asymptotic calibration step (d): quantile
 transfer (empirical percentile consistency)

Inverts steps (b)-(c)'s empirical-CDF convergence into consistency of the
empirical percentiles the Z_test chooser thresholds against, sorry-free over
Mathlib v4.30.0.

NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean:
- empCDF_mono: empirical CDF monotone in the threshold.
- StraddlesQuantile / IsLowerQuantile: the honest population straddle hypothesis
  (continuity + strict monotonicity through p at q) and the defining property of
  a lower empirical p-quantile.
- empCDF_eventually_straddle: the transfer engine -- pointwise consistency at
  q +/- eps makes the empirical CDF eventually straddle p.
- empQuantile_tendsto: any lower empirical p-quantile converges a.s. to q (sandwich
  pinned into [q-eps, q+eps], intersected over eps = 1/(m+1) via ae_all_iff).

NN/Examples/Factorization/Discovery.lean: matching #eval block -- lower p-quantile
reaches level p, monotone in p, in [0,1], median converges (<= 0.02 for N >= 3);
negative controls for non-vacuity and straddle-hypothesis sensitivity.

Docs: aggregate Factorization.lean bullet, Ch4 Verso blueprint, and CHDTorch plan
updated to mark the four-tier asymptotic-calibration plan complete, leaving only
the uniform DKW-Massart constant, the zLowFn/zHighFn triangular-array wiring, and
the exchangeability rank rate as research-grade.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 NN/Examples/Factorization.lean                |  11 ++
 NN/Examples/Factorization/Discovery.lean      |  74 +++++++++++
 .../Basic/FactorizationsZAsymptotic.lean      | 116 +++++++++++++++++-
 .../Ch4_Verification/Factorizations.lean      |  55 ++++++---
 4 files changed, 237 insertions(+), 19 deletions(-)

diff --git a/NN/Examples/Factorization.lean b/NN/Examples/Factorization.lean
index b4eb302..018b6c9 100644
--- a/NN/Examples/Factorization.lean
+++ b/NN/Examples/Factorization.lean
@@ -135,6 +135,17 @@ factorization misbehaves.
   full-sample estimate uniformly over thresholds, with a **negative control** that the tiniest
   prefixes (`N = 1, 2`) deviate by `0.5 > ε`, the honest weak-`N` regime where the `2·exp(−2Nε²)`
   bound is still near `2`.
+  A final **quantile-transfer** sub-block corroborates `empQuantile_tendsto` (step (d)): inverting the
+  CDF convergence into convergence of the empirical *percentiles* the chooser thresholds against.
+  Wherever the true CDF strictly straddles a level `p` at its quantile `q` (`StraddlesQuantile`), any
+  lower empirical `p`-quantile (`IsLowerQuantile`) converges almost surely to `q`. The limit is
+  noncomputable, so the `#eval`s use the **full-sample** quantile `q̂₂₀` as the stand-in for `q` and
+  the prefix quantile `q̂_N` as its shadow: the lower `p`-quantile reaches level `p` (`p ≤ F̂₂₀(q̂₂₀)`),
+  is monotone in `p` and lands in `[0,1]`, and the empirical median converges (`|q̂_N − q̂₂₀| ≤ 0.02`
+  for every prefix of `≥ 3` draws). Two **negative controls** keep it honest: the prefix median
+  genuinely moves with `N` (non-vacuous limit), and the convergence is hypothesis-sensitive — the
+  `5%`-tail quantile (flatter CDF, sparser straddle) deviates more at `N = 10` than the well-straddled
+  median, the empirical signature of `StraddlesQuantile` being a needed hypothesis.
 
 Both **positive** checks (a valid factorization reconstructs to `err ≈ 0`) and **negative controls**
 (the same metric reports a large error / `NaN` when a hypothesis is violated) are included, so a
diff --git a/NN/Examples/Factorization/Discovery.lean b/NN/Examples/Factorization/Discovery.lean
index 983747d..9dcd1e7 100644
--- a/NN/Examples/Factorization/Discovery.lean
+++ b/NN/Examples/Factorization/Discovery.lean
@@ -467,4 +467,78 @@ def maxDev (N : Nat) : Float :=
 #eval assertTrue "concentration needs N to grow: N = 1, 2 prefixes deviate by > ε = 0.3"
   ([1, 2].all (fun N => Spec.ltBool 0.3 (maxDev N)))
 
+/-! ### Quantile transfer (step d): consistency of the empirical percentiles
+
+`empQuantile_tendsto` inverts steps (b)–(c): wherever the true null CDF is continuous and strictly
+increasing through a level `p` at the quantile `q` (`StraddlesQuantile`), *any* lower empirical
+`p`-quantile (`IsLowerQuantile`: the CDF is `< p` to its left and `≥ p` to its right) converges almost
+surely to `q` as `N → ∞`. This is the honest consistency statement for the 5%/95% percentile
+thresholds the `Z_test` chooser uses. The limit `q` is noncomputable (a quantile of the law
+`noiseLaw`), so — exactly as for steps (b)/(c) — we exercise it through the **full-sample** quantile
+`q̂₂₀` standing in for `q`, and watch the prefix-`N` empirical quantile `q̂_N` settle toward it. -/
+
+/-- The first `N ≤ 20` null noises, as the candidate set for the prefix empirical quantile. -/
+def prefixNoises (N : Nat) : List Float :=
+  ((List.finRange 20).filter (fun j => decide (j.val < N))).map zNullNoises
+
+/-- The **lower empirical `p`-quantile** of the first `N` null draws: the smallest sampled noise `v`
+whose running empirical CDF `F̂_N(v)` reaches `p` (`min`-fold over the qualifying draws, falling back
+to `1`). This is the computable shadow of `IsLowerQuantile (empCDF … N) p` — `inf {t | p ≤ F̂_N t}`
+for the right-continuous step CDF — the object `empQuantile_tendsto` drives to the true quantile. -/
+def empQuantilePrefix (N : Nat) (p : Float) : Float :=
+  ((prefixNoises N).filter (fun v => Spec.leBool p (empCdfPrefix N v))).foldl min 1.0
+
+/-- The full-sample (`N = 20`) lower `p`-quantile — the computable stand-in for the true quantile `q`
+of `noiseLaw` that `empQuantile_tendsto` sends the prefix quantiles to. -/
+def empQuantile20 (p : Float) : Float := empQuantilePrefix 20 p
+
+/-- Deviation of the prefix-`N` lower `p`-quantile from the full-sample limit stand-in `q̂₂₀` — the
+computable proxy for `|q̂_N − q|` that `empQuantile_tendsto` drives to `0`. -/
+def quantileDev (N : Nat) (p : Float) : Float :=
+  Float.abs (empQuantilePrefix N p - empQuantile20 p)
+
+#eval IO.println s!"empirical median (p = 0.5) over growing prefixes: q̂_3 = {empQuantilePrefix 3 0.5}, \
+  q̂_5 = {empQuantilePrefix 5 0.5}, q̂_10 = {empQuantilePrefix 10 0.5}, q̂_15 = {empQuantilePrefix 15 0.5}, \
+  q̂_20 = {empQuantile20 0.5} (the limit stand-in q)"
+
+#eval IO.println s!"quantile triple at full sample (q̂₂₀): 5% = {empQuantile20 0.05}, 50% = {empQuantile20 0.5}, \
+  95% = {empQuantile20 0.95}; median dev at N=10 {quantileDev 10 0.5} vs 5%-tail dev at N=10 {quantileDev 10 0.05}"
+
+-- Positive — `IsLowerQuantile` right-property at the full sample: `p ≤ F̂₂₀(q̂₂₀)`. The lower
+-- `p`-quantile genuinely reaches level `p` (here with equality, `p ∈ {0.05, 0.5, 0.95}` being multiples
+-- of `1/20`) — the half of `IsLowerQuantile` feeding `empQuantile_tendsto`.
+#eval assertTrue "lower p-quantile reaches level p: p ≤ F̂₂₀(q̂₂₀) for p ∈ {0.05, 0.5, 0.95}"
+  ([0.05, 0.5, 0.95].all (fun p => Spec.leBool p (empCdf (empQuantile20 p))))
+
+-- Positive — the empirical quantile is monotone in the level `p` (order statistics are nondecreasing):
+-- `q̂₂₀(0.05) ≤ q̂₂₀(0.5) ≤ q̂₂₀(0.95)`, the quantile-function shadow of `monotone_cdf` inverted.
+#eval assertTrue "empirical quantile is monotone in p: q̂₂₀(5%) ≤ q̂₂₀(50%) ≤ q̂₂₀(95%)"
+  (Spec.leBool (empQuantile20 0.05) (empQuantile20 0.5)
+    && Spec.leBool (empQuantile20 0.5) (empQuantile20 0.95))
+
+-- Positive — every empirical quantile is a fraction in `[0,1]` (the percentiles live in the null
+-- support, `zLowFn_nonneg`/`_le_one` and friends).
+#eval assertTrue "empirical quantiles lie in [0,1] for p ∈ {0.05, 0.5, 0.95}"
+  ([0.05, 0.5, 0.95].all (fun p =>
+    Spec.leBool 0.0 (empQuantile20 p) && Spec.leBool (empQuantile20 p) 1.0))
+
+-- Positive — quantile transfer (consistency): the prefix-`N` empirical median settles toward the
+-- full-sample limit, within `0.02` for every prefix of `≥ 3` draws — the computable shadow of
+-- `empQuantile_tendsto` (almost-sure `q̂_N → q` at the strictly-straddled median).
+#eval assertTrue "empirical median converges: |q̂_N − q̂₂₀| ≤ 0.02 for every prefix of ≥ 3 draws"
+  ([3, 5, 10, 15, 20].all (fun N => Spec.leBool (quantileDev N 0.5) 0.02))
+
+-- Negative control — consistency is non-vacuous: the prefix median genuinely *moves* with `N` (some
+-- prefix differs from the full-sample limit), so `q̂_N → q̂₂₀` is a real limit being approached, not a
+-- value already constant at `N = 3`.
+#eval assertTrue "convergence is non-vacuous: some prefix median differs from the full-sample limit"
+  ([3, 5, 10, 15].any (fun N => !(empQuantilePrefix N 0.5 == empQuantile20 0.5)))
+
+-- Negative control — the convergence is hypothesis-sensitive: the lower `5%`-tail quantile (where the
+-- CDF is flatter and the straddle sparser, fewer draws to pin it) deviates *more* at `N = 10` than the
+-- median does — the empirical signature of `StraddlesQuantile` being a genuine, needed hypothesis, not
+-- automatic. A flat CDF region (no strict straddle) would defeat consistency entirely.
+#eval assertTrue "hypothesis-sensitive: the 5%-tail quantile deviates more at N=10 than the well-straddled median"
+  (Spec.ltBool (quantileDev 10 0.5) (quantileDev 10 0.05))
+
 end NN.Examples.Factorization.Discovery
diff --git a/NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean b/NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean
index 23ad95d..52ccb51 100644
--- a/NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean
+++ b/NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean
@@ -15,7 +15,7 @@ public import Mathlib.MeasureTheory.Integral.Bochner.Set
 public import Mathlib.Probability.Moments.SubGaussian
 
 /-!
-# CHD `Z_test`: asymptotic calibration — i.i.d. scaffold, empirical-CDF consistency and pointwise concentration (steps a–c)
+# CHD `Z_test`: asymptotic calibration — i.i.d. scaffold, empirical-CDF consistency, pointwise concentration and quantile transfer (steps a–d)
 
 [`FactorizationsZTest`](./FactorizationsZTest.lean) modelled a *single* `Z_test` null draw as
 `nullGaussian n` (the product of `n` standard normals on `Fin n → ℝ`) and proved the per-draw
@@ -70,6 +70,18 @@ DKW inequality *at a single point* `t`, with the sharp Hoeffding exponent. This
 companion of step (b)'s almost-sure limit. The *uniform-over-`t`* DKW–Massart bound with the global
 constant `2` (the genuine Dvoretzky–Kiefer–Wolfowitz theorem) is the research-grade strengthening
 still flagged out of scope, and the quantile-transfer step (d) remains.
+
+**Step (d) — quantile transfer (consistency of the empirical percentiles)** inverts steps (b)–(c):
+it carries CDF convergence over to convergence of the empirical *quantiles* — the 5%/95% percentiles
+the `Z_test` chooser thresholds against. Under the honest hypothesis that the true CDF is continuous
+and strictly increasing through the target level `p` at the quantile `q` (`StraddlesQuantile`), the
+classical sandwich — pointwise consistency (step (b)) at the two straddle points `q ∓ ε`
+(`empCDF_eventually_straddle`) pinning any lower empirical `p`-quantile (`IsLowerQuantile`) into
+`[q − ε, q + ε]`, intersected over `ε = 1/(m+1)` via `ae_all_iff` — gives `empQuantile_tendsto`:
+almost surely `empQ N → q` as `N → ∞`. This is stated for a generic lower empirical `p`-quantile; the
+concrete `zLowFn`/`zHighFn` order statistics instantiate it through the order-statistic count lemmas
+with the moving level `p_N = (⌊N/20⌋ + 1)/N → 1/20`, the remaining concrete (triangular-array) bridge.
+The *uniform* DKW–Massart sharp constant and the exchangeability rank rate stay research-grade.
 -/
 
 @[expose] public section
@@ -411,6 +423,108 @@ theorem empCDF_concentration (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ)
   have h2 := empCDF_lower_tail Λ V γ t hN hε
   linarith
 
+/-! ## Step (d): quantile transfer — consistency of the empirical percentiles
+
+Steps (b)–(c) control the empirical CDF at a *fixed* threshold. Step (d) *inverts* that: it transfers
+the convergence of `empCDF` to the convergence of the empirical *quantiles* — the 5%/95% percentiles
+`Z_low`/`Z_high` the `Z_test` chooser actually thresholds against. The honest hypothesis under which
+this works is that the true CDF is continuous and strictly increasing through the target level `p` at
+the quantile `q`, captured by `StraddlesQuantile`: the CDF sits strictly below `p` to the left of `q`
+and strictly above `p` to the right.
+
+The argument is the classical sandwich. For any tolerance `ε > 0`, the straddle gives
+`cdf (q − ε) < p < cdf (q + ε)`. Pointwise consistency (step (b)) at the two points `q ∓ ε` then says
+that almost surely, eventually `empCDF (q − ε) < p < empCDF (q + ε)`. Any *lower empirical
+`p`-quantile* `empQ` (CDF strictly below `p` to its left, at least `p` to its right —
+`IsLowerQuantile`) is therefore pinned into `[q − ε, q + ε]` once the sandwich holds. Letting `ε` run
+over `1/(m+1)` and intersecting the countably many almost-sure events (`ae_all_iff`) yields, almost
+surely, `empQ N → q` as `N → ∞`: **consistency of the empirical quantile**.
+
+This is stated for a *generic* lower empirical `p`-quantile `empQ`; the concrete percentile order
+statistics `zLowFn`/`zHighFn` instantiate it through the order-statistic count lemmas
+(`kthSmallestFn_strictBelow_count_le` / `kthSmallestFn_strictAbove_count_le`), with the index-driven
+level `p_N = (⌊N/20⌋ + 1)/N → 1/20` — a triangular-array (moving-level) refinement that is the
+remaining concrete bridge, while the *uniform* DKW–Massart sharp constant and the exchangeability rank
+rate stay research-grade and out of scope (flagged, never `sorry`'d). -/
+
+/-- **The empirical CDF is monotone in the threshold.** Raising `t` only enlarges `Iic t`, so each
+threshold indicator (hence their normalized sum) is nondecreasing — the empirical CDF behaves like a
+genuine distribution function in its argument. -/
+theorem empCDF_mono (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) (N : ℕ) (ω : ℕ → Fin n → ℝ) :
+    Monotone (fun t => empCDF Λ V γ N t ω) := by
+  intro t t' htt'
+  have hsum : ∑ i ∈ Finset.range N, nullBelow Λ V γ t i ω
+      ≤ ∑ i ∈ Finset.range N, nullBelow Λ V γ t' i ω :=
+    Finset.sum_le_sum fun i _ =>
+      Set.indicator_le_indicator_of_subset (Set.Iic_subset_Iic.mpr htt')
+        (fun _ => zero_le_one) (nullNoise Λ V γ i ω)
+  simp only [empCDF]
+  exact div_le_div_of_nonneg_right hsum (by positivity)
+
+/-- **Population `p`-quantile (continuous, strictly-increasing-through-`p` sense).** `q` straddles
+level `p` for the CDF `F` when `F` sits strictly below `p` just left of `q` and strictly above just
+right. This holds whenever `F` is continuous and strictly monotone at `q` with `F q = p` — the honest
+hypothesis the empirical quantile is consistent under. -/
+def StraddlesQuantile (F : ℝ → ℝ) (p q : ℝ) : Prop :=
+  ∀ ε : ℝ, 0 < ε → F (q - ε) < p ∧ p < F (q + ε)
+
+/-- **Lower empirical `p`-quantile.** `q` is a lower `p`-quantile of the distribution function `F`
+when `F` is strictly below `p` to the left of `q` and at least `p` to the right — exactly
+`inf {t | p ≤ F t}` for a right-continuous step CDF. The defining property the order-statistic
+percentiles satisfy. -/
+def IsLowerQuantile (F : ℝ → ℝ) (p q : ℝ) : Prop :=
+  (∀ t, t < q → F t < p) ∧ (∀ t, q < t → p ≤ F t)
+
+/-- **Quantile sandwich (the transfer engine).** If the true null CDF straddles level `p` strictly
+across `t₁ < t₂` (`cdf t₁ < p < cdf t₂`), then — by pointwise consistency (step (b)) at the two
+points — almost surely the empirical CDF eventually straddles `p` the same way:
+`empCDF N t₁ < p < empCDF N t₂` for all large `N`. -/
+theorem empCDF_eventually_straddle (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) {p t₁ t₂ : ℝ}
+    (h1 : cdf (noiseLaw Λ V γ) t₁ < p) (h2 : p < cdf (noiseLaw Λ V γ) t₂) :
+    ∀ᵐ ω ∂(nullSeqGaussian n), ∀ᶠ N in Filter.atTop,
+      empCDF Λ V γ N t₁ ω < p ∧ p < empCDF Λ V γ N t₂ ω := by
+  filter_upwards [empCDF_tendsto_cdf Λ V γ t₁, empCDF_tendsto_cdf Λ V γ t₂] with ω hω1 hω2
+  filter_upwards [hω1.eventually_lt_const h1, hω2.eventually_const_lt h2] with N hN1 hN2
+  exact ⟨hN1, hN2⟩
+
+/-- **Consistency of the empirical quantile (quantile transfer, step (d)).** Fix a target level `p`
+and a population quantile `q` straddled by the true null CDF. Then for *any* lower empirical
+`p`-quantile `empQ` of the empirical CDF (e.g. the percentile order statistics), almost surely
+`empQ N → q` as the number of null draws `N → ∞`. This is the honest consistency statement for the
+5%/95% thresholds the `Z_test` chooser uses, inverting steps (b)–(c)'s CDF convergence into quantile
+convergence wherever the CDF is continuous and strictly monotone at the quantile. -/
+theorem empQuantile_tendsto (Λ : Fin n → ℝ) (V : Fin n → Fin n → ℝ) (γ : ℝ) {p q : ℝ}
+    {empQ : ℕ → (ℕ → Fin n → ℝ) → ℝ}
+    (hstr : StraddlesQuantile (cdf (noiseLaw Λ V γ)) p q)
+    (hq : ∀ N ω, IsLowerQuantile (fun t => empCDF Λ V γ N t ω) p (empQ N ω)) :
+    ∀ᵐ ω ∂(nullSeqGaussian n),
+      Filter.Tendsto (fun N => empQ N ω) Filter.atTop (nhds q) := by
+  have key : ∀ m : ℕ, ∀ᵐ ω ∂(nullSeqGaussian n),
+      ∀ᶠ N in Filter.atTop, |empQ N ω - q| ≤ 1 / (m + 1 : ℝ) := by
+    intro m
+    have hε : (0 : ℝ) < 1 / (m + 1 : ℝ) := by positivity
+    obtain ⟨hlt, hgt⟩ := hstr _ hε
+    filter_upwards [empCDF_eventually_straddle Λ V γ hlt hgt] with ω hω
+    filter_upwards [hω] with N hN
+    obtain ⟨hN1, hN2⟩ := hN
+    obtain ⟨hqL, hqR⟩ := hq N ω
+    have hub : empQ N ω ≤ q + 1 / (m + 1 : ℝ) := by
+      by_contra hc
+      exact absurd (hqL _ (not_le.mp hc)) (not_lt.mpr hN2.le)
+    have hlb : q - 1 / (m + 1 : ℝ) ≤ empQ N ω := by
+      by_contra hc
+      exact absurd (hqR _ (not_le.mp hc)) (not_le.mpr hN1)
+    rw [abs_le]
+    constructor <;> linarith
+  filter_upwards [ae_all_iff.mpr key] with ω hω
+  rw [Metric.tendsto_atTop]
+  intro δ hδ
+  obtain ⟨m, hm⟩ := exists_nat_one_div_lt hδ
+  obtain ⟨N₀, hN₀⟩ := Filter.eventually_atTop.mp (hω m)
+  refine ⟨N₀, fun N hN => ?_⟩
+  rw [Real.dist_eq]
+  exact lt_of_le_of_lt (hN₀ N hN) hm
+
 end
 
 end Spec.Factorization
diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
index d24a9f5..d2bfa1a 100644
--- a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
+++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/Factorizations.lean
@@ -441,8 +441,9 @@ standard-Gaussian draw, so every `nullNoise i` has the *same* law `noiseLaw` (`n
 (`integrable_nullNoise`). That is exactly the i.i.d.-bounded-integrable triple — `hint`, `hindep`,
 `hident` — that the strong law of large numbers (`strong_law_ae_real`) and the Hoeffding tail consume.
 This scaffold is the only genuinely new measure-theory plumbing; the empirical-CDF consistency
-(Glivenko–Cantelli via the SLLN) and the per-`t` concentration rate `2 exp(-2 N ε²)` (Hoeffding) are
-applications of it — both now proved.
+(Glivenko–Cantelli via the SLLN), the per-`t` concentration rate `2 exp(-2 N ε²)` (Hoeffding), and the
+quantile transfer (consistency of the empirical 5%/95% percentiles) are applications of it — all three
+now proved.
 
 *Pointwise consistency of the empirical CDF (step b).* The first such application is now proved,
 `empCDF_tendsto_cdf`. Fix a threshold `t`. The threshold indicators
@@ -476,16 +477,31 @@ DKW inequality at one point with the sharp Hoeffding exponent — sorry-free ove
 (twice the one-sided tail, decreasing in `N` and `ε`, non-vacuous once `2 N ε² > ln 2`) and the
 observed prefix deviation it governs.
 
-*What is honestly left.* With the pointwise pair (b)–(c) proved, what stays genuinely research-grade
-is the *uniform* Glivenko–Cantelli (`sup_t |F̂_N - cdf| → 0`) and the full *DKW–Massart* inequality
-with its sharp constant `2` over the supremum — both need the bracketing / VC-class chaining Mathlib
-v4.30.0 lacks — together with the *quantile-transfer* step (d) (converting CDF concentration into
-convergence of the empirical 5%/95% percentiles to the true quantiles), and the *exchangeability rank
-rate* `k/(N+1)` for a fresh null draw, which needs a symmetric-group rank-distribution argument also
-absent. Those are stated as the open frontier, never stubbed with `sorry`. The finite-sample
-false-positive *bound* above is the exact, non-asymptotic statement the test actually guarantees, and
-the pointwise consistency-plus-concentration pair is the sorry-free bridge toward the asymptotic
-statement.
+*Quantile transfer (step d).* Steps (b)–(c) control the empirical CDF at a fixed threshold;
+`empQuantile_tendsto` *inverts* that into convergence of the empirical *percentiles* the `Z_test`
+chooser thresholds against. The honest hypothesis is `StraddlesQuantile`: the true CDF sits strictly
+below the level `p` just left of the quantile `q` and strictly above just right — exactly continuity
+plus strict monotonicity through `p` at `q`. The argument is the classical sandwich: for any tolerance
+`ε`, the straddle gives `cdf (q - ε) < p < cdf (q + ε)`, and pointwise consistency (step b) at the two
+points `q ∓ ε` makes the empirical CDF eventually straddle `p` the same way
+(`empCDF_eventually_straddle`), which pins any lower empirical `p`-quantile (`IsLowerQuantile`, with
+the monotone `empCDF_mono` as the step CDF) into `[q - ε, q + ε]`. Intersecting the countably many
+almost-sure events over `ε = 1/(m+1)` (`ae_all_iff`) yields, almost surely, `empQ N → q` as `N → ∞` —
+consistency of the empirical quantile, sorry-free over Mathlib v4.30.0. It is stated for a generic
+lower empirical `p`-quantile; the `Discovery` examples corroborate it via the full-sample quantile as
+the limit stand-in (the empirical median converges within `0.02` for prefixes of `≥ 3` draws, the
+`5%`-tail quantile visibly slower — the empirical signature of the straddle hypothesis mattering).
+
+*What is honestly left.* With the pointwise pair (b)–(c) and the quantile transfer (d) proved, what
+stays genuinely research-grade is the *uniform* Glivenko–Cantelli (`sup_t |F̂_N - cdf| → 0`) and the
+full *DKW–Massart* inequality with its sharp constant `2` over the supremum — both need the bracketing
+/ VC-class chaining Mathlib v4.30.0 lacks — together with the concrete *triangular-array* bridge
+wiring the order-statistic percentiles `zLowFn`/`zHighFn` into `empQuantile_tendsto` at the moving
+level `p_N = (⌊N/20⌋ + 1)/N → 1/20`, and the *exchangeability rank rate* `k/(N+1)` for a fresh null
+draw, which needs a symmetric-group rank-distribution argument also absent. Those are stated as the
+open frontier, never stubbed with `sorry`. The finite-sample false-positive *bound* above is the
+exact, non-asymptotic statement the test actually guarantees, and the consistency-concentration-
+quantile chain (b)–(d) is the sorry-free bridge toward the asymptotic statement.
 
 # The a-posteriori residual certificate
 
@@ -700,14 +716,17 @@ identically-`noiseLaw`-distributed, `[0,1]`-valued, integrable sequence under
 `Measure.infinitePi nullGaussian`), `empCDF_tendsto_cdf` applies the strong law of large numbers to
 the bounded indicators `1{noiseᵢ ≤ t}` — whose mean is exactly `cdf noiseLaw t`
 (`integral_nullBelow_zero`) — to give almost-sure convergence `F̂_N(t) → cdf noiseLaw t` for every
-fixed `t`, the *pointwise* Glivenko–Cantelli theorem, sorry-free; and its finite-sample companion
+fixed `t`, the *pointwise* Glivenko–Cantelli theorem, sorry-free; its finite-sample companion
 `empCDF_concentration` adds the per-`t` rate `ℙ(|F̂_N(t) - cdf noiseLaw t| ≥ ε) ≤ 2 exp(-2 N ε²)`,
 the DKW inequality at one point, from Hoeffding's lemma on the `[0,1]`-bounded indicators
-(`nullBelow_subgaussian`) and Mathlib's sub-Gaussian sum bound. What stays genuinely research-grade
-is the *uniform* Glivenko–Cantelli / DKW–Massart sharp
-constant over the supremum (bracketing / VC chaining), the *quantile-transfer* step that turns this
-CDF concentration into convergence of the empirical 5%/95% percentiles, and the exchangeability rank
-rate `k/(N+1)`
+(`nullBelow_subgaussian`) and Mathlib's sub-Gaussian sum bound; and `empQuantile_tendsto` *inverts*
+both into the quantile statement itself — wherever the true CDF strictly straddles a level `p` at its
+quantile `q` (`StraddlesQuantile`), the sandwich at `q ∓ ε` (`empCDF_eventually_straddle`) drives any
+lower empirical `p`-quantile to `q` almost surely, the honest consistency of the 5%/95% percentiles.
+What stays genuinely research-grade is the *uniform* Glivenko–Cantelli / DKW–Massart sharp
+constant over the supremum (bracketing / VC chaining), the concrete *triangular-array* bridge wiring
+the `zLowFn`/`zHighFn` order statistics into `empQuantile_tendsto` at the moving level
+`p_N = (⌊N/20⌋ + 1)/N → 1/20`, and the exchangeability rank rate `k/(N+1)`
 (symmetric-group rank distribution) — all absent from `Mathlib.Probability` v4.30.0. One open item is
 a proof-only gap on a quantity CHD does not need to *run*; the other is the genuine statistical
 frontier, flagged rather than stubbed with `sorry`.