diff --git a/NN/Examples/Factorization.lean b/NN/Examples/Factorization.lean new file mode 100644 index 0000000..261dce8 --- /dev/null +++ b/NN/Examples/Factorization.lean @@ -0,0 +1,19 @@ +/- +Copyright (c) 2026 TorchLean +Released under MIT license as described in the file LICENSE. +Authors: TorchLean Team +-/ + +module + +public import NN.Examples.Factorization.Common +public import NN.Examples.Factorization.Cholesky +public import NN.Examples.Factorization.QR + +/-! +# Matrix-factorization examples (Cholesky and QR) + +Executable `#eval` witnesses for the exact finite factorizations: Cholesky `A = L·Lᵀ` and QR `A = Q·R` +(with `Qᵀ·Q = I`). Each pairs a positive reconstruction/orthonormality check with a negative control, +over `Float`, sorry/admit-free. +-/ diff --git a/NN/Examples/Factorization/Cholesky.lean b/NN/Examples/Factorization/Cholesky.lean new file mode 100644 index 0000000..8c32676 --- /dev/null +++ b/NN/Examples/Factorization/Cholesky.lean @@ -0,0 +1,65 @@ +/- +Copyright (c) 2026 TorchLean +Released under MIT license as described in the file LICENSE. +Authors: TorchLean Team +-/ + +module + +public import NN.Examples.Factorization.Common +meta import NN.Examples.Factorization.Common + +/-! +# Example: Cholesky factorization + +`choleskySpec A` returns the lower-triangular `L` with `A = L · Lᵀ` for a symmetric +positive-definite `A`. Here we factor a 3×3 SPD matrix and check the reconstruction error. +-/ + +@[expose] public section + + +namespace NN.Examples.Factorization.Cholesky + +/-- A symmetric positive-definite test matrix. -/ +def A : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := + mkMat [[4, 2, 2], + [2, 5, 3], + [2, 3, 6]] + +/-- The Cholesky factor `L` (lower-triangular). -/ +def L : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := Spec.choleskySpec A + +/-- Reconstruction error `‖A - L·Lᵀ‖_max`. -/ +def reconErr : Float := maxMatErr A (mm L (tr L)) + +-- Inspect the diagonal of the factor. +#guard_msgs (drop info) in +#eval vecToList (Spec.ofVecFn (fun i : Fin 3 => Spec.get2 L i i)) + +-- Compiled assertion: the factorization reconstructs A (fails the build otherwise). +#guard_msgs (drop info) in +#eval assertLt "Cholesky A = L·Lᵀ" reconErr + +/-! ## Negative control: the positive-pivot hypothesis is necessary + +`isCholesky_of_pos` requires the executable pivots `L[j,j]` to be positive (`0 < choleskyFn A j j`), +which is exactly the success condition over the reals (SPD is the expected — but here unformalized — +sufficient condition for it). The matrix below is symmetric but *not* positive-definite (eigenvalues +`3` and `-1`), so a pivot is non-positive, the diagonal step takes `√(negative)`, and the reconstruction +is `NaN` — never a small error. This documents that the hypothesis genuinely bites. -/ + +/-- A symmetric but **indefinite** matrix (eigenvalues `{3, -1}`), outside Cholesky's domain. -/ +def Abad : Spec.Tensor Float (.dim 2 (.dim 2 .scalar)) := + mkMat [[1, 2], + [2, 1]] + +def Lbad : Spec.Tensor Float (.dim 2 (.dim 2 .scalar)) := Spec.choleskySpec Abad +-- Use the *summed* Frobenius error here, not `maxMatErr`: IEEE `max` ignores `NaN`, whereas the sum +-- propagates the `NaN` produced by `√(negative)`, faithfully reporting that no factor exists. +def reconErrBad : Float := frobSqErr Abad (mm Lbad (tr Lbad)) + +#guard_msgs (drop info) in +#eval assertReconFails "Cholesky on indefinite A correctly fails (no SPD ⇒ no factor)" reconErrBad + +end NN.Examples.Factorization.Cholesky diff --git a/NN/Examples/Factorization/Common.lean b/NN/Examples/Factorization/Common.lean new file mode 100644 index 0000000..dae8aeb --- /dev/null +++ b/NN/Examples/Factorization/Common.lean @@ -0,0 +1,96 @@ +/- +Copyright (c) 2026 TorchLean +Released under MIT license as described in the file LICENSE. +Authors: TorchLean Team +-/ + +module + +public import NN.Spec.Core.Tensor.Factorizations + +/-! +# Factorization examples — shared helpers + +Small `Float`-valued helpers used by the matrix-factorization examples (`Cholesky`, `QR`). These +examples are *executable sanity checks*: each one reconstructs the original matrix from its factors and +asserts (via `#eval`) that the maximum entrywise reconstruction error is below a tolerance, so the build +fails if a factorization is wrong. + +These run over `Float` (the executable 64-bit runtime scalar), which is the precision the +factorizations target for Gaussian-process / kernel-method use. +-/ + +@[expose] public section + + +namespace NN.Examples.Factorization + +/-- Build an `m × n` `Float` matrix tensor from a row-major nested list. Missing entries are `0`. -/ +def mkMat {m n : Nat} (rows : List (List Float)) : Spec.Tensor Float (.dim m (.dim n .scalar)) := + Spec.ofMatFn (fun i j => (rows.getD i.val []).getD j.val 0.0) + +/-- Maximum entrywise absolute difference between two `m × n` matrices. -/ +def maxMatErr {m n : Nat} (A B : Spec.Tensor Float (.dim m (.dim n .scalar))) : Float := + (List.finRange m).foldl (fun acc i => + (List.finRange n).foldl + (fun a j => max a (Float.abs (Spec.get2 A i j - Spec.get2 B i j))) acc) 0.0 + +/-- Matrix product `A · B` (thin wrapper over `matMulSpec`). -/ +def mm {m n p : Nat} (A : Spec.Tensor Float (.dim m (.dim n .scalar))) + (B : Spec.Tensor Float (.dim n (.dim p .scalar))) : Spec.Tensor Float (.dim m (.dim p .scalar)) := + Spec.matMulSpec A B + +/-- Matrix transpose. -/ +def tr {m n : Nat} (A : Spec.Tensor Float (.dim m (.dim n .scalar))) : + Spec.Tensor Float (.dim n (.dim m .scalar)) := + Spec.Tensor.matrixTransposeSpec A + +/-- Read a vector tensor back out as a `List Float` (for display). -/ +def vecToList {n : Nat} (v : Spec.Tensor Float (.dim n .scalar)) : List Float := + (List.finRange n).map (fun i => Spec.Tensor.toScalar (Spec.get v i)) + +/-- Squared Frobenius distance `Σ_{i,j} (A_ij - B_ij)²` between two `m × n` matrices. -/ +def frobSqErr {m n : Nat} (A B : Spec.Tensor Float (.dim m (.dim n .scalar))) : Float := + (List.finRange m).foldl (fun acc i => + (List.finRange n).foldl + (fun a j => let d := Spec.get2 A i j - Spec.get2 B i j; a + d * d) acc) 0.0 + +/-- Shared tolerance for reconstruction-error assertions. -/ +def tol : Float := 1e-6 + +/-- +Compiled **positive** assertion: print `name: OK (err)` when `err < tol`, otherwise raise an +`IO` error so the build/`#eval` fails. Running this through `#eval` evaluates with the compiler +(fast), unlike `#guard`, which forces slow kernel reduction of the whole factorization. +-/ +def assertLt (name : String) (err : Float) (tolerance : Float := tol) : IO Unit := + if err < tolerance then + IO.println s!"{name}: OK (err = {err})" + else + throw (IO.userError s!"{name}: FAIL (err = {err} ≥ tol = {tolerance})") + +/-- +Compiled **negative-control** assertion: succeeds only when `err ≥ threshold`, i.e. when a property +that *should not* hold is correctly detected as violated. Gives the metric teeth — a reviewer can see +the same `maxMatErr`/residual that reports `0` on a valid factorization reports a large value on an +invalid one, so the positive checks are not vacuous. +-/ +def assertGe (name : String) (err : Float) (threshold : Float := 0.5) : IO Unit := + if err ≥ threshold then + IO.println s!"{name}: OK (correctly rejected, err = {err} ≥ {threshold})" + else + throw (IO.userError s!"{name}: FAIL (err = {err} < {threshold}; expected the property to fail)") + +/-- +Compiled **negative-control** assertion that a reconstruction *fails*: succeeds when the error is not +below `tol` — including the `NaN` produced when a hypothesis is violated (e.g. Cholesky of a +non-positive-definite matrix takes `√(negative)`). Documents that the success hypotheses (positive +Cholesky pivots, positive `R` pivots / full column rank) are genuinely necessary. +-/ +def assertReconFails (name : String) (err : Float) (tolerance : Float := tol) : IO Unit := + if err < tolerance then + throw (IO.userError s!"{name}: FAIL (unexpectedly reconstructed, err = {err} < {tolerance})") + else + IO.println s!"{name}: OK (correctly failed, err = {err})" + +end NN.Examples.Factorization diff --git a/NN/Examples/Factorization/QR.lean b/NN/Examples/Factorization/QR.lean new file mode 100644 index 0000000..a762f1e --- /dev/null +++ b/NN/Examples/Factorization/QR.lean @@ -0,0 +1,73 @@ +/- +Copyright (c) 2026 TorchLean +Released under MIT license as described in the file LICENSE. +Authors: TorchLean Team +-/ + +module + +public import NN.Examples.Factorization.Common +meta import NN.Examples.Factorization.Common + +/-! +# Example: QR factorization + +`qrSpec A` returns `(Q, R)` with `A = Q · R`, `Q` having orthonormal columns and `R` +upper-triangular (classical Gram–Schmidt). We check both `A = Q·R` and `Qᵀ·Q = I`. +-/ + +@[expose] public section + + +namespace NN.Examples.Factorization.QR + +/-- A 3×3 test matrix (the classic Householder/QR example). -/ +def A : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := + mkMat [[12, -51, 4], + [6, 167, -68], + [-4, 24, -41]] + +/-- Orthonormal `Q` factor. -/ +def Q : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := Spec.qrQSpec A +/-- Upper-triangular `R` factor. -/ +def R : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := Spec.qrRSpec A + +/-- Reconstruction error `‖A - Q·R‖_max`. -/ +def reconErr : Float := maxMatErr A (mm Q R) +/-- Orthonormality error `‖Qᵀ·Q - I‖_max`. -/ +def orthoErr : Float := maxMatErr (mm (tr Q) Q) (Spec.identityTensorSpec 3) + +-- Compiled assertions (fail the build otherwise). +#guard_msgs (drop info) in +#eval assertLt "QR A = Q·R" reconErr +#guard_msgs (drop info) in +#eval assertLt "QR Qᵀ·Q = I" orthoErr + +/-! ## Negative control: full column rank is necessary for orthonormality + +`qrSpec_orthonormal` (`Qᵀ Q = 1`) requires full column rank — positive `R`-pivots +(`0 < R[j,j]`). The matrix below has a dependent column (`col₂ = 2·col₁`), so Gram–Schmidt produces a +**zero** `Q` column where the pivot vanishes: `A = Q·R` still holds, but `Qᵀ Q` has a `0` on the +diagonal, so orthonormality fails. This separates the two guarantees and shows the rank hypothesis +genuinely bites. -/ + +/-- A rank-2 matrix (`col₂ = 2·col₁`): reconstructs, but `Q` cannot be orthonormal. -/ +def Adef : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := + mkMat [[1, 2, 0], + [2, 4, 1], + [1, 2, 0]] + +def Qdef : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := Spec.qrQSpec Adef +def Rdef : Spec.Tensor Float (.dim 3 (.dim 3 .scalar)) := Spec.qrRSpec Adef + +/-- Reconstruction still holds even without full rank. -/ +def reconErrDef : Float := maxMatErr Adef (mm Qdef Rdef) +/-- Orthonormality fails: `Qᵀ·Q` has a zero diagonal entry, so it is far from `I`. -/ +def orthoErrDef : Float := maxMatErr (mm (tr Qdef) Qdef) (Spec.identityTensorSpec 3) + +#guard_msgs (drop info) in +#eval assertLt "QR(rank-deficient) A = Q·R still reconstructs" reconErrDef +#guard_msgs (drop info) in +#eval assertGe "QR(rank-deficient) Qᵀ·Q = I correctly fails (needs full column rank)" orthoErrDef + +end NN.Examples.Factorization.QR diff --git a/NN/Proofs/Tensor/Basic.lean b/NN/Proofs/Tensor/Basic.lean index f9a7b2f..fb2f877 100644 --- a/NN/Proofs/Tensor/Basic.lean +++ b/NN/Proofs/Tensor/Basic.lean @@ -9,6 +9,9 @@ module public import NN.Proofs.Tensor.Basic.Core public import NN.Proofs.Tensor.Basic.Folds public import NN.Proofs.Tensor.Basic.LinearAlgebra +public import NN.Proofs.Tensor.Basic.Factorizations +public import NN.Proofs.Tensor.Basic.FactorizationsReconstruction +public import NN.Proofs.Tensor.Basic.FactorizationsOrthonormal public import NN.Proofs.Tensor.Basic.BoundsNorms public import NN.Proofs.Tensor.Basic.Algebra diff --git a/NN/Proofs/Tensor/Basic/Factorizations.lean b/NN/Proofs/Tensor/Basic/Factorizations.lean new file mode 100644 index 0000000..1453f95 --- /dev/null +++ b/NN/Proofs/Tensor/Basic/Factorizations.lean @@ -0,0 +1,187 @@ +/- +Copyright (c) 2026 TorchLean +Released under MIT license as described in the file LICENSE. +Authors: TorchLean Team +-/ + +module + +public import NN.Spec.Core.Tensor.Factorizations +public import NN.Proofs.Tensor.Basic.LinearAlgebra +public import Mathlib.Data.List.GetD + +/-! +# Correctness of the exact matrix factorizations (Cholesky and QR) + +This file sets up the **formal correctness layer** for the two exact, finite spec-layer factorizations +in [`NN.Spec.Core.Tensor.Factorizations`](../../../Spec/Core/Tensor/Factorizations.lean) +(`choleskySpec`, `qrSpec`): the factorization predicates over real matrices, and the first structural +theorem about the executable Cholesky factor. + +## Architecture (refinement) + +* **Specifications** (`IsCholesky`, `IsQR`) are `Prop`s on Mathlib `Matrix (Fin n) (Fin n) ℝ`. + Mathlib's `Matrix m n α` is *definitionally* `m → n → α`, so the function representation + `Spec.toMatFn` produced by the executable specs bridges for free. +* **Fold-indexing lemmas** (`length_foldl_snoc`, `getD_foldl_snoc_lt`, `getD_foldl_snoc_read`, + `getD_foldl_finRange`) read off the column produced at a given position of the left fold that both + `choleskyColsFn` and `gramSchmidtFn` use to build their output, bridging the executable `List.foldl` + form to per-entry reasoning. They live here once and are reused by `FactorizationsReconstruction`. +* **Structural theorem.** The executable Cholesky factor is lower-triangular + (`choleskyFn_lower_triangular`, lifted to the tensor level as `choleskySpec_lower_triangular`), + proved directly from the column fold — the above-diagonal entry is forced to `0` by construction. + +## Scope + +This file proves only the predicates and the lower-triangularity fact. The exact algebraic +reconstructions — `A = L · Lᵀ` for Cholesky (under positive pivots) and `A = Q · R` with `Qᵀ Q = 1` +for Gram–Schmidt (under full column rank) — are proved in the companion files +[`FactorizationsReconstruction`](FactorizationsReconstruction.lean) and +[`FactorizationsOrthonormal`](FactorizationsOrthonormal.lean). Everything here is an exact identity +over `ℝ`; the only hypotheses are the genuine success conditions of the algorithms. +-/ + +@[expose] public section + +namespace Spec.Factorization + +open Matrix +open scoped BigOperators + +variable {n : Nat} + +/-! ## Specifications + +The mathematical meaning of each factorization, as a predicate over real matrices. Over `ℝ`, +`star = id` so `conjTranspose = transpose`; we phrase everything with `ᵀ`. +-/ + +/-- `L` is a Cholesky factor of `A`: lower-triangular with `A = L · Lᵀ`. -/ +def IsCholesky (A L : Matrix (Fin n) (Fin n) ℝ) : Prop := + (∀ i j, i < j → L i j = 0) ∧ A = L * Lᵀ + +/-- `(Q, R)` is a QR factorization of `A`: `Q` has orthonormal columns, `R` is upper-triangular, +`A = Q · R`. -/ +def IsQR {m k : Nat} (A Q : Matrix (Fin m) (Fin k) ℝ) (R : Matrix (Fin k) (Fin k) ℝ) : Prop := + Qᵀ * Q = 1 ∧ (∀ i j, j < i → R i j = 0) ∧ A = Q * R + + +/-! ### Fold-indexing for the column-building specs + +`choleskyColsFn` and `gramSchmidtFn` build their output with a left fold that appends one column per +index. The lemmas here read off the column produced at a given position, bridging the executable +`List.foldl` form to per-entry reasoning. They are generic over the appended-value function `g` and +are the single home for these snoc-fold read lemmas — `FactorizationsReconstruction` reuses them. -/ + +section FoldSnoc + +variable {β : Type _} {ι : Type _} + +/-- A left fold that appends one element per input grows the accumulator by `l.length`. -/ +theorem length_foldl_snoc (g : List β → ι → β) (l : List ι) (acc : List β) : + (l.foldl (fun s a => s ++ [g s a]) acc).length = acc.length + l.length := by + induction l generalizing acc with + | nil => simp + | cons a t ih => + rw [List.foldl_cons, ih] + simp only [List.length_append, List.length_cons, List.length_nil] + grind + +/-- A fold that only appends never changes an index already inside the accumulator. -/ +theorem getD_foldl_snoc_lt (g : List β → ι → β) (d : β) (l : List ι) (acc : List β) + (k : Nat) (hk : k < acc.length) : + (l.foldl (fun s a => s ++ [g s a]) acc).getD k d = acc.getD k d := by + induction l generalizing acc with + | nil => simp + | cons a t ih => + rw [List.foldl_cons, + ih (acc ++ [g acc a]) (by rw [List.length_append]; grind), + List.getD_append _ _ _ _ hk] + +/-- The element at position `k` of the snoc-fold over an arbitrary list `l` is `g` applied to the +fold of the length-`k` prefix and the `k`-th element. -/ +theorem getD_foldl_snoc_read (g : List β → ι → β) (d : β) (l : List ι) (k : Nat) + (hk : k < l.length) : + (l.foldl (fun s a => s ++ [g s a]) []).getD k d + = g ((l.take k).foldl (fun s a => s ++ [g s a]) []) (l[k]'hk) := by + have htake : l.take (k + 1) = l.take k ++ [l[k]'hk] := List.take_succ_eq_append_getElem hk + have hplen : ((l.take k).foldl (fun s a => s ++ [g s a]) []).length = k := by + rw [length_foldl_snoc, List.length_nil, List.length_take, Nat.zero_add, + Nat.min_eq_left (le_of_lt hk)] + calc + (l.foldl (fun s a => s ++ [g s a]) []).getD k d + = ((l.drop (k + 1)).foldl (fun s a => s ++ [g s a]) + ((l.take (k + 1)).foldl (fun s a => s ++ [g s a]) [])).getD k d := by + conv_lhs => rw [show l = l.take (k + 1) ++ l.drop (k + 1) from + (List.take_append_drop _ _).symm] + rw [List.foldl_append] + _ = ((l.take (k + 1)).foldl (fun s a => s ++ [g s a]) []).getD k d := by + apply getD_foldl_snoc_lt + rw [length_foldl_snoc, List.length_nil, List.length_take, Nat.zero_add] + grind + _ = g ((l.take k).foldl (fun s a => s ++ [g s a]) []) (l[k]'hk) := by + rw [htake, List.foldl_append, List.foldl_cons, List.foldl_nil] + rw [List.getD_append_right _ _ _ _ (le_of_eq hplen), hplen, Nat.sub_self] + rfl + +/-- The element at position `j` of the snoc-fold over `finRange n` is `g` applied to the fold of the +length-`j` prefix and the index `j`. -/ +theorem getD_foldl_finRange (g : List β → Fin n → β) (d : β) (j : Fin n) : + ((List.finRange n).foldl (fun s a => s ++ [g s a]) []).getD j.val d + = g (((List.finRange n).take j.val).foldl (fun s a => s ++ [g s a]) []) j := by + have hjlen : j.val < (List.finRange n).length := by + rw [List.length_finRange]; exact j.isLt + have htake : (List.finRange n).take (j.val + 1) + = (List.finRange n).take j.val ++ [j] := by + rw [List.take_succ_eq_append_getElem hjlen] + congr 1 + simp [List.getElem_finRange] + have hplen : (((List.finRange n).take j.val).foldl (fun s a => s ++ [g s a]) []).length + = j.val := by + rw [length_foldl_snoc, List.length_nil, List.length_take, List.length_finRange, Nat.zero_add, + Nat.min_eq_left (Nat.le_of_lt j.isLt)] + calc + ((List.finRange n).foldl (fun s a => s ++ [g s a]) []).getD j.val d + = (((List.finRange n).drop (j.val + 1)).foldl (fun s a => s ++ [g s a]) + ((List.finRange n).take (j.val + 1) |>.foldl (fun s a => s ++ [g s a]) [])).getD + j.val d := by + conv_lhs => rw [show List.finRange n + = (List.finRange n).take (j.val + 1) ++ (List.finRange n).drop (j.val + 1) from + (List.take_append_drop _ _).symm] + rw [List.foldl_append] + _ = ((List.finRange n).take (j.val + 1) |>.foldl (fun s a => s ++ [g s a]) []).getD j.val d := by + apply getD_foldl_snoc_lt + rw [length_foldl_snoc, List.length_nil, List.length_take, List.length_finRange, + Nat.zero_add] + grind + _ = g (((List.finRange n).take j.val).foldl (fun s a => s ++ [g s a]) []) j := by + rw [htake, List.foldl_append, List.foldl_cons, List.foldl_nil] + rw [List.getD_append_right _ _ _ _ (le_of_eq hplen), hplen, Nat.sub_self] + rfl + +end FoldSnoc + +/-! ### Cholesky factor is lower-triangular + +A structural fact about the executable `choleskyFn`, proved directly from the column fold: the entry +above the diagonal is forced to `0` by the construction. -/ + +/-- Reading an entry of a matrix tensor built by `ofMatFn` returns the underlying function value. -/ +theorem get2_ofMatFn {m k : Nat} (f : Fin m → Fin k → ℝ) (i : Fin m) (j : Fin k) : + Spec.get2 (Spec.ofMatFn f) i j = f i j := rfl + +/-- The executable Cholesky factor is lower-triangular: entries strictly above the diagonal vanish. -/ +theorem choleskyFn_lower_triangular (A : Fin n → Fin n → ℝ) {i j : Fin n} (hij : i.val < j.val) : + Spec.choleskyFn A i j = 0 := by + unfold Spec.choleskyFn Spec.choleskyColsFn + rw [getD_foldl_finRange] + rw [if_pos hij] + +/-- Tensor-level statement: the Cholesky factor `choleskySpec A` is lower-triangular. -/ +theorem choleskySpec_lower_triangular (A : Spec.Tensor ℝ (.dim n (.dim n .scalar))) + {i j : Fin n} (hij : i.val < j.val) : + Spec.get2 (Spec.choleskySpec A) i j = 0 := by + rw [show Spec.choleskySpec A = Spec.ofMatFn (Spec.choleskyFn (Spec.toMatFn A)) from rfl, + get2_ofMatFn] + exact choleskyFn_lower_triangular _ hij +end Spec.Factorization diff --git a/NN/Proofs/Tensor/Basic/FactorizationsOrthonormal.lean b/NN/Proofs/Tensor/Basic/FactorizationsOrthonormal.lean new file mode 100644 index 0000000..42eb826 --- /dev/null +++ b/NN/Proofs/Tensor/Basic/FactorizationsOrthonormal.lean @@ -0,0 +1,252 @@ +/- +Copyright (c) 2026 TorchLean +Released under MIT license as described in the file LICENSE. +Authors: TorchLean Team +-/ + +module + +public import NN.Proofs.Tensor.Basic.FactorizationsReconstruction +public import Mathlib.Analysis.InnerProductSpace.GramSchmidtOrtho +public import Mathlib.Analysis.InnerProductSpace.PiL2 + +/-! +# Orthonormality of the executable Gram–Schmidt `Q` factor (`Qᵀ Q = 1`) + +This file closes the one finite-fold property left open by +[`NN.Proofs.Tensor.Basic.FactorizationsReconstruction`](FactorizationsReconstruction.lean): the +orthonormality of the `Q` factor produced by the executable classical Gram–Schmidt `gramSchmidtFn`. + +The strategy is to **unify the executable variant with Mathlib's `gramSchmidt`** rather than re-derive +the orthogonality induction by hand. Reading the columns of `A` as vectors of +`EuclideanSpace ℝ (Fin m)`, the `j`-th executable `Q` column equals Mathlib's `gramSchmidtNormed ℝ` +of the column map (`Qcol_bridge`), so the orthonormality follows from Mathlib's +`gramSchmidtNormed_orthonormal'`. + +## Main results + +* `Qcol_bridge`: `WithLp.toLp 2 (Qcol A k) = gramSchmidtNormed ℝ (gsCol A) k` — the executable `Q` + column is Mathlib's normalized Gram–Schmidt vector, proved by strong induction on `k`. +* `Q_orthonormal`: `dotFn (Qcol A a) (Qcol A b) = if a = b then 1 else 0` under positive `R` pivots. +* `QT_mul_Q_eq_one` and `isQR_of_pos`: the matrix-level `Qᵀ Q = 1` and the full + `Spec.Factorization.IsQR` predicate for the executable factors (combining with the reconstruction + `A = Q · R` and `R` upper-triangular from the companion file). +* `qrSpec_orthonormal`: the tensor-level corollary. + +## Method + +The bridge rests on three connectors over `ℝ`: `dotFn = ⟪·,·⟫` and `normFn = ‖·‖` on +`EuclideanSpace ℝ (Fin m)`, and the projection identity `proj_normalize` showing the un-normalized +Gram–Schmidt projection term equals the normalized one. The strong induction feeds the partial +identification of the earlier `Q` columns into `gramSchmidt_def''`, term by term. +-/ + +@[expose] public section + +namespace Spec.Factorization.Reconstruction + +open Matrix +open scoped BigOperators RealInnerProductSpace +open InnerProductSpace + +/-! ## Connectors between the executable scalar ops and the Euclidean inner product -/ + +/-- `dotFn` as a `Finset` sum. -/ +theorem dotFn_eq_sum {p : Nat} (u v : Fin p → ℝ) : Spec.dotFn u v = ∑ i, u i * v i := by + unfold Spec.dotFn + rw [foldl_addf_eq_sum (fun i => u i * v i) (List.finRange p) 0, zero_add, + ← finsum_eq_finRange_sum (fun i => u i * v i)] + +/-- The executable dot product is the Euclidean inner product over `ℝ`. -/ +theorem dotFn_eq_inner {p : Nat} (u v : Fin p → ℝ) : + Spec.dotFn u v + = ⟪(WithLp.toLp 2 u : EuclideanSpace ℝ (Fin p)), WithLp.toLp 2 v⟫_ℝ := by + rw [dotFn_eq_sum, PiLp.inner_apply] + apply Finset.sum_congr rfl + intro i _ + rw [RCLike.inner_apply', PiLp.toLp_apply, PiLp.toLp_apply] + simp + +/-- The executable Euclidean norm is the `EuclideanSpace` norm over `ℝ`. -/ +theorem normFn_eq_norm {p : Nat} (v : Fin p → ℝ) : + Spec.normFn v = ‖(WithLp.toLp 2 v : EuclideanSpace ℝ (Fin p))‖ := by + rw [Spec.normFn, mfsqrt_eq, EuclideanSpace.norm_eq] + congr 1 + rw [dotFn_eq_sum] + apply Finset.sum_congr rfl + intro i _ + rw [PiLp.toLp_apply, Real.norm_eq_abs, sq_abs, sq] + +/-- The Gram–Schmidt projection term, with the normalized vector pulled out. Holds with no +non-degeneracy hypothesis (both sides vanish when `gramSchmidt = 0`). -/ +theorem proj_normalize {F : Type*} [NormedAddCommGroup F] [InnerProductSpace ℝ F] (w x : F) : + (⟪w, x⟫_ℝ / ‖w‖ ^ 2) • w = ⟪‖w‖⁻¹ • w, x⟫_ℝ • (‖w‖⁻¹ • w) := by + rw [real_inner_smul_left, smul_smul] + congr 1 + rw [div_eq_mul_inv, ← inv_pow, sq] + ring + +/-- `gramSchmidtNormed` over `ℝ`, with the scalar coercion removed. -/ +theorem gn_eq {n : Nat} {F : Type*} [NormedAddCommGroup F] [InnerProductSpace ℝ F] + (f : Fin n → F) (i : Fin n) : + gramSchmidtNormed ℝ f i = ‖gramSchmidt ℝ f i‖⁻¹ • gramSchmidt ℝ f i := by + rw [gramSchmidtNormed] + norm_num + +/-- A masked full sum equals the sum over `Iio`. -/ +theorem sum_Iio_eq_mask {n : Nat} (k : Fin n) (h : Fin n → ℝ) : + ∑ i ∈ Finset.Iio k, h i = ∑ i, if i.val < k.val then h i else 0 := by + rw [← Finset.sum_filter] + congr 1 + ext i + simp only [Finset.mem_Iio, Finset.mem_filter, Finset.mem_univ, true_and, Fin.lt_def] + +/-! ## The bridge to Mathlib's `gramSchmidt` -/ + +section QR + +variable {m n : Nat} + +/-- Column `j` of `A` as a vector of `EuclideanSpace ℝ (Fin m)`. -/ +noncomputable def gsCol (A : Fin m → Fin n → ℝ) (j : Fin n) : EuclideanSpace ℝ (Fin m) := + WithLp.toLp 2 (gsA A j) + +/-- `gsCol A k` reads as the executable column `gsA A k`. -/ +theorem gsCol_apply (A : Fin m → Fin n → ℝ) (k : Fin n) (r : Fin m) : + gsCol A k r = gsA A k r := rfl + +/-- **Orthogonalized-vector bridge.** Given that the earlier `Q` columns coincide with Mathlib's +normalized Gram–Schmidt vectors, the executable orthogonalized vector `v` at index `k` equals +Mathlib's (un-normalized) `gramSchmidt` vector. -/ +theorem gsV_bridge (A : Fin m → Fin n → ℝ) (k : Fin n) + (ih : ∀ i : Fin n, i.val < k.val → + (WithLp.toLp 2 (Qcol A i) : EuclideanSpace ℝ (Fin m)) = gramSchmidtNormed ℝ (gsCol A) i) : + gramSchmidt ℝ (gsCol A) k = WithLp.toLp 2 (gsV A (qsPrefix A k) k) := by + -- Rewrite Mathlib's vector via the explicit recurrence. + rw [show gramSchmidt ℝ (gsCol A) k + = gsCol A k - ∑ i ∈ Finset.Iio k, + (⟪gramSchmidt ℝ (gsCol A) i, gsCol A k⟫_ℝ / ‖gramSchmidt ℝ (gsCol A) i‖ ^ 2) + • gramSchmidt ℝ (gsCol A) i + from eq_sub_of_add_eq (gramSchmidt_def'' ℝ (gsCol A) k).symm] + -- Replace each projection term by the normalized form, then by the executable `Q` column. + have hproj : ∀ i ∈ Finset.Iio k, + (⟪gramSchmidt ℝ (gsCol A) i, gsCol A k⟫_ℝ / ‖gramSchmidt ℝ (gsCol A) i‖ ^ 2) + • gramSchmidt ℝ (gsCol A) i + = ⟪(WithLp.toLp 2 (Qcol A i) : EuclideanSpace ℝ (Fin m)), gsCol A k⟫_ℝ + • (WithLp.toLp 2 (Qcol A i) : EuclideanSpace ℝ (Fin m)) := by + intro i hi + have hik : i < k := Finset.mem_Iio.mp hi + rw [proj_normalize (gramSchmidt ℝ (gsCol A) i) (gsCol A k), ← gn_eq, ih i hik] + rw [Finset.sum_congr rfl hproj] + -- Compare entrywise. + ext r + rw [PiLp.sub_apply] + show gsCol A k r - _ = gsV A (qsPrefix A k) k r + rw [gsV_eq, gsCol_apply] + congr 1 + -- The Euclidean `Iio` sum, applied at `r`, equals the executable list projection sum. + rw [WithLp.ofLp_sum, Finset.sum_apply] + rw [show ∑ i ∈ Finset.Iio k, + (WithLp.ofLp (⟪(WithLp.toLp 2 (Qcol A i) : EuclideanSpace ℝ (Fin m)), gsCol A k⟫_ℝ + • (WithLp.toLp 2 (Qcol A i) : EuclideanSpace ℝ (Fin m)))) r + = ∑ i ∈ Finset.Iio k, Spec.dotFn (Qcol A i) (gsA A k) * Qcol A i r from by + apply Finset.sum_congr rfl + intro i _ + rw [show WithLp.ofLp (⟪(WithLp.toLp 2 (Qcol A i) : EuclideanSpace ℝ (Fin m)), gsCol A k⟫_ℝ + • (WithLp.toLp 2 (Qcol A i) : EuclideanSpace ℝ (Fin m))) r + = ⟪(WithLp.toLp 2 (Qcol A i) : EuclideanSpace ℝ (Fin m)), gsCol A k⟫_ℝ • Qcol A i r + from rfl, smul_eq_mul, gsCol, ← dotFn_eq_inner]] + rw [sum_Iio_eq_mask, qsPrefix_eq_map, List.map_map, take_map_sum_eq] + rfl + +/-- **Normalized-column bridge.** The executable `Q` column at index `k` equals Mathlib's +`gramSchmidtNormed`. Proved by strong induction on `k`, under positive `R` pivots (full column rank). -/ +theorem Qcol_bridge (A : Fin m → Fin n → ℝ) (hrank : ∀ j : Fin n, 0 < Rmat A j j) : + ∀ k : Fin n, + (WithLp.toLp 2 (Qcol A k) : EuclideanSpace ℝ (Fin m)) = gramSchmidtNormed ℝ (gsCol A) k := by + have main : ∀ N : Nat, ∀ k : Fin n, k.val = N → + (WithLp.toLp 2 (Qcol A k) : EuclideanSpace ℝ (Fin m)) = gramSchmidtNormed ℝ (gsCol A) k := by + intro N + induction N using Nat.strong_induction_on with + | _ N ih => + intro k hk + have IH : ∀ i : Fin n, i.val < k.val → + (WithLp.toLp 2 (Qcol A i) : EuclideanSpace ℝ (Fin m)) = gramSchmidtNormed ℝ (gsCol A) i := + fun i hi => ih i.val (hk ▸ hi) i rfl + have hρpos : 0 < gsRjj A (qsPrefix A k) k := by + have h := hrank k; rwa [Rmat_eq, rStep_diag] at h + have hgsV := gsV_bridge A k IH + rw [gn_eq, hgsV] + ext r + rw [PiLp.smul_apply, PiLp.toLp_apply, PiLp.toLp_apply, smul_eq_mul] + show Qcol A k r = _ + rw [show Qcol A k r = Qmat A r k from rfl, Qmat_eq, qStep_pos A (qsPrefix A k) k hρpos, + ← normFn_eq_norm] + show gsV A (qsPrefix A k) k r / gsRjj A (qsPrefix A k) k + = (Spec.normFn (gsV A (qsPrefix A k) k))⁻¹ * gsV A (qsPrefix A k) k r + rw [show Spec.normFn (gsV A (qsPrefix A k) k) = gsRjj A (qsPrefix A k) k from rfl, + div_eq_mul_inv, mul_comm] + exact fun k => main k.val k rfl + +/-! ## Orthonormality `Qᵀ Q = 1` -/ + +/-- Each normalized Gram–Schmidt vector is non-zero (the pivot is positive). -/ +theorem gn_ne_zero (A : Fin m → Fin n → ℝ) (hrank : ∀ j : Fin n, 0 < Rmat A j j) (j : Fin n) : + gramSchmidtNormed ℝ (gsCol A) j ≠ 0 := by + have hpos : 0 < ‖gramSchmidt ℝ (gsCol A) j‖ := by + have h := hrank j + rw [Rmat_eq, rStep_diag] at h + rwa [gsV_bridge A j (fun i _ => Qcol_bridge A hrank i), ← normFn_eq_norm] + rw [gn_eq] + exact smul_ne_zero (inv_ne_zero (ne_of_gt hpos)) (norm_pos_iff.mp hpos) + +/-- **Orthonormality of the executable `Q` columns.** Under positive `R` pivots, +`qₐ · q_b = δₐᵦ`. -/ +theorem Q_orthonormal (A : Fin m → Fin n → ℝ) (hrank : ∀ j : Fin n, 0 < Rmat A j j) (a b : Fin n) : + Spec.dotFn (Qcol A a) (Qcol A b) = if a = b then 1 else 0 := by + rw [dotFn_eq_inner] + show ⟪(WithLp.toLp 2 (Qcol A a) : EuclideanSpace ℝ (Fin m)), WithLp.toLp 2 (Qcol A b)⟫_ℝ = _ + rw [Qcol_bridge A hrank a, Qcol_bridge A hrank b] + have horth := orthonormal_iff_ite.mp (gramSchmidtNormed_orthonormal' (gsCol A)) + ⟨a, gn_ne_zero A hrank a⟩ ⟨b, gn_ne_zero A hrank b⟩ + rw [horth] + simp only [Subtype.mk.injEq] + +/-- **Matrix-level orthonormality.** `Qᵀ Q = 1` for the executable Gram–Schmidt `Q` factor. -/ +theorem QT_mul_Q_eq_one (A : Fin m → Fin n → ℝ) (hrank : ∀ j : Fin n, 0 < Rmat A j j) : + (Matrix.of (fun i k => Qmat A i k))ᵀ * Matrix.of (fun i k => Qmat A i k) = 1 := by + ext a b + rw [Matrix.mul_apply] + simp only [Matrix.transpose_apply, Matrix.of_apply, Matrix.one_apply] + rw [show (∑ i, Qmat A i a * Qmat A i b) = Spec.dotFn (Qcol A a) (Qcol A b) from by + rw [dotFn_eq_sum]; rfl, + Q_orthonormal A hrank a b] + +/-- **Full QR specification.** For `A` with positive executable `R`-pivots (full column rank), the +executable Gram–Schmidt factors satisfy `Spec.Factorization.IsQR`: `Qᵀ Q = 1`, `R` upper-triangular, +and `A = Q · R`. -/ +theorem isQR_of_pos (A : Fin m → Fin n → ℝ) (hrank : ∀ j : Fin n, 0 < Rmat A j j) : + Spec.Factorization.IsQR (Matrix.of A) (Matrix.of (fun i k => Qmat A i k)) + (Matrix.of (fun k j => Rmat A k j)) := by + refine ⟨QT_mul_Q_eq_one A hrank, ?_, qr_mul_eq A hrank⟩ + intro i j hji + show Rmat A i j = 0 + exact Rmat_upper_triangular A (Fin.lt_def.mp hji) + +/-- **Tensor-level orthonormality.** For a tensor `A` with positive `qrRSpec` pivots, the `Q` factor +`qrQSpec A` has orthonormal columns: `Σ_i Q[i,a]·Q[i,b] = δₐᵦ`. -/ +theorem qrSpec_orthonormal (A : Spec.Tensor ℝ (.dim m (.dim n .scalar))) + (hrank : ∀ j : Fin n, 0 < Spec.get2 (Spec.qrRSpec A) j j) (a b : Fin n) : + (∑ i, Spec.get2 (Spec.qrQSpec A) i a * Spec.get2 (Spec.qrQSpec A) i b) + = if a = b then 1 else 0 := by + have hQ : ∀ x y, Spec.get2 (Spec.qrQSpec A) x y = Qmat (Spec.toMatFn A) x y := fun _ _ => rfl + have hR : ∀ x y, Spec.get2 (Spec.qrRSpec A) x y = Rmat (Spec.toMatFn A) x y := fun _ _ => rfl + simp only [hQ] + rw [show (∑ i, Qmat (Spec.toMatFn A) i a * Qmat (Spec.toMatFn A) i b) + = Spec.dotFn (Qcol (Spec.toMatFn A) a) (Qcol (Spec.toMatFn A) b) from by + rw [dotFn_eq_sum]; rfl] + exact Q_orthonormal (Spec.toMatFn A) (fun j => by rw [← hR]; exact hrank j) a b + +end QR + +end Spec.Factorization.Reconstruction diff --git a/NN/Proofs/Tensor/Basic/FactorizationsReconstruction.lean b/NN/Proofs/Tensor/Basic/FactorizationsReconstruction.lean new file mode 100644 index 0000000..1a6d698 --- /dev/null +++ b/NN/Proofs/Tensor/Basic/FactorizationsReconstruction.lean @@ -0,0 +1,648 @@ +/- +Copyright (c) 2026 TorchLean +Released under MIT license as described in the file LICENSE. +Authors: TorchLean Team +-/ + +module + +public import NN.Spec.Core.Tensor.Factorizations +public import NN.Proofs.Tensor.Basic.Factorizations +public import Mathlib.Data.List.GetD +public import Mathlib.Algebra.BigOperators.Fin + +/-! +# Exact reconstruction of the finite factorizations (Cholesky and QR) + +This file proves the *exact* algebraic reconstruction of the finite executable Cholesky and QR +factorizations from [`NN.Spec.Core.Tensor.Factorizations`](../../../Spec/Core/Tensor/Factorizations.lean), +building on the predicates and fold-indexing lemmas of `NN.Proofs.Tensor.Basic.Factorizations`. Because +Cholesky and Gram–Schmidt are *direct, finite* constructions — no iteration, no convergence caveat — +over `ℝ` they reconstruct their input on the nose under the success hypotheses (positive pivots / full +column rank), an exact identity rather than an a-posteriori bound. + +## Main results + +* `isCholesky_of_pos`: for a symmetric `A : Fin n → Fin n → ℝ` whose executable Cholesky pivots are all + positive (`0 < choleskyFn A j j`, the exact condition under which the algorithm succeeds over `ℝ`), + the factor `L = choleskyFn A` satisfies the spec `Spec.Factorization.IsCholesky`: lower-triangular + and `A = L · Lᵀ`. `choleskySpec_reconstruction` is the tensor-level corollary. +* `qr_mul_eq`: for `A : Fin m → Fin n → ℝ` whose executable Gram–Schmidt `R`-pivots are positive + (`0 < Rmat A j j`, full column rank), the factors `Q = gramSchmidtFn A` and `R` satisfy `A = Q · R`, + with `R` upper-triangular (`Rmat_upper_triangular`). `qrSpec_reconstruction` is the tensor-level + corollary. + +## Method + +Each executable factor is built by a `List.foldl` that snocs one column per index. The core technical +device is `getD_foldl_snoc_read`, a general lemma reading the `j`-th element of such a fold as the step +function applied to the length-`j` prefix. From it, `prefix_eq_map`/`qsPrefix_eq_map` identify the +prefix with the first `j` columns of the final factor, and `take_map_sum_eq` turns the code's +`List.foldl` sums into masked `Finset` partial sums. The QR fold threads a `GSState` that snocs onto +*both* the `Q`-list and the `R`-list at once; `gs_proj_qs` and `gs_fold_split`/`rTail_getD` recover the +single-list read lemmas for each projection (the step depends only on the `Q`-history). The +positive-pivot hypotheses discharge the `√`-radicand and divisor side conditions. + +## Scope + +This file proves `A = L · Lᵀ` and `A = Q · R` purely algebraically. The remaining QR property — +orthonormality of the `Q` factor, `Qᵀ Q = 1` — is proved in the companion file +[`NN.Proofs.Tensor.Basic.FactorizationsOrthonormal`](FactorizationsOrthonormal.lean) by bridging the +executable Gram–Schmidt to Mathlib's `gramSchmidt`, completing the full `Spec.Factorization.IsQR` +predicate (`isQR_of_pos`). +-/ + +@[expose] public section + +namespace Spec.Factorization.Reconstruction + +open Matrix +open scoped BigOperators + +variable {n : Nat} + +/-! ## List/Finset bridges -/ + +/-- A left `+`-fold accumulates the list sum. -/ +theorem foldl_add_eq_sum (l : List ℝ) (a : ℝ) : + l.foldl (· + ·) a = a + l.sum := by + induction l generalizing a with + | nil => simp + | cons x t ih => rw [List.foldl_cons, ih, List.sum_cons]; ring + +/-- A left `s + x*x`-fold accumulates the sum of squares. -/ +theorem foldl_addsq_eq_sum (l : List ℝ) (a : ℝ) : + l.foldl (fun s x => s + x * x) a = a + (l.map (fun x => x * x)).sum := by + induction l generalizing a with + | nil => simp + | cons x t ih => rw [List.foldl_cons, ih, List.map_cons, List.sum_cons]; ring + +/-- A `Fin n` sum is the foldl-sum over `finRange n`. -/ +theorem finsum_eq_finRange_sum (h : Fin n → ℝ) : + ∑ i, h i = ((List.finRange n).map h).sum := by + rw [← List.sum_toFinset _ (List.nodup_finRange n)] + · simp [List.toFinset_finRange] + +/-! ## Cholesky: the column-building step + +`choleskyColsFn` is a left fold that snocs one column per index. `cholStep` names the function it +appends, so that the read lemmas above can be specialized to it. -/ + +/-- The column appended at index `j` of the Cholesky fold, given the columns `cols` built so far. -/ +noncomputable def cholStep (A : Fin n → Fin n → ℝ) (cols : List (Fin n → ℝ)) (j : Fin n) : + Fin n → ℝ := + let sumsq := (cols.map (fun ck => ck j)).foldl (fun s x => s + x * x) 0 + let Ljj := MathFunctions.sqrt (A j j - sumsq) + fun i => + if i.val < j.val then 0 + else if i.val == j.val then Ljj + else + let s := (cols.map (fun ck => ck i * ck j)).foldl (fun acc x => acc + x) 0 + (A i j - s) / Ljj + +/-- `choleskyColsFn` is the snoc-fold appending `cholStep`. -/ +theorem choleskyColsFn_eq (A : Fin n → Fin n → ℝ) : + Spec.choleskyColsFn A + = (List.finRange n).foldl (fun cols j => cols ++ [cholStep A cols j]) [] := rfl + +/-- The diagonal value produced by `cholStep`. -/ +theorem cholStep_diag (A : Fin n → Fin n → ℝ) (cols : List (Fin n → ℝ)) (j : Fin n) : + cholStep A cols j j + = MathFunctions.sqrt (A j j - (cols.map (fun ck => ck j)).foldl (fun s x => s + x * x) 0) := by + simp only [cholStep] + rw [if_neg (lt_irrefl _), if_pos (beq_self_eq_true _)] + +/-- The below-diagonal value produced by `cholStep`. -/ +theorem cholStep_offdiag (A : Fin n → Fin n → ℝ) (cols : List (Fin n → ℝ)) {i j : Fin n} + (hij : j.val < i.val) : + cholStep A cols j i + = (A i j - (cols.map (fun ck => ck i * ck j)).foldl (fun acc x => acc + x) 0) + / MathFunctions.sqrt (A j j - (cols.map (fun ck => ck j)).foldl (fun s x => s + x * x) 0) := by + simp only [cholStep] + rw [if_neg (by grind), if_neg (by rw [beq_iff_eq]; grind)] + +/-- The length-`j` prefix of Cholesky columns built before index `j`. -/ +noncomputable def prefixCols (A : Fin n → Fin n → ℝ) (j : Fin n) : List (Fin n → ℝ) := + ((List.finRange n).take j.val).foldl (fun cols k => cols ++ [cholStep A cols k]) [] + +/-- Entry `(i, j)` of the executable Cholesky factor equals `cholStep` evaluated on the prefix. -/ +theorem choleskyFn_eq_step (A : Fin n → Fin n → ℝ) (i j : Fin n) : + Spec.choleskyFn A i j = cholStep A (prefixCols A j) j i := by + have hlen : j.val < (List.finRange n).length := by rw [List.length_finRange]; exact j.isLt + show (Spec.choleskyColsFn A).getD j.val (fun _ => 0) i = _ + rw [choleskyColsFn_eq, getD_foldl_snoc_read (fun cols k => cholStep A cols k) (fun _ => 0) + (List.finRange n) j.val hlen] + have hj : (List.finRange n)[j.val]'hlen = j := by simp [List.getElem_finRange] + rw [hj] + rfl + +/-- The prefix of Cholesky columns is exactly the first `j` columns of the final factor `L`, +each presented as the function `r ↦ L r k`. -/ +theorem prefix_eq_map (A : Fin n → Fin n → ℝ) (j : Fin n) : + prefixCols A j + = ((List.finRange n).take j.val).map (fun k => fun r => Spec.choleskyFn A r k) := by + have hjval : ((List.finRange n).take j.val).length = j.val := by + rw [List.length_take, List.length_finRange, Nat.min_eq_left (le_of_lt j.isLt)] + apply List.ext_getElem + · unfold prefixCols + rw [length_foldl_snoc (fun cols k => cholStep A cols k), List.length_nil, Nat.zero_add, + List.length_map] + · intro p h1 h2 + rw [List.length_map, hjval] at h2 + have hpn : p < n := lt_trans h2 j.isLt + rw [List.getElem_map] + have hidx : ((List.finRange n).take j.val)[p]'(by rw [hjval]; exact h2) = (⟨p, hpn⟩ : Fin n) := by + rw [List.getElem_take, List.getElem_finRange]; exact Fin.ext rfl + rw [show (prefixCols A j)[p]'h1 = (prefixCols A j).getD p (fun _ => 0) from + (List.getD_eq_getElem _ _ h1).symm] + unfold prefixCols + rw [getD_foldl_snoc_read (fun cols k => cholStep A cols k) (fun _ => 0) + ((List.finRange n).take j.val) p (by rw [hjval]; exact h2)] + rw [List.take_take, Nat.min_eq_left (le_of_lt h2), hidx] + funext r + rw [choleskyFn_eq_step] + rfl + +/-! ### List/Finset partial-sum bridges -/ + +/-- Every element of a `finRange` prefix has index below the cut. -/ +theorem mem_take_finRange {m : Nat} {x : Fin n} (hx : x ∈ (List.finRange n).take m) : + x.val < m := by + obtain ⟨p, hp, hpx⟩ := List.getElem_of_mem hx + rw [List.length_take, List.length_finRange] at hp + rw [List.getElem_take, List.getElem_finRange] at hpx + subst hpx + exact lt_of_lt_of_le hp (Nat.min_le_left m n) + +/-- Every element of a `finRange` tail has index at least the cut. -/ +theorem mem_drop_finRange {m : Nat} {x : Fin n} (hx : x ∈ (List.finRange n).drop m) : + m ≤ x.val := by + obtain ⟨p, hp, hpx⟩ := List.getElem_of_mem hx + rw [List.getElem_drop, List.getElem_finRange] at hpx + subst hpx + exact Nat.le_add_right m p + +/-- Mapping `f` over a `finRange` prefix and summing equals the masked full sum. -/ +theorem take_map_sum_eq (m : Nat) (f : Fin n → ℝ) : + (((List.finRange n).take m).map f).sum = ∑ k : Fin n, if k.val < m then f k else 0 := by + rw [finsum_eq_finRange_sum] + conv_rhs => rw [show (List.finRange n) + = (List.finRange n).take m ++ (List.finRange n).drop m from (List.take_append_drop _ _).symm] + rw [List.map_append, List.sum_append] + have htake : ((List.finRange n).take m).map (fun k => if k.val < m then f k else 0) + = ((List.finRange n).take m).map f := + List.map_congr_left (fun x hx => if_pos (mem_take_finRange hx)) + have hdrop : (((List.finRange n).drop m).map (fun k => if k.val < m then f k else 0)).sum = 0 := by + rw [List.sum_eq_zero] + intro y hy + rw [List.mem_map] at hy + obtain ⟨x, hx, rfl⟩ := hy + exact if_neg (by have := mem_drop_finRange hx; grind) + rw [htake, hdrop, add_zero] + +/-- The Cholesky cross-sum equals the masked partial dot product of rows `i` and `j` of `L`. -/ +theorem cross_sum_eq (A : Fin n → Fin n → ℝ) (i j : Fin n) : + ((prefixCols A j).map (fun ck => ck i * ck j)).foldl (fun acc x => acc + x) 0 + = ∑ k : Fin n, if k.val < j.val then Spec.choleskyFn A i k * Spec.choleskyFn A j k else 0 := by + rw [prefix_eq_map, List.map_map, foldl_add_eq_sum, zero_add, + show ((fun ck : Fin n → ℝ => ck i * ck j) ∘ fun k => fun r => Spec.choleskyFn A r k) + = (fun k => Spec.choleskyFn A i k * Spec.choleskyFn A j k) from rfl] + exact take_map_sum_eq j.val (fun k => Spec.choleskyFn A i k * Spec.choleskyFn A j k) + +/-- The Cholesky diagonal sum-of-squares equals the masked partial squared norm of row `j` of `L`. -/ +theorem sumsq_eq (A : Fin n → Fin n → ℝ) (j : Fin n) : + ((prefixCols A j).map (fun ck => ck j)).foldl (fun s x => s + x * x) 0 + = ∑ k : Fin n, if k.val < j.val then Spec.choleskyFn A j k * Spec.choleskyFn A j k else 0 := by + rw [prefix_eq_map, List.map_map, foldl_addsq_eq_sum, zero_add, List.map_map, + show ((fun x : ℝ => x * x) ∘ ((fun ck : Fin n → ℝ => ck j) ∘ fun k => fun r => Spec.choleskyFn A r k)) + = (fun k => Spec.choleskyFn A j k * Spec.choleskyFn A j k) from rfl] + exact take_map_sum_eq j.val (fun k => Spec.choleskyFn A j k * Spec.choleskyFn A j k) + +/-! ### Closed-form entries of the executable Cholesky factor -/ + +/-- Over `ℝ`, the `Context` square root is `Real.sqrt`. -/ +theorem mfsqrt_eq (x : ℝ) : MathFunctions.sqrt x = Real.sqrt x := rfl + +/-- The diagonal entry of `L` in closed form: `L[j,j] = √(A[j,j] − Σ_{k j`. -/ +theorem choleskyFn_offdiag_eq (A : Fin n → Fin n → ℝ) {i j : Fin n} (hij : j.val < i.val) : + Spec.choleskyFn A i j + = (A i j - ∑ k, if k.val < j.val then Spec.choleskyFn A i k * Spec.choleskyFn A j k else 0) + / Spec.choleskyFn A j j := by + rw [choleskyFn_eq_step A i j, cholStep_offdiag _ _ hij, cross_sum_eq, sumsq_eq, mfsqrt_eq, + ← choleskyFn_diag_eq] + +/-! ### Reconstruction `A = L · Lᵀ` + +The diagonal of the rotated/peeled product is reconstructed using the closed-form entries and the +positive-pivot hypothesis (`0 < L[j,j]`), which is exactly the condition under which the executable +Cholesky succeeds over `ℝ`. -/ + +/-- Per-entry reconstruction for the lower part (`j ≤ i`): the `(i, j)` entry of `L · Lᵀ` is `A i j`. -/ +theorem choleskyFn_dot_eq (A : Fin n → Fin n → ℝ) + (hpos : ∀ j : Fin n, 0 < Spec.choleskyFn A j j) {i j : Fin n} (hji : j.val ≤ i.val) : + (∑ k, Spec.choleskyFn A i k * Spec.choleskyFn A j k) = A i j := by + set L := Spec.choleskyFn A with hL + have key : ∀ k : Fin n, L i k * L j k + = (if k.val < j.val then L i k * L j k else 0) + (if k = j then L i j * L j j else 0) := by + intro k + rcases lt_trichotomy k.val j.val with h | h | h + · have hne : k ≠ j := fun hk => by rw [hk] at h; exact lt_irrefl _ h + rw [if_pos h, if_neg hne, add_zero] + · have hkj : k = j := Fin.ext h + rw [if_neg (by grind), if_pos hkj, zero_add, hkj] + · have hne : k ≠ j := fun hk => by rw [hk] at h; exact lt_irrefl _ h + rw [if_neg (by grind), if_neg hne, add_zero, + show L j k = 0 from Spec.Factorization.choleskyFn_lower_triangular A h, mul_zero] + rw [show (∑ k, L i k * L j k) + = ∑ k, ((if k.val < j.val then L i k * L j k else 0) + (if k = j then L i j * L j j else 0)) + from Finset.sum_congr rfl (fun k _ => key k), + Finset.sum_add_distrib, Finset.sum_ite_eq' Finset.univ j (fun _ => L i j * L j j)] + simp only [Finset.mem_univ, if_true] + rcases eq_or_lt_of_le hji with heq | hlt + · have hij' : i = j := Fin.ext heq.symm + subst hij' + have hrad : 0 < A i i - (∑ k, if k.val < i.val then L i k * L i k else 0) := by + have hp := hpos i + rw [hL, choleskyFn_diag_eq] at hp + exact Real.sqrt_pos.mp hp + have hsq : L i i * L i i = A i i - (∑ k, if k.val < i.val then L i k * L i k else 0) := by + conv_lhs => rw [hL, choleskyFn_diag_eq A i] + exact Real.mul_self_sqrt hrad.le + rw [hsq]; ring + · have hne : L j j ≠ 0 := ne_of_gt (hpos j) + have hmul : L i j * L j j + = A i j - (∑ k, if k.val < j.val then L i k * L j k else 0) := by + rw [hL, choleskyFn_offdiag_eq A hlt, div_mul_eq_mul_div, mul_div_assoc, div_self hne, mul_one] + rw [hmul]; ring + +/-- Per-entry reconstruction for all `(i, j)`, using symmetry of `A`. -/ +theorem choleskyFn_dot (A : Fin n → Fin n → ℝ) (hsymm : ∀ i j, A i j = A j i) + (hpos : ∀ j : Fin n, 0 < Spec.choleskyFn A j j) (i j : Fin n) : + (∑ k, Spec.choleskyFn A i k * Spec.choleskyFn A j k) = A i j := by + rcases le_total j.val i.val with h | h + · exact choleskyFn_dot_eq A hpos h + · rw [show (∑ k, Spec.choleskyFn A i k * Spec.choleskyFn A j k) + = ∑ k, Spec.choleskyFn A j k * Spec.choleskyFn A i k + from Finset.sum_congr rfl (fun k _ => mul_comm _ _), + choleskyFn_dot_eq A hpos h, hsymm j i] + +/-- **Exact Cholesky reconstruction.** For a symmetric `A` whose executable Cholesky pivots are all +positive (`0 < L[j,j]`, the success condition over `ℝ`), the factor `L = choleskyFn A` is a genuine +Cholesky factor: lower-triangular with `A = L · Lᵀ`. -/ +theorem isCholesky_of_pos (A : Fin n → Fin n → ℝ) (hsymm : ∀ i j, A i j = A j i) + (hpos : ∀ j : Fin n, 0 < Spec.choleskyFn A j j) : + Spec.Factorization.IsCholesky (Matrix.of A) (Matrix.of (Spec.choleskyFn A)) := by + refine ⟨?_, ?_⟩ + · intro a b hab + show Spec.choleskyFn A a b = 0 + exact Spec.Factorization.choleskyFn_lower_triangular A (Fin.lt_def.mp hab) + · ext i j + rw [Matrix.mul_apply] + simp only [Matrix.of_apply, Matrix.transpose_apply] + exact (choleskyFn_dot A hsymm hpos i j).symm + +/-- **Tensor-level Cholesky reconstruction.** For a symmetric tensor `A` whose `choleskySpec` pivots +are positive, every entry of `A` is reconstructed by `L · Lᵀ`: +`A[i,j] = Σ_k L[i,k] · L[j,k]`, with `L = choleskySpec A`. -/ +theorem choleskySpec_reconstruction (A : Spec.Tensor ℝ (.dim n (.dim n .scalar))) + (hsymm : ∀ i j, Spec.get2 A i j = Spec.get2 A j i) + (hpos : ∀ j : Fin n, 0 < Spec.get2 (Spec.choleskySpec A) j j) (i j : Fin n) : + Spec.get2 A i j + = ∑ k, Spec.get2 (Spec.choleskySpec A) i k * Spec.get2 (Spec.choleskySpec A) j k := by + have hg : ∀ a b, Spec.get2 (Spec.choleskySpec A) a b = Spec.choleskyFn (Spec.toMatFn A) a b := by + intro a b + rw [show Spec.choleskySpec A = Spec.ofMatFn (Spec.choleskyFn (Spec.toMatFn A)) from rfl, + Spec.Factorization.get2_ofMatFn] + simp only [hg] + show Spec.toMatFn A i j = _ + refine (choleskyFn_dot (Spec.toMatFn A) (fun a b => hsymm a b) (fun b => ?_) i j).symm + rw [← hg b b]; exact hpos b + +/-! ## QR (classical Gram–Schmidt): exact reconstruction `A = Q · R` + +`gramSchmidtFn` threads a `GSState` that snocs a column onto *both* the `Q`-list and the `R`-list at +each index. Crucially the appended values depend only on the `Q`-history (`st.qs`), never on the +`R`-history, so the `Q`-list is itself a single-list snoc-fold (`gs_proj_qs`) and the `R`-list is the +`Q`-prefix-indexed tail `rTail`. -/ + +section QR + +variable {m : Nat} + +open Spec (GSState) + +/-- Column `j` of `A` as a function of the row. -/ +noncomputable def gsA (A : Fin m → Fin n → ℝ) (j : Fin n) : Fin m → ℝ := fun i => A i j + +/-- The `R` off-diagonal entries `rₖⱼ = qₖ · a` for the columns `qs` built so far. -/ +noncomputable def gsRkjs (A : Fin m → Fin n → ℝ) (qs : List (Fin m → ℝ)) (j : Fin n) : List ℝ := + qs.map (fun qk => Spec.dotFn qk (gsA A j)) + +/-- The orthogonalized (not-yet-normalized) vector `v = a − Σ rₖⱼ qₖ`. -/ +noncomputable def gsV (A : Fin m → Fin n → ℝ) (qs : List (Fin m → ℝ)) (j : Fin n) : Fin m → ℝ := + fun i => gsA A j i + - (List.zip qs (gsRkjs A qs j)).foldl (fun acc (qk, r) => acc + r * qk i) 0 + +/-- The diagonal `R` entry `rⱼⱼ = ‖v‖`. -/ +noncomputable def gsRjj (A : Fin m → Fin n → ℝ) (qs : List (Fin m → ℝ)) (j : Fin n) : ℝ := + Spec.normFn (gsV A qs j) + +/-- The `Q` column appended at index `j`: `v / rⱼⱼ` (or `0` when `rⱼⱼ = 0`). -/ +noncomputable def qStep (A : Fin m → Fin n → ℝ) (qs : List (Fin m → ℝ)) (j : Fin n) : Fin m → ℝ := + fun i => if Context.gtBool (gsRjj A qs j) 0 then gsV A qs j i / gsRjj A qs j else 0 + +/-- The `R` column at column index `j`, as a function of the row index `k` (so the value is the +matrix entry `R[k, j]`). With `R` indexed row-then-column, the nonzero part is on and *above* the +diagonal: `rₖⱼ` for `k < j` (strictly above the diagonal), `rⱼⱼ` for `k = j`, and `0` for `k > j` +(strictly below the diagonal). -/ +noncomputable def rStep (A : Fin m → Fin n → ℝ) (qs : List (Fin m → ℝ)) (j : Fin n) : Fin n → ℝ := + fun k => if k.val < j.val then (gsRkjs A qs j).getD k.val 0 + else if k.val == j.val then gsRjj A qs j else 0 + +/-- `gramSchmidtFn` as the dual-list snoc-fold appending `qStep`/`rStep`. -/ +theorem gramSchmidtFn_eq (A : Fin m → Fin n → ℝ) : + Spec.gramSchmidtFn A + = (List.finRange n).foldl + (fun st j => (⟨st.qs ++ [qStep A st.qs j], st.rcols ++ [rStep A st.qs j]⟩ : GSState m n ℝ)) + ⟨[], []⟩ := rfl + +/-- The `Q`-list projection of the structure fold is the single-list `qStep` snoc-fold. -/ +theorem gs_proj_qs (A : Fin m → Fin n → ℝ) (l : List (Fin n)) (q0 : List (Fin m → ℝ)) + (r0 : List (Fin n → ℝ)) : + (l.foldl (fun st j => (⟨st.qs ++ [qStep A st.qs j], st.rcols ++ [rStep A st.qs j]⟩ : GSState m n ℝ)) + ⟨q0, r0⟩).qs + = l.foldl (fun qs j => qs ++ [qStep A qs j]) q0 := by + induction l generalizing q0 r0 with + | nil => rfl + | cons a t ih => simp only [List.foldl_cons]; exact ih _ _ + +/-- The `Q` columns built before index `j`. -/ +noncomputable def qsPrefix (A : Fin m → Fin n → ℝ) (j : Fin n) : List (Fin m → ℝ) := + ((List.finRange n).take j.val).foldl (fun qs k => qs ++ [qStep A qs k]) [] + +/-- The `R`-list tail: the `R` columns produced from `Q`-prefix `q0` over the indices `l`. -/ +noncomputable def rTail (A : Fin m → Fin n → ℝ) (q0 : List (Fin m → ℝ)) : List (Fin n) → + List (Fin n → ℝ) + | [] => [] + | j :: rest => rStep A q0 j :: rTail A (q0 ++ [qStep A q0 j]) rest + +/-- The structure fold splits into the `qStep` snoc-fold (`Q`-list) and the `rTail` (`R`-list). -/ +theorem gs_fold_split (A : Fin m → Fin n → ℝ) (l : List (Fin n)) (q0 : List (Fin m → ℝ)) + (r0 : List (Fin n → ℝ)) : + (l.foldl (fun st j => (⟨st.qs ++ [qStep A st.qs j], st.rcols ++ [rStep A st.qs j]⟩ : GSState m n ℝ)) + ⟨q0, r0⟩) + = ⟨l.foldl (fun qs j => qs ++ [qStep A qs j]) q0, r0 ++ rTail A q0 l⟩ := by + induction l generalizing q0 r0 with + | nil => simp [rTail] + | cons j rest ih => + simp only [List.foldl_cons, rTail] + rw [ih] + simp [List.append_assoc] + +/-- Reading the `k`-th element of `rTail` recovers `rStep` applied to the length-`k` `Q`-prefix. -/ +theorem rTail_getD (A : Fin m → Fin n → ℝ) (q0 : List (Fin m → ℝ)) (l : List (Fin n)) (k : Nat) + (hk : k < l.length) (d : Fin n → ℝ) : + (rTail A q0 l).getD k d + = rStep A ((l.take k).foldl (fun qs j => qs ++ [qStep A qs j]) q0) (l[k]'hk) := by + induction l generalizing q0 k with + | nil => simp at hk + | cons j rest ih => + cases k with + | zero => simp [rTail] + | succ k' => + simp only [rTail, List.getD_cons_succ, List.take_succ_cons, List.foldl_cons, + List.getElem_cons_succ] + exact ih (q0 ++ [qStep A q0 j]) k' (by simpa using hk) + +/-- Semantics of the `Context` `>` test over `ℝ`. -/ +theorem gtBool_true_iff {x y : ℝ} : Context.gtBool x y = true ↔ y < x := by + unfold Context.gtBool; exact decide_eq_true_iff + +/-- A left fold `acc + h x` accumulates the mapped list sum. -/ +theorem foldl_addf_eq_sum {β : Type _} (h : β → ℝ) (l : List β) (a : ℝ) : + l.foldl (fun acc x => acc + h x) a = a + (l.map h).sum := by + induction l generalizing a with + | nil => simp + | cons x t ih => rw [List.foldl_cons, ih, List.map_cons, List.sum_cons]; ring + +/-! ### Entries of the executable `Q` and `R` factors -/ + +/-- Entry `(i, k)` of the `Q` factor produced by `gramSchmidtFn`. -/ +noncomputable def Qmat (A : Fin m → Fin n → ℝ) (i : Fin m) (k : Fin n) : ℝ := + (Spec.gramSchmidtFn A).qs.getD k.val (fun _ => 0) i + +/-- Entry `(k, j)` of the `R` factor produced by `gramSchmidtFn`. -/ +noncomputable def Rmat (A : Fin m → Fin n → ℝ) (k j : Fin n) : ℝ := + (Spec.gramSchmidtFn A).rcols.getD j.val (fun _ => 0) k + +/-- Column `k` of `Q` as a function of the row. -/ +noncomputable def Qcol (A : Fin m → Fin n → ℝ) (k : Fin n) : Fin m → ℝ := fun r => Qmat A r k + +/-- Closed form of a `Q` entry: `qStep` evaluated on the `Q`-prefix. -/ +theorem Qmat_eq (A : Fin m → Fin n → ℝ) (i : Fin m) (k : Fin n) : + Qmat A i k = qStep A (qsPrefix A k) k i := by + have hqs : (Spec.gramSchmidtFn A).qs + = (List.finRange n).foldl (fun qs j => qs ++ [qStep A qs j]) [] := by + rw [gramSchmidtFn_eq]; exact gs_proj_qs A (List.finRange n) [] [] + unfold Qmat + rw [hqs, getD_foldl_snoc_read (fun qs j => qStep A qs j) (fun _ => 0) (List.finRange n) k.val + (by rw [List.length_finRange]; exact k.isLt)] + have hk : (List.finRange n)[k.val]'(by rw [List.length_finRange]; exact k.isLt) = k := by + simp [List.getElem_finRange] + rw [hk]; rfl + +/-- Closed form of an `R` entry: `rStep` evaluated on the `Q`-prefix. -/ +theorem Rmat_eq (A : Fin m → Fin n → ℝ) (k j : Fin n) : + Rmat A k j = rStep A (qsPrefix A j) j k := by + have hrc : (Spec.gramSchmidtFn A).rcols = rTail A [] (List.finRange n) := by + rw [gramSchmidtFn_eq, gs_fold_split]; simp + unfold Rmat + rw [hrc, rTail_getD A [] (List.finRange n) j.val (by rw [List.length_finRange]; exact j.isLt)] + have hk : (List.finRange n)[j.val]'(by rw [List.length_finRange]; exact j.isLt) = j := by + simp [List.getElem_finRange] + rw [hk]; rfl + +/-- `R` is upper-triangular: the entry `R[k, j]` at a row `k` strictly below the diagonal +(`column j < row k`) vanishes. -/ +theorem rStep_below_diag_zero (A : Fin m → Fin n → ℝ) (qs : List (Fin m → ℝ)) {j k : Fin n} + (hjk : j.val < k.val) : rStep A qs j k = 0 := by + simp only [rStep]; rw [if_neg (by grind), if_neg (by rw [beq_iff_eq]; grind)] + +/-- The diagonal `R` entry is `rⱼⱼ`. -/ +theorem rStep_diag (A : Fin m → Fin n → ℝ) (qs : List (Fin m → ℝ)) (j : Fin n) : + rStep A qs j j = gsRjj A qs j := by + simp only [rStep]; rw [if_neg (lt_irrefl _), if_pos (beq_self_eq_true _)] + +/-- The `Q` column when the pivot is positive: `qⱼ = v / rⱼⱼ`. -/ +theorem qStep_pos (A : Fin m → Fin n → ℝ) (qs : List (Fin m → ℝ)) (j : Fin n) + (h : 0 < gsRjj A qs j) (i : Fin m) : + qStep A qs j i = gsV A qs j i / gsRjj A qs j := by + simp only [qStep]; rw [if_pos (gtBool_true_iff.mpr h)] + +/-! ### The orthogonalization sum as a `Finset` sum -/ + +/-- The zip-fold defining `v` collapses to a single map-fold over the `Q` columns. -/ +theorem cross_fold_eq (qs : List (Fin m → ℝ)) (g : (Fin m → ℝ) → ℝ) (i : Fin m) (a : ℝ) : + (List.zip qs (qs.map g)).foldl (fun acc (qk, r) => acc + r * qk i) a + = a + (qs.map (fun qk => g qk * qk i)).sum := by + induction qs generalizing a with + | nil => simp + | cons x xs ih => + simp only [List.map_cons, List.zip_cons_cons, List.foldl_cons] + rw [ih]; simp only [List.sum_cons]; ring + +/-- Closed form of `v i`: `A i j` minus the partial projection sum. -/ +theorem gsV_eq (A : Fin m → Fin n → ℝ) (qs : List (Fin m → ℝ)) (j : Fin n) (i : Fin m) : + gsV A qs j i = gsA A j i - (qs.map (fun qk => Spec.dotFn qk (gsA A j) * qk i)).sum := by + unfold gsV gsRkjs + rw [cross_fold_eq qs (fun qk => Spec.dotFn qk (gsA A j)) i 0, zero_add] + +/-- Length of the `Q`-prefix list. -/ +theorem qsPrefix_length (A : Fin m → Fin n → ℝ) (j : Fin n) : (qsPrefix A j).length = j.val := by + unfold qsPrefix + rw [length_foldl_snoc (fun qs k => qStep A qs k), List.length_nil, Nat.zero_add, List.length_take, + List.length_finRange, Nat.min_eq_left (le_of_lt j.isLt)] + +/-- The `Q`-prefix is exactly the first `j` columns of the final factor `Q`. -/ +theorem qsPrefix_eq_map (A : Fin m → Fin n → ℝ) (j : Fin n) : + qsPrefix A j = ((List.finRange n).take j.val).map (fun k => Qcol A k) := by + have hjval : ((List.finRange n).take j.val).length = j.val := by + rw [List.length_take, List.length_finRange, Nat.min_eq_left (le_of_lt j.isLt)] + apply List.ext_getElem + · unfold qsPrefix + rw [length_foldl_snoc (fun qs k => qStep A qs k), List.length_nil, Nat.zero_add, + List.length_map] + · intro p h1 h2 + rw [List.length_map, hjval] at h2 + have hpn : p < n := lt_trans h2 j.isLt + rw [List.getElem_map] + have hidx : ((List.finRange n).take j.val)[p]'(by rw [hjval]; exact h2) = (⟨p, hpn⟩ : Fin n) := by + rw [List.getElem_take, List.getElem_finRange]; exact Fin.ext rfl + rw [show (qsPrefix A j)[p]'h1 = (qsPrefix A j).getD p (fun _ => 0) from + (List.getD_eq_getElem _ _ h1).symm] + unfold qsPrefix + rw [getD_foldl_snoc_read (fun qs k => qStep A qs k) (fun _ => 0) + ((List.finRange n).take j.val) p (by rw [hjval]; exact h2)] + rw [List.take_take, Nat.min_eq_left (le_of_lt h2), hidx] + funext r + rw [show Qcol A (⟨p, hpn⟩ : Fin n) r = Qmat A r ⟨p, hpn⟩ from rfl, Qmat_eq] + rfl + +/-- `getD` commutes with `dotFn`-mapping when the index is in range. -/ +theorem getD_map_dotFn (qs : List (Fin m → ℝ)) (a : Fin m → ℝ) (k : Nat) (hk : k < qs.length) : + (qs.map (fun qk => Spec.dotFn qk a)).getD k 0 = Spec.dotFn (qs.getD k (fun _ => 0)) a := by + rw [List.getD_eq_getElem _ _ (by rw [List.length_map]; exact hk), List.getElem_map, + List.getD_eq_getElem _ _ hk] + +/-- A `Q`-prefix entry equals the final `Q` column at that index. -/ +theorem qsPrefix_getD (A : Fin m → Fin n → ℝ) {k j : Fin n} (hkj : k.val < j.val) : + (qsPrefix A j).getD k.val (fun _ => 0) = Qcol A k := by + rw [qsPrefix_eq_map, + List.getD_eq_getElem _ _ (by rw [List.length_map, List.length_take, List.length_finRange, + Nat.min_eq_left (le_of_lt j.isLt)]; exact hkj), + List.getElem_map] + congr 1 + rw [List.getElem_take, List.getElem_finRange]; exact Fin.ext rfl + +/-- The strictly-above-diagonal `R` entry `R[k, j]` (`row k < column j`) is the inner product of +`Q` column `k` with column `j`. -/ +theorem Rmat_above_diag_dot (A : Fin m → Fin n → ℝ) {k j : Fin n} (hkj : k.val < j.val) : + Rmat A k j = Spec.dotFn (Qcol A k) (gsA A j) := by + rw [Rmat_eq]; simp only [rStep]; rw [if_pos hkj]; unfold gsRkjs + rw [getD_map_dotFn (qsPrefix A j) (gsA A j) k.val (by rw [qsPrefix_length]; exact hkj), + qsPrefix_getD A hkj] + +/-- The projection sum equals the masked partial sum `Σ_{k Spec.dotFn qk (gsA A j) * qk i)).sum + = ∑ k, if k.val < j.val then Rmat A k j * Qmat A i k else 0 := by + rw [qsPrefix_eq_map] + rw [List.map_map] + rw [take_map_sum_eq] + apply Finset.sum_congr rfl + intro k _ + by_cases hkj : k.val < j.val + · rw [if_pos hkj, if_pos hkj] + show Spec.dotFn (Qcol A k) (gsA A j) * Qmat A i k = Rmat A k j * Qmat A i k + rw [Rmat_above_diag_dot A hkj] + · rw [if_neg hkj, if_neg hkj] + +/-! ### Exact reconstruction `A = Q · R` -/ + +/-- `R` is upper-triangular: the entry `R[k, j]` strictly below the diagonal (`column j < row k`) +vanishes. -/ +theorem Rmat_upper_triangular (A : Fin m → Fin n → ℝ) {k j : Fin n} (hjk : j.val < k.val) : + Rmat A k j = 0 := by + rw [Rmat_eq]; exact rStep_below_diag_zero A (qsPrefix A j) hjk + +/-- **Per-entry QR reconstruction.** When every `R` pivot is positive (`0 < R[j,j]`, the full +column-rank success condition), `A[i,j] = Σ_k Q[i,k]·R[k,j]`. -/ +theorem qr_reconstruction (A : Fin m → Fin n → ℝ) (hrank : ∀ j : Fin n, 0 < Rmat A j j) + (i : Fin m) (j : Fin n) : + A i j = ∑ k, Qmat A i k * Rmat A k j := by + have key : ∀ k : Fin n, Qmat A i k * Rmat A k j + = (if k.val < j.val then Qmat A i k * Rmat A k j else 0) + + (if k = j then Qmat A i j * Rmat A j j else 0) := by + intro k + rcases lt_trichotomy k.val j.val with h | h | h + · have hne : k ≠ j := fun hk => by rw [hk] at h; exact lt_irrefl _ h + rw [if_pos h, if_neg hne, add_zero] + · have hkj : k = j := Fin.ext h + rw [if_neg (by grind), if_pos hkj, zero_add, hkj] + · have hne : k ≠ j := fun hk => by rw [hk] at h; exact lt_irrefl _ h + rw [if_neg (by grind), if_neg hne, add_zero, Rmat_upper_triangular A h, mul_zero] + rw [show (∑ k, Qmat A i k * Rmat A k j) + = ∑ k, ((if k.val < j.val then Qmat A i k * Rmat A k j else 0) + + (if k = j then Qmat A i j * Rmat A j j else 0)) + from Finset.sum_congr rfl (fun k _ => key k), + Finset.sum_add_distrib, Finset.sum_ite_eq' Finset.univ j (fun _ => Qmat A i j * Rmat A j j)] + simp only [Finset.mem_univ, if_true] + have hρpos : 0 < gsRjj A (qsPrefix A j) j := by + have h := hrank j; rwa [Rmat_eq, rStep_diag] at h + have hdiag : Qmat A i j * Rmat A j j = gsV A (qsPrefix A j) j i := by + rw [Qmat_eq, qStep_pos A (qsPrefix A j) j hρpos, + show Rmat A j j = gsRjj A (qsPrefix A j) j from by rw [Rmat_eq]; exact rStep_diag _ _ j, + div_mul_eq_mul_div, mul_div_assoc, div_self (ne_of_gt hρpos), mul_one] + rw [hdiag, gsV_eq, cross_sum_qr, + show gsA A j i = A i j from rfl, + show (∑ k, if k.val < j.val then Qmat A i k * Rmat A k j else 0) + = (∑ k, if k.val < j.val then Rmat A k j * Qmat A i k else 0) + from Finset.sum_congr rfl (fun k _ => by + by_cases hkj : k.val < j.val + · rw [if_pos hkj, if_pos hkj, mul_comm] + · rw [if_neg hkj, if_neg hkj])] + ring + +/-- **Matrix-level QR reconstruction.** `A = Q · R` for the executable Gram–Schmidt factors, +under positive `R` pivots (full column rank). -/ +theorem qr_mul_eq (A : Fin m → Fin n → ℝ) (hrank : ∀ j : Fin n, 0 < Rmat A j j) : + Matrix.of A = Matrix.of (fun i k => Qmat A i k) * Matrix.of (fun k j => Rmat A k j) := by + ext i j + rw [Matrix.mul_apply] + simp only [Matrix.of_apply] + exact qr_reconstruction A hrank i j + +/-- **Tensor-level QR reconstruction.** For a tensor `A` whose `qrSpec` `R`-pivots are positive +(full column rank), every entry of `A` is reconstructed by `Q · R`: +`A[i,j] = Σ_k Q[i,k]·R[k,j]`, with `Q = qrQSpec A`, `R = qrRSpec A`. -/ +theorem qrSpec_reconstruction (A : Spec.Tensor ℝ (.dim m (.dim n .scalar))) + (hrank : ∀ j : Fin n, 0 < Spec.get2 (Spec.qrRSpec A) j j) (i : Fin m) (j : Fin n) : + Spec.get2 A i j + = ∑ k, Spec.get2 (Spec.qrQSpec A) i k * Spec.get2 (Spec.qrRSpec A) k j := by + have hQ : ∀ a b, Spec.get2 (Spec.qrQSpec A) a b = Qmat (Spec.toMatFn A) a b := fun _ _ => rfl + have hR : ∀ a b, Spec.get2 (Spec.qrRSpec A) a b = Rmat (Spec.toMatFn A) a b := fun _ _ => rfl + simp only [hQ, hR] + show Spec.toMatFn A i j = _ + exact qr_reconstruction (Spec.toMatFn A) (fun b => by rw [← hR b b]; exact hrank b) i j + +end QR + +end Spec.Factorization.Reconstruction diff --git a/NN/Spec/Core/Tensor/Factorizations.lean b/NN/Spec/Core/Tensor/Factorizations.lean new file mode 100644 index 0000000..f8477d2 --- /dev/null +++ b/NN/Spec/Core/Tensor/Factorizations.lean @@ -0,0 +1,367 @@ +/- +Copyright (c) 2026 TorchLean +Released under MIT license as described in the file LICENSE. +Authors: TorchLean Team +-/ + +module + +public import NN.Spec.Core.Tensor.Linalg +public import NN.Spec.Core.TensorReductionShape.LinearAlgebra + +/-! +# Matrix factorizations (spec layer) + +This file provides **real**, shape-indexed reference implementations of the two *exact, finite* +matrix factorizations that classical / scientific-ML models (Gaussian processes, kernel ridge +regression, PCA, least squares) depend on, and which were previously missing from the spec layer: + +- `choleskySpec` — Cholesky factorization `A = L · Lᵀ` (lower-triangular `L`), for SPD `A`. +- `qrSpec` — QR factorization `A = Q · R` via classical Gram–Schmidt + (`Q` has orthonormal columns, `R` upper-triangular). + +It also provides the linear solves that ride on the Cholesky factor: + +- `triSolveLowerFn` / `triSolveUpperFn` — forward / back triangular substitution; +- `cholSolveFn` — solve `A · x = b` from a Cholesky factor of `A`; +- `solveRidgeSpec` — the Tikhonov / kernel-ridge solve `(K + γ·I) · x = b`. + +## Verification scope + +The **verified** contribution is the factorizations: `choleskySpec` / `qrSpec` come with +reconstruction and structural theorems (`IsCholesky` / `IsQR`, lower- and upper-triangularity, +orthonormality) in `NN.Proofs.Tensor.Basic.Factorizations*`. The triangular- and ridge-solve +helpers above (`triSolveLowerFn`, `triSolveUpperFn`, `cholSolveFn`, `solveRidgeSpec`) are +**executable APIs only**: this PR does *not* yet prove their correctness (no +`triSolveLower · x = b` / `solveRidge` correctness theorem has landed). They are sound by +construction over the readable function representation and exercised by `#eval` examples, but +should not be read as carrying a verified-correctness guarantee. + +## Intent / tradeoffs + +Like the rest of the spec layer (`determinantSpec`, `inverseSpec`, `matMulSpec`), these prioritize +**mathematical clarity** and **shape safety** over performance, and are intended for small/medium +matrices and proof-oriented reference code. For large-scale numerics, use array-backed runtime +kernels. + +Internally the algorithms are written over the plain function representation +`Fin n → Fin n → α` (matrices) and `Fin n → α` (vectors), then wrapped back into `Spec.Tensor` +at the boundary. This keeps the numerical formulas readable and keeps later correctness proofs +working on ordinary functions rather than on nested `Tensor` `match`es. +-/ + +@[expose] public section + + +namespace Spec + +open Tensor + +variable {α : Type} [Context α] + +/-! ## Boundary conversions between `Spec.Tensor` and plain functions -/ + +/-- View a matrix tensor as a function `Fin m → Fin n → α`. -/ +def toMatFn {m n : Nat} (A : Tensor α (.dim m (.dim n .scalar))) : Fin m → Fin n → α := + fun i j => get2 A i j + +/-- Build a matrix tensor from a function `Fin m → Fin n → α`. -/ +def ofMatFn {m n : Nat} (f : Fin m → Fin n → α) : Tensor α (.dim m (.dim n .scalar)) := + Tensor.dim (fun i => Tensor.dim (fun j => Tensor.scalar (f i j))) + +/-- View a vector tensor as a function `Fin n → α`. -/ +def toVecFn {n : Nat} (v : Tensor α (.dim n .scalar)) : Fin n → α := + fun i => Tensor.toScalar (get v i) + +/-- Build a vector tensor from a function `Fin n → α`. -/ +def ofVecFn {n : Nat} (f : Fin n → α) : Tensor α (.dim n .scalar) := + Tensor.dim (fun i => Tensor.scalar (f i)) + +/-! ## Small numeric helpers on the function representation -/ + +/-- Dot product of two length-`p` vectors. -/ +def dotFn {p : Nat} (u v : Fin p → α) : α := + (List.finRange p).foldl (fun s i => s + u i * v i) 0 + +/-- Euclidean norm of a length-`p` vector. -/ +def normFn {p : Nat} (v : Fin p → α) : α := + MathFunctions.sqrt (dotFn v v) + +/-! ## Cholesky factorization + +For a symmetric positive-definite `A`, compute the lower-triangular `L` with `A = L · Lᵀ`. + +The columns are computed left to right. Column `j` uses only columns `0 .. j-1`: + +- diagonal: `L[j,j] = sqrt(A[j,j] - Σ_{k j` +- above: `L[i,j] = 0` for `i < j` + +### Trust boundary: the `@[implemented_by]` performance hooks + +Several defs here (`choleskyColsFn`, `cholSolveFn`, `solveRidgeFn`) carry an `@[implemented_by …Impl]` +attribute. The clean closure form is what the correctness proofs reason about; the `…Impl` companion +is a strict, array-backed rewrite that the compiler runs instead, so `#eval` stays fast (the closure +form re-evaluates prefixes exponentially in the interpreter). + +**This substitution is *trusted*, not verified.** No Lean theorem proves `choleskyColsFn = choleskyColsImpl` +(etc.), so compiled `#eval`/runtime code executes the `…Impl` body while the proofs constrain only the +closure body. The two are believed equal by construction (they transcribe the same recurrence), and the +numeric examples in `NN/Examples/Factorization` are *evidence* the compiled path is correct — but they +exercise the `…Impl` replacement, not the proof body, and are not a substitute for an equivalence proof. +Anything proved about `choleskyFn`/`solveRidgeFn` therefore transfers to `#eval` output only modulo this +unverified hook. +-/ + +/-- +Strict, array-backed runtime implementation of `choleskyColsFn` (registered via `@[implemented_by]`). +Each column is *materialized* into an `Array α`, so a back-reference `L[i,k]` is an `O(1)` lookup +rather than a closure that re-evaluates the whole prefix. The closure form below is mathematically +clean (and is what the proofs reason about), but reading the full factor `L` from it re-evaluates +columns exponentially — ruinous in the interpreter (`#eval`). It is *intended* to compute the same +factor strictly; this equivalence is **trusted, not proved** (see the trust-boundary note above), with +the numeric examples (`A = L·Lᵀ`, the ridge-solve residual ≈ 0) as evidence rather than a proof. +-/ +def choleskyColsImpl {n : Nat} (A : Fin n → Fin n → α) : List (Fin n → α) := + let cols : Array (Array α) := (List.finRange n).foldl (fun cols j => + let jv := j.val + -- Σ_{k if k.val < jv then s + (cols.getD k.val #[]).getD jv 0 * (cols.getD k.val #[]).getD jv 0 + else s) 0 + let Ljj := MathFunctions.sqrt (A j j - sumsq) + let colArr : Array α := Array.ofFn (fun i : Fin n => + if i.val < jv then 0 + else if i.val == jv then Ljj + else + -- Σ_{k if k.val < jv then + acc + (cols.getD k.val #[]).getD i.val 0 * (cols.getD k.val #[]).getD jv 0 else acc) 0 + (A i j - s) / Ljj) + cols.push colArr) #[] + (List.finRange n).map (fun j => fun i => (cols.getD j.val #[]).getD i.val 0) + +/-- +The list of columns of the Cholesky factor `L`, as length-`n` vectors, computed left to right. +Element `j` of the result is column `j` of `L`. Built by a left fold so that when column `j` is +formed, `cols` already holds columns `0 .. j-1`. + +The runtime implementation is `choleskyColsImpl` (strict arrays); the closure form here is the one the +correctness proofs reason about. The two are intended to compute the same factor — trusted, not proved; +see the trust-boundary note above. +-/ +@[implemented_by choleskyColsImpl] +def choleskyColsFn {n : Nat} (A : Fin n → Fin n → α) : List (Fin n → α) := + (List.finRange n).foldl (fun cols j => + -- Σ_{k ck j)).foldl (fun s x => s + x * x) 0 + let Ljj := MathFunctions.sqrt (A j j - sumsq) + let colj : Fin n → α := fun i => + if i.val < j.val then 0 + else if i.val == j.val then Ljj + else + -- Σ_{k ck i * ck j)).foldl (fun acc x => acc + x) 0 + (A i j - s) / Ljj + cols ++ [colj]) [] + +/-- Cholesky factor as a function: `L[i,j] = (choleskyColsFn A)[j] i`. -/ +def choleskyFn {n : Nat} (A : Fin n → Fin n → α) : Fin n → Fin n → α := + let cols := choleskyColsFn A + fun i j => (cols.getD j.val (fun _ => 0)) i + +/-- +Cholesky factorization of a symmetric positive-definite matrix `A`, returning the +lower-triangular factor `L` with `A = L · Lᵀ`. + +PyTorch analogue: `torch.linalg.cholesky(A)`. +-/ +def choleskySpec {n : Nat} (A : Tensor α (.dim n (.dim n .scalar))) : + Tensor α (.dim n (.dim n .scalar)) := + ofMatFn (choleskyFn (toMatFn A)) + +/-! ## Triangular solves and the kernel-ridge (Tikhonov) linear solve + +Once `A` is factored as `A = L · Lᵀ` (Cholesky), the linear system `A · x = b` is solved by two +triangular substitutions: forward-solve `L · z = b`, then back-solve `Lᵀ · x = z`. Each substitution +visits the unknowns in an order such that, when row `i` is reached, every unknown it depends on has +already been computed; the accumulator `acc` holds those values and `0` everywhere else, so the dot +`dotFn (row i) acc` is exactly the required partial sum (the not-yet-solved and structurally-zero +terms drop out). -/ + +/-- Forward substitution: solve `L · y = b` for a lower-triangular `L` with nonzero diagonal. +Unknowns are visited `0, 1, …, n-1`; when row `i` is reached `acc` holds `y₀ … yᵢ₋₁` (and `0` +elsewhere), so `dotFn (L i) acc = Σ_{k Function.update acc i ((b i - dotFn (L i) acc) / L i i)) + (fun _ => 0) + +/-- Back substitution: solve `U · x = y` for an upper-triangular `U` with nonzero diagonal. +Unknowns are visited `n-1, …, 1, 0`; when row `i` is reached `acc` holds `xᵢ₊₁ … xₙ₋₁` (and `0` +elsewhere), so `dotFn (U i) acc = Σ_{k>i} U[i,k]·xₖ` by upper-triangularity. -/ +def triSolveUpperFn {n : Nat} (U : Fin n → Fin n → α) (y : Fin n → α) : Fin n → α := + (List.finRange n).reverse.foldl + (fun acc i => Function.update acc i ((y i - dotFn (U i) acc) / U i i)) + (fun _ => 0) + +/-- +Strict, array-backed runtime implementation of `cholSolveFn` (registered via `@[implemented_by]`). +It materializes `L` into a strict `Array (Array α)` once, then runs both triangular substitutions over +`Array`s, so a back-reference is an `O(1)` lookup. The closure form below (`triSolveUpperFn` over +`triSolveLowerFn`) is mathematically clean — and is what the correctness proofs reason about — but reads +the `Function.update` accumulator chain on every step, which is ruinous in the interpreter (`#eval`) when +`L` is itself an unmaterialized closure (e.g. `choleskyFn` of a kernel matrix). It is *intended* to +compute the same solution strictly; this equivalence is **trusted, not proved** (see the trust-boundary +note above), with the numeric examples (the ridge residual ≈ 0) as evidence rather than a proof. -/ +def cholSolveImpl {n : Nat} (L : Fin n → Fin n → α) (b : Fin n → α) : Fin n → α := + let La : Array (Array α) := Array.ofFn (fun i : Fin n => Array.ofFn (fun j : Fin n => L i j)) + let Lent : Nat → Nat → α := fun i j => (La.getD i #[]).getD j 0 + -- Forward solve `L · z = b`: `z[i] = (b[i] − Σ_{k + let iv := i.val + let s := (List.finRange n).foldl + (fun acc k => if k.val < iv then acc + Lent iv k.val * z.getD k.val 0 else acc) 0 + z.push ((b i - s) / Lent iv iv)) #[] + -- Back solve `Lᵀ · x = z`: `x[i] = (z[i] − Σ_{k>i} L[k,i]·x[k]) / L[i,i]`, `i = n−1 … 0`. + let x : Array α := (List.finRange n).reverse.foldl (fun xs i => + let iv := i.val + let s := (List.finRange n).foldl + (fun acc k => if iv < k.val then acc + Lent k.val iv * xs.getD k.val 0 else acc) 0 + xs.set! iv ((z.getD iv 0 - s) / Lent iv iv)) (Array.replicate n 0) + fun i => x.getD i.val 0 + +/-- Solve `A · x = b` given a Cholesky factor `L` of `A` (so `A = L · Lᵀ`): forward-solve +`L · z = b`, then back-solve `Lᵀ · x = z`. + +The runtime implementation is `cholSolveImpl` (strict arrays); the closure form here is what the +correctness proofs reason about. The two are intended to compute the same solution — trusted, not +proved; see the trust-boundary note above. -/ +@[implemented_by cholSolveImpl] +def cholSolveFn {n : Nat} (L : Fin n → Fin n → α) (b : Fin n → α) : Fin n → α := + triSolveUpperFn (fun i k => L k i) (triSolveLowerFn L b) + +/-- The regularized matrix `K + γ·I` as a function. For a symmetric PSD kernel `K` and `γ > 0` +this is symmetric positive-definite, so its Cholesky factorization succeeds. -/ +def addScaledIdFn {n : Nat} (K : Fin n → Fin n → α) (γ : α) : Fin n → Fin n → α := + fun i j => K i j + (if i = j then γ else 0) + +/-- +Strict, array-backed runtime implementation of `solveRidgeFn` (registered via `@[implemented_by]`). +It factors `K + γ·I = L·Lᵀ` and runs both triangular substitutions entirely over `Array`s, so no step +materializes the deep `Fin n → α` closures the functional definition builds — those re-evaluate +columns / the substitution accumulator exponentially, which is ruinous in the interpreter (`#eval`). +Intended to be the same linear solve; this equivalence is **trusted, not proved** (see the +trust-boundary note above), with the numeric examples (residual `(K+γ·I)·x − b ≈ 0`) as evidence +rather than a proof. +-/ +def solveRidgeImpl {n : Nat} (K : Fin n → Fin n → α) (γ : α) (b : Fin n → α) : Fin n → α := + let A : Fin n → Fin n → α := fun i j => K i j + (if i.val == j.val then γ else 0) + -- Cholesky columns, left to right: `cols[j][i] = L[i][j]` (strict arrays, `O(1)` back-reference). + let cols : Array (Array α) := (List.finRange n).foldl (fun cols j => + let jv := j.val + let sumsq := (List.finRange n).foldl + (fun s k => if k.val < jv then let v := (cols.getD k.val #[]).getD jv 0; s + v * v else s) 0 + let Ljj := MathFunctions.sqrt (A j j - sumsq) + cols.push (Array.ofFn (fun i : Fin n => + if i.val < jv then 0 + else if i.val == jv then Ljj + else + let s := (List.finRange n).foldl (fun acc k => + if k.val < jv then + acc + (cols.getD k.val #[]).getD i.val 0 * (cols.getD k.val #[]).getD jv 0 + else acc) 0 + (A i j - s) / Ljj))) #[] + let Lent : Nat → Nat → α := fun i j => (cols.getD j #[]).getD i 0 + -- Forward solve `L · z = b`: `z[i] = (b[i] − Σ_{k + let iv := i.val + let s := (List.finRange n).foldl + (fun acc k => if k.val < iv then acc + Lent iv k.val * z.getD k.val 0 else acc) 0 + z.push ((b i - s) / Lent iv iv)) #[] + -- Back solve `Lᵀ · x = z`: `x[i] = (z[i] − Σ_{k>i} L[k,i]·x[k]) / L[i,i]`, `i = n−1 … 0`. + let x : Array α := (List.finRange n).reverse.foldl (fun xs i => + let iv := i.val + let s := (List.finRange n).foldl + (fun acc k => if iv < k.val then acc + Lent k.val iv * xs.getD k.val 0 else acc) 0 + xs.set! iv ((z.getD iv 0 - s) / Lent iv iv)) (Array.replicate n 0) + fun i => x.getD i.val 0 + +/-- The Tikhonov-regularized (kernel-ridge) solve `(K + γ·I)·x = b`, via the Cholesky factorization +of `K + γ·I`. + +The runtime implementation is `solveRidgeImpl` (strict arrays); the closure form here, built from the +`choleskyFn` / `triSolve*` pieces the correctness proofs reason about. The two are intended to compute +the same solution — trusted, not proved; see the trust-boundary note above. -/ +@[implemented_by solveRidgeImpl] +def solveRidgeFn {n : Nat} (K : Fin n → Fin n → α) (γ : α) (b : Fin n → α) : Fin n → α := + cholSolveFn (choleskyFn (addScaledIdFn K γ)) b + +/-- Tensor-level kernel-ridge solve: `(K + γ·I)·x = b`. + +PyTorch analogue: `torch.linalg.solve(K + γ·I, b)` (specialized to the SPD Cholesky path). -/ +def solveRidgeSpec {n : Nat} (K : Tensor α (.dim n (.dim n .scalar))) (γ : α) + (b : Tensor α (.dim n .scalar)) : Tensor α (.dim n .scalar) := + ofVecFn (solveRidgeFn (toMatFn K) γ (toVecFn b)) + +/-! ## QR factorization (classical Gram–Schmidt) + +For `A : m × n`, produce `Q : m × n` with orthonormal columns and `R : n × n` upper-triangular +such that `A = Q · R`. This uses **classical** Gram–Schmidt: each `r[k,j] = qₖ · aⱼ` is the inner +product against the *original* column `aⱼ`, and all projections are subtracted in a single pass +(modified Gram–Schmidt would instead dot each `qₖ` against the running residual). In exact real +arithmetic the two coincide; the classical form is what the recurrence below implements and what +the reconstruction proof matches. +-/ + +/-- Internal state for the Gram–Schmidt fold: computed `Q` columns and `R` columns so far. -/ +structure GSState (m n : Nat) (α : Type) where + /-- Orthonormal `Q` columns produced so far (each of length `m`). -/ + qs : List (Fin m → α) + /-- `R` columns produced so far (each of length `n`, upper-triangular). -/ + rcols : List (Fin n → α) + +/-- +Run classical Gram–Schmidt over the columns of `A`, returning the `Q` columns and `R` columns. +Column `j` is orthogonalized against the previously produced `Q` columns. +-/ +def gramSchmidtFn {m n : Nat} (A : Fin m → Fin n → α) : GSState m n α := + (List.finRange n).foldl (fun (st : GSState m n α) j => + let a : Fin m → α := fun i => A i j + -- r[k,j] = qₖ · a for each previously computed column k + let rkjs : List α := st.qs.map (fun qk => dotFn qk a) + -- v = a - Σ r[k,j] qₖ + let v : Fin m → α := fun i => + a i - (List.zip st.qs rkjs).foldl (fun acc (qk, r) => acc + r * qk i) 0 + let rjj := normFn v + let qj : Fin m → α := fun i => if Context.gtBool rjj 0 then v i / rjj else 0 + let rcolj : Fin n → α := fun k => + if k.val < j.val then rkjs.getD k.val 0 + else if k.val == j.val then rjj + else 0 + { qs := st.qs ++ [qj], rcols := st.rcols ++ [rcolj] }) { qs := [], rcols := [] } + +/-- The `Q` factor (orthonormal columns) of the QR factorization of `A`. -/ +def qrQSpec {m n : Nat} (A : Tensor α (.dim m (.dim n .scalar))) : + Tensor α (.dim m (.dim n .scalar)) := + let st := gramSchmidtFn (toMatFn A) + ofMatFn (fun i j => (st.qs.getD j.val (fun _ => 0)) i) + +/-- The `R` factor (upper-triangular) of the QR factorization of `A`. -/ +def qrRSpec {m n : Nat} (A : Tensor α (.dim m (.dim n .scalar))) : + Tensor α (.dim n (.dim n .scalar)) := + let st := gramSchmidtFn (toMatFn A) + ofMatFn (fun k j => (st.rcols.getD j.val (fun _ => 0)) k) + +/-- +QR factorization of `A : m × n` via classical Gram–Schmidt, returning `(Q, R)` with +`A = Q · R`, `Q` orthonormal columns, `R` upper-triangular. + +PyTorch analogue: `torch.linalg.qr(A)`. +-/ +def qrSpec {m n : Nat} (A : Tensor α (.dim m (.dim n .scalar))) : + Tensor α (.dim m (.dim n .scalar)) × Tensor α (.dim n (.dim n .scalar)) := + (qrQSpec A, qrRSpec A) + +end Spec diff --git a/blueprint/TorchLeanBlueprint/Guide.lean b/blueprint/TorchLeanBlueprint/Guide.lean index 65f239f..4588fd3 100644 --- a/blueprint/TorchLeanBlueprint/Guide.lean +++ b/blueprint/TorchLeanBlueprint/Guide.lean @@ -33,6 +33,7 @@ import TorchLeanBlueprint.Guide.Ch4_Verification.ApproximationTheory import TorchLeanBlueprint.Guide.Ch4_Verification.ClassicalMLProofs import TorchLeanBlueprint.Guide.Ch4_Verification.ProbabilityAndGradients import TorchLeanBlueprint.Guide.Ch4_Verification.ScientificMLVerification +import TorchLeanBlueprint.Guide.Ch4_Verification.FactorizationsCholeskyQR import TorchLeanBlueprint.Guide.Ch4_Verification.Certificates import TorchLeanBlueprint.Guide.Ch4_Verification.FP32Soundness import TorchLeanBlueprint.Guide.Ch4_Verification.TwoStageWorkflows @@ -231,6 +232,8 @@ into precise mathematical statements. {include 2 TorchLeanBlueprint.Guide.Ch4_Verification.ScientificMLVerification} +{include 2 TorchLeanBlueprint.Guide.Ch4_Verification.FactorizationsCholeskyQR} + {include 2 TorchLeanBlueprint.Guide.Ch4_Verification.Certificates} {include 2 TorchLeanBlueprint.Guide.Ch4_Verification.TwoStageWorkflows} diff --git a/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/FactorizationsCholeskyQR.lean b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/FactorizationsCholeskyQR.lean new file mode 100644 index 0000000..4d9c3f8 --- /dev/null +++ b/blueprint/TorchLeanBlueprint/Guide/Ch4_Verification/FactorizationsCholeskyQR.lean @@ -0,0 +1,82 @@ +import VersoManual + +open Verso.Genre Manual + +#doc (Manual) "Matrix Factorizations: Cholesky and QR" => +%%% +tag := "factorizations-cholesky-qr" +%%% + +Classical and scientific-ML models — Gaussian processes, kernel-ridge regression, PCA, least squares — +rest on a handful of matrix factorizations that were missing from the spec layer. This chapter adds the +two *finite, exact* ones: the Cholesky factorization `A = L·Lᵀ` of an SPD matrix and the QR +factorization `A = Q·R` by classical Gram–Schmidt. Both are *direct* algorithms — no iteration, no +convergence caveat — so their correctness is an *exact identity*, not an a-posteriori bound. + +# Specifications + +Each factorization is given a `Prop`-level meaning over real matrices, independent of any particular +algorithm: + +- `IsCholesky A L` — `L` is lower-triangular and `A = L·Lᵀ`; +- `IsQR A Q R` — `Q` has orthonormal columns (`Qᵀ·Q = 1`), `R` is upper-triangular, and `A = Q·R`. + +The executable specs `choleskySpec` / `qrSpec` (over the readable `Fin n → Fin n → α` function +representation, wrapped back into `Spec.Tensor` at the boundary) are then proved to *produce* objects +satisfying these predicates. + +# Exact Cholesky reconstruction + +`choleskyFn` builds `L` one column at a time by a left fold. Two structural facts are proved directly +from that fold. First, *lower-triangularity*: the entry strictly above the diagonal is forced to `0` by +construction (`choleskyFn_lower_triangular`, lifted to the tensor level as +`choleskySpec_lower_triangular`). Second, *reconstruction*: the theorem `isCholesky_of_pos` assumes the +algorithm's success condition directly as a hypothesis — that every *executable pivot* is positive, +`0 < choleskyFn A j j` — and under it the fold satisfies + +$$`A = L\,L^{\top},` + +i.e. `IsCholesky A (choleskyFn A)`. This is exact over `ℝ`; the only hypothesis is positivity of the +executable pivots, which is exactly the condition under which Cholesky succeeds over `ℝ`. (The +mathematical fact that an SPD `A` — `Matrix.PosDef` — yields positive executable pivots is the expected +sufficient condition, but the reduction `PosDef A → ∀ j, 0 < choleskyFn A j j` is *not* formalized here; +the theorem takes the positive-pivot hypothesis as given.) + +# Exact QR reconstruction and orthonormality + +`qrSpec` runs classical Gram–Schmidt. Bridging the executable column fold to Mathlib's `gramSchmidt` +gives the two QR guarantees, both exact under a full-column-rank hypothesis (each `R` diagonal entry +positive): + +$$`Q^{\top} Q = 1 \qquad\text{and}\qquad A = Q\,R, \quad R \text{ upper-triangular}.` + +Orthonormality (`qrSpec_orthonormal` / `QT_mul_Q_eq_one`) follows because each Gram–Schmidt column is the +normalization of a vector orthogonal to the span of its predecessors; reconstruction follows by +re-expanding that span. Together they give `IsQR A Q R`. + +# Honest scope + +Everything in this chapter is an *exact finite identity*: there is no sweep count, no residual, no +asymptotic limit. Three layers are kept distinct, and only the first is a verified claim: + +- *Proved specs* (over `ℝ`): the predicates `IsCholesky` / `IsQR`, together with reconstruction, + triangularity, and orthonormality, each derived from the executable column fold. +- *Executable examples* (over `Float`): concrete witnesses with residual checks — evidence that the + definitions run and reconstruct, not proofs about floating-point arithmetic. +- *Trusted runtime hooks*: the strict-array `@[implemented_by]` replacements are runtime substitutions + used for fast evaluation; they are *not* proved equal to the clean proof definitions. + +The formal Cholesky hypothesis is *positivity of the executable pivots*, `∀ j, 0 < choleskyFn A j j` — +*not* SPD. `Matrix.PosDef A` is the expected sufficient condition for those pivots to be positive, but +the implication `PosDef A → ∀ j, 0 < choleskyFn A j j` is *not* formalized in this PR. QR likewise +assumes positive executable `R`-pivots (full column rank), not a separately proved rank hypothesis. Under +those pivot hypotheses no goal in the chapter is left unproved. The triangular- and ridge-solve helpers +that ride on the Cholesky factor are shipped as executable APIs only; their correctness theorems are not +part of this PR. + +# Executable witnesses + +`NN.Examples.Factorization.Cholesky` and `…QR` exhibit each factorization on a concrete matrix: a +positive reconstruction check (`‖A − L·Lᵀ‖`, `‖A − Q·R‖`, `‖Qᵀ·Q − I‖` all at machine zero) paired with a +negative control, every check a `#eval` over `Float`, with no unproved goals, green on +`lake build NN.Examples.Factorization`.