A small C library (with Python bindings) that accelerates fixed-point
iterations x ← F(x) using Anderson Acceleration — Type-I or Type-II,
with optional relaxation and a built-in safeguarding step. Useful whenever
you have a contraction or nonexpansive map — gradient descent, proximal
algorithms, operator-splitting solvers (ADMM / PDHG), fixed-point
optimization — and want to converge in fewer iterations without changing
the underlying map.
The algorithm and its theoretical guarantees are described in Globally Convergent Type-I Anderson Acceleration for Non-Smooth Fixed-Point Iterations. The MATLAB code used for the experiments in that paper lives at cvxgrp/nonexp_global_aa1.
At every iteration the library looks back at the last mem iterates and
solves a small least-squares problem to pick a linear combination that
should drive x − F(x) toward zero faster than plain x ← F(x). You keep
calling your own map F; AA only decides what point to feed it next. A
built-in safeguard rejects AA steps that don't make progress, falling back
to the underlying iteration so convergence is preserved even when AA
misbehaves.
The standard usage pattern is:
for i = 0, 1, 2, ...
if i > 0: aa_apply(x, x_prev) # replaces x with AA extrapolate
x_prev = x
x = F(x) # your map — unchanged
aa_safeguard(x, x_prev) # accept or roll back
pip install anderson-accelerationThe wheel is linked against an optimized BLAS/LAPACK on every platform: OpenBLAS is bundled into the wheel on Linux and Windows, and Apple's Accelerate framework is used on macOS (already ships with the OS), so you don't need a system BLAS installed.
Requires a C compiler and any BLAS/LAPACK (reference BLAS, OpenBLAS, MKL, Accelerate, ...).
make # default: -lblas -llapack
make LDLIBS="-framework Accelerate" # macOS, Apple Accelerate
make LDLIBS="-lopenblas" # OpenBLAS (bundles LAPACK)
make LDLIBS="-lmkl_rt -lpthread -lm -ldl" # Intel MKL
make test # run the test suite
out/gd # run the GD+AA exampleThis produces out/libaa.a (static library) and out/gd (example binary).
Minimize a convex quadratic ½ x'Qx − q'x by gradient descent, accelerated
with AA:
import numpy as np
import aa
dim, mem, N = 100, 10, 1000
rng = np.random.default_rng(0)
Qh = rng.standard_normal((dim, dim)) / np.sqrt(dim)
Q = Qh.T @ Qh + 1e-3 * np.eye(dim)
q = rng.standard_normal(dim)
eigs = np.linalg.eigvalsh(Q)
step = 2.0 / (eigs.min() + eigs.max()) # optimal GD step for a quadratic
x_star = np.linalg.solve(Q, q) # true optimum, for error measurement
f = lambda x: 0.5 * x @ Q @ x - q @ x # objective
f_star = f(x_star)
acc = aa.AndersonAccelerator(dim, mem, type1=False, regularization=1e-12)
x = rng.standard_normal(dim)
x_prev = x.copy()
for i in range(N):
if i > 0:
_ = acc.apply(x, x_prev) # in-place: overwrites x with AA extrapolate
x_prev = x.copy()
x = x - step * (Q @ x_prev - q) # your map F — gradient step
_ = acc.safeguard(x, x_prev) # rolls back if AA didn't help
print(f"iter {i:4d} f(x) - f* = {f(x) - f_star:.3e}")Convergence on this problem for vanilla GD vs AA-accelerated GD (Type-I and
Type-II, both with mem=10):
Type-II converges smoothly; Type-I is more aggressive and makes plateau-style
progress as the safeguard rejects-then-accepts steps. Both beat vanilla GD by
several orders of magnitude in the same number of iterations. The plot is
generated by python/plot_convergence.py. A
fuller example that sweeps memory sizes is in
python/example.py. Note that running these Python
examples requires installing matplotlib (pip install matplotlib).
#include "aa.h"
AaWork *a = aa_init(n, /* dim */
10, /* mem */
10, /* min_len */
1, /* type1 */
1e-8, /* regularization */
1.0, /* relaxation */
2.0, /* safeguard_factor */
1e10, /* max_weight_norm */
5, /* ir_max_steps */
0); /* verbosity */
for (int i = 0; i < N; i++) {
if (i > 0) aa_apply(x, x_prev, a);
memcpy(x_prev, x, sizeof(aa_float) * n);
F(x); /* your in-place map */
aa_safeguard(x, x_prev, a);
}
aa_finish(a);See tests/c/gd.c for a complete runnable example
(gradient descent on a random convex quadratic).
| Parameter | Meaning | Typical value |
|---|---|---|
dim |
Problem dimension | your variable size |
mem |
Number of past iterates to look back | 5 – 20 |
min_len |
Minimum buffered residual pairs before AA begins extrapolating. min_len = mem waits for the memory to fill (stable default); min_len = 1 starts extrapolating immediately. Must be ≥ 1 when mem > 0; clamped down when it exceeds min(mem, dim). |
mem |
type1 |
Type-I if true, Type-II otherwise | see notes below |
regularization |
Tikhonov regularization on the AA least-squares system. > 0: scaled by ‖A‖_F·‖Y‖_F. < 0: pinned absolute -regularization (no scaling). = 0: off. |
Type-I: 1e-8, Type-II: 1e-12 |
relaxation |
Mixing parameter in [0, 2]; 1.0 is vanilla AA |
1.0 |
safeguard_factor |
Multiplier on the residual-growth ratio beyond which the AA step is rejected. Larger = more aggressive. | 2.0 |
max_weight_norm |
Upper bound on the norm of the AA combination weights; rejects numerically unstable steps | 1e6 – 1e10 |
ir_max_steps |
Cap on iterative-refinement passes for the weight solve. The loop stops early when refinement stalls, so this is an upper bound; raise for ill-conditioned problems, lower for tighter cost bounds. | 5 |
verbosity |
0 silent, higher values print progress and diagnostics |
0 |
Type-I vs Type-II. Type-I often makes faster progress on well-conditioned
problems but can be sensitive; Type-II is more robust. If one fails, try the
other. Both tolerate nonsmooth F thanks to the safeguard, though
convergence guarantees in that regime are stronger for Type-I (see the paper).
aa.AndersonAccelerator(
dim,
mem,
*,
min_len=None, # defaults to min(mem, dim)
type1=False,
regularization=1e-12,
relaxation=1.0,
safeguard_factor=1.0,
max_weight_norm=1e6,
ir_max_steps=5,
verbosity=0,
)All options after mem are keyword-only. Array arguments must be
C-contiguous, writeable float64 numpy arrays of length dim.
| Method | Description |
|---|---|
apply(f, x) |
Call once per iteration (skip the first). f holds the most recent map output F(x). Overwrites f in place with the AA-extrapolated point. |
safeguard(f_new, x_new) |
Call after running your map on the AA extrapolate. If AA did not make progress, reverts both arrays to the last-known-good state. Returns 0 on accept, -1 on reject. |
reset() |
Clears AA state (equivalent to re-initializing) without reallocating. Lifetime stats counters are NOT cleared. |
stats |
Read-only property returning a dict of lifetime counters: iter, n_accept, n_reject_lapack, n_reject_rank0, n_reject_nonfinite, n_reject_weight_cap, n_safeguard_reject, last_rank, last_aa_norm (NaN until the first solve), last_regularization. Useful for diagnosing when AA isn't helping — rising n_reject_weight_cap or n_reject_nonfinite points at max_weight_norm / regularization tuning; rising n_safeguard_reject points at safeguard_factor / mem; n_reject_rank0 is normal near convergence (memory is numerically zero). |
See include/aa.h for the full interface, which mirrors the
Python API exactly:
AaWork *aa_init(aa_int dim, aa_int mem, aa_int min_len, aa_int type1,
aa_float regularization, aa_float relaxation,
aa_float safeguard_factor, aa_float max_weight_norm,
aa_int ir_max_steps, aa_int verbosity);
aa_float aa_apply(aa_float *f, const aa_float *x, AaWork *a);
aa_int aa_safeguard(aa_float *f_new, aa_float *x_new, AaWork *a);
void aa_reset(AaWork *a);
void aa_finish(AaWork *a);
AaStats aa_get_stats(const AaWork *a);aa_apply returns the (signed) norm of the AA weight vector: positive means
the step was taken, negative means it was rejected (and f is left
unchanged).
Defaults: aa_float = double, aa_int = int, BLAS integers are int with
a trailing underscore on symbol names (e.g. dgemv_).
To change these, compile with:
-DSFLOAT— use single-precisionfloatthroughout.-DBLAS64— 64-bit BLAS integers (int64_t).-DNOBLASSUFFIX— no trailing underscore on BLAS symbols.-DBLASSUFFIX=...— a different suffix.
python -m pip install --upgrade pip
pip install cython numpy
pip install -e .
python python/example.pyThe bindings #include the C source directly, so no separate library is
needed.
If you use this library in academic work, please cite:
@article{zhang2020globally,
title = {Globally convergent type-{I} {A}nderson acceleration for nonsmooth fixed-point iterations},
author = {Zhang, Junzi and O'Donoghue, Brendan and Boyd, Stephen},
journal = {SIAM Journal on Optimization},
volume = {30},
number = {4},
pages = {3170--3197},
year = {2020}
}MIT — see LICENSE.txt.
