Skip to content

cvxgrp/aa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

187 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

aa — Anderson Acceleration

Build Valgrind

A small C library (with Python bindings) that accelerates fixed-point iterations x ← F(x) using Anderson Acceleration — Type-I or Type-II, with optional relaxation and a built-in safeguarding step. Useful whenever you have a contraction or nonexpansive map — gradient descent, proximal algorithms, operator-splitting solvers (ADMM / PDHG), fixed-point optimization — and want to converge in fewer iterations without changing the underlying map.

The algorithm and its theoretical guarantees are described in Globally Convergent Type-I Anderson Acceleration for Non-Smooth Fixed-Point Iterations. The MATLAB code used for the experiments in that paper lives at cvxgrp/nonexp_global_aa1.

How it works

At every iteration the library looks back at the last mem iterates and solves a small least-squares problem to pick a linear combination that should drive x − F(x) toward zero faster than plain x ← F(x). You keep calling your own map F; AA only decides what point to feed it next. A built-in safeguard rejects AA steps that don't make progress, falling back to the underlying iteration so convergence is preserved even when AA misbehaves.

The standard usage pattern is:

for i = 0, 1, 2, ...
    if i > 0: aa_apply(x, x_prev)    # replaces x with AA extrapolate
    x_prev = x
    x = F(x)                         # your map — unchanged
    aa_safeguard(x, x_prev)          # accept or roll back

Install

Python

pip install anderson-acceleration

The wheel is linked against an optimized BLAS/LAPACK on every platform: OpenBLAS is bundled into the wheel on Linux and Windows, and Apple's Accelerate framework is used on macOS (already ships with the OS), so you don't need a system BLAS installed.

C, from source

Requires a C compiler and any BLAS/LAPACK (reference BLAS, OpenBLAS, MKL, Accelerate, ...).

make                                           # default: -lblas -llapack
make LDLIBS="-framework Accelerate"            # macOS, Apple Accelerate
make LDLIBS="-lopenblas"                       # OpenBLAS (bundles LAPACK)
make LDLIBS="-lmkl_rt -lpthread -lm -ldl"      # Intel MKL
make test                                      # run the test suite
out/gd                                         # run the GD+AA example

This produces out/libaa.a (static library) and out/gd (example binary).

Quickstart (Python)

Minimize a convex quadratic ½ x'Qx − q'x by gradient descent, accelerated with AA:

import numpy as np
import aa

dim, mem, N = 100, 10, 1000
rng = np.random.default_rng(0)
Qh = rng.standard_normal((dim, dim)) / np.sqrt(dim)
Q  = Qh.T @ Qh + 1e-3 * np.eye(dim)
q  = rng.standard_normal(dim)
eigs = np.linalg.eigvalsh(Q)
step = 2.0 / (eigs.min() + eigs.max())  # optimal GD step for a quadratic
x_star = np.linalg.solve(Q, q)             # true optimum, for error measurement
f = lambda x: 0.5 * x @ Q @ x - q @ x      # objective
f_star = f(x_star)

acc = aa.AndersonAccelerator(dim, mem, type1=False, regularization=1e-12)

x = rng.standard_normal(dim)
x_prev = x.copy()
for i in range(N):
    if i > 0:
        _ = acc.apply(x, x_prev)           # in-place: overwrites x with AA extrapolate
    x_prev = x.copy()
    x = x - step * (Q @ x_prev - q)        # your map F — gradient step
    _ = acc.safeguard(x, x_prev)           # rolls back if AA didn't help
    print(f"iter {i:4d}  f(x) - f* = {f(x) - f_star:.3e}")

Convergence on this problem for vanilla GD vs AA-accelerated GD (Type-I and Type-II, both with mem=10):

convergence

Type-II converges smoothly; Type-I is more aggressive and makes plateau-style progress as the safeguard rejects-then-accepts steps. Both beat vanilla GD by several orders of magnitude in the same number of iterations. The plot is generated by python/plot_convergence.py. A fuller example that sweeps memory sizes is in python/example.py. Note that running these Python examples requires installing matplotlib (pip install matplotlib).

Quickstart (C)

#include "aa.h"

AaWork *a = aa_init(n,     /* dim              */
                    10,    /* mem              */
                    10,    /* min_len          */
                    1,     /* type1            */
                    1e-8,  /* regularization   */
                    1.0,   /* relaxation       */
                    2.0,   /* safeguard_factor */
                    1e10,  /* max_weight_norm  */
                    5,     /* ir_max_steps     */
                    0);    /* verbosity        */

for (int i = 0; i < N; i++) {
    if (i > 0) aa_apply(x, x_prev, a);
    memcpy(x_prev, x, sizeof(aa_float) * n);
    F(x);                          /* your in-place map */
    aa_safeguard(x, x_prev, a);
}

aa_finish(a);

See tests/c/gd.c for a complete runnable example (gradient descent on a random convex quadratic).

Parameters

Parameter Meaning Typical value
dim Problem dimension your variable size
mem Number of past iterates to look back 5 – 20
min_len Minimum buffered residual pairs before AA begins extrapolating. min_len = mem waits for the memory to fill (stable default); min_len = 1 starts extrapolating immediately. Must be ≥ 1 when mem > 0; clamped down when it exceeds min(mem, dim). mem
type1 Type-I if true, Type-II otherwise see notes below
regularization Tikhonov regularization on the AA least-squares system. > 0: scaled by ‖A‖_F·‖Y‖_F. < 0: pinned absolute -regularization (no scaling). = 0: off. Type-I: 1e-8, Type-II: 1e-12
relaxation Mixing parameter in [0, 2]; 1.0 is vanilla AA 1.0
safeguard_factor Multiplier on the residual-growth ratio beyond which the AA step is rejected. Larger = more aggressive. 2.0
max_weight_norm Upper bound on the norm of the AA combination weights; rejects numerically unstable steps 1e61e10
ir_max_steps Cap on iterative-refinement passes for the weight solve. The loop stops early when refinement stalls, so this is an upper bound; raise for ill-conditioned problems, lower for tighter cost bounds. 5
verbosity 0 silent, higher values print progress and diagnostics 0

Type-I vs Type-II. Type-I often makes faster progress on well-conditioned problems but can be sensitive; Type-II is more robust. If one fails, try the other. Both tolerate nonsmooth F thanks to the safeguard, though convergence guarantees in that regime are stronger for Type-I (see the paper).

Python API

aa.AndersonAccelerator(
    dim,
    mem,
    *,
    min_len=None,          # defaults to min(mem, dim)
    type1=False,
    regularization=1e-12,
    relaxation=1.0,
    safeguard_factor=1.0,
    max_weight_norm=1e6,
    ir_max_steps=5,
    verbosity=0,
)

All options after mem are keyword-only. Array arguments must be C-contiguous, writeable float64 numpy arrays of length dim.

Method Description
apply(f, x) Call once per iteration (skip the first). f holds the most recent map output F(x). Overwrites f in place with the AA-extrapolated point.
safeguard(f_new, x_new) Call after running your map on the AA extrapolate. If AA did not make progress, reverts both arrays to the last-known-good state. Returns 0 on accept, -1 on reject.
reset() Clears AA state (equivalent to re-initializing) without reallocating. Lifetime stats counters are NOT cleared.
stats Read-only property returning a dict of lifetime counters: iter, n_accept, n_reject_lapack, n_reject_rank0, n_reject_nonfinite, n_reject_weight_cap, n_safeguard_reject, last_rank, last_aa_norm (NaN until the first solve), last_regularization. Useful for diagnosing when AA isn't helping — rising n_reject_weight_cap or n_reject_nonfinite points at max_weight_norm / regularization tuning; rising n_safeguard_reject points at safeguard_factor / mem; n_reject_rank0 is normal near convergence (memory is numerically zero).

C API

See include/aa.h for the full interface, which mirrors the Python API exactly:

AaWork *aa_init(aa_int dim, aa_int mem, aa_int min_len, aa_int type1,
                aa_float regularization, aa_float relaxation,
                aa_float safeguard_factor, aa_float max_weight_norm,
                aa_int ir_max_steps, aa_int verbosity);

aa_float aa_apply(aa_float *f, const aa_float *x, AaWork *a);
aa_int   aa_safeguard(aa_float *f_new, aa_float *x_new, AaWork *a);
void     aa_reset(AaWork *a);
void     aa_finish(AaWork *a);
AaStats  aa_get_stats(const AaWork *a);

aa_apply returns the (signed) norm of the AA weight vector: positive means the step was taken, negative means it was rejected (and f is left unchanged).

Precision and BLAS integer width

Defaults: aa_float = double, aa_int = int, BLAS integers are int with a trailing underscore on symbol names (e.g. dgemv_).

To change these, compile with:

  • -DSFLOAT — use single-precision float throughout.
  • -DBLAS64 — 64-bit BLAS integers (int64_t).
  • -DNOBLASSUFFIX — no trailing underscore on BLAS symbols.
  • -DBLASSUFFIX=... — a different suffix.

Building the Python bindings from source

python -m pip install --upgrade pip
pip install cython numpy
pip install -e .
python python/example.py

The bindings #include the C source directly, so no separate library is needed.

Citing

If you use this library in academic work, please cite:

@article{zhang2020globally,
  title   = {Globally convergent type-{I} {A}nderson acceleration for nonsmooth fixed-point iterations},
  author  = {Zhang, Junzi and O'Donoghue, Brendan and Boyd, Stephen},
  journal = {SIAM Journal on Optimization},
  volume  = {30},
  number  = {4},
  pages   = {3170--3197},
  year    = {2020}
}

License

MIT — see LICENSE.txt.

About

Anderson Acceleration

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors