Summary
rustworkx.pagerank() uses a convergence criterion norm < N * tol where norm is the L1 distance between successive iterates. Since L1 distance between probability vectors is bounded by 2, this threshold becomes useless once N > 2/tol. On large sparse graphs, the first iteration's L1 diff can be smaller than the threshold simply because mass barely moves in one step — the function then returns the initial uniform vector and reports convergence successfully.
The user gets a PageRank result that looks valid (sums to 1, no errors raised), but every node has score 1/N. Silent data corruption.
Environment
rustworkx version: 0.17.1
- Python: 3.12.8
- OS: Linux (Ubuntu 22.04)
Minimal Reproduction
import rustworkx as rx
import numpy as np
g = rx.PyDiGraph()
for i in range(2000):
g.add_node(i)
g.add_edge(0, 1, 1.0)
g.add_edge(1, 2, 1.0)
pr = rx.pagerank(g, alpha=0.85)
vals = np.array([pr[i] for i in range(2000)])
print(f"All uniform? {np.allclose(vals, 1/2000)}") # True — BUG
print(f"Node 2 score: {pr[2]:.6f}") # 0.000500 (uniform)
print(f"Expected: 0.001285 (2.6x above uniform)")
Output:
All uniform? True
Node 2 score: 0.000500
Expected: 0.001285 (2.6x above uniform)
Where It Triggers
Tested with minimal repros:
| Case |
Result |
| N=10, 80% dangling |
OK |
| N=1,000, 99.8% dangling |
OK |
| N=2,000, 2 edges (99.9% dangling) |
FAIL — uniform output |
| N=10,000, 2 edges |
FAIL — uniform output |
| N=10,000, 50% dangling (5K edges) |
OK |
| N=100,000, 2 edges |
FAIL — uniform output |
Bug triggers when N >= ~2000 and the active subgraph is sparse enough that the first power-iteration step produces an L1 diff below N * tol. We originally hit this on an 807,253-node talent migration graph (38% dangling) where rx.pagerank() silently returned uniform 1/N for all nodes.
Root Cause
src/link_analysis.rs lines 192-208:
let mut has_converged = false;
for _ in 0..max_iter {
let dangling_sum: f64 = is_dangling
.iter()
.zip(popularity.iter())
.map(|(cond, pop)| if *cond { *pop } else { 0.0 })
.sum();
let new_popularity =
alpha * ((&a * &popularity) + (dangling_sum * &dangling_weights)) + &damping;
let norm: f64 = new_popularity.l1_dist(&popularity).unwrap();
if norm < (n as f64) * tol { // <<<<<<< BUG HERE
has_converged = true;
break;
} else {
popularity = new_popularity;
}
}
The threshold (n as f64) * tol means:
| N |
tol |
threshold |
% of max L1 dist (=2) |
| 1,000 |
1e-6 |
0.001 |
0.05% |
| 10,000 |
1e-6 |
0.01 |
0.5% |
| 100,000 |
1e-6 |
0.1 |
5% |
| 1,000,000 |
1e-6 |
1.0 |
50% |
Any iteration whose L1 diff from the previous is less than N * tol is called "converged" — but for large N this happens trivially before any real convergence.
Manual Trace
N=2000, 2 edges (path 0→1→2), alpha=0.85, tol=1e-6:
iter 0: L1 diff = 1.698e-03, threshold = 2.000e-03 ← converged! (but it isn't)
iter 1: L1 diff = 7.214e-04 ← would have kept improving
iter 2: L1 diff = 1.226e-06
iter 3: L1 diff = 2.598e-07
iter 4: L1 diff = 6.631e-10 ← actual convergence
The true converged values are pr[2] = 0.00128, pr[1] = 0.00092, pr[0] = 0.00050. Rustworkx returns 0.00050 for all three.
Proposed Fix
Change line 202 from norm < (n as f64) * tol to norm < tol:
- if norm < (n as f64) * tol {
+ if norm < tol {
has_converged = true;
break;
} else {
This matches the docstring semantics ("error tolerance used when checking for convergence in the power method") and treats tol as an absolute L1 tolerance.
Happy to open a PR with the one-line fix + regression test.
Related: NetworkX has the same formula
NetworkX's pure-Python pagerank also uses err < N * tol. We haven't verified whether NetworkX exhibits the same failure mode (our environment is missing _bz2 so we couldn't run NetworkX side-by-side). If NetworkX also fails on this case, this is a broader issue worth fixing there too, but that's a separate discussion — the rustworkx fix stands on its own.
Summary
rustworkx.pagerank()uses a convergence criterionnorm < N * tolwherenormis the L1 distance between successive iterates. Since L1 distance between probability vectors is bounded by 2, this threshold becomes useless onceN > 2/tol. On large sparse graphs, the first iteration's L1 diff can be smaller than the threshold simply because mass barely moves in one step — the function then returns the initial uniform vector and reports convergence successfully.The user gets a PageRank result that looks valid (sums to 1, no errors raised), but every node has score
1/N. Silent data corruption.Environment
rustworkxversion: 0.17.1Minimal Reproduction
Output:
Where It Triggers
Tested with minimal repros:
Bug triggers when
N >= ~2000and the active subgraph is sparse enough that the first power-iteration step produces an L1 diff belowN * tol. We originally hit this on an 807,253-node talent migration graph (38% dangling) whererx.pagerank()silently returned uniform1/Nfor all nodes.Root Cause
src/link_analysis.rslines 192-208:The threshold
(n as f64) * tolmeans:Any iteration whose L1 diff from the previous is less than
N * tolis called "converged" — but for large N this happens trivially before any real convergence.Manual Trace
N=2000, 2 edges (path 0→1→2), alpha=0.85, tol=1e-6:
The true converged values are
pr[2] = 0.00128,pr[1] = 0.00092,pr[0] = 0.00050. Rustworkx returns0.00050for all three.Proposed Fix
Change line 202 from
norm < (n as f64) * toltonorm < tol:This matches the docstring semantics ("error tolerance used when checking for convergence in the power method") and treats
tolas an absolute L1 tolerance.Happy to open a PR with the one-line fix + regression test.
Related: NetworkX has the same formula
NetworkX's pure-Python pagerank also uses
err < N * tol. We haven't verified whether NetworkX exhibits the same failure mode (our environment is missing_bz2so we couldn't run NetworkX side-by-side). If NetworkX also fails on this case, this is a broader issue worth fixing there too, but that's a separate discussion — the rustworkx fix stands on its own.