Skip to content

pagerank does not return valid scores for large graphs with the default error tolerance  #1575

@pathway

Description

@pathway

Summary

rustworkx.pagerank() uses a convergence criterion norm < N * tol where norm is the L1 distance between successive iterates. Since L1 distance between probability vectors is bounded by 2, this threshold becomes useless once N > 2/tol. On large sparse graphs, the first iteration's L1 diff can be smaller than the threshold simply because mass barely moves in one step — the function then returns the initial uniform vector and reports convergence successfully.

The user gets a PageRank result that looks valid (sums to 1, no errors raised), but every node has score 1/N. Silent data corruption.

Environment

  • rustworkx version: 0.17.1
  • Python: 3.12.8
  • OS: Linux (Ubuntu 22.04)

Minimal Reproduction

import rustworkx as rx
import numpy as np

g = rx.PyDiGraph()
for i in range(2000):
    g.add_node(i)
g.add_edge(0, 1, 1.0)
g.add_edge(1, 2, 1.0)

pr = rx.pagerank(g, alpha=0.85)
vals = np.array([pr[i] for i in range(2000)])
print(f"All uniform? {np.allclose(vals, 1/2000)}")  # True — BUG
print(f"Node 2 score: {pr[2]:.6f}")                  # 0.000500 (uniform)
print(f"Expected:     0.001285 (2.6x above uniform)")

Output:

All uniform? True
Node 2 score: 0.000500
Expected:     0.001285 (2.6x above uniform)

Where It Triggers

Tested with minimal repros:

Case Result
N=10, 80% dangling OK
N=1,000, 99.8% dangling OK
N=2,000, 2 edges (99.9% dangling) FAIL — uniform output
N=10,000, 2 edges FAIL — uniform output
N=10,000, 50% dangling (5K edges) OK
N=100,000, 2 edges FAIL — uniform output

Bug triggers when N >= ~2000 and the active subgraph is sparse enough that the first power-iteration step produces an L1 diff below N * tol. We originally hit this on an 807,253-node talent migration graph (38% dangling) where rx.pagerank() silently returned uniform 1/N for all nodes.

Root Cause

src/link_analysis.rs lines 192-208:

let mut has_converged = false;
for _ in 0..max_iter {
    let dangling_sum: f64 = is_dangling
        .iter()
        .zip(popularity.iter())
        .map(|(cond, pop)| if *cond { *pop } else { 0.0 })
        .sum();
    let new_popularity =
        alpha * ((&a * &popularity) + (dangling_sum * &dangling_weights)) + &damping;
    let norm: f64 = new_popularity.l1_dist(&popularity).unwrap();
    if norm < (n as f64) * tol {   // <<<<<<< BUG HERE
        has_converged = true;
        break;
    } else {
        popularity = new_popularity;
    }
}

The threshold (n as f64) * tol means:

N tol threshold % of max L1 dist (=2)
1,000 1e-6 0.001 0.05%
10,000 1e-6 0.01 0.5%
100,000 1e-6 0.1 5%
1,000,000 1e-6 1.0 50%

Any iteration whose L1 diff from the previous is less than N * tol is called "converged" — but for large N this happens trivially before any real convergence.

Manual Trace

N=2000, 2 edges (path 0→1→2), alpha=0.85, tol=1e-6:

iter 0: L1 diff = 1.698e-03, threshold = 2.000e-03  ← converged! (but it isn't)
iter 1: L1 diff = 7.214e-04                          ← would have kept improving
iter 2: L1 diff = 1.226e-06
iter 3: L1 diff = 2.598e-07
iter 4: L1 diff = 6.631e-10                          ← actual convergence

The true converged values are pr[2] = 0.00128, pr[1] = 0.00092, pr[0] = 0.00050. Rustworkx returns 0.00050 for all three.

Proposed Fix

Change line 202 from norm < (n as f64) * tol to norm < tol:

-        if norm < (n as f64) * tol {
+        if norm < tol {
             has_converged = true;
             break;
         } else {

This matches the docstring semantics ("error tolerance used when checking for convergence in the power method") and treats tol as an absolute L1 tolerance.

Happy to open a PR with the one-line fix + regression test.

Related: NetworkX has the same formula

NetworkX's pure-Python pagerank also uses err < N * tol. We haven't verified whether NetworkX exhibits the same failure mode (our environment is missing _bz2 so we couldn't run NetworkX side-by-side). If NetworkX also fails on this case, this is a broader issue worth fixing there too, but that's a separate discussion — the rustworkx fix stands on its own.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions