choose exact vs approximate summary stats by row count (exact at or below ~100k rows)

## Summary

#908 and #909 move the expensive xorq stats to fast approximations. The desired end state is size-based: tables at or below ~100k rows get the accurate implementations (exact COUNT(DISTINCT), exact top-10 value counts), larger tables get the approximations.

#909 already dispatches this way because the per-column histogram phase receives `length` resolved from the batch aggregate. The batch phase itself cannot: batched stat expressions are built before any row count exists — `length` is computed by the same aggregate query they are folded into (`xorq_stat_pipeline.py:374`, `__total_length__`). So after #908, `distinct_count` is approximate at every table size.

## Options

- Pre-count before building the batch expressions. `table.count()` is metadata-cheap on plain parquet scans but is a full plan execution for joins and filter chains; the count could go through the same snapshot cache so it's paid once per expression.
- Let `@stat` functions declare exact/approx variants and have the pipeline pick once `length` is known: the batch runs the approx variant unconditionally, and a follow-up pass re-runs the exact variants when `length <= threshold` and updates the summary — cheap by definition at that size.
- Leave it: at <=100k rows HLL's ~1% error is rarely visible in a stats panel, and the histogram (the user-visible stat) is already exact below the threshold via #909.

## Context

Split out of #906. Until a selection mechanism exists, the fast methods are the default wherever row count is unavailable at expression-build time.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

choose exact vs approximate summary stats by row count (exact at or below ~100k rows) #911

Summary

Options

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

choose exact vs approximate summary stats by row count (exact at or below ~100k rows) #911

Description

Summary

Options

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions