~5.7GB retained RSS loading an 18.45M-row table whose string column is near-unique (escapes the #906/#907 fixes)

## Problem

On 0.14.19, `/load_expr` for an 18.45M-row table with a near-unique string column (`spree_id`, ~18.45M distinct, ~12 chars) grew the server from 337MB to 6,044MB RSS — and it stays there after stats complete. The same server loaded a 23.9M-row table with repeating high-cardinality strings (plate, state, county) at +172MB, so the #908 approx_nunique and #909 histogram-sampling fixes hold for the repeating shape; the distinct≈rows shape escapes them. Suspect whatever path still materializes per-distinct-value state (top-values/value_counts before sampling kicks in, or the exact/approx dispatch treating the column as already-small-enough).

Repro: any ~18M-row frame with a per-row-unique string id column → POST /load_expr → compare RSS before/after; memory is not released afterwards.

## Suggested fix

Cap distinct-dependent stats by distinct-count estimate, not just row count — a cheap approx_distinct probe before the batch can route distinct≈rows columns to the sampled/early-exit path. Separately, whatever holds the transient post-stats should release it (see companion issue on session retention).

## Context

Found during tallyman prompt-pack run pack01 (2026-06-11), buckaroo 0.14.19, via tallyman's companion. Related: #906, #907 (closed), #911 (size-based exact/approx selection), #920 (perf/memory smoke testing — this column shape is a good test case).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

~5.7GB retained RSS loading an 18.45M-row table whose string column is near-unique (escapes the #906/#907 fixes) #924

Problem

Suggested fix

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

~5.7GB retained RSS loading an 18.45M-row table whose string column is near-unique (escapes the #906/#907 fixes) #924

Description

Problem

Suggested fix

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions