Skip to content

feat: add anonymous feature-usage telemetry#1928

Open
Titus-von-Koeller wants to merge 1 commit intomainfrom
feat/telemetry
Open

feat: add anonymous feature-usage telemetry#1928
Titus-von-Koeller wants to merge 1 commit intomainfrom
feat/telemetry

Conversation

@Titus-von-Koeller
Copy link
Copy Markdown
Collaborator

Summary

Adds lightweight, opt-out, anonymous feature-usage telemetry via huggingface_hub.utils.send_telemetry(), mirroring the pattern Transformers already uses for its quant user-agent field. Data lands in the HF Hub telemetry index under path_prefix=/api/telemetry/bitsandbytes/ and answers two concrete questions: which features deserve continued maintenance and which are safe to retire.

Key design points:

  • Namespaced: all metadata keys under bitsandbytes.* so they don't collide in the shared telemetry index.
  • Session fingerprint + per-feature events: the fingerprint (bnb version, OS, arch, libc, Python/torch versions, accelerator vendor/name/arch/count) is sent with each feature event; each distinct feature fires at most once per process.
  • Wired at first-use, not import time: passive installs send nothing.
  • Opt-out: BNB_DISABLE_TELEMETRY, HF_HUB_DISABLE_TELEMETRY, or HF_HUB_OFFLINE.
  • Auto-disabled under pytest so CI and local test runs don't pollute the real-usage stream.
  • Silent no-op when huggingface_hub is not installed.
  • No new runtime dependency: huggingface_hub stays optional.

Features tracked

topic call site rationale
linear_4bit Linear4bit.forward QLoRA usage shape (nf4/fp4, blocksize, compute_dtype)
linear_8bit Linear8bitLt.forward LLM.int8() usage (threshold, has_fp16_weights)
params_4bit Params4bit.__new__ catches direct use outside Linear4bit (PEFT, vLLM, custom)
int8_params Int8Params.__new__ same, for 8-bit
embedding all 4 embedding variants variant=stable|standard|8bit|4bit
optimizer Optimizer8bit.step optimizer name + is_paged
optim_override_config GlobalOptimManager.override_config which keys users actually override
optim_register_module_override GlobalOptimManager.register_module_override existence signal for the module-level override mechanism
outlier_aware_linear OutlierAwareLinear.__init__ deprecation candidate
int8_double_quant int8_double_quant() deprecation candidate

What is never collected

Model names, file paths, tensor shapes, parameter values, user identifiers, or anything derived from user input. The send_telemetry() call explicitly passes token=False, so no auth / identity info is attached.

End-to-end verification

scripts/verify_telemetry.py exercises every wired-up feature once, tagging every event with a unique BNB_TELEMETRY_TAG so a single run's events can be correlated in ds-hub-telemetry:

python scripts/verify_telemetry.py
# prints run_id = verify-XXXXXXXX
# then, after ~30s:
es-cli -H esql 'FROM ds-hub-telemetry
  | WHERE metadata.bitsandbytes.tag == "verify-XXXXXXXX"
  | STATS count = COUNT(*) BY metadata.bitsandbytes.feature
  | SORT count DESC'

Test plan

  • 22 new unit tests in tests/test_telemetry.py pass (dedup, namespacing, stringification, fingerprint fields, all three opt-out env vars, truthy-value parsing, tag attachment, graceful fallback when huggingface_hub is missing, exception swallowing, pytest auto-detection).
  • Existing tests/test_linear4bit.py and tests/test_linear8bitlt.py pass — no regressions (322 tests).
  • Existing tests/test_modules.py and tests/test_optim.py pass — no regressions (509 tests).
  • Full pre-commit run --all-files passes.
  • Manual end-to-end verification: scripts/verify_telemetry.py fired with run_id=verify-4738a43c — to be confirmed in ds-hub-telemetry.
  • Review by @matthewdouglas who shaped the feature list and namespacing decisions.

Documentation

README ## Telemetry section explains what is collected, what is not, and the three opt-out env vars. The module docstring in bitsandbytes/_telemetry.py is the authoritative reference.

🤖 Generated with Claude Code

New `bitsandbytes._telemetry.report_feature()` sends one event per distinct
feature per process via `huggingface_hub.utils.send_telemetry()`, mirroring
the pattern Transformers uses for its `quant` user-agent field. Data lands
in the Hub telemetry index under `path_prefix=/api/telemetry/bitsandbytes/`
and informs which features are worth maintaining or retiring.

Wired at: Linear4bit/Linear8bitLt forward, Params4bit/Int8Params __new__,
all Embedding variants, Optimizer8bit.step, GlobalOptimManager overrides,
OutlierAwareLinear and int8_double_quant (deprecation candidates).

All metadata keys namespaced under `bitsandbytes.*`. Fingerprint carries
bnb version, OS, arch, libc, Python/torch versions, and accelerator vendor
/ name / arch / count. No model names, file paths, or user-derived values
are ever sent.

Opt-out via BNB_DISABLE_TELEMETRY, HF_HUB_DISABLE_TELEMETRY, or
HF_HUB_OFFLINE. Auto-disabled under pytest so CI and local test runs don't
pollute the real-usage stream. Silent no-op when huggingface_hub is not
installed.

End-to-end verification: `scripts/verify_telemetry.py` emits every feature
once tagged with a unique run_id via BNB_TELEMETRY_TAG, for correlation in
Elasticsearch queries on `ds-hub-telemetry`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@matthewdouglas matthewdouglas added this to the v0.50.0 milestone Apr 20, 2026
Comment on lines 621 to +631
def forward(self, x: torch.Tensor):
report_feature(
"linear_4bit",
{
"quant_type": getattr(self.weight, "quant_type", "unknown"),
"blocksize": getattr(self.weight, "blocksize", 0),
"compress_statistics": getattr(self.weight, "compress_statistics", False),
"input_dtype": str(x.dtype).replace("torch.", ""),
"compute_dtype": (str(self.compute_dtype).replace("torch.", "") if self.compute_dtype else "auto"),
},
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer we do this in __init__ rather than add unnecessary overhead in the forward() hot path. Plus most of this is in __init__ - you don't need to use all these getattr calls.

Comment on lines +1206 to +1213
report_feature(
"linear_8bit",
{
"has_fp16_weights": self.state.has_fp16_weights,
"threshold": self.state.threshold,
"input_dtype": str(x.dtype).replace("torch.", ""),
},
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as with Linear4bit, this is best in __init__ and not in forward.

Comment on lines +322 to +328
report_feature(
"optimizer",
{
"name": type(self).__name__,
"is_paged": self.is_paged,
},
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer this to be in init of optimizers as well rather than in step(). Also maybe would be nice to see optim_bits.

@@ -0,0 +1,231 @@
# Copyright (c) Facebook, Inc. and its affiliates.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: copyright here is wrong, let's take this out or replace with more appropriate

runs in CI and locally do not pollute the real-usage stream.

Opt-out (any of the following env vars disables all telemetry):
- BNB_DISABLE_TELEMETRY=1 (bitsandbytes only)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we need to roll our own; it's cleaner to just reuse the existing? I don't see a use case for e.g. opting out of HF Hub telemetry but still opting in for BNB?

Comment on lines +32 to +35
End-to-end verification:
Set `BNB_TELEMETRY_TAG=<some-id>` before importing bitsandbytes and the
value is attached as `bitsandbytes.tag` on every event. Use this to
correlate a single run's events in ES.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems unnecessary/overkill?

Comment on lines +60 to +67
def _is_pytest() -> bool:
"""Detect whether we are running inside a pytest process.

Telemetry is suppressed during test runs so that CI and local test
invocations don't pollute the real-usage stream. Tests that want to
assert on telemetry behavior monkey-patch this function to return False.
"""
return "pytest" in sys.modules or "PYTEST_CURRENT_TEST" in os.environ
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider looking at other env variables and not bother with the "pytest" in sys.modules condition.

Most CI platforms will have an env var like CI for this.

if os_name == "Windows":
try:
build = sys.getwindowsversion().build
os_version = f"11 (build {build})" if build >= 22000 else f"10 (build {build})"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fragile and also ignores Windows Server etc.

Comment on lines +166 to +171
try:
import torch

info["bitsandbytes.torch"] = torch.__version__
except ImportError:
pass
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is redundant, does hf hub automatically collect this?

Comment on lines +192 to +197
if feature in _REPORTED:
return
_REPORTED.add(feature)

if _is_disabled():
return
Copy link
Copy Markdown
Member

@matthewdouglas matthewdouglas Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to to de-duping more granular than just the "feature" name as it is. But maybe we just name the features differently in that case. So that's more of a minor nit.

Should we add to _REPORTED even when disabled? Seems to me we should just exit right away.

Comment on lines +212 to +214
tag = os.environ.get("BNB_TELEMETRY_TAG", "").strip()
if tag:
user_agent["bitsandbytes.tag"] = tag
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as comment earlier, seems unnecessary.

Comment on lines +104 to +115
if torch.cuda.is_available():
vendor = "amd" if getattr(torch.version, "hip", None) else "nvidia"
info["bitsandbytes.accel"] = vendor
info["bitsandbytes.accel_count"] = str(torch.cuda.device_count())
props = torch.cuda.get_device_properties(0)
info["bitsandbytes.accel_name"] = props.name
if vendor == "nvidia":
info["bitsandbytes.accel_arch"] = f"sm_{props.major}{props.minor}"
else:
info["bitsandbytes.accel_arch"] = getattr(props, "gcnArchName", "unknown")
return info

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only looks at the first device; I'm not sure but we may be interested when there's multiple devices and they're different. I'm wondering if for that maybe we just add some sort of flag to tell us whether it is a heterogeneous system or not. Likely it is, but may be valuable to find out otherwise.

Let's grab device 0's SM count and memory. We don't really need the name. So this should be for both AMD and NVIDIA the multi_processor_count and total_memory properties. Keep gcnArchName and major/minor.

Comment on lines +97 to +99
try:
import torch
except ImportError:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch is already a pretty hard dep, this shouldnt need to be caught

Comment on lines +245 to +252
report_feature(
"params_4bit",
{
"quant_type": quant_type,
"blocksize": blocksize,
"compress_statistics": compress_statistics,
"quant_storage": str(quant_storage).replace("torch.", ""),
},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting to think we don't need this here, would prefer we just keep Linear4bit and Linear8bitLt but remove this on Params4bit/Int8Params.

Comment on lines +108 to +111
report_feature(
"optim_override_config",
{"keys": ",".join(sorted(key_value_dict.keys()))},
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not particularly interested in tracking this

Comment on lines +119 to +122
report_feature(
"optim_register_module_override",
{"keys": ",".join(sorted(config.keys())) if isinstance(config, dict) else "unknown"},
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise not really interested in tracking this either

return info


def report_feature(feature: str, details: Optional[dict[str, object]] = None) -> None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for more clarity we should just name this _report_feature as well.

"bitsandbytes.os": os_name,
"bitsandbytes.os_version": os_version,
"bitsandbytes.arch": platform.machine(),
"bitsandbytes.python": platform.python_version(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is redundant too, huggingface_hub likely includes Python version already

Comment thread README.md
Comment on lines +186 to +221
## Telemetry

`bitsandbytes` sends anonymous, aggregate feature-usage telemetry to the
Hugging Face Hub. This data is used to prioritize maintenance (which quantization
methods and optimizers are actually in use?) and to safely retire features that
are no longer called by anyone.

### What is collected

* A session fingerprint sent once per process: `bitsandbytes` version, OS
name/version, CPU architecture, Python/PyTorch versions, accelerator
vendor/name/arch/count (e.g. `nvidia`, `NVIDIA H100`, `sm_90`, `1`).
* One event per distinct feature used, with feature-specific flags. For
example: using `Linear4bit` sends `quant_type=nf4`, `blocksize=64`; using
`AdamW8bit.step()` sends `name=AdamW8bit`, `is_paged=false`.

### What is never collected

Model names, file paths, tensor shapes, parameter values, user identifiers, or
anything derived from user input.

### How to opt out

Set any one of these environment variables:

| Variable | Scope |
| ---------------------------- | ---------------------------- |
| `BNB_DISABLE_TELEMETRY=1` | `bitsandbytes` only |
| `HF_HUB_DISABLE_TELEMETRY=1` | all Hugging Face libraries |
| `HF_HUB_OFFLINE=1` | all Hugging Face libraries |

Telemetry is also automatically suppressed while running under `pytest` (so
CI and local test runs don't pollute the stream) and a silent no-op when
`huggingface_hub` is not installed. The implementation lives in
[`bitsandbytes/_telemetry.py`](bitsandbytes/_telemetry.py) and each event
fires at most once per process.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's simplify this:

Suggested change
## Telemetry
`bitsandbytes` sends anonymous, aggregate feature-usage telemetry to the
Hugging Face Hub. This data is used to prioritize maintenance (which quantization
methods and optimizers are actually in use?) and to safely retire features that
are no longer called by anyone.
### What is collected
* A session fingerprint sent once per process: `bitsandbytes` version, OS
name/version, CPU architecture, Python/PyTorch versions, accelerator
vendor/name/arch/count (e.g. `nvidia`, `NVIDIA H100`, `sm_90`, `1`).
* One event per distinct feature used, with feature-specific flags. For
example: using `Linear4bit` sends `quant_type=nf4`, `blocksize=64`; using
`AdamW8bit.step()` sends `name=AdamW8bit`, `is_paged=false`.
### What is never collected
Model names, file paths, tensor shapes, parameter values, user identifiers, or
anything derived from user input.
### How to opt out
Set any one of these environment variables:
| Variable | Scope |
| ---------------------------- | ---------------------------- |
| `BNB_DISABLE_TELEMETRY=1` | `bitsandbytes` only |
| `HF_HUB_DISABLE_TELEMETRY=1` | all Hugging Face libraries |
| `HF_HUB_OFFLINE=1` | all Hugging Face libraries |
Telemetry is also automatically suppressed while running under `pytest` (so
CI and local test runs don't pollute the stream) and a silent no-op when
`huggingface_hub` is not installed. The implementation lives in
[`bitsandbytes/_telemetry.py`](bitsandbytes/_telemetry.py) and each event
fires at most once per process.
## Telemetry
bitsandbytes collects anonymous feature-usage data using the same telemetry
mechanism as other Hugging Face libraries (Transformers, Gradio, etc.). This
helps us understand which features are actively used so we can prioritize
maintenance and make informed decisions about deprecation.
### What is collected
Hardware and version info sent once per process (bitsandbytes version, OS, CPU
architecture, accelerator type and compute capability) plus one event per
distinct feature used per process.
### How to opt out
Set any of the following environment variables:
| Variable | Effect |
| ---------------------------- | ----------------------------------- |
| `HF_HUB_DISABLE_TELEMETRY=1` | Disables telemetry in all HF libs |
| `HF_HUB_OFFLINE=1` | Disables all outbound HF Hub calls |
| `DO_NOT_TRACK=1` | Standard cross-tool opt-out signal |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants