feat: add anonymous feature-usage telemetry by Titus-von-Koeller · Pull Request #1928 · bitsandbytes-foundation/bitsandbytes

Titus-von-Koeller · 2026-04-20T12:20:34Z

Summary

Adds lightweight, opt-out, anonymous feature-usage telemetry via huggingface_hub.utils.send_telemetry(), mirroring the pattern Transformers already uses for its quant user-agent field. Data lands in the HF Hub telemetry index under path_prefix=/api/telemetry/bitsandbytes/ and answers two concrete questions: which features deserve continued maintenance and which are safe to retire.

Key design points:

Namespaced: all metadata keys under bitsandbytes.* so they don't collide in the shared telemetry index.
Session fingerprint + per-feature events: the fingerprint (bnb version, OS, arch, libc, Python/torch versions, accelerator vendor/name/arch/count) is sent with each feature event; each distinct feature fires at most once per process.
Wired at first-use, not import time: passive installs send nothing.
Opt-out: BNB_DISABLE_TELEMETRY, HF_HUB_DISABLE_TELEMETRY, or HF_HUB_OFFLINE.
Auto-disabled under pytest so CI and local test runs don't pollute the real-usage stream.
Silent no-op when huggingface_hub is not installed.
No new runtime dependency: huggingface_hub stays optional.

Features tracked

topic	call site	rationale
`linear_4bit`	`Linear4bit.forward`	QLoRA usage shape (nf4/fp4, blocksize, compute_dtype)
`linear_8bit`	`Linear8bitLt.forward`	LLM.int8() usage (threshold, has_fp16_weights)
`params_4bit`	`Params4bit.__new__`	catches direct use outside `Linear4bit` (PEFT, vLLM, custom)
`int8_params`	`Int8Params.__new__`	same, for 8-bit
`embedding`	all 4 embedding variants	`variant=stable\|standard\|8bit\|4bit`
`optimizer`	`Optimizer8bit.step`	optimizer name + `is_paged`
`optim_override_config`	`GlobalOptimManager.override_config`	which keys users actually override
`optim_register_module_override`	`GlobalOptimManager.register_module_override`	existence signal for the module-level override mechanism
`outlier_aware_linear`	`OutlierAwareLinear.__init__`	deprecation candidate
`int8_double_quant`	`int8_double_quant()`	deprecation candidate

What is never collected

Model names, file paths, tensor shapes, parameter values, user identifiers, or anything derived from user input. The send_telemetry() call explicitly passes token=False, so no auth / identity info is attached.

End-to-end verification

scripts/verify_telemetry.py exercises every wired-up feature once, tagging every event with a unique BNB_TELEMETRY_TAG so a single run's events can be correlated in ds-hub-telemetry:

python scripts/verify_telemetry.py
# prints run_id = verify-XXXXXXXX
# then, after ~30s:
es-cli -H esql 'FROM ds-hub-telemetry
  | WHERE metadata.bitsandbytes.tag == "verify-XXXXXXXX"
  | STATS count = COUNT(*) BY metadata.bitsandbytes.feature
  | SORT count DESC'

Test plan

22 new unit tests in tests/test_telemetry.py pass (dedup, namespacing, stringification, fingerprint fields, all three opt-out env vars, truthy-value parsing, tag attachment, graceful fallback when huggingface_hub is missing, exception swallowing, pytest auto-detection).
Existing tests/test_linear4bit.py and tests/test_linear8bitlt.py pass — no regressions (322 tests).
Existing tests/test_modules.py and tests/test_optim.py pass — no regressions (509 tests).
Full pre-commit run --all-files passes.
Manual end-to-end verification: scripts/verify_telemetry.py fired with run_id=verify-4738a43c — to be confirmed in ds-hub-telemetry.
Review by @matthewdouglas who shaped the feature list and namespacing decisions.

Documentation

README ## Telemetry section explains what is collected, what is not, and the three opt-out env vars. The module docstring in bitsandbytes/_telemetry.py is the authoritative reference.

🤖 Generated with Claude Code

New `bitsandbytes._telemetry.report_feature()` sends one event per distinct feature per process via `huggingface_hub.utils.send_telemetry()`, mirroring the pattern Transformers uses for its `quant` user-agent field. Data lands in the Hub telemetry index under `path_prefix=/api/telemetry/bitsandbytes/` and informs which features are worth maintaining or retiring. Wired at: Linear4bit/Linear8bitLt forward, Params4bit/Int8Params __new__, all Embedding variants, Optimizer8bit.step, GlobalOptimManager overrides, OutlierAwareLinear and int8_double_quant (deprecation candidates). All metadata keys namespaced under `bitsandbytes.*`. Fingerprint carries bnb version, OS, arch, libc, Python/torch versions, and accelerator vendor / name / arch / count. No model names, file paths, or user-derived values are ever sent. Opt-out via BNB_DISABLE_TELEMETRY, HF_HUB_DISABLE_TELEMETRY, or HF_HUB_OFFLINE. Auto-disabled under pytest so CI and local test runs don't pollute the real-usage stream. Silent no-op when huggingface_hub is not installed. End-to-end verification: `scripts/verify_telemetry.py` emits every feature once tagged with a unique run_id via BNB_TELEMETRY_TAG, for correlation in Elasticsearch queries on `ds-hub-telemetry`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-20T12:23:51Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

matthewdouglas · 2026-04-20T17:15:16Z

    def forward(self, x: torch.Tensor):
+        report_feature(
+            "linear_4bit",
+            {
+                "quant_type": getattr(self.weight, "quant_type", "unknown"),
+                "blocksize": getattr(self.weight, "blocksize", 0),
+                "compress_statistics": getattr(self.weight, "compress_statistics", False),
+                "input_dtype": str(x.dtype).replace("torch.", ""),
+                "compute_dtype": (str(self.compute_dtype).replace("torch.", "") if self.compute_dtype else "auto"),
+            },
+        )


I would prefer we do this in __init__ rather than add unnecessary overhead in the forward() hot path. Plus most of this is in __init__ - you don't need to use all these getattr calls.

matthewdouglas · 2026-04-20T17:16:37Z

+        report_feature(
+            "linear_8bit",
+            {
+                "has_fp16_weights": self.state.has_fp16_weights,
+                "threshold": self.state.threshold,
+                "input_dtype": str(x.dtype).replace("torch.", ""),
+            },
+        )


Same comment as with Linear4bit, this is best in __init__ and not in forward.

matthewdouglas · 2026-04-20T17:21:11Z

+        report_feature(
+            "optimizer",
+            {
+                "name": type(self).__name__,
+                "is_paged": self.is_paged,
+            },
+        )


Would prefer this to be in init of optimizers as well rather than in step(). Also maybe would be nice to see optim_bits.

matthewdouglas · 2026-04-20T17:21:57Z

@@ -0,0 +1,231 @@
+# Copyright (c) Facebook, Inc. and its affiliates.


minor nit: copyright here is wrong, let's take this out or replace with more appropriate

matthewdouglas · 2026-04-20T17:22:49Z

+runs in CI and locally do not pollute the real-usage stream.
+
+Opt-out (any of the following env vars disables all telemetry):
+    - BNB_DISABLE_TELEMETRY=1           (bitsandbytes only)


I am not sure we need to roll our own; it's cleaner to just reuse the existing? I don't see a use case for e.g. opting out of HF Hub telemetry but still opting in for BNB?

matthewdouglas · 2026-04-20T17:23:24Z

+End-to-end verification:
+    Set `BNB_TELEMETRY_TAG=<some-id>` before importing bitsandbytes and the
+    value is attached as `bitsandbytes.tag` on every event. Use this to
+    correlate a single run's events in ES.


Seems unnecessary/overkill?

matthewdouglas · 2026-04-20T17:28:10Z

+def _is_pytest() -> bool:
+    """Detect whether we are running inside a pytest process.
+
+    Telemetry is suppressed during test runs so that CI and local test
+    invocations don't pollute the real-usage stream. Tests that want to
+    assert on telemetry behavior monkey-patch this function to return False.
+    """
+    return "pytest" in sys.modules or "PYTEST_CURRENT_TEST" in os.environ


I would consider looking at other env variables and not bother with the "pytest" in sys.modules condition.

Most CI platforms will have an env var like CI for this.

matthewdouglas · 2026-04-20T17:28:33Z

+    if os_name == "Windows":
+        try:
+            build = sys.getwindowsversion().build
+            os_version = f"11 (build {build})" if build >= 22000 else f"10 (build {build})"


This seems fragile and also ignores Windows Server etc.

matthewdouglas · 2026-04-20T17:30:59Z

+    try:
+        import torch
+
+        info["bitsandbytes.torch"] = torch.__version__
+    except ImportError:
+        pass


I think this is redundant, does hf hub automatically collect this?

matthewdouglas · 2026-04-20T17:36:34Z

+    if feature in _REPORTED:
+        return
+    _REPORTED.add(feature)
+
+    if _is_disabled():
+        return


We may want to to de-duping more granular than just the "feature" name as it is. But maybe we just name the features differently in that case. So that's more of a minor nit.

Should we add to _REPORTED even when disabled? Seems to me we should just exit right away.

matthewdouglas · 2026-04-20T17:37:10Z

+    tag = os.environ.get("BNB_TELEMETRY_TAG", "").strip()
+    if tag:
+        user_agent["bitsandbytes.tag"] = tag


Same as comment earlier, seems unnecessary.

matthewdouglas · 2026-04-20T17:47:54Z

+        if torch.cuda.is_available():
+            vendor = "amd" if getattr(torch.version, "hip", None) else "nvidia"
+            info["bitsandbytes.accel"] = vendor
+            info["bitsandbytes.accel_count"] = str(torch.cuda.device_count())
+            props = torch.cuda.get_device_properties(0)
+            info["bitsandbytes.accel_name"] = props.name
+            if vendor == "nvidia":
+                info["bitsandbytes.accel_arch"] = f"sm_{props.major}{props.minor}"
+            else:
+                info["bitsandbytes.accel_arch"] = getattr(props, "gcnArchName", "unknown")
+            return info
+


This only looks at the first device; I'm not sure but we may be interested when there's multiple devices and they're different. I'm wondering if for that maybe we just add some sort of flag to tell us whether it is a heterogeneous system or not. Likely it is, but may be valuable to find out otherwise.

Let's grab device 0's SM count and memory. We don't really need the name. So this should be for both AMD and NVIDIA the multi_processor_count and total_memory properties. Keep gcnArchName and major/minor.

matthewdouglas · 2026-04-20T17:49:08Z

+    try:
+        import torch
+    except ImportError:


torch is already a pretty hard dep, this shouldnt need to be caught

matthewdouglas · 2026-04-20T17:56:24Z

+        report_feature(
+            "params_4bit",
+            {
+                "quant_type": quant_type,
+                "blocksize": blocksize,
+                "compress_statistics": compress_statistics,
+                "quant_storage": str(quant_storage).replace("torch.", ""),
+            },


Starting to think we don't need this here, would prefer we just keep Linear4bit and Linear8bitLt but remove this on Params4bit/Int8Params.

matthewdouglas · 2026-04-20T17:57:25Z

+            report_feature(
+                "optim_override_config",
+                {"keys": ",".join(sorted(key_value_dict.keys()))},
+            )


Not particularly interested in tracking this

matthewdouglas · 2026-04-20T17:57:36Z

+        report_feature(
+            "optim_register_module_override",
+            {"keys": ",".join(sorted(config.keys())) if isinstance(config, dict) else "unknown"},
+        )


Likewise not really interested in tracking this either

matthewdouglas · 2026-04-20T17:58:32Z

+    return info
+
+
+def report_feature(feature: str, details: Optional[dict[str, object]] = None) -> None:


I think for more clarity we should just name this _report_feature as well.

matthewdouglas · 2026-04-20T18:02:30Z

+        "bitsandbytes.os": os_name,
+        "bitsandbytes.os_version": os_version,
+        "bitsandbytes.arch": platform.machine(),
+        "bitsandbytes.python": platform.python_version(),


I think this is redundant too, huggingface_hub likely includes Python version already

matthewdouglas · 2026-04-20T18:30:35Z

+## Telemetry
+
+`bitsandbytes` sends anonymous, aggregate feature-usage telemetry to the
+Hugging Face Hub. This data is used to prioritize maintenance (which quantization
+methods and optimizers are actually in use?) and to safely retire features that
+are no longer called by anyone.
+
+### What is collected
+
+* A session fingerprint sent once per process: `bitsandbytes` version, OS
+  name/version, CPU architecture, Python/PyTorch versions, accelerator
+  vendor/name/arch/count (e.g. `nvidia`, `NVIDIA H100`, `sm_90`, `1`).
+* One event per distinct feature used, with feature-specific flags. For
+  example: using `Linear4bit` sends `quant_type=nf4`, `blocksize=64`; using
+  `AdamW8bit.step()` sends `name=AdamW8bit`, `is_paged=false`.
+
+### What is never collected
+
+Model names, file paths, tensor shapes, parameter values, user identifiers, or
+anything derived from user input.
+
+### How to opt out
+
+Set any one of these environment variables:
+
+| Variable                     | Scope                        |
+| ---------------------------- | ---------------------------- |
+| `BNB_DISABLE_TELEMETRY=1`    | `bitsandbytes` only          |
+| `HF_HUB_DISABLE_TELEMETRY=1` | all Hugging Face libraries   |
+| `HF_HUB_OFFLINE=1`           | all Hugging Face libraries   |
+
+Telemetry is also automatically suppressed while running under `pytest` (so
+CI and local test runs don't pollute the stream) and a silent no-op when
+`huggingface_hub` is not installed. The implementation lives in
+[`bitsandbytes/_telemetry.py`](bitsandbytes/_telemetry.py) and each event
+fires at most once per process.


Let's simplify this:

Suggested change

## Telemetry

`bitsandbytes` sends anonymous, aggregate feature-usage telemetry to the

Hugging Face Hub. This data is used to prioritize maintenance (which quantization

methods and optimizers are actually in use?) and to safely retire features that

are no longer called by anyone.

### What is collected

* A session fingerprint sent once per process: `bitsandbytes` version, OS

name/version, CPU architecture, Python/PyTorch versions, accelerator

vendor/name/arch/count (e.g. `nvidia`, `NVIDIA H100`, `sm_90`, `1`).

* One event per distinct feature used, with feature-specific flags. For

example: using `Linear4bit` sends `quant_type=nf4`, `blocksize=64`; using

`AdamW8bit.step()` sends `name=AdamW8bit`, `is_paged=false`.

### What is never collected

Model names, file paths, tensor shapes, parameter values, user identifiers, or

anything derived from user input.

### How to opt out

Set any one of these environment variables:

| Variable | Scope |

| ---------------------------- | ---------------------------- |

| `BNB_DISABLE_TELEMETRY=1` | `bitsandbytes` only |

| `HF_HUB_DISABLE_TELEMETRY=1` | all Hugging Face libraries |

| `HF_HUB_OFFLINE=1` | all Hugging Face libraries |

Telemetry is also automatically suppressed while running under `pytest` (so

CI and local test runs don't pollute the stream) and a silent no-op when

`huggingface_hub` is not installed. The implementation lives in

[`bitsandbytes/_telemetry.py`](bitsandbytes/_telemetry.py) and each event

fires at most once per process.

## Telemetry

bitsandbytes collects anonymous feature-usage data using the same telemetry

mechanism as other Hugging Face libraries (Transformers, Gradio, etc.). This

helps us understand which features are actively used so we can prioritize

maintenance and make informed decisions about deprecation.

### What is collected

Hardware and version info sent once per process (bitsandbytes version, OS, CPU

architecture, accelerator type and compute capability) plus one event per

distinct feature used per process.

### How to opt out

Set any of the following environment variables:

| Variable | Effect |

| ---------------------------- | ----------------------------------- |

| `HF_HUB_DISABLE_TELEMETRY=1` | Disables telemetry in all HF libs |

| `HF_HUB_OFFLINE=1` | Disables all outbound HF Hub calls |

| `DO_NOT_TRACK=1` | Standard cross-tool opt-out signal |

matthewdouglas added this to the v0.50.0 milestone Apr 20, 2026

matthewdouglas reviewed Apr 20, 2026

View reviewed changes

		@@ -0,0 +1,231 @@
		# Copyright (c) Facebook, Inc. and its affiliates.

		return info


		def report_feature(feature: str, details: Optional[dict[str, object]] = None) -> None:

Uh oh!

Conversation

Titus-von-Koeller commented Apr 20, 2026

Summary

Features tracked

What is never collected

End-to-end verification

Test plan

Documentation

Uh oh!

github-actions bot commented Apr 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthewdouglas Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

matthewdouglas Apr 20, 2026 •

edited

Loading