feat: add anonymous feature-usage telemetry#1928
feat: add anonymous feature-usage telemetry#1928Titus-von-Koeller wants to merge 1 commit intomainfrom
Conversation
New `bitsandbytes._telemetry.report_feature()` sends one event per distinct feature per process via `huggingface_hub.utils.send_telemetry()`, mirroring the pattern Transformers uses for its `quant` user-agent field. Data lands in the Hub telemetry index under `path_prefix=/api/telemetry/bitsandbytes/` and informs which features are worth maintaining or retiring. Wired at: Linear4bit/Linear8bitLt forward, Params4bit/Int8Params __new__, all Embedding variants, Optimizer8bit.step, GlobalOptimManager overrides, OutlierAwareLinear and int8_double_quant (deprecation candidates). All metadata keys namespaced under `bitsandbytes.*`. Fingerprint carries bnb version, OS, arch, libc, Python/torch versions, and accelerator vendor / name / arch / count. No model names, file paths, or user-derived values are ever sent. Opt-out via BNB_DISABLE_TELEMETRY, HF_HUB_DISABLE_TELEMETRY, or HF_HUB_OFFLINE. Auto-disabled under pytest so CI and local test runs don't pollute the real-usage stream. Silent no-op when huggingface_hub is not installed. End-to-end verification: `scripts/verify_telemetry.py` emits every feature once tagged with a unique run_id via BNB_TELEMETRY_TAG, for correlation in Elasticsearch queries on `ds-hub-telemetry`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| def forward(self, x: torch.Tensor): | ||
| report_feature( | ||
| "linear_4bit", | ||
| { | ||
| "quant_type": getattr(self.weight, "quant_type", "unknown"), | ||
| "blocksize": getattr(self.weight, "blocksize", 0), | ||
| "compress_statistics": getattr(self.weight, "compress_statistics", False), | ||
| "input_dtype": str(x.dtype).replace("torch.", ""), | ||
| "compute_dtype": (str(self.compute_dtype).replace("torch.", "") if self.compute_dtype else "auto"), | ||
| }, | ||
| ) |
There was a problem hiding this comment.
I would prefer we do this in __init__ rather than add unnecessary overhead in the forward() hot path. Plus most of this is in __init__ - you don't need to use all these getattr calls.
| report_feature( | ||
| "linear_8bit", | ||
| { | ||
| "has_fp16_weights": self.state.has_fp16_weights, | ||
| "threshold": self.state.threshold, | ||
| "input_dtype": str(x.dtype).replace("torch.", ""), | ||
| }, | ||
| ) |
There was a problem hiding this comment.
Same comment as with Linear4bit, this is best in __init__ and not in forward.
| report_feature( | ||
| "optimizer", | ||
| { | ||
| "name": type(self).__name__, | ||
| "is_paged": self.is_paged, | ||
| }, | ||
| ) |
There was a problem hiding this comment.
Would prefer this to be in init of optimizers as well rather than in step(). Also maybe would be nice to see optim_bits.
| @@ -0,0 +1,231 @@ | |||
| # Copyright (c) Facebook, Inc. and its affiliates. | |||
There was a problem hiding this comment.
minor nit: copyright here is wrong, let's take this out or replace with more appropriate
| runs in CI and locally do not pollute the real-usage stream. | ||
|
|
||
| Opt-out (any of the following env vars disables all telemetry): | ||
| - BNB_DISABLE_TELEMETRY=1 (bitsandbytes only) |
There was a problem hiding this comment.
I am not sure we need to roll our own; it's cleaner to just reuse the existing? I don't see a use case for e.g. opting out of HF Hub telemetry but still opting in for BNB?
| End-to-end verification: | ||
| Set `BNB_TELEMETRY_TAG=<some-id>` before importing bitsandbytes and the | ||
| value is attached as `bitsandbytes.tag` on every event. Use this to | ||
| correlate a single run's events in ES. |
There was a problem hiding this comment.
Seems unnecessary/overkill?
| def _is_pytest() -> bool: | ||
| """Detect whether we are running inside a pytest process. | ||
|
|
||
| Telemetry is suppressed during test runs so that CI and local test | ||
| invocations don't pollute the real-usage stream. Tests that want to | ||
| assert on telemetry behavior monkey-patch this function to return False. | ||
| """ | ||
| return "pytest" in sys.modules or "PYTEST_CURRENT_TEST" in os.environ |
There was a problem hiding this comment.
I would consider looking at other env variables and not bother with the "pytest" in sys.modules condition.
Most CI platforms will have an env var like CI for this.
| if os_name == "Windows": | ||
| try: | ||
| build = sys.getwindowsversion().build | ||
| os_version = f"11 (build {build})" if build >= 22000 else f"10 (build {build})" |
There was a problem hiding this comment.
This seems fragile and also ignores Windows Server etc.
| try: | ||
| import torch | ||
|
|
||
| info["bitsandbytes.torch"] = torch.__version__ | ||
| except ImportError: | ||
| pass |
There was a problem hiding this comment.
I think this is redundant, does hf hub automatically collect this?
| if feature in _REPORTED: | ||
| return | ||
| _REPORTED.add(feature) | ||
|
|
||
| if _is_disabled(): | ||
| return |
There was a problem hiding this comment.
We may want to to de-duping more granular than just the "feature" name as it is. But maybe we just name the features differently in that case. So that's more of a minor nit.
Should we add to _REPORTED even when disabled? Seems to me we should just exit right away.
| tag = os.environ.get("BNB_TELEMETRY_TAG", "").strip() | ||
| if tag: | ||
| user_agent["bitsandbytes.tag"] = tag |
There was a problem hiding this comment.
Same as comment earlier, seems unnecessary.
| if torch.cuda.is_available(): | ||
| vendor = "amd" if getattr(torch.version, "hip", None) else "nvidia" | ||
| info["bitsandbytes.accel"] = vendor | ||
| info["bitsandbytes.accel_count"] = str(torch.cuda.device_count()) | ||
| props = torch.cuda.get_device_properties(0) | ||
| info["bitsandbytes.accel_name"] = props.name | ||
| if vendor == "nvidia": | ||
| info["bitsandbytes.accel_arch"] = f"sm_{props.major}{props.minor}" | ||
| else: | ||
| info["bitsandbytes.accel_arch"] = getattr(props, "gcnArchName", "unknown") | ||
| return info | ||
|
|
There was a problem hiding this comment.
This only looks at the first device; I'm not sure but we may be interested when there's multiple devices and they're different. I'm wondering if for that maybe we just add some sort of flag to tell us whether it is a heterogeneous system or not. Likely it is, but may be valuable to find out otherwise.
Let's grab device 0's SM count and memory. We don't really need the name. So this should be for both AMD and NVIDIA the multi_processor_count and total_memory properties. Keep gcnArchName and major/minor.
| try: | ||
| import torch | ||
| except ImportError: |
There was a problem hiding this comment.
torch is already a pretty hard dep, this shouldnt need to be caught
| report_feature( | ||
| "params_4bit", | ||
| { | ||
| "quant_type": quant_type, | ||
| "blocksize": blocksize, | ||
| "compress_statistics": compress_statistics, | ||
| "quant_storage": str(quant_storage).replace("torch.", ""), | ||
| }, |
There was a problem hiding this comment.
Starting to think we don't need this here, would prefer we just keep Linear4bit and Linear8bitLt but remove this on Params4bit/Int8Params.
| report_feature( | ||
| "optim_override_config", | ||
| {"keys": ",".join(sorted(key_value_dict.keys()))}, | ||
| ) |
There was a problem hiding this comment.
Not particularly interested in tracking this
| report_feature( | ||
| "optim_register_module_override", | ||
| {"keys": ",".join(sorted(config.keys())) if isinstance(config, dict) else "unknown"}, | ||
| ) |
There was a problem hiding this comment.
Likewise not really interested in tracking this either
| return info | ||
|
|
||
|
|
||
| def report_feature(feature: str, details: Optional[dict[str, object]] = None) -> None: |
There was a problem hiding this comment.
I think for more clarity we should just name this _report_feature as well.
| "bitsandbytes.os": os_name, | ||
| "bitsandbytes.os_version": os_version, | ||
| "bitsandbytes.arch": platform.machine(), | ||
| "bitsandbytes.python": platform.python_version(), |
There was a problem hiding this comment.
I think this is redundant too, huggingface_hub likely includes Python version already
| ## Telemetry | ||
|
|
||
| `bitsandbytes` sends anonymous, aggregate feature-usage telemetry to the | ||
| Hugging Face Hub. This data is used to prioritize maintenance (which quantization | ||
| methods and optimizers are actually in use?) and to safely retire features that | ||
| are no longer called by anyone. | ||
|
|
||
| ### What is collected | ||
|
|
||
| * A session fingerprint sent once per process: `bitsandbytes` version, OS | ||
| name/version, CPU architecture, Python/PyTorch versions, accelerator | ||
| vendor/name/arch/count (e.g. `nvidia`, `NVIDIA H100`, `sm_90`, `1`). | ||
| * One event per distinct feature used, with feature-specific flags. For | ||
| example: using `Linear4bit` sends `quant_type=nf4`, `blocksize=64`; using | ||
| `AdamW8bit.step()` sends `name=AdamW8bit`, `is_paged=false`. | ||
|
|
||
| ### What is never collected | ||
|
|
||
| Model names, file paths, tensor shapes, parameter values, user identifiers, or | ||
| anything derived from user input. | ||
|
|
||
| ### How to opt out | ||
|
|
||
| Set any one of these environment variables: | ||
|
|
||
| | Variable | Scope | | ||
| | ---------------------------- | ---------------------------- | | ||
| | `BNB_DISABLE_TELEMETRY=1` | `bitsandbytes` only | | ||
| | `HF_HUB_DISABLE_TELEMETRY=1` | all Hugging Face libraries | | ||
| | `HF_HUB_OFFLINE=1` | all Hugging Face libraries | | ||
|
|
||
| Telemetry is also automatically suppressed while running under `pytest` (so | ||
| CI and local test runs don't pollute the stream) and a silent no-op when | ||
| `huggingface_hub` is not installed. The implementation lives in | ||
| [`bitsandbytes/_telemetry.py`](bitsandbytes/_telemetry.py) and each event | ||
| fires at most once per process. |
There was a problem hiding this comment.
Let's simplify this:
| ## Telemetry | |
| `bitsandbytes` sends anonymous, aggregate feature-usage telemetry to the | |
| Hugging Face Hub. This data is used to prioritize maintenance (which quantization | |
| methods and optimizers are actually in use?) and to safely retire features that | |
| are no longer called by anyone. | |
| ### What is collected | |
| * A session fingerprint sent once per process: `bitsandbytes` version, OS | |
| name/version, CPU architecture, Python/PyTorch versions, accelerator | |
| vendor/name/arch/count (e.g. `nvidia`, `NVIDIA H100`, `sm_90`, `1`). | |
| * One event per distinct feature used, with feature-specific flags. For | |
| example: using `Linear4bit` sends `quant_type=nf4`, `blocksize=64`; using | |
| `AdamW8bit.step()` sends `name=AdamW8bit`, `is_paged=false`. | |
| ### What is never collected | |
| Model names, file paths, tensor shapes, parameter values, user identifiers, or | |
| anything derived from user input. | |
| ### How to opt out | |
| Set any one of these environment variables: | |
| | Variable | Scope | | |
| | ---------------------------- | ---------------------------- | | |
| | `BNB_DISABLE_TELEMETRY=1` | `bitsandbytes` only | | |
| | `HF_HUB_DISABLE_TELEMETRY=1` | all Hugging Face libraries | | |
| | `HF_HUB_OFFLINE=1` | all Hugging Face libraries | | |
| Telemetry is also automatically suppressed while running under `pytest` (so | |
| CI and local test runs don't pollute the stream) and a silent no-op when | |
| `huggingface_hub` is not installed. The implementation lives in | |
| [`bitsandbytes/_telemetry.py`](bitsandbytes/_telemetry.py) and each event | |
| fires at most once per process. | |
| ## Telemetry | |
| bitsandbytes collects anonymous feature-usage data using the same telemetry | |
| mechanism as other Hugging Face libraries (Transformers, Gradio, etc.). This | |
| helps us understand which features are actively used so we can prioritize | |
| maintenance and make informed decisions about deprecation. | |
| ### What is collected | |
| Hardware and version info sent once per process (bitsandbytes version, OS, CPU | |
| architecture, accelerator type and compute capability) plus one event per | |
| distinct feature used per process. | |
| ### How to opt out | |
| Set any of the following environment variables: | |
| | Variable | Effect | | |
| | ---------------------------- | ----------------------------------- | | |
| | `HF_HUB_DISABLE_TELEMETRY=1` | Disables telemetry in all HF libs | | |
| | `HF_HUB_OFFLINE=1` | Disables all outbound HF Hub calls | | |
| | `DO_NOT_TRACK=1` | Standard cross-tool opt-out signal | | |
Summary
Adds lightweight, opt-out, anonymous feature-usage telemetry via
huggingface_hub.utils.send_telemetry(), mirroring the pattern Transformers already uses for itsquantuser-agent field. Data lands in the HF Hub telemetry index underpath_prefix=/api/telemetry/bitsandbytes/and answers two concrete questions: which features deserve continued maintenance and which are safe to retire.Key design points:
bitsandbytes.*so they don't collide in the shared telemetry index.BNB_DISABLE_TELEMETRY,HF_HUB_DISABLE_TELEMETRY, orHF_HUB_OFFLINE.huggingface_hubis not installed.huggingface_hubstays optional.Features tracked
linear_4bitLinear4bit.forwardlinear_8bitLinear8bitLt.forwardparams_4bitParams4bit.__new__Linear4bit(PEFT, vLLM, custom)int8_paramsInt8Params.__new__embeddingvariant=stable|standard|8bit|4bitoptimizerOptimizer8bit.stepis_pagedoptim_override_configGlobalOptimManager.override_configoptim_register_module_overrideGlobalOptimManager.register_module_overrideoutlier_aware_linearOutlierAwareLinear.__init__int8_double_quantint8_double_quant()What is never collected
Model names, file paths, tensor shapes, parameter values, user identifiers, or anything derived from user input. The
send_telemetry()call explicitly passestoken=False, so no auth / identity info is attached.End-to-end verification
scripts/verify_telemetry.pyexercises every wired-up feature once, tagging every event with a uniqueBNB_TELEMETRY_TAGso a single run's events can be correlated inds-hub-telemetry:Test plan
tests/test_telemetry.pypass (dedup, namespacing, stringification, fingerprint fields, all three opt-out env vars, truthy-value parsing, tag attachment, graceful fallback whenhuggingface_hubis missing, exception swallowing, pytest auto-detection).tests/test_linear4bit.pyandtests/test_linear8bitlt.pypass — no regressions (322 tests).tests/test_modules.pyandtests/test_optim.pypass — no regressions (509 tests).pre-commit run --all-filespasses.scripts/verify_telemetry.pyfired withrun_id=verify-4738a43c— to be confirmed inds-hub-telemetry.Documentation
README
## Telemetrysection explains what is collected, what is not, and the three opt-out env vars. The module docstring inbitsandbytes/_telemetry.pyis the authoritative reference.🤖 Generated with Claude Code