Upgrading transformers version to 5.5.4 by filyp · Pull Request #191 · locuslab/open-unlearning

filyp · 2026-05-13T11:42:13Z

What does this PR do?

It upgrades transformers to 5.5.4, to support newer models and features. Especially running MoE models much more efficiently (added in transformers 5).
Also bumps other dependencies to versions required by transformers==5.5.4. (I extensively used this setup in my unlearning experiments, without any issues.)
Adds a regression test for prediction_step, to make future upgrades easier and catch issues.
Adds a readme link to a prebuilt docker image with full environment (makes it easier to start using open-unlearning, especially on cloud GPUs that require specifying an image with the environment; helps reproducibility). If you'd like, I'll also commit the Dockerfile.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Have you gone through the contributions guide?
Are your changes documented? Read documentation guidelines here.

Tests

I used the newly added prediction_step regression test to verify validity; I ran make quality. I also run python src/train.py --config-name=unlearn.yaml experiment=unlearn/tofu/default eval=tofu_simple question_key=paraphrased_question eval.tofu.batch_size=16 trainer.args.report_to=wandb trainer=NPO task_name=transformers5.5.4 to regression test with runs in #175. Like before, the reported loss changes (due to a different scaling convention in new version), but the actual unlearning trajectory stays almost the same.

One thing worth flagging: in the newer transformers versions (not only this one), installing flash-attn is more messy. Simple pip install will trigger a long build process (which also can fail if local CUDA version mismatches); there are prebuild wheels, but only in 3rd party repos (I documented it in readme). I think the cleanest solution would be to actually remove flash-attn, given that with typical unlearning datasets it actually slows training down, as discussed in #190

Copilot

Pull request overview

This PR upgrades the project’s Hugging Face / PyTorch stack to transformers==5.5.4 (and required dependency bumps) to enable newer model support and improved MoE efficiency, and it adds a regression check around UnlearnTrainer.prediction_step behavior under the new transformers loss-normalization semantics.

Changes:

Bump core ML dependencies (transformers / torch / accelerate / bitsandbytes / huggingface-hub) and add flash-linear-attention.
Update unlearning trainer prediction_step to pass num_items_in_batch through to the base Trainer.compute_loss for transformers 5.x behavior.
Add a prediction_step regression script and update README installation guidance (flash-attn + Docker image link).

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`requirements.txt`	Pins new dependency versions for the transformers 5.5.4 upgrade.
`src/trainer/unlearn/base.py`	Updates `prediction_step` to call base `compute_loss` with `num_items_in_batch` (transformers 5.x normalization).
`src/data/utils.py`	Adapts `apply_chat_template(..., tokenize=True)` handling for transformers 5.x returning `BatchEncoding`.
`tests/prediction_step_regression.py`	Adds a regression check script for `prediction_step` loss/logits/labels behavior.
`README.md`	Updates flash-attn guidance and adds a link to a prebuilt Docker image.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

filyp · 2026-05-13T12:14:46Z

+MODEL_NAME = "meta-llama/Llama-3.2-1B-Instruct"
+SEED = 0
+
+
+def main():
+    torch.manual_seed(SEED)
+
+    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
+    if tokenizer.pad_token_id is None:
+        tokenizer.pad_token = tokenizer.eos_token
+
+    model = AutoModelForCausalLM.from_pretrained(
+        MODEL_NAME, torch_dtype=torch.float32, attn_implementation="sdpa"
+    )
+    model.eval()


Meant as manual test, not CI.

+# Or to avoid building flash-attn:
+pip install "https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.7.16/flash_attn-2.8.3+cu128torch2.9-cp311-cp311-linux_x86_64.whl"


 # python setup_data.py --help
 ```

+We also provide a [Docker image](https://hub.docker.com/r/filyp/open-unlearning), with this environment already installed.


filyp added 3 commits May 13, 2026 13:13

Bump transformers to 5.5.3; add prediction_step regression test

949d345

format base.py with ruff

e57e69c

(revert accidental custom utils commit)

6fe071e

Copilot AI review requested due to automatic review settings May 13, 2026 11:42

filyp temporarily deployed to tests May 13, 2026 11:42 — with GitHub Actions Inactive

Copilot started reviewing on behalf of filyp May 13, 2026 11:42 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

(fixes after automated review)

8031d92

filyp temporarily deployed to tests May 13, 2026 12:12 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrading transformers version to 5.5.4#191

Upgrading transformers version to 5.5.4#191
filyp wants to merge 4 commits into
locuslab:mainfrom
filyp:pr-transformers-5.5.3

filyp commented May 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

filyp May 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Or to avoid building flash-attn:
		pip install "https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.7.16/flash_attn-2.8.3+cu128torch2.9-cp311-cp311-linux_x86_64.whl"

Conversation

filyp commented May 13, 2026

What does this PR do?

Before submitting

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

filyp May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants