Skip to content

Upgrading transformers version to 5.5.4#191

Open
filyp wants to merge 4 commits into
locuslab:mainfrom
filyp:pr-transformers-5.5.3
Open

Upgrading transformers version to 5.5.4#191
filyp wants to merge 4 commits into
locuslab:mainfrom
filyp:pr-transformers-5.5.3

Conversation

@filyp
Copy link
Copy Markdown
Contributor

@filyp filyp commented May 13, 2026

What does this PR do?

  • It upgrades transformers to 5.5.4, to support newer models and features. Especially running MoE models much more efficiently (added in transformers 5).
  • Also bumps other dependencies to versions required by transformers==5.5.4. (I extensively used this setup in my unlearning experiments, without any issues.)
  • Adds a regression test for prediction_step, to make future upgrades easier and catch issues.
  • Adds a readme link to a prebuilt docker image with full environment (makes it easier to start using open-unlearning, especially on cloud GPUs that require specifying an image with the environment; helps reproducibility). If you'd like, I'll also commit the Dockerfile.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Have you gone through the contributions guide?
  • Are your changes documented? Read documentation guidelines here.

Tests

I used the newly added prediction_step regression test to verify validity; I ran make quality. I also run python src/train.py --config-name=unlearn.yaml experiment=unlearn/tofu/default eval=tofu_simple question_key=paraphrased_question eval.tofu.batch_size=16 trainer.args.report_to=wandb trainer=NPO task_name=transformers5.5.4 to regression test with runs in #175. Like before, the reported loss changes (due to a different scaling convention in new version), but the actual unlearning trajectory stays almost the same.

image

One thing worth flagging: in the newer transformers versions (not only this one), installing flash-attn is more messy. Simple pip install will trigger a long build process (which also can fail if local CUDA version mismatches); there are prebuild wheels, but only in 3rd party repos (I documented it in readme). I think the cleanest solution would be to actually remove flash-attn, given that with typical unlearning datasets it actually slows training down, as discussed in #190

Copilot AI review requested due to automatic review settings May 13, 2026 11:42
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades the project’s Hugging Face / PyTorch stack to transformers==5.5.4 (and required dependency bumps) to enable newer model support and improved MoE efficiency, and it adds a regression check around UnlearnTrainer.prediction_step behavior under the new transformers loss-normalization semantics.

Changes:

  • Bump core ML dependencies (transformers / torch / accelerate / bitsandbytes / huggingface-hub) and add flash-linear-attention.
  • Update unlearning trainer prediction_step to pass num_items_in_batch through to the base Trainer.compute_loss for transformers 5.x behavior.
  • Add a prediction_step regression script and update README installation guidance (flash-attn + Docker image link).

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
requirements.txt Pins new dependency versions for the transformers 5.5.4 upgrade.
src/trainer/unlearn/base.py Updates prediction_step to call base compute_loss with num_items_in_batch (transformers 5.x normalization).
src/data/utils.py Adapts apply_chat_template(..., tokenize=True) handling for transformers 5.x returning BatchEncoding.
tests/prediction_step_regression.py Adds a regression check script for prediction_step loss/logits/labels behavior.
README.md Updates flash-attn guidance and adds a link to a prebuilt Docker image.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/prediction_step_regression.py Outdated
Comment on lines +20 to +34
MODEL_NAME = "meta-llama/Llama-3.2-1B-Instruct"
SEED = 0


def main():
torch.manual_seed(SEED)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
if tokenizer.pad_token_id is None:
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME, torch_dtype=torch.float32, attn_implementation="sdpa"
)
model.eval()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meant as manual test, not CI.

Comment thread tests/prediction_step_regression.py
Comment thread tests/prediction_step_regression.py
Comment thread src/trainer/unlearn/base.py
Comment thread src/trainer/unlearn/base.py
Comment thread README.md
Comment on lines +119 to +120
# Or to avoid building flash-attn:
pip install "https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.7.16/flash_attn-2.8.3+cu128torch2.9-cp311-cp311-linux_x86_64.whl"
Comment thread README.md
# python setup_data.py --help
```

We also provide a [Docker image](https://hub.docker.com/r/filyp/open-unlearning), with this environment already installed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants