Upgrading transformers version to 5.5.4#191
Open
filyp wants to merge 4 commits into
Open
Conversation
There was a problem hiding this comment.
Pull request overview
This PR upgrades the project’s Hugging Face / PyTorch stack to transformers==5.5.4 (and required dependency bumps) to enable newer model support and improved MoE efficiency, and it adds a regression check around UnlearnTrainer.prediction_step behavior under the new transformers loss-normalization semantics.
Changes:
- Bump core ML dependencies (transformers / torch / accelerate / bitsandbytes / huggingface-hub) and add
flash-linear-attention. - Update unlearning trainer
prediction_stepto passnum_items_in_batchthrough to the baseTrainer.compute_lossfor transformers 5.x behavior. - Add a
prediction_stepregression script and update README installation guidance (flash-attn + Docker image link).
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
requirements.txt |
Pins new dependency versions for the transformers 5.5.4 upgrade. |
src/trainer/unlearn/base.py |
Updates prediction_step to call base compute_loss with num_items_in_batch (transformers 5.x normalization). |
src/data/utils.py |
Adapts apply_chat_template(..., tokenize=True) handling for transformers 5.x returning BatchEncoding. |
tests/prediction_step_regression.py |
Adds a regression check script for prediction_step loss/logits/labels behavior. |
README.md |
Updates flash-attn guidance and adds a link to a prebuilt Docker image. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+20
to
+34
| MODEL_NAME = "meta-llama/Llama-3.2-1B-Instruct" | ||
| SEED = 0 | ||
|
|
||
|
|
||
| def main(): | ||
| torch.manual_seed(SEED) | ||
|
|
||
| tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) | ||
| if tokenizer.pad_token_id is None: | ||
| tokenizer.pad_token = tokenizer.eos_token | ||
|
|
||
| model = AutoModelForCausalLM.from_pretrained( | ||
| MODEL_NAME, torch_dtype=torch.float32, attn_implementation="sdpa" | ||
| ) | ||
| model.eval() |
Contributor
Author
There was a problem hiding this comment.
Meant as manual test, not CI.
Comment on lines
+119
to
+120
| # Or to avoid building flash-attn: | ||
| pip install "https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.7.16/flash_attn-2.8.3+cu128torch2.9-cp311-cp311-linux_x86_64.whl" |
| # python setup_data.py --help | ||
| ``` | ||
|
|
||
| We also provide a [Docker image](https://hub.docker.com/r/filyp/open-unlearning), with this environment already installed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
transformers==5.5.4. (I extensively used this setup in my unlearning experiments, without any issues.)prediction_step, to make future upgrades easier and catch issues.Before submitting
Tests
I used the newly added
prediction_stepregression test to verify validity; I ranmake quality. I also runpython src/train.py --config-name=unlearn.yaml experiment=unlearn/tofu/default eval=tofu_simple question_key=paraphrased_question eval.tofu.batch_size=16 trainer.args.report_to=wandb trainer=NPO task_name=transformers5.5.4to regression test with runs in #175. Like before, the reported loss changes (due to a different scaling convention in new version), but the actual unlearning trajectory stays almost the same.One thing worth flagging: in the newer transformers versions (not only this one), installing flash-attn is more messy. Simple pip install will trigger a long build process (which also can fail if local CUDA version mismatches); there are prebuild wheels, but only in 3rd party repos (I documented it in readme). I think the cleanest solution would be to actually remove flash-attn, given that with typical unlearning datasets it actually slows training down, as discussed in #190