feature(tj): integrate PPO into UniZero framework by tAnGjIa520 · Pull Request #464 · opendilab/LightZero

tAnGjIa520 · 2026-01-13T12:54:20Z

Integrate PPO into UniZero

Key Changes

Add compute_loss_ppo() in world_model.py for PPO loss calculation
Integrate PPO hyperparameters and training logic in unizero.py
Add GAE computation and log_prob storage in muzero_collector.py
Add PPO data fields (advantage, return, old_log_prob) in game_segment.py
Support collect_with_pure_policy mode to bypass MCTS

将 PPO 集成到 UniZero

主要变更

在 world_model.py 中添加 compute_loss_ppo() 用于 PPO 损失计算
在 unizero.py 中集成 PPO 超参数和训练逻辑
在 muzero_collector.py 中添加 GAE 计算和对数概率存储
在 game_segment.py 中添加 PPO 数据字段（优势、回报、旧对数概率）
支持 collect_with_pure_policy 模式以绕过 MCTS

…o-updates

- Replace manual GAE computation with ding.rl_utils.gae_data and gae - Keep original implementation as _batch_compute_gae_for_pool_bak for backup - Add test script to verify GAE computation correctness - Fix lunarlander_env.py to handle both int and numpy array actions - Add lunarlander_disc_unizero_ppo_config.py for PPO training

tAnGjIa520 added 3 commits January 13, 2026 20:50

Add PPO-related files from main_ppo to unizero_ppo branch

c9ab7b6

feat(tj): add PPO config files for Atari and CartPole from unizero-pp…

1d44006

…o-updates

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature(tj): integrate PPO into UniZero framework#464

feature(tj): integrate PPO into UniZero framework#464
tAnGjIa520 wants to merge 3 commits intoopendilab:mainfrom
tAnGjIa520:unizero_ppo_v1

tAnGjIa520 commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tAnGjIa520 commented Jan 13, 2026

Integrate PPO into UniZero

Key Changes

将 PPO 集成到 UniZero

主要变更

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant