Skip to content

feature(tj): integrate PPO into UniZero framework#464

Open
tAnGjIa520 wants to merge 3 commits intoopendilab:mainfrom
tAnGjIa520:unizero_ppo_v1
Open

feature(tj): integrate PPO into UniZero framework#464
tAnGjIa520 wants to merge 3 commits intoopendilab:mainfrom
tAnGjIa520:unizero_ppo_v1

Conversation

@tAnGjIa520
Copy link
Copy Markdown
Contributor

Integrate PPO into UniZero

Key Changes

  • Add compute_loss_ppo() in world_model.py for PPO loss calculation
  • Integrate PPO hyperparameters and training logic in unizero.py
  • Add GAE computation and log_prob storage in muzero_collector.py
  • Add PPO data fields (advantage, return, old_log_prob) in game_segment.py
  • Support collect_with_pure_policy mode to bypass MCTS

将 PPO 集成到 UniZero

主要变更

  • world_model.py 中添加 compute_loss_ppo() 用于 PPO 损失计算
  • unizero.py 中集成 PPO 超参数和训练逻辑
  • muzero_collector.py 中添加 GAE 计算和对数概率存储
  • game_segment.py 中添加 PPO 数据字段(优势、回报、旧对数概率)
  • 支持 collect_with_pure_policy 模式以绕过 MCTS

- Replace manual GAE computation with ding.rl_utils.gae_data and gae
- Keep original implementation as _batch_compute_gae_for_pool_bak for backup
- Add test script to verify GAE computation correctness
- Fix lunarlander_env.py to handle both int and numpy array actions
- Add lunarlander_disc_unizero_ppo_config.py for PPO training
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant