KL 유도 다중 업데이트 기반의 새로운 손실 인지 근접 정책 최적화 기법A New Loss-Aware Proximal Policy Optimization Based On KL-Guided Multi-Updates
- Other Titles
- A New Loss-Aware Proximal Policy Optimization Based On KL-Guided Multi-Updates
- Authors
- 반태원
- Issue Date
- Jul-2025
- Publisher
- 한국정보통신학회
- Keywords
- Proximal Policy Optimization; Adaptive Policy Update; Reinforcement Learning Stability; KL Divergence Control.
- Citation
- 한국정보통신학회논문지, v.29, no.7, pp 960 - 963
- Pages
- 4
- Indexed
- KCI
- Journal Title
- 한국정보통신학회논문지
- Volume
- 29
- Number
- 7
- Start Page
- 960
- End Page
- 963
- URI
- https://scholarworks.gnu.ac.kr/handle/sw.gnu/79639
- ISSN
- 2234-4772
2288-4165
- Abstract
- This paper proposes LA-PPO-KL (Loss-Aware Proximal Policy Optimization with KL-guided Retrying), a novel reinforcement learning algorithm that improves the stability and efficiency of PPO. Unlike traditional PPO, which relies on static clipping or KL thresholds, LA-PPO-KL monitors both policy and value loss to determine when to retry policy updates. It also halts retries when KL divergence exceeds a predefined limit, preventing excessive policy shifts. Experiments in the BipedalWalker-v3 environment demonstrate that LA-PPO-KL outperforms baseline PPO by 15~20% in average return, with faster convergence and more robust learning. These results highlight the potential of adaptive retry mechanisms in improving policy optimization under complex and uncertain environments.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - ETC > Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.