상세 보기
WEB OF SCIENCE
0SCOPUS
0초록
This paper proposes LA-PPO-KL (Loss-Aware Proximal Policy Optimization with KL-guided Retrying), a novel reinforcement learning algorithm that improves the stability and efficiency of PPO. Unlike traditional PPO, which relies on static clipping or KL thresholds, LA-PPO-KL monitors both policy and value loss to determine when to retry policy updates. It also halts retries when KL divergence exceeds a predefined limit, preventing excessive policy shifts. Experiments in the BipedalWalker-v3 environment demonstrate that LA-PPO-KL outperforms baseline PPO by 15~20% in average return, with faster convergence and more robust learning. These results highlight the potential of adaptive retry mechanisms in improving policy optimization under complex and uncertain environments.
키워드
- 제목
- KL 유도 다중 업데이트 기반의 새로운 손실 인지 근접 정책 최적화 기법
- 제목 (타언어)
- A New Loss-Aware Proximal Policy Optimization Based On KL-Guided Multi-Updates
- 저자
- 반태원
- 발행일
- 2025-07
- 유형
- Y
- 저널명
- 한국정보통신학회논문지
- 권
- 29
- 호
- 7
- 페이지
- 960 ~ 963