KL 유도 다중 업데이트 기반의 새로운 손실 인지 근접 정책 최적화 기법
A New Loss-Aware Proximal Policy Optimization Based On KL-Guided Multi-Updates
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

This paper proposes LA-PPO-KL (Loss-Aware Proximal Policy Optimization with KL-guided Retrying), a novel reinforcement learning algorithm that improves the stability and efficiency of PPO. Unlike traditional PPO, which relies on static clipping or KL thresholds, LA-PPO-KL monitors both policy and value loss to determine when to retry policy updates. It also halts retries when KL divergence exceeds a predefined limit, preventing excessive policy shifts. Experiments in the BipedalWalker-v3 environment demonstrate that LA-PPO-KL outperforms baseline PPO by 15~20% in average return, with faster convergence and more robust learning. These results highlight the potential of adaptive retry mechanisms in improving policy optimization under complex and uncertain environments.

키워드

Proximal Policy OptimizationAdaptive Policy UpdateReinforcement Learning StabilityKL Divergence Control.
제목
KL 유도 다중 업데이트 기반의 새로운 손실 인지 근접 정책 최적화 기법
제목 (타언어)
A New Loss-Aware Proximal Policy Optimization Based On KL-Guided Multi-Updates
저자
반태원
발행일
2025-07
유형
Y
저널명
한국정보통신학회논문지
29
7
페이지
960 ~ 963