KL 유도 다중 업데이트 기반의 새로운 손실 인지 근접 정책 최적화 기법

반태원

상세 보기

A New Loss-Aware Proximal Policy Optimization Based On KL-Guided Multi-Updates

반태원

초록

This paper proposes LA-PPO-KL (Loss-Aware Proximal Policy Optimization with KL-guided Retrying), a novel reinforcement learning algorithm that improves the stability and efficiency of PPO. Unlike traditional PPO, which relies on static clipping or KL thresholds, LA-PPO-KL monitors both policy and value loss to determine when to retry policy updates. It also halts retries when KL divergence exceeds a predefined limit, preventing excessive policy shifts. Experiments in the BipedalWalker-v3 environment demonstrate that LA-PPO-KL outperforms baseline PPO by 15~20% in average return, with faster convergence and more robust learning. These results highlight the potential of adaptive retry mechanisms in improving policy optimization under complex and uncertain environments.

키워드

Proximal Policy Optimization; Adaptive Policy Update; Reinforcement Learning Stability; KL Divergence Control.

제목: KL 유도 다중 업데이트 기반의 새로운 손실 인지 근접 정책 최적화 기법

제목 (타언어): A New Loss-Aware Proximal Policy Optimization Based On KL-Guided Multi-Updates

저자: 반태원

발행일: 2025-07

유형: Y

저널명: 한국정보통신학회논문지

권: 29

호: 7

페이지: 960 ~ 963

ScholarWorks@경상국립대학교

상세 보기

초록

키워드