Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

KL 유도 다중 업데이트 기반의 새로운 손실 인지 근접 정책 최적화 기법A New Loss-Aware Proximal Policy Optimization Based On KL-Guided Multi-Updates

Other Titles
A New Loss-Aware Proximal Policy Optimization Based On KL-Guided Multi-Updates
Authors
반태원
Issue Date
Jul-2025
Publisher
한국정보통신학회
Keywords
Proximal Policy Optimization; Adaptive Policy Update; Reinforcement Learning Stability; KL Divergence Control.
Citation
한국정보통신학회논문지, v.29, no.7, pp 960 - 963
Pages
4
Indexed
KCI
Journal Title
한국정보통신학회논문지
Volume
29
Number
7
Start Page
960
End Page
963
URI
https://scholarworks.gnu.ac.kr/handle/sw.gnu/79639
ISSN
2234-4772
2288-4165
Abstract
This paper proposes LA-PPO-KL (Loss-Aware Proximal Policy Optimization with KL-guided Retrying), a novel reinforcement learning algorithm that improves the stability and efficiency of PPO. Unlike traditional PPO, which relies on static clipping or KL thresholds, LA-PPO-KL monitors both policy and value loss to determine when to retry policy updates. It also halts retries when KL divergence exceeds a predefined limit, preventing excessive policy shifts. Experiments in the BipedalWalker-v3 environment demonstrate that LA-PPO-KL outperforms baseline PPO by 15~20% in average return, with faster convergence and more robust learning. These results highlight the potential of adaptive retry mechanisms in improving policy optimization under complex and uncertain environments.
Files in This Item
There are no files associated with this item.
Appears in
Collections
ETC > Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Ban, Tae Won photo

Ban, Tae Won
IT공과대학 (AI정보공학과)
Read more

Altmetrics

Total Views & Downloads

BROWSE