Cited 0 time in
KL 유도 다중 업데이트 기반의 새로운 손실 인지 근접 정책 최적화 기법
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | 반태원 | - |
| dc.date.accessioned | 2025-08-06T07:30:09Z | - |
| dc.date.available | 2025-08-06T07:30:09Z | - |
| dc.date.issued | 2025-07 | - |
| dc.identifier.issn | 2234-4772 | - |
| dc.identifier.issn | 2288-4165 | - |
| dc.identifier.uri | https://scholarworks.gnu.ac.kr/handle/sw.gnu/79639 | - |
| dc.description.abstract | This paper proposes LA-PPO-KL (Loss-Aware Proximal Policy Optimization with KL-guided Retrying), a novel reinforcement learning algorithm that improves the stability and efficiency of PPO. Unlike traditional PPO, which relies on static clipping or KL thresholds, LA-PPO-KL monitors both policy and value loss to determine when to retry policy updates. It also halts retries when KL divergence exceeds a predefined limit, preventing excessive policy shifts. Experiments in the BipedalWalker-v3 environment demonstrate that LA-PPO-KL outperforms baseline PPO by 15~20% in average return, with faster convergence and more robust learning. These results highlight the potential of adaptive retry mechanisms in improving policy optimization under complex and uncertain environments. | - |
| dc.format.extent | 4 | - |
| dc.language | 한국어 | - |
| dc.language.iso | KOR | - |
| dc.publisher | 한국정보통신학회 | - |
| dc.title | KL 유도 다중 업데이트 기반의 새로운 손실 인지 근접 정책 최적화 기법 | - |
| dc.title.alternative | A New Loss-Aware Proximal Policy Optimization Based On KL-Guided Multi-Updates | - |
| dc.type | Article | - |
| dc.publisher.location | 대한민국 | - |
| dc.identifier.bibliographicCitation | 한국정보통신학회논문지, v.29, no.7, pp 960 - 963 | - |
| dc.citation.title | 한국정보통신학회논문지 | - |
| dc.citation.volume | 29 | - |
| dc.citation.number | 7 | - |
| dc.citation.startPage | 960 | - |
| dc.citation.endPage | 963 | - |
| dc.type.docType | Y | - |
| dc.identifier.kciid | ART003228585 | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | kci | - |
| dc.subject.keywordAuthor | Proximal Policy Optimization | - |
| dc.subject.keywordAuthor | Adaptive Policy Update | - |
| dc.subject.keywordAuthor | Reinforcement Learning Stability | - |
| dc.subject.keywordAuthor | KL Divergence Control. | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
Gyeongsang National University Central Library, 501, Jinju-daero, Jinju-si, Gyeongsangnam-do, 52828, Republic of Korea+82-55-772-0532
COPYRIGHT 2022 GYEONGSANG NATIONAL UNIVERSITY LIBRARY. ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
