트랜스포머 기반 BERT를 활용한 비특허 문헌 자동 분류의 성능 향상 방안 연구Using Transformer-Based BERT for Improving the Performance of Automatic Non-Patent Literature Classification
- Other Titles
- Using Transformer-Based BERT for Improving the Performance of Automatic Non-Patent Literature Classification
- Authors
- 김성원; 안민영; 유동희
- Issue Date
- Mar-2025
- Publisher
- 한국정보시스템학회
- Keywords
- Non-Patent Literature; Classification Model; BERT; Transformer; CPC
- Citation
- 정보시스템연구, v.34, no.1, pp 155 - 170
- Pages
- 16
- Indexed
- KCI
- Journal Title
- 정보시스템연구
- Volume
- 34
- Number
- 1
- Start Page
- 155
- End Page
- 170
- URI
- https://scholarworks.gnu.ac.kr/handle/sw.gnu/78015
- DOI
- 10.5859/KAIS.2025.34.1.155
- ISSN
- 1229-8476
2733-8770
- Abstract
- Purpose Non-Patent Literature (NPL) plays a crucial role in patent examination but is difficult to classify due to its vast volume and diverse formats. This study proposes an approach utilizing BERT-based Natural Language Processing (NLP) techniques to automatically classify NPL and assign Cooperative Patent Classification (CPC) codes.
Design/methodology/approach NPL abstracts cited in U.S. patents were collected from KIPRIS Plus. The study applied vectorization techniques such as TF-IDF, SBERT, and anferico/bert-for-patents, and compared classification performance using Logistic Regression, XGBoost, LightGBM, BERT, RoBERTa, and anferico/bert-for-patents models.
Findings The anferico/bert-for-patents model, specialized for patent documents, achieved the highest classification accuracy (56.3%) and effectively captured the semantic representation of NPL. This study contributes to improving NPL search and classification efficiency, enhancing the prior art search process in patent examination.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Business Administration > Department of Management Information Systems > Journal Articles
- 학과간협동과정 > 지식재산융합학과 > Journal Articles
- 인문사회계열 > 경영정보학과 > Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.