트랜스포머 기반 BERT를 활용한 비특허 문헌 자동 분류의 성능 향상 방안 연구

김성원; 안민영; 유동희

doi:10.5859/KAIS.2025.34.1.155

상세 보기

트랜스포머 기반 BERT를 활용한 비특허 문헌 자동 분류의 성능 향상 방안 연구

Using Transformer-Based BERT for Improving the Performance of Automatic Non-Patent Literature Classification

김성원;
안민영;
유동희

Citations

WEB OF SCIENCE

0

Citations

SCOPUS

0

초록

Purpose Non-Patent Literature (NPL) plays a crucial role in patent examination but is difficult to classify due to its vast volume and diverse formats. This study proposes an approach utilizing BERT-based Natural Language Processing (NLP) techniques to automatically classify NPL and assign Cooperative Patent Classification (CPC) codes. Design/methodology/approach NPL abstracts cited in U.S. patents were collected from KIPRIS Plus. The study applied vectorization techniques such as TF-IDF, SBERT, and anferico/bert-for-patents, and compared classification performance using Logistic Regression, XGBoost, LightGBM, BERT, RoBERTa, and anferico/bert-for-patents models. Findings The anferico/bert-for-patents model, specialized for patent documents, achieved the highest classification accuracy (56.3%) and effectively captured the semantic representation of NPL. This study contributes to improving NPL search and classification efficiency, enhancing the prior art search process in patent examination.

키워드

Non-Patent Literature; Classification Model; BERT; Transformer; CPC

제목: 트랜스포머 기반 BERT를 활용한 비특허 문헌 자동 분류의 성능 향상 방안 연구

제목 (타언어): Using Transformer-Based BERT for Improving the Performance of Automatic Non-Patent Literature Classification

저자: 김성원; 안민영; 유동희

DOI: 10.5859/KAIS.2025.34.1.155

발행일: 2025-03

저널명: 정보시스템연구

권: 34

호: 1

페이지: 155 ~ 170

ScholarWorks@경상국립대학교

상세 보기

초록

키워드