A Comparison of Pretrained Models for Classifying Issue Reports
  • Heo, Jueun
  • Kwon, Gibeom
  • Kwak, Changwon
  • Lee, Seonah
Citations

WEB OF SCIENCE

2
Citations

SCOPUS

6

초록

Issues are evolving requirements which are the main factor that increase the cost of software evolution. To help developers manage issues, GitHub provides issue labeling mechanisms in issue management systems. However, manually labeling issue reports still requires considerable developer effort. To ease developers’ burden, researchers have proposed automatically classifying issue reports. They used deep learning techniques and pretrained models to improve the classification accuracy. However, pretrained models in the general domain such as RoBERTa have limitations in understanding the contexts of software engineering tasks. In this paper, we create a pretrained model IssueBERT with issue data to understand if a domain-specific pretrained model could improve the accuracy of classifying issue reports. We also adopt and explore several pretrained models with software engineering domains, CodeBERT, BERTOverflow, and seBERT. We conduct a comparative experiment of these pretrained models to understand their performance in classifying issue reports. Our comparison results show that IssueBERT outperforms other pretrained models. Noticeably, IssueBERT yields an average F1 score that is 1.74% higher than that of seBERT and 3.61% higher than that of RoBERTa, even though IssueBERT was pretrained with much less data than seBERT and RoBERTa. Authors

키워드

BERTBidirectional controlCodesComputer bugsData modelsdeep learning techniquesEncodingissue classificationissue reportspretrained modelsSoftware engineeringTask analysis
제목
A Comparison of Pretrained Models for Classifying Issue Reports
저자
Heo, JueunKwon, GibeomKwak, ChangwonLee, Seonah
DOI
10.1109/ACCESS.2024.3408688
발행일
2024-06
유형
Article
저널명
IEEE Access
12
페이지
1 ~ 1