Cited 0 time in
범주형 변수의 연속형 환산 방법론 비교: 국민건강보험공단 표본 코호트의 흡연 문진의 시계열 단절 문제 해결
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | 하강희 | - |
| dc.contributor.author | 양혜지 | - |
| dc.contributor.author | 박민주 | - |
| dc.contributor.author | 이시경 | - |
| dc.contributor.author | 김수환 | - |
| dc.date.accessioned | 2025-12-24T01:30:17Z | - |
| dc.date.available | 2025-12-24T01:30:17Z | - |
| dc.date.issued | 2025-11 | - |
| dc.identifier.issn | 2465-8014 | - |
| dc.identifier.issn | 2465-8022 | - |
| dc.identifier.uri | https://scholarworks.gnu.ac.kr/handle/sw.gnu/81463 | - |
| dc.description.abstract | Objectives: We developed and evaluated statistical and machine-learning approaches to convert categorical smoking variables into continuous values, addressing temporal discontinuity caused by questionnaire format changes in the National Health Insurance Service–National Sample Cohort (NHIS- NSC). Using repeated measurements from the same individuals, we compared strategies for transforming objective multiple-choice responses into sub- jective numerical values. Methods: We analyzed 44,755 smokers who completed health examinations during 2007-2010 and answered both objective (2007-2008) and subjective (2009-2010) smoking questionnaires. After temporally correcting smoking-duration variables, we compared simple substitu- tion rules (median, mean, mode, midpoint), regression models, and machine-learning algorithms (random forest, gradient boosting, XGBoost, K-Nearest neighbors, support vector regression). Performance was assessed using mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), and the coefficient of determination (R²). Results: For smoking duration, random forest performed best (MSE=35.18, R²=0.70), followed by XGBoost (MSE=35.30, R²=0.70) and gradient boosting (MSE=36.63, R²=0.68). For daily cigarette consumption, random forest (MSE=32.02, R²=0.38) and XGBoost (MSE=32.07, R²=0.38) outperformed alternatives. Machine-learning models consistently exceeded simple substitution methods; notably, the midpoint approach performed poorly for daily consumption with negative explained variance (R²=-0.10). Predicted values generally respected category boundaries, with minor discrepancies in extreme categories. Conclusions: Machine-learning approaches—particularly random forest and XGBoost— substantially outperformed traditional statistical conversions when mapping categorical smoking variables to continuous values. The proposed frame- work preserves temporal continuity in longitudinal health surveys affected by questionnaire changes and is portable to other public-health databases undergoing similar methodological transitions. | - |
| dc.format.extent | 12 | - |
| dc.language | 한국어 | - |
| dc.language.iso | KOR | - |
| dc.publisher | 한국보건정보통계학회 | - |
| dc.title | 범주형 변수의 연속형 환산 방법론 비교: 국민건강보험공단 표본 코호트의 흡연 문진의 시계열 단절 문제 해결 | - |
| dc.title.alternative | Comparison of Methods for Converting Categorical Variables to Continuous Measures: Resolving Temporal Discontinuity in the Smoking Questionnaires in NHIS-National Sample Cohort | - |
| dc.type | Article | - |
| dc.publisher.location | 대한민국 | - |
| dc.identifier.doi | 10.21032/jhis.2025.50.4.441 | - |
| dc.identifier.bibliographicCitation | 보건정보통계학회지, v.50, no.4, pp 441 - 452 | - |
| dc.citation.title | 보건정보통계학회지 | - |
| dc.citation.volume | 50 | - |
| dc.citation.number | 4 | - |
| dc.citation.startPage | 441 | - |
| dc.citation.endPage | 452 | - |
| dc.type.docType | Y | - |
| dc.identifier.kciid | ART003271359 | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | kci | - |
| dc.subject.keywordAuthor | Smoking questionnaire | - |
| dc.subject.keywordAuthor | Discontinuation | - |
| dc.subject.keywordAuthor | Machine learning | - |
| dc.subject.keywordAuthor | . | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
Gyeongsang National University Central Library, 501, Jinju-daero, Jinju-si, Gyeongsangnam-do, 52828, Republic of Korea+82-55-772-0532
COPYRIGHT 2022 GYEONGSANG NATIONAL UNIVERSITY LIBRARY. ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
