Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

범주형 변수의 연속형 환산 방법론 비교: 국민건강보험공단 표본 코호트의 흡연 문진의 시계열 단절 문제 해결

Full metadata record
DC Field Value Language
dc.contributor.author하강희-
dc.contributor.author양혜지-
dc.contributor.author박민주-
dc.contributor.author이시경-
dc.contributor.author김수환-
dc.date.accessioned2025-12-24T01:30:17Z-
dc.date.available2025-12-24T01:30:17Z-
dc.date.issued2025-11-
dc.identifier.issn2465-8014-
dc.identifier.issn2465-8022-
dc.identifier.urihttps://scholarworks.gnu.ac.kr/handle/sw.gnu/81463-
dc.description.abstractObjectives: We developed and evaluated statistical and machine-learning approaches to convert categorical smoking variables into continuous values, addressing temporal discontinuity caused by questionnaire format changes in the National Health Insurance Service–National Sample Cohort (NHIS- NSC). Using repeated measurements from the same individuals, we compared strategies for transforming objective multiple-choice responses into sub- jective numerical values. Methods: We analyzed 44,755 smokers who completed health examinations during 2007-2010 and answered both objective (2007-2008) and subjective (2009-2010) smoking questionnaires. After temporally correcting smoking-duration variables, we compared simple substitu- tion rules (median, mean, mode, midpoint), regression models, and machine-learning algorithms (random forest, gradient boosting, XGBoost, K-Nearest neighbors, support vector regression). Performance was assessed using mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), and the coefficient of determination (R²). Results: For smoking duration, random forest performed best (MSE=35.18, R²=0.70), followed by XGBoost (MSE=35.30, R²=0.70) and gradient boosting (MSE=36.63, R²=0.68). For daily cigarette consumption, random forest (MSE=32.02, R²=0.38) and XGBoost (MSE=32.07, R²=0.38) outperformed alternatives. Machine-learning models consistently exceeded simple substitution methods; notably, the midpoint approach performed poorly for daily consumption with negative explained variance (R²=-0.10). Predicted values generally respected category boundaries, with minor discrepancies in extreme categories. Conclusions: Machine-learning approaches—particularly random forest and XGBoost— substantially outperformed traditional statistical conversions when mapping categorical smoking variables to continuous values. The proposed frame- work preserves temporal continuity in longitudinal health surveys affected by questionnaire changes and is portable to other public-health databases undergoing similar methodological transitions.-
dc.format.extent12-
dc.language한국어-
dc.language.isoKOR-
dc.publisher한국보건정보통계학회-
dc.title범주형 변수의 연속형 환산 방법론 비교: 국민건강보험공단 표본 코호트의 흡연 문진의 시계열 단절 문제 해결-
dc.title.alternativeComparison of Methods for Converting Categorical Variables to Continuous Measures: Resolving Temporal Discontinuity in the Smoking Questionnaires in NHIS-National Sample Cohort-
dc.typeArticle-
dc.publisher.location대한민국-
dc.identifier.doi10.21032/jhis.2025.50.4.441-
dc.identifier.bibliographicCitation보건정보통계학회지, v.50, no.4, pp 441 - 452-
dc.citation.title보건정보통계학회지-
dc.citation.volume50-
dc.citation.number4-
dc.citation.startPage441-
dc.citation.endPage452-
dc.type.docTypeY-
dc.identifier.kciidART003271359-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClasskci-
dc.subject.keywordAuthorSmoking questionnaire-
dc.subject.keywordAuthorDiscontinuation-
dc.subject.keywordAuthorMachine learning-
dc.subject.keywordAuthor.-
Files in This Item
There are no files associated with this item.
Appears in
Collections
자연과학대학 > Dept. of Information and Statistics > Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE