Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

범주형 변수의 연속형 환산 방법론 비교: 국민건강보험공단 표본 코호트의 흡연 문진의 시계열 단절 문제 해결

Full metadata record
DC Field Value Language
dc.contributor.author하강희-
dc.contributor.author양혜지-
dc.contributor.author박민주-
dc.contributor.author이시경-
dc.contributor.author김수환-
dc.date.accessioned2025-12-24T01:30:17Z-
dc.date.available2025-12-24T01:30:17Z-
dc.date.issued2025-11-
dc.identifier.issn2465-8014-
dc.identifier.issn2465-8022-
dc.identifier.urihttps://scholarworks.gnu.ac.kr/handle/sw.gnu/81463-
dc.description.abstractObjectives: We developed and evaluated statistical and machine-learning approaches to convert categorical smoking variables into continuous values, addressing temporal discontinuity caused by questionnaire format changes in the National Health Insurance Service–National Sample Cohort (NHIS- NSC). Using repeated measurements from the same individuals, we compared strategies for transforming objective multiple-choice responses into sub- jective numerical values. Methods: We analyzed 44,755 smokers who completed health examinations during 2007-2010 and answered both objective (2007-2008) and subjective (2009-2010) smoking questionnaires. After temporally correcting smoking-duration variables, we compared simple substitu- tion rules (median, mean, mode, midpoint), regression models, and machine-learning algorithms (random forest, gradient boosting, XGBoost, K-Nearest neighbors, support vector regression). Performance was assessed using mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), and the coefficient of determination (R²). Results: For smoking duration, random forest performed best (MSE=35.18, R²=0.70), followed by XGBoost (MSE=35.30, R²=0.70) and gradient boosting (MSE=36.63, R²=0.68). For daily cigarette consumption, random forest (MSE=32.02, R²=0.38) and XGBoost (MSE=32.07, R²=0.38) outperformed alternatives. Machine-learning models consistently exceeded simple substitution methods; notably, the midpoint approach performed poorly for daily consumption with negative explained variance (R²=-0.10). Predicted values generally respected category boundaries, with minor discrepancies in extreme categories. Conclusions: Machine-learning approaches—particularly random forest and XGBoost— substantially outperformed traditional statistical conversions when mapping categorical smoking variables to continuous values. The proposed frame- work preserves temporal continuity in longitudinal health surveys affected by questionnaire changes and is portable to other public-health databases undergoing similar methodological transitions.-
dc.format.extent12-
dc.language한국어-
dc.language.isoKOR-
dc.publisher한국보건정보통계학회-
dc.title범주형 변수의 연속형 환산 방법론 비교: 국민건강보험공단 표본 코호트의 흡연 문진의 시계열 단절 문제 해결-
dc.title.alternativeComparison of Methods for Converting Categorical Variables to Continuous Measures: Resolving Temporal Discontinuity in the Smoking Questionnaires in NHIS-National Sample Cohort-
dc.typeArticle-
dc.publisher.location대한민국-
dc.identifier.doi10.21032/jhis.2025.50.4.441-
dc.identifier.bibliographicCitation보건정보통계학회지, v.50, no.4, pp 441 - 452-
dc.citation.title보건정보통계학회지-
dc.citation.volume50-
dc.citation.number4-
dc.citation.startPage441-
dc.citation.endPage452-
dc.type.docTypeY-
dc.identifier.kciidART003271359-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClasskci-
dc.subject.keywordAuthorSmoking questionnaire-
dc.subject.keywordAuthorDiscontinuation-
dc.subject.keywordAuthorMachine learning-
dc.subject.keywordAuthor.-
Files in This Item
There are no files associated with this item.
Appears in
Collections
자연과학대학 > Dept. of Information and Statistics > Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Su Hwan photo

Kim, Su Hwan
자연과학대학 (정보통계학과)
Read more

Altmetrics

Total Views & Downloads

BROWSE