범주형 변수의 연속형 환산 방법론 비교: 국민건강보험공단 표본 코호트의 흡연 문진의 시계열 단절 문제 해결

하강희; 양혜지; 박민주; 이시경; 김수환

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

범주형 변수의 연속형 환산 방법론 비교: 국민건강보험공단 표본 코호트의 흡연 문진의 시계열 단절 문제 해결

Full metadata record

DC Field	Value	Language
dc.contributor.author	하강희	-
dc.contributor.author	양혜지	-
dc.contributor.author	박민주	-
dc.contributor.author	이시경	-
dc.contributor.author	김수환	-
dc.date.accessioned	2025-12-24T01:30:17Z	-
dc.date.available	2025-12-24T01:30:17Z	-
dc.date.issued	2025-11	-
dc.identifier.issn	2465-8014	-
dc.identifier.issn	2465-8022	-
dc.identifier.uri	https://scholarworks.gnu.ac.kr/handle/sw.gnu/81463	-
dc.description.abstract	Objectives: We developed and evaluated statistical and machine-learning approaches to convert categorical smoking variables into continuous values, addressing temporal discontinuity caused by questionnaire format changes in the National Health Insurance Service–National Sample Cohort (NHIS- NSC). Using repeated measurements from the same individuals, we compared strategies for transforming objective multiple-choice responses into sub- jective numerical values. Methods: We analyzed 44,755 smokers who completed health examinations during 2007-2010 and answered both objective (2007-2008) and subjective (2009-2010) smoking questionnaires. After temporally correcting smoking-duration variables, we compared simple substitu- tion rules (median, mean, mode, midpoint), regression models, and machine-learning algorithms (random forest, gradient boosting, XGBoost, K-Nearest neighbors, support vector regression). Performance was assessed using mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), and the coefficient of determination (R²). Results: For smoking duration, random forest performed best (MSE=35.18, R²=0.70), followed by XGBoost (MSE=35.30, R²=0.70) and gradient boosting (MSE=36.63, R²=0.68). For daily cigarette consumption, random forest (MSE=32.02, R²=0.38) and XGBoost (MSE=32.07, R²=0.38) outperformed alternatives. Machine-learning models consistently exceeded simple substitution methods; notably, the midpoint approach performed poorly for daily consumption with negative explained variance (R²=-0.10). Predicted values generally respected category boundaries, with minor discrepancies in extreme categories. Conclusions: Machine-learning approaches—particularly random forest and XGBoost— substantially outperformed traditional statistical conversions when mapping categorical smoking variables to continuous values. The proposed frame- work preserves temporal continuity in longitudinal health surveys affected by questionnaire changes and is portable to other public-health databases undergoing similar methodological transitions.	-
dc.format.extent	12	-
dc.language	한국어	-
dc.language.iso	KOR	-
dc.publisher	한국보건정보통계학회	-
dc.title	범주형 변수의 연속형 환산 방법론 비교: 국민건강보험공단 표본 코호트의 흡연 문진의 시계열 단절 문제 해결	-
dc.title.alternative	Comparison of Methods for Converting Categorical Variables to Continuous Measures: Resolving Temporal Discontinuity in the Smoking Questionnaires in NHIS-National Sample Cohort	-
dc.type	Article	-
dc.publisher.location	대한민국	-
dc.identifier.doi	10.21032/jhis.2025.50.4.441	-
dc.identifier.bibliographicCitation	보건정보통계학회지, v.50, no.4, pp 441 - 452	-
dc.citation.title	보건정보통계학회지	-
dc.citation.volume	50	-
dc.citation.number	4	-
dc.citation.startPage	441	-
dc.citation.endPage	452	-
dc.type.docType	Y	-
dc.identifier.kciid	ART003271359	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	kci	-
dc.subject.keywordAuthor	Smoking questionnaire	-
dc.subject.keywordAuthor	Discontinuation	-
dc.subject.keywordAuthor	Machine learning	-
dc.subject.keywordAuthor	.	-

Files in This Item: There are no files associated with this item.

Appears in Collections: 자연과학대학 > Dept. of Information and Statistics > Journal Articles

Show simple item record

qrcode

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

Gyeongsang National University Central Library, 501, Jinju-daero, Jinju-si, Gyeongsangnam-do, 52828, Republic of Korea+82-55-772-0534

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE