Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Stochastic LASSO for extremely high-dimensional genomic data

Full metadata record
DC Field Value Language
dc.contributor.authorBaek, Beomsu-
dc.contributor.authorJo, Jongkwon-
dc.contributor.authorKang, Mingon-
dc.contributor.authorKim, Youngsoon-
dc.date.accessioned2026-02-23T06:00:08Z-
dc.date.available2026-02-23T06:00:08Z-
dc.date.issued2026-01-
dc.identifier.issn2045-2322-
dc.identifier.urihttps://scholarworks.gnu.ac.kr/handle/sw.gnu/82450-
dc.description.abstractAccurate identification of significant features in high-dimensional data is indispensable in high-throughput genomic analysis and association studies. Least Absolute Shrinkage and Selection Operator (LASSO) and its derivatives have been widely adapted to discover potential biomarkers as a feature selection scheme in various biological systems. Recently, bootstrap-based LASSO models, such as Random LASSO and Hi-LASSO, have been effective solutions for extremely high-dimensional but low sample size (EHDLSS) genomic data. However, the bootstrap-based LASSO models still have several drawbacks, such as multicollinearity within bootstrap samples, missing predictors in draw, and randomness in predictor sampling. To tackle the limitations, we propose a new bootstrap-based LASSO, named Stochastic LASSO, that effectively reduces multicollinearity in bootstrap samples and mitigates randomness in predictor sampling, resulting in remarkably outperforming benchmarks in feature selection and coefficient estimation. Furthermore, Stochastic LASSO provides a two-stage t-test strategy for selecting statistically significant features. The performance of Stochastic LASSO was assessed by comparing the existing benchmark models in extensive simulation experiments. In the simulation experiments, Stochastic LASSO consistently showed significant improvements in performance compared to the state-of-the-art LASSO models for feature selection, coefficient estimation, and robustness. We also applied Stochastic LASSO for the gene expression data of publicly available TCGA cancer datasets and identified statistically significant genes associated with survival month prediction. The source code is publicly available at: https://github.com/datax-lab/StochasticLASSO.-
dc.language영어-
dc.language.isoENG-
dc.publisherNature Publishing Group-
dc.titleStochastic LASSO for extremely high-dimensional genomic data-
dc.typeArticle-
dc.publisher.location영국-
dc.identifier.doi10.1038/s41598-026-35273-3-
dc.identifier.scopusid2-s2.0-105029545491-
dc.identifier.wosid001683242100008-
dc.identifier.bibliographicCitationScientific Reports, v.16, no.1-
dc.citation.titleScientific Reports-
dc.citation.volume16-
dc.citation.number1-
dc.type.docTypeArticle-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaScience & Technology - Other Topics-
dc.relation.journalWebOfScienceCategoryMultidisciplinary Sciences-
dc.subject.keywordPlusVARIABLE SELECTION-
dc.subject.keywordPlusMESSENGER-RNAS-
dc.subject.keywordPlusEXPRESSION-
dc.subject.keywordPlusUROCORTIN-
dc.subject.keywordPlusGENES-
dc.subject.keywordAuthorStochastic LASSO-
dc.subject.keywordAuthorLASSO-
dc.subject.keywordAuthorHigh-dimensional data-
dc.subject.keywordAuthorVariable selection-
Files in This Item
There are no files associated with this item.
Appears in
Collections
자연과학대학 > Dept. of Information and Statistics > Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Young Soon photo

Kim, Young Soon
자연과학대학 (정보통계학과)
Read more

Altmetrics

Total Views & Downloads

BROWSE