Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Stochastic LASSO for extremely high-dimensional genomic data

Authors
Baek, BeomsuJo, JongkwonKang, MingonKim, Youngsoon
Issue Date
Jan-2026
Publisher
Nature Publishing Group
Keywords
Stochastic LASSO; LASSO; High-dimensional data; Variable selection
Citation
Scientific Reports, v.16, no.1
Indexed
SCIE
SCOPUS
Journal Title
Scientific Reports
Volume
16
Number
1
URI
https://scholarworks.gnu.ac.kr/handle/sw.gnu/82450
DOI
10.1038/s41598-026-35273-3
ISSN
2045-2322
Abstract
Accurate identification of significant features in high-dimensional data is indispensable in high-throughput genomic analysis and association studies. Least Absolute Shrinkage and Selection Operator (LASSO) and its derivatives have been widely adapted to discover potential biomarkers as a feature selection scheme in various biological systems. Recently, bootstrap-based LASSO models, such as Random LASSO and Hi-LASSO, have been effective solutions for extremely high-dimensional but low sample size (EHDLSS) genomic data. However, the bootstrap-based LASSO models still have several drawbacks, such as multicollinearity within bootstrap samples, missing predictors in draw, and randomness in predictor sampling. To tackle the limitations, we propose a new bootstrap-based LASSO, named Stochastic LASSO, that effectively reduces multicollinearity in bootstrap samples and mitigates randomness in predictor sampling, resulting in remarkably outperforming benchmarks in feature selection and coefficient estimation. Furthermore, Stochastic LASSO provides a two-stage t-test strategy for selecting statistically significant features. The performance of Stochastic LASSO was assessed by comparing the existing benchmark models in extensive simulation experiments. In the simulation experiments, Stochastic LASSO consistently showed significant improvements in performance compared to the state-of-the-art LASSO models for feature selection, coefficient estimation, and robustness. We also applied Stochastic LASSO for the gene expression data of publicly available TCGA cancer datasets and identified statistically significant genes associated with survival month prediction. The source code is publicly available at: https://github.com/datax-lab/StochasticLASSO.
Files in This Item
There are no files associated with this item.
Appears in
Collections
자연과학대학 > Dept. of Information and Statistics > Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Young Soon photo

Kim, Young Soon
자연과학대학 (정보통계학과)
Read more

Altmetrics

Total Views & Downloads

BROWSE