The more the better or the less the better: LASSO versus random forest in forecasting seasonal precipitation for drought managementopen access
- Authors
- Lee, Taesam; Kong, Yejin; Singh, Vijay P.
- Issue Date
- Jun-2025
- Publisher
- IOP Publishing
- Keywords
- spring drought; MSLP; variable selection; ensemble; random forest; LASSO
- Citation
- Machine Learning: Science and Technology, v.6, no.2
- Indexed
- SCIE
SCOPUS
- Journal Title
- Machine Learning: Science and Technology
- Volume
- 6
- Number
- 2
- URI
- https://scholarworks.gnu.ac.kr/handle/sw.gnu/78185
- DOI
- 10.1088/2632-2153/adbe24
- ISSN
- 2632-2153
2632-2153
- Abstract
- Forecasting of drought with long-term persistence has been a difficult task, since the major driving force-precipitation-does not exhibit a long-term relation with climate variables. Although drought indices, such as standardized precipitation index (SPI), have been successfully applied for forecasting, the forecasts directly applying SPI are not always useful for water management. Meanwhile, to efficiently manage drought, global climate variables can be applied for direct precipitation forecasting. Therefore, the current study aimed at forecasting the accumulated seasonal precipitation (ASP) over South Korea from globally gridded climate variables, such as sea surface temperature (SST) and mean sea level pressure (MSLP). The major issue when building a forecast model is how to handle the extensive number of predictors derived from globally gridded climate variables, when the condition that the observed response variable (here, ASP) contains only a limited number of records exists. Overcoming this limitation, two conceptually dissonant models, Least Absolute Shrinkage and Selection Operator (LASSO) and Random forest (RF) were tested and compared. The LASSO model selects only a limited number of predictors by assigning zero values to its regression parameters, while the RF model is an ensemble of a number of simple tree models by employing all feasible variables. The globally gridded Difference of Climate Index derived by subtracting the climate variable of one location from that of the other location, SST, MSLP and their combination were applied as candidate predictors. The final predictors were determined by employing the thresholds of cross-correlation at each season. Results concluded that the LASSO model with the combined MSLPSST variables presented the best performance in all seasons. The current study can provide a critical lesson for future studies when applying globally gridded climate variables to analyze hydrological and environmental data with limited records.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - 공학계열 > 토목공학과 > Journal Articles
- 공과대학 > Department of Civil Engineering > Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.