Multi-modal recommender system using text-to-image generative models and adaptive learning
- Authors
- Kim, Seongmin; Moon, Seona; Lim, Yeongseo; Choi, Sang-Min; Ko, Sang-Ki
- Issue Date
- Jan-2026
- Publisher
- Elsevier
- Keywords
- Recommender systems; Multi-modal data; Adaptive learning; Generative artificial intelligence
- Citation
- Expert Systems with Applications, v.296
- Indexed
- SCOPUS
- Journal Title
- Expert Systems with Applications
- Volume
- 296
- URI
- https://scholarworks.gnu.ac.kr/handle/sw.gnu/79988
- DOI
- 10.1016/j.eswa.2025.129086
- ISSN
- 0957-4174
1873-6793
- Abstract
- Recently, various successful approaches have been developed to enhance the performance of recommender systems by incorporating multi-modal data, such as item images and textual descriptions. However, adopting these algorithms in real-world scenarios is challenging, as images or textual descriptions are often unavailable. Moreover, in some cases, the provided images or descriptions may not accurately represent the item. We refer to such situations as missing data. In the fashion domain, visual information is crucial, as people are unlikely to buy clothing without seeing its design and appearance. Thus, we propose employing a text-to-image Generative Adversarial Network (GAN) to generate missing visual data from available textual descriptions, enabling a multi-modal recommender system that leverages both visual and textual information. We also introduce an adaptive feature importance learning mechanism to dynamically determine the weight of each multi-modal feature when calculating the preference score. We demonstrate the effectiveness of the proposed algorithm through extensive experiments on the publicly available Amazon review dataset.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - ETC > Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.