Deep Generative Replay with Denoising Diffusion Probabilistic Models for Continual Learning in Audio Classificationopen access
- Authors
- Lee, Hyeon-Ju; Buu, Seok-Jun
- Issue Date
- Sep-2024
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
- Keywords
- Audio classification; Continual learning; Denoising diffusion probabilistic model; Generative replay; Triplet network
- Citation
- IEEE Access, v.12, pp 134714 - 134727
- Pages
- 14
- Indexed
- SCIE
SCOPUS
- Journal Title
- IEEE Access
- Volume
- 12
- Start Page
- 134714
- End Page
- 134727
- URI
- https://scholarworks.gnu.ac.kr/handle/sw.gnu/74333
- DOI
- 10.1109/ACCESS.2024.3459954
- ISSN
- 2169-3536
2169-3536
- Abstract
- Accurate classification of audio data is essential in various fields such as speech recognition, safety management, healthcare, security, and surveillance. However, existing deep learning classifiers typically require extensive pre-collected data and struggle to adapt to the emergence of new audio classes over time. To address these challenges, this paper proposes a continual learning method utilizing Diffusion-driven Generative Replay (DDGR). The proposed DDGR method continuously updates the model at each training stage with high-quality generated data from Denoising Diffusion Probabilistic Models (DDPM), preserving existing knowledge. Furthermore, by embedding disentangled representations through a triplet network, the model can effectively recognize new classes as they emerge. This approach overcomes the problem of catastrophic forgetting and effectively resolves the issue of data scalability in a continual learning setup. The proposed method achieved the highest AIA values of 95.45% and 72.99% on the Audio MNIST and ESC-50 datasets, respectively, compared to existing continual learning methods. Additionally, for Audio MNIST, it showed IM-0.01, FWT 0.27, FM 0.06, and BWT-0.06, indicating that it best preserves prior knowledge while learning new data most effectively. For ESC-50, it demonstrated IM of-0.12, FWT of 0.09, FM of 0.17, and BWT of-0.17. These results validate the efficacy of the DDGR method in maintaining prior knowledge while integrating new information and highlight the complementary role of the triplet network in enhancing feature representation. © 2013 IEEE.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - ETC > Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.