상세 보기
- Kim, Jiwon;
- Gu, Seuli;
- Kim, Youngbeom;
- Lee, Sukwon;
- Kang, Changgu
WEB OF SCIENCE
0SCOPUS
0초록
Voice phishing has emerged as a critical security threat, exploiting both linguistic manipulation and advances in synthetic speech technologies. Traditional keyword-based approaches often fail to capture contextual patterns or detect forged audio, limiting their effectiveness in real-world scenarios. To address this gap, we propose a multimodal voice phishing detection system that integrates text and audio analysis. The text module employs a KoBERT-based transformer classifier with self-attention interpretation, while the audio module leverages MFCC features and a CNN-BiLSTM classifier to identify synthetic speech. A fusion mechanism combines the outputs of both modalities, with experiments conducted on real-world call transcripts, phishing datasets, and synthetic voice corpora. The results demonstrate that the proposed system consistently achieves high values regarding the accuracy, precision, recall, and F1-score on validation data while maintaining robust performance in noisy and diverse real-call scenarios. Furthermore, attention-based interpretability enhances trustworthiness by revealing cross-token and discourse-level interaction patterns specific to phishing contexts. These findings highlight the potential of the proposed system as a reliable, explainable, and deployable solution for preventing the financial and social damage caused by voice phishing. Unlike prior studies limited to single-modality or shallow fusion, our work presents a fully integrated text-audio detection pipeline optimized for Korean real-world datasets and robust to noisy, multi-speaker conditions.
키워드
- 제목
- A Multimodal Voice Phishing Detection System Integrating Text and Audio Analysis
- 저자
- Kim, Jiwon; Gu, Seuli; Kim, Youngbeom; Lee, Sukwon; Kang, Changgu
- 발행일
- 2025-10
- 유형
- Article
- 저널명
- Applied Sciences-basel
- 권
- 15
- 호
- 20