A Multimodal Voice Phishing Detection System Integrating Text and Audio Analysis

Kim, Jiwon; Gu, Seuli; Kim, Youngbeom; Lee, Sukwon; Kang, Changgu

doi:10.3390/app152011170

상세 보기

A Multimodal Voice Phishing Detection System Integrating Text and Audio Analysis

Kim, Jiwon;
Gu, Seuli;
Kim, Youngbeom;
Lee, Sukwon;
Kang, Changgu

Citations

WEB OF SCIENCE

0

Citations

SCOPUS

0

초록

Voice phishing has emerged as a critical security threat, exploiting both linguistic manipulation and advances in synthetic speech technologies. Traditional keyword-based approaches often fail to capture contextual patterns or detect forged audio, limiting their effectiveness in real-world scenarios. To address this gap, we propose a multimodal voice phishing detection system that integrates text and audio analysis. The text module employs a KoBERT-based transformer classifier with self-attention interpretation, while the audio module leverages MFCC features and a CNN-BiLSTM classifier to identify synthetic speech. A fusion mechanism combines the outputs of both modalities, with experiments conducted on real-world call transcripts, phishing datasets, and synthetic voice corpora. The results demonstrate that the proposed system consistently achieves high values regarding the accuracy, precision, recall, and F1-score on validation data while maintaining robust performance in noisy and diverse real-call scenarios. Furthermore, attention-based interpretability enhances trustworthiness by revealing cross-token and discourse-level interaction patterns specific to phishing contexts. These findings highlight the potential of the proposed system as a reliable, explainable, and deployable solution for preventing the financial and social damage caused by voice phishing. Unlike prior studies limited to single-modality or shallow fusion, our work presents a fully integrated text-audio detection pipeline optimized for Korean real-world datasets and robust to noisy, multi-speaker conditions.

키워드

voice phishing detection; multimodal learning; audio forgery analysis; transformer-based text classification

제목: A Multimodal Voice Phishing Detection System Integrating Text and Audio Analysis

저자: Kim, Jiwon; Gu, Seuli; Kim, Youngbeom; Lee, Sukwon; Kang, Changgu

DOI: 10.3390/app152011170

발행일: 2025-10

유형: Article

저널명: Applied Sciences-basel

권: 15

호: 20