Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

A Multimodal Voice Phishing Detection System Integrating Text and Audio Analysisopen access

Authors
Kim, JiwonGu, SeuliKim, YoungbeomLee, SukwonKang, Changgu
Issue Date
Oct-2025
Publisher
MDPI
Keywords
voice phishing detection; multimodal learning; audio forgery analysis; transformer-based text classification
Citation
Applied Sciences-basel, v.15, no.20
Indexed
SCIE
SCOPUS
Journal Title
Applied Sciences-basel
Volume
15
Number
20
URI
https://scholarworks.gnu.ac.kr/handle/sw.gnu/80791
DOI
10.3390/app152011170
ISSN
2076-3417
2076-3417
Abstract
Voice phishing has emerged as a critical security threat, exploiting both linguistic manipulation and advances in synthetic speech technologies. Traditional keyword-based approaches often fail to capture contextual patterns or detect forged audio, limiting their effectiveness in real-world scenarios. To address this gap, we propose a multimodal voice phishing detection system that integrates text and audio analysis. The text module employs a KoBERT-based transformer classifier with self-attention interpretation, while the audio module leverages MFCC features and a CNN-BiLSTM classifier to identify synthetic speech. A fusion mechanism combines the outputs of both modalities, with experiments conducted on real-world call transcripts, phishing datasets, and synthetic voice corpora. The results demonstrate that the proposed system consistently achieves high values regarding the accuracy, precision, recall, and F1-score on validation data while maintaining robust performance in noisy and diverse real-call scenarios. Furthermore, attention-based interpretability enhances trustworthiness by revealing cross-token and discourse-level interaction patterns specific to phishing contexts. These findings highlight the potential of the proposed system as a reliable, explainable, and deployable solution for preventing the financial and social damage caused by voice phishing. Unlike prior studies limited to single-modality or shallow fusion, our work presents a fully integrated text-audio detection pipeline optimized for Korean real-world datasets and robust to noisy, multi-speaker conditions.
Files in This Item
There are no files associated with this item.
Appears in
Collections
ETC > Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kang, Chang Gu photo

Kang, Chang Gu
IT공과대학 (컴퓨터공학부)
Read more

Altmetrics

Total Views & Downloads

BROWSE