Academic Research Library

Find some of the best Journals and Proceedings.

Deepfake Audio Detection Using CNN-Transformer Hybrid Model with Data Augmentation

Author : Shraddha Zoman

Abstract :The emergence of deepfake audio generated through advanced machine learning models such as GANs and speech synthesis networks presents serious threats to digital security and trust. In this paper, we propose a CNN-Transformer hybrid architecture for detecting deepfake audio signals. The CNN extracts local spectral features while the Transformer captures long-range temporal dependencies across audio sequences. Evaluated on the ASVspoof 2019 dataset, the model achieved a classification accuracy of 91.47%, outperforming conventional models including LSTM (90.00%), CNN-LSTM (91.39%), and TCN (86.96%). A detailed classification report and confusion matrix further demonstrate the robustness of the proposed approach. The approach builds upon trends observed in prior works using spectral learning, adversarial learning, and hybrid audio forensics architectures

Keywords :CNN-Transformer Hybrid, Data Augmentation, Deepfake Audio Detection, Spectrogram Analysis.

Conference Name :International Conference on Engineering & Technology (ICET-25)

Conference Place Nagpur, India

Conference Date 13th Jul 2025

Preview