Abstract: The dawn of sophisticated voice synthesis methods gave rise to audio deepfakes, jeopardizing security and trust in digital communications. In this work, three complementary measures are employed for deepfake detection. First, Mel-Frequency Cepstral Coefficients (MFCCs) are extracted and subjected to Support Vector Machine (SVM) classification. Second, Constant-Q Cepstral Coefficients (CQCCs) are analysed by Gaussian Mixture Models (GMMs), operating with log-likelihood ratio decisions—a long-standing baseline in spoofing countermeasures. Third, Mel-spectrogram representations stand as inputs to a Convolutional Neural Network (CNN) for end-to-end learning. The experiments on the In-the-Wild dataset indicate that CNN enjoys state-of-the-art accuracy, while SVM, on the other hand, is able to provide computational efficiency, and CQCC-GMM holds onto a bit more robustness inherited from traditional anti-spoofing. Therefore, these results present the trade-off between the classical machine learning and the deep learning paradigms, provide insightful considerations when developing reliable deepfake detection methods in real-world conditions.
Keywords: Audio Deepfake Detection, Voice Spoofing, Support Vector Machine (SVM), Gaussian Mixture Model (GMM), Convolutional Neural Network (CNN), MFCC, CQCC, Spectrograms
Downloads:
|
DOI:
10.17148/IJIREEICE.2025.131103
[1] Abhiram Sri Paravastu, B. Harshat, H. Keerthi Lakshmi, Abhishek Harish, Vivek S Nair, Neelam Sanjeev Kumar, "Audio Deepfake Detection Using MFCC-SVM, CQCC-GMM, and Spectrogram-CNN: A Comparative Study," International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering (IJIREEICE), DOI 10.17148/IJIREEICE.2025.131103