Abstract: The viral spread of digital misinformation, often called an "infodemic," has moved beyond a technical nuisance to become a genuine threat to public health and social stability. While modern research favors Deep Learning models such as BERT, these architectures often demand hardware resources that are not practical for real-time, decentralized deployment. This study shifts the focus to computational efficiency by evaluating four traditional machine learning classifiers—Logistic Regression, Support Vector Machine (SVM), Multinomial Naïve Bayes, and the Passive-Aggressive Classifier (PAC)—on a corpus of 39,103 news articles. By combining a Regex-based preprocessing pipeline with optimized TF-IDF vectorization, the proposed framework achieved a peak in-domain accuracy of 0.995 using PAC. However, cross-dataset validation on the LIAR benchmark dataset revealed a performance decline to 0.474, primarily due to contextual sparsity in short-form political statements. These findings suggest that while traditional models are highly effective for long-form news classification, they require semantic enhancement to handle sparse social media content. Overall, this work supports a sustainable "Green AI" perspective that emphasizes computational efficiency while acknowledging cross-domain limitations.
Keywords—Fake News Detection; Traditional Machine Learning; Passive-Aggressive Classifier; TF- IDF; Cross-Dataset Validation; Domain Shift; Text Classification


Downloads: PDF | DOI: 10.17148/IJIREEICE.2026.14429

Cite This:

[1] Pranjal L. Bhamre, Isha Y. Jain, Sunita N. Deore, "Comparative Analysis and Cross-Dataset Validation of Traditional Machine Learning Models for Fake News Detection," International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering (IJIREEICE), DOI 10.17148/IJIREEICE.2026.14429

Open chat