Abstract: The exponential growth of social media and online discussion forums increases the difficulty of maintaining healthy digital spaces with growing volumes of toxic comments. This study designs an efficient machine learning model that can classify toxic content by employing techniques in Natural Language Processing and ensemble learning. The approach mixes the models Logistic Regression, Random Forest, and XGBoost into a framework based on Voting Ensemble, boosting predictive accuracy. By using TF-IDF for feature extraction, along with a soft voting mechanism, the proposed ensemble outperforms the stand-alone classifiers in both ROC-AUC and precision. The system proposed here will provide a robust, efficient, and scalable way to identify and manage toxicity online.

Keywords: Toxic comments, Natural Language Processing, TF-IDF, Ensemble Learning, Voting Classifier, XGBoost, Logistic Regression, Random Forest.


Downloads: PDF | DOI: 10.17148/IJIREEICE.2025.131124

Cite This:

[1] AATHITYA.A, KRISHITH TP, SARVESH S, KEVIN BENJAMIN SAMUEL, PRANAV M, VIGNESH D, Dr. M. ULAGAMMAI, "Toxic Comment Classification Using Ensemble Machine Learning Techniques," International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering (IJIREEICE), DOI 10.17148/IJIREEICE.2025.131124

Open chat