Abstract: Robust tools for plagiarism checking are essential for maintaining integrity in both academic and professional environments. Existing detection strategies, which are typically built on lexical comparison, struggle to correctly flag sophisticated, machine-aided rephrasing. This challenge requires a necessary pivot toward adaptable Machine Learning (ML) platforms capable of comprehending the underlying meaning of text. This research introduces a highly efficient, two-phase ML framework specifically engineered to accurately identify text that has been heavily paraphrased. The initial phase of this architecture employs a SentenceTransformer model (all-MiniLM-L6-v2) to generate dense vector embeddings for documents under suspicion and for the reference library. These embeddings are stored and searched using FAISS (Facebook AI Similarity Search), enabling fast, large-scale retrieval of potential source candidates. The second phase uses a Longformer-based sequence classifier to perform an in-depth, pairwise contextual analysis between the flagged text and the retrieved candidates before delivering a final verdict. This classifier model was chosen because it effectively bypasses the sequence-length constraints of previous transformer models, enabling analysis of long-form content. The final system, named "CopyShield," is deployed with an accessible user interface using the Gradio framework. Validation using the challenging jpwahle/machine-paraphrase-dataset demonstrated a strong F1-score in the 0.89–0.92 range, confirming its ability to counter contemporary obfuscation methods.

Keywords: NLP, ML, Semantic Analysis, Transformer Models, Deep Learning, Longformer Architecture, Plagiarism Checkers, Gradio, FAISS.


Downloads: PDF | DOI: 10.17148/IJIREEICE.2025.131038

Cite This:

[1] Rakshitha S N, Shreya Sathapathi, Yashvitha J, Dr. Golda Dilip, "Machine Learning-Based Plagiarism Detector System," International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering (IJIREEICE), DOI 10.17148/IJIREEICE.2025.131038

Open chat