A MULTI-FEATURE ACCURATE DETECTION (MFAD) APPROACH FOR LARGE LANGUAGE MODEL-GENERATEDTEXT

sayed, doaa ahmed; ismail, sally saad; Aref, Mostafa

doi:10.21608/ijicis.2025.410515.1416

	A MULTI-FEATURE ACCURATE DETECTION (MFAD) APPROACH FOR LARGE LANGUAGE MODEL-GENERATEDTEXT
International Journal of Intelligent Computing and Information Sciences
Volume 25, Issue 3, September 2025, Pages 107-122 PDF (1.22 M)
Document Type: Original Article
DOI: 10.21608/ijicis.2025.410515.1416
Authors
doaa ahmed sayed^* ¹; sally saad ismail²; Mostafa Aref³
¹computer science , faculty of computer and information,Ain shimas
²Computer Science, Faculty of Computer and Information science, Ainshams University, Cairo, Egypt
³Department Computer Science, Faculty of Computer and Information Sciences,Ain Shams University, Cairo, Egypt.
Abstract
Advanced Large Language Models (LLMs) generate highly complex text that closely resembles human writing. However, their rapid development raises significant concerns, such as misinformation and academic cheating. As the responsible use of LLMs becomes increasingly critical, the ability to detect LLM-generated content has emerged as a critical challenge. Existing detection methods often rely on single-feature analysis, traditional feature extraction techniques, and conventional classification models. Many also require full access to the underlying models and are sensitive to variations in text length, limiting their overall effectiveness. This paper proposes a novel Multi-Feature Accurate Detection (MFAD) approach for identifying LLM-generated text by integrating syntactic and statistical attributes with high-level semantic representations. A case study using the Human ChatGPT Comparison Corpus (HC3) is conducted to evaluate the proposed architecture. MFAD comprises six phases: text preprocessing, syntactic and statistical feature extraction, text representation, semantic feature extraction, feature concatenation, and text classification. Results show that MFAD effectively distinguishes between human-written and LLM-generated text, achieving a peak confidence score of 98%, highlighting its reliability and strong performance.
Keywords
Large language models (LLMs); Machine-generated text; AI-generated text; Feature-based detection

Statistics Article View: 143 PDF Download: 76