Advancing Arabic Scientific Text Analysis: Evaluating Machine Learning Models for Named Entity Recognition

Marzouk, Nourhan; Nayel, Hamada; Elsawy, Ahmed

doi:10.21608/bjas.2024.279914.1377

	Advancing Arabic Scientific Text Analysis: Evaluating Machine Learning Models for Named Entity Recognition
Benha Journal of Applied Sciences
Article 5, Volume 9, Issue 5, May 2024, Pages 45-48 PDF (352.98 K)
Document Type: Original Research Papers
DOI: 10.21608/bjas.2024.279914.1377
Authors
Nourhan Marzouk^* ¹; Hamada Nayel¹; Ahmed Elsawy²
¹Department of Computer Science, faculty of Computers and Artificial Intelligence, Benha University, Benha, Egypt
²Computer Science Department Faculty of Computers and Artificial Intelligence Benha University Benha, Egypt
Abstract
The task of named entity recognition in Arabic text, particularly within the scientific and medical domains, presents unique challenges due to the language's rich morphology, the scarcity of resources, and dialectical diversity. This study evaluates the efficacy of Conditional Random Fields (CRF), Support Vector Machines (SVM), and Stochastic Gradient Descent (SGD) models for named entity recognition in Arabic scientific texts. These models have been implemented on a self-collected dataset consisting of Arabic abstracts of theses. The named entities identified in the dataset include proteins, DNA, RNA, cell types, and cell lines. Focusing on the scientific domain, our comparative analysis reveals significant performance differences among the models, with hybrid approaches showing promising results. SGD, SVM, and CRF achieved F1-scores of 0.96, 0.91, and 0.80, respectively. The results demonstrate the effectiveness of the proposed models. The research contributes to Arabic natural language processing by highlighting model strengths and guiding future selections and development of named entity recognition models.
Keywords
Arabic Named Entity Recognition; Entity Extraction; Arabic NLP; Machine Learning

Statistics Article View: 412 PDF Download: 358