Advancing Arabic Scientific Text Analysis: Evaluating Machine Learning Models for Named Entity Recognition | ||||
Benha Journal of Applied Sciences | ||||
Article 5, Volume 9, Issue 5, May 2024, Page 45-48 PDF (352.98 K) | ||||
Document Type: Original Research Papers | ||||
DOI: 10.21608/bjas.2024.279914.1377 | ||||
![]() | ||||
Authors | ||||
Nourhan Marzouk ![]() ![]() ![]() | ||||
1Department of Computer Science, faculty of Computers and Artificial Intelligence, Benha University, Benha, Egypt | ||||
2Computer Science Department Faculty of Computers and Artificial Intelligence Benha University Benha, Egypt | ||||
Abstract | ||||
The task of named entity recognition in Arabic text, particularly within the scientific and medical domains, presents unique challenges due to the language's rich morphology, the scarcity of resources, and dialectical diversity. This study evaluates the efficacy of Conditional Random Fields (CRF), Support Vector Machines (SVM), and Stochastic Gradient Descent (SGD) models for named entity recognition in Arabic scientific texts. These models have been implemented on a self-collected dataset consisting of Arabic abstracts of theses. The named entities identified in the dataset include proteins, DNA, RNA, cell types, and cell lines. Focusing on the scientific domain, our comparative analysis reveals significant performance differences among the models, with hybrid approaches showing promising results. SGD, SVM, and CRF achieved F1-scores of 0.96, 0.91, and 0.80, respectively. The results demonstrate the effectiveness of the proposed models. The research contributes to Arabic natural language processing by highlighting model strengths and guiding future selections and development of named entity recognition models. | ||||
Keywords | ||||
Arabic Named Entity Recognition; Entity Extraction; Arabic NLP; Machine Learning | ||||
Statistics Article View: 273 PDF Download: 293 |
||||