ARABIC CORPUS of LIBRARY and INFORMATION SCIENCE: DESIGN and CONSTRUCTION | ||||
The Egyptian Journal of Language Engineering | ||||
Article 1, Volume 10, Issue 1, April 2023, Page 1-9 PDF (1 MB) | ||||
Document Type: Original Article | ||||
DOI: 10.21608/ejle.2023.183529.1040 | ||||
![]() | ||||
Author | ||||
Ayman Eddakrouri ![]() ![]() | ||||
Effat University, Jeddah, Kingdom of Saudi Arabia | ||||
Abstract | ||||
This paper addresses the principal considerations in creating the Arabic Corpus of Library and Information Science, a specialized Arabic corpus on the academic genre. This discusses ten phases of creation: the rationale of the Arabic Corpus of Library and Information Science, types of texts, resources of texts, legal approval, data collection, refining texts, revising texts, saving texts, coding texts, and finally, the size of the Arabic Corpus of Library and Information Science (357,485 tokens). Collecting texts of the articles was the longest and most challenging phase of building the corpus. Especially when we encounter files in PDFs or images that are difficult to read 100% correctly by various software. This challenge has been overcome by considering several factors that have been clarified at this stage. The Arabic Corpus of Library and Information Science can play a significant role in addressing the salient features of the academic genre, including keywords identification, lexico-grammatical patterns, themes, topics, and index terms used in the genre of Library and Information Science. Furthermore, the steps of creating the Arabic Corpus of Library and Information Science can guide in building other corpora for any genre or language. | ||||
Keywords | ||||
Arabic Corpora; Arabic Natural Language Processing; Information Retrieval Systems; Indexing Arabic Texts; Arabic Information Extraction | ||||
Statistics Article View: 329 PDF Download: 674 |
||||