ARABIC CORPUS of LIBRARY and INFORMATION SCIENCE: DESIGN and CONSTRUCTION

Eddakrouri, Ayman

doi:10.21608/ejle.2023.183529.1040

	ARABIC CORPUS of LIBRARY and INFORMATION SCIENCE: DESIGN and CONSTRUCTION
The Egyptian Journal of Language Engineering
Article 1, Volume 10, Issue 1, April 2023, Pages 1-9 PDF (1 M)
Document Type: Original Article
DOI: 10.21608/ejle.2023.183529.1040
Author
Ayman Eddakrouri^*
Effat University, Jeddah, Kingdom of Saudi Arabia
Abstract
This paper addresses the principal considerations in creating the Arabic Corpus of Library and Information Science, a specialized Arabic corpus on the academic genre. This discusses ten phases of creation: the rationale of the Arabic Corpus of Library and Information Science, types of texts, resources of texts, legal approval, data collection, refining texts, revising texts, saving texts, coding texts, and finally, the size of the Arabic Corpus of Library and Information Science (357,485 tokens). Collecting texts of the articles was the longest and most challenging phase of building the corpus. Especially when we encounter files in PDFs or images that are difficult to read 100% correctly by various software. This challenge has been overcome by considering several factors that have been clarified at this stage. The Arabic Corpus of Library and Information Science can play a significant role in addressing the salient features of the academic genre, including keywords identification, lexico-grammatical patterns, themes, topics, and index terms used in the genre of Library and Information Science. Furthermore, the steps of creating the Arabic Corpus of Library and Information Science can guide in building other corpora for any genre or language.
Keywords
Arabic Corpora; Arabic Natural Language Processing; Information Retrieval Systems; Indexing Arabic Texts; Arabic Information Extraction

Statistics Article View: 368 PDF Download: 736