Modern Standard Arabic Grammar Automatic Extraction from Penn 1 Arabic Treebank Using Natural Language Toolkit | ||||
The Egyptian Journal of Language Engineering | ||||
Article 1, Volume 5, Issue 1, April 2018, Page 1-10 PDF (733.21 K) | ||||
Document Type: Original Article | ||||
DOI: 10.21608/ejle.2018.59295 | ||||
View on SCiNiTO | ||||
Authors | ||||
Amira Abdelhalim 1; Sameh Alansary2 | ||||
1Phonetics and Linguistics Department, Faculty of Arts, Alexandria University | ||||
2Department of Phonetics and Linguistics and the head of Phonetics and Linguistics Department, Faculty of Arts, Alexandria University | ||||
Abstract | ||||
This paper presents a methodology for rule based bottom up parsing technique forModern Standard Arabic (MSA) in Context Free Grammar (CFG) formalism in Phrase Structure Grammar (PSG) representation, where the grammar is automatically extracted from a syntactically annotated corpus.The extracted grammar is used to build an automatic lexicon and grammar rules module. Furthermore, the extracted CFG is further transformed into Probabilistic Context Free Grammar (PCFG) that could be used in a hybrid approach, which is also calculated automatically. The used corpus is the Penn Arabic Treebank(PATB)and algorithm implementation is performed with Natural Language Processing Toolkit (NLTK).The parser showed that automatic extraction of grammar improved the grammar building phase in both coverage of structures and time needed, but still needs further manual constrains addition. Automatic extraction of grammar is able to enhance rule based grammar parsers and it will enable a new paradigm of statistically directed symbolic parsing. | ||||
Keywords | ||||
Observational Based Grammar; Automatic Grammar Extraction- Rule Based Grammar – Enhancing Arabic Grammar Parsing; Statistically Directed Symbolic Parsing | ||||
Statistics Article View: 111 PDF Download: 449 |
||||