A Hybrid Approach for Automatic Morphological Diacritization of Arabic Text | ||||
Mansoura Journal for Computer and Information Sciences | ||||
Volume 14, Issue 2, December 2018, Page 39-46 PDF (557.72 K) | ||||
Document Type: Original Research Articles. | ||||
DOI: 10.21608/mjcis.2018.312008 | ||||
View on SCiNiTO | ||||
Authors | ||||
Hatem M Noaman1; Shahenda S. Sarhan1; M. A. A. Rashwan2 | ||||
1Computer Science Department, Mansoura University, Egypt | ||||
2Electronics and Communications Department, Cairo University, Egypt | ||||
Abstract | ||||
Arabic Modern texts are commonly written without diacritization, which is a critical task for other Arabic processing tasks as word sense disambiguation, automatic speech recognition, and text to speech, where word meaning or pronunciation is decided based on the diacritic signs assigned to each letter. This paper presents a novel approach for automatic Arabic text diacritization using deep encode-decode recurrent neural networks that is followed by several text correction techniques, to improve the overall system output accuracy. Experimental results of the proposed system on Wikinews test set show superior performance and are competitive with those of the-state-of-the-art diacritization methods. Namely, our method achieves morphological diacritization Word Error Rate (WER) 3.85% and Diacritic Error Rate (DER) 1.12% | ||||
Keywords | ||||
Arabic Natural Language Processing; Automatic Morphological Diacritization; deep encode-decode recurrent neural networks | ||||
Statistics Article View: 28 PDF Download: 29 |
||||