Ara-RATGAN for Arabic Text to Image Synthesis | ||||
International Journal of Intelligent Computing and Information Sciences | ||||
Volume 25, Issue 2, June 2025, Page 63-73 PDF (661.94 K) | ||||
Document Type: Original Article | ||||
DOI: 10.21608/ijicis.2025.392458.1400 | ||||
![]() | ||||
Authors | ||||
Mostafa Samy Samy ![]() | ||||
1146 Madint Naser,Madint Al_Tawfeq,Cairo,Egypt | ||||
2Ain Shams University | ||||
3Prof., Scientific Computing Department, Faculty of Computers and Information Sciences Ain Shams University, Cairo, Egypt | ||||
Abstract | ||||
Current text-to-image systems have revealed outstanding performance in tasks requiring the automated synthesis of realistic generated images from text descriptions. Previous approaches typically employ multiple sperate fusion blocks to adaptively fuse appropriate text information into the generation process, which increases the difficulty of training and conflict with one another. To solve these concerns, we present Arabic Recurrent Affine Transformation (Ara-RATGAN) a novel framework that integrates AraBERT -- a pretrained Arabic BERT that has been trained on billions of Arabic words to generate robust Arabic sentences embedding with Recurrent Affine Transformation (RAT) to generate images with high-quality from Arabic-language text descriptions. Furthermore, a spatial attention model is used in the discriminator to promote semantic coherence between text and synthesized images, which identifies corresponding image areas, and directs the generator to produce more appropriate visual contents according to the Arabic text descriptions. We conducted our extensive experiments on Arabic CUB dataset translated from English to Arabic, which shows a superior performance of our proposed model in comparison to the previous Arabic text-to-image models. Our approach addresses two key challenges: (1) Text-Image Fusion: Unlike traditional methods that use isolated fusion blocks, we employ RAT to model long-term dependencies across layers, ensuring global consistency in text conditioning. (2) Semantic Alignment: A spatial attention mechanism is used in the discriminator to enhance the semantic coherence between the synthetic visuals and Arabic text. | ||||
Keywords | ||||
AraBERT; Generative Adversarial Networks (GANs); Text-to-Image; Feature Fusion | ||||
Statistics Article View: 51 PDF Download: 36 |
||||