A Survey on Visual Question Answering Methodologies | ||||
The Egyptian Journal of Language Engineering | ||||
Article 4, Volume 11, Issue 1, April 2024, Page 57-65 PDF (902.91 K) | ||||
Document Type: Original Article | ||||
DOI: 10.21608/ejle.2024.244720.1058 | ||||
![]() | ||||
Authors | ||||
Aya M. Al-Zoghby1; Aya Salah Saleh ![]() ![]() | ||||
1Department of Computer Science, Faculty of Computers and Information Science Damietta University Damietta, Egypt | ||||
2Computer Science,Computer and Artificial Intelligence, Damietta University, New Damietta, Damietta | ||||
3Computer Science Department, Faculty of Computer and Artificial Intelligence, Damietta University | ||||
Abstract | ||||
Understanding visual question-answering (VQA) will be essential for many human tasks. However, it poses significant obstacles at the core of artificial intelligence as a multimodal system. This article provides a summary of the challenges in multimodal architectures that have lately been demonstrated by the enormous rise in research. We need to keep our eyes on these challenges to enhance the design of visual question-answering systems. Then we will introduce the recent rapid developments in methods for answering visual questions with images. Providing the right response to a natural language question concerning an input image, it is a difficult multi-modal activity as we don’t need only to extract features from both modal (text and image) but also getting attention on relation between them. Many deep learning researchers are drawn to it because of their outstanding contributions to text, voice, and vision technologies (images and videos) in fields like welfare, robotics, security, and medicine, etc. | ||||
Keywords | ||||
Deep Learning; Visual question answering; Multimodal challenges; VQA methodologies | ||||
Statistics Article View: 407 PDF Download: 335 |
||||