Integration of Deep Learning Models for Enhanced Classification of Viral DNA Sequences Across Specific Viruses and Viral Families | ||||
International Journal of Intelligent Computing and Information Sciences | ||||
Volume 24, Issue 1, March 2024, Page 89-104 PDF (582.79 K) | ||||
Document Type: Original Article | ||||
DOI: 10.21608/ijicis.2024.279692.1332 | ||||
View on SCiNiTO | ||||
Authors | ||||
Ahmed Hesham El-Tohamy 1; Huda Amin 2; Nagwa Badr 3 | ||||
1Information Systems Department ,Faculty of Computer and Information Sciences,. Ain Shams University | ||||
2Information Systems Department ,Faculty of Computer and Information Sciences, Ain Shams University | ||||
3Department of Information Systems, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, 11566, Egypt | ||||
Abstract | ||||
The field of genomic bioinformatics is continually challenged by the need for precise classification of viral DNA sequences. The challenge of accurately classifying viral sequences is crucial for the development of diagnostic and therapeutic strategies for any viral outbreaks. This study presents a comprehensive approach integrating two distinct deep learning models, namely the Genetic Algorithm (GA) optimized Convolutional Neural Networks (CNN) hybrid model and the CNN-Extreme Learning Machines (ELM) model aiming to enhance the classification of viral DNA sequences across specific viruses and viral families. A comprehensive data preprocessing strategy is employed, wherein both datasets undergo k-mer, label, and one-hot vector encoding. This allows for a uniform and comparative analysis across different models and datasets. When the optimized GA-CNN is applied to the more generic viral family dataset, it demonstrates a good adaptability with an accuracy of 95.88% achieving a higher result than the CNN-ELM. In contrast, the CNN-ELM, when tested on the specific virus dataset, maintains robust feature extraction capabilities, faster training time but lower than the optimized GA-CNN model achieving an accuracy of 92.7%. A comparative analysis of training times is also employed in this study. The CNN-ELM model shows a notable efficiency, with a 34% faster training time compared to the GA-CNN. Moreover, when both models are applied to the new generic dataset, a comparative study with other deep learning models is conducted. Remarkably, the GA-CNN outperforms other models, achieving the highest classification accuracy of 95.88%. | ||||
Keywords | ||||
Genomic Bioinformatics; Viral DNA Classification; GA-CNN; CNN-ELM; Extreme Learning Machines | ||||
Statistics Article View: 109 PDF Download: 154 |
||||