Evaluating AI Performance in Academic Settings: A Comparative Study of ChatGPT-4 and Gemini

Embark, Asmaa Saeed; Amin, Yassmeen Ali

doi:10.21608/jaiep.2025.395670.1017

	Evaluating AI Performance in Academic Settings: A Comparative Study of ChatGPT-4 and Gemini
Journal of Artificial Intelligence in Engineering Practice
Volume 2, Issue 1, April 2025, Pages 17-30 PDF (571.94 K)
Document Type: Original Article
DOI: 10.21608/jaiep.2025.395670.1017
Authors
Asmaa Saeed Embark^* ; Yassmeen Ali Amin^*
Lecturer, Al-Gazeera High Institute for Computer and Information Systems, Cairo, Egypt
Abstract
This study conducts a systematic comparison of ChatGPT-4 and Gemini in addressing academic queries across four disciplines: Python programming, financial accounting, business administration, and medical sciences. Through a mixed-methods analysis of 40 standardized questions (balanced between numerical and narrative formats), we evaluate the models' accuracy, reasoning capabilities, and limitations. Results reveal ChatGPT-4's superior performance with 82.5% overall accuracy (85% numerical, 80% narrative) versus Gemini's 68.8% (72.5% numerical, 65% narrative). While both models demonstrate competence in straightforward queries, ChatGPT-4 exhibits significantly better contextual interpretation and explanatory depth for complex narrative questions. Gemini, though faster in response generation, shows higher susceptibility to errors in technical domains. Notably, both systems face challenges in handling implicit assumptions, particularly in advanced accounting problems, where error rates reach 15-20%. These findings underscore ChatGPT-4's current advantage as an educational support tool while emphasizing the necessity of human oversight for quality control. The study contributes practical evaluation metrics and implementation guidelines for academic institutions adopting AI technologies.
Keywords
Educational AI; large language models; comparative analysis

Statistics Article View: 248 PDF Download: 325