Evaluating AI Performance in Academic Settings: A Comparative Study of ChatGPT-4 and Gemini | ||||
Journal of Artificial Intelligence in Engineering Practice | ||||
Volume 2, Issue 1, April 2025, Page 17-30 PDF (571.94 K) | ||||
Document Type: Original Article | ||||
DOI: 10.21608/jaiep.2025.395670.1017 | ||||
![]() | ||||
Authors | ||||
Asmaa Saeed Embark ![]() ![]() | ||||
Lecturer, Al-Gazeera High Institute for Computer and Information Systems, Cairo, Egypt | ||||
Abstract | ||||
This study conducts a systematic comparison of ChatGPT-4 and Gemini in addressing academic queries across four disciplines: Python programming, financial accounting, business administration, and medical sciences. Through a mixed-methods analysis of 40 standardized questions (balanced between numerical and narrative formats), we evaluate the models' accuracy, reasoning capabilities, and limitations. Results reveal ChatGPT-4's superior performance with 82.5% overall accuracy (85% numerical, 80% narrative) versus Gemini's 68.8% (72.5% numerical, 65% narrative). While both models demonstrate competence in straightforward queries, ChatGPT-4 exhibits significantly better contextual interpretation and explanatory depth for complex narrative questions. Gemini, though faster in response generation, shows higher susceptibility to errors in technical domains. Notably, both systems face challenges in handling implicit assumptions, particularly in advanced accounting problems, where error rates reach 15-20%. These findings underscore ChatGPT-4's current advantage as an educational support tool while emphasizing the necessity of human oversight for quality control. The study contributes practical evaluation metrics and implementation guidelines for academic institutions adopting AI technologies. | ||||
Keywords | ||||
Educational AI; large language models; comparative analysis | ||||
Statistics Article View: 108 PDF Download: 163 |
||||