ChatGPT's Potential in Navigating the Complexity of the Polish Anaesthesiology Specialist Examination | ||||
Ain-Shams Journal of Anesthesiology | ||||
Volume 17, Issue 1, January 2025, Page 1-4 PDF (378.12 K) | ||||
Document Type: Original Article | ||||
DOI: 10.21608/asja.2024.290716.1111 | ||||
![]() | ||||
Authors | ||||
Michał Bielowka ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() | ||||
1Student Scientific Association of Computer Analysis and Artificial Intelligence at the Department of Radiology and Nuclear Medicine of the Medical University of Silesia in Katowice | ||||
2Department of Radiodiagnostics, Interventional Radiology and Nuclear Medicine | ||||
3Dr B. Hager Memorial Multi-specialty District Hospital, Pyskowicka 47-51, 42-600 Tarnowskie Góry, Poland | ||||
4Faculty of Medical Sciences in Katowice, Medical University of Silesia, 40-752 Katowice, Poland | ||||
Abstract | ||||
Purpose: This study aims to assess the capability of an artificial intelligence (AI) model, specifically ChatGPT-3.5, in answering questions from the test section of the Polish National Specialist Examination (PES) in anaesthesiology and intensive care. Materials and Methods: A pool of 118 questions from the spring 2023 PES exam was utilized. Bloom's classification was employed to categorize questions based on comprehension, critical thinking, and memory. The questions were then presented to ChatGPT-3.5 in five independent sessions to evaluate its performance. Statistical analyses were conducted to assess correlations between the model's confidence, question difficulty, and correctness of answers. Results: ChatGPT-3.5 achieved an overall accuracy of 47.5%, with variations observed across different question types and subtypes. Significant correlations were found between the model's confidence and answer correctness. However, no correlation was observed between the certainty index and question difficulty or answer correctness based on category or subcategory. Conclusions: While ChatGPT-3.5 exhibited moderate performance, it fell short of the 60% threshold required to pass the PES exam. Comparison with similar AI studies in Japan suggests superior performance by the Polish AI model, albeit with limitations in expertise level. Human candidates consistently outperformed the AI model, indicating the current superiority of human expertise in this domain. Despite current limitations, continued research and collaboration offer promising prospects for AI integration in medical practice, supporting diagnostics, therapeutics, and patient care. | ||||
Keywords | ||||
Anaesthesiology; artificial intelligence; ChatGPT; intensive care; medical education; specialty examinations | ||||
Statistics Article View: 164 PDF Download: 94 |
||||