Evaluating Student Performance Prediction Using Machine Learning Models | ||||
Port-Said Engineering Research Journal | ||||
Articles in Press, Accepted Manuscript, Available Online from 17 June 2025 | ||||
Document Type: Review Article | ||||
DOI: 10.21608/pserj.2025.387085.1412 | ||||
![]() | ||||
Authors | ||||
sara mohamed abohashish ![]() ![]() | ||||
1department Information technology management, management technology and information systems, port said university | ||||
2Department of Information Technology Management, Faculty of Management Technology and Information Systems, Port Said University, Port Said | ||||
Abstract | ||||
Machine learning plays a crucial role in addressing various challenges in data science. A widely used application of machine learning is the prediction of outcomes based on large educational datasets. This study examines a dataset of 4,424 students with 20 features. Several regression models, including Linear Regression (LR), XGBoost, Support Vector Regression (SVR), Random Forest (RF), and Stacking Regressor, were developed and compared to predict students’ GPA on a 0–4 scale. Additionally, classification models such as LR, RF, XGBoost, and Support Vector Machine (SVM) were implemented to categorize students into Dropout, Enrolled, or Graduate groups. Various evaluation metrics such as accuracy, specificity, precision, recall, and F1 score are utilized to assess model performance. Furthermore, a clustering is implemented using the Principal Component Analysis (PCA) on the numerical features algorithm and Uniform Manifold Approximation and Projection (UMAP) for dimensionality reduction on high-dimensional categorical data. Students were segmented into three groups based on the Silhouette Score and Davies-Bouldin Index (DB). The clustering technique identifies three student clusters, yielding a silhouette score 0.35. The proposed system demonstrates strong predictive capabilities as the most effective model, achieving minimal Mean Squared Error (MSE) and high accuracy. These clusters are analyzed through visualizations of exam score distributions and feature averages. | ||||
Keywords | ||||
Student performance; Classification; Regression; Cluster; Machine Learning | ||||
Statistics Article View: 130 |
||||