A Comparative Study for Different Resampling Techniques for Imbalanced datasets | ||||
IJCI. International Journal of Computers and Information | ||||
Article 21, Volume 10, Issue 3, November 2023, Page 147-156 PDF (650.91 K) | ||||
Document Type: Original Article | ||||
DOI: 10.21608/ijci.2023.236287.1136 | ||||
![]() | ||||
Authors | ||||
Alaa Mahmoud Elsobky ![]() | ||||
menoufia | ||||
Abstract | ||||
The imbalanced data is a significant challenge for researchers in supervised machine learning. Current data mining algorithms are not effective for processing imbalanced data. In fact, this problem reduces classification accuracy because the prediction of minority classes is inaccurate. The classification of imbalanced data is the major challenge that has received significant attention. Therefore, The use of sampling techniques to improve classification performance has been a significant consideration in related work. In this paper, a comparative study of six different sampling algorithms is performed. The employed sampling algorithms are from different sampling techniques: two oversampling algorithms, two undersampling algorithms, and two combination algorithms between oversampling and undersampling. The techniques used in oversampling are random oversampling and SMOTE, while undersampling techniques are random undersampling and a near miss. A combination of oversampling and undersampling techniques is SMOTE TOMEK and SMOTEEN. This comparative study aims to examine the impact of the employed sampling method. Algorithms on the performance of three classifiers: SVM, KNN, and logistic regression. Cross-validation experiments on 12 standard datasets show that the SMOTEEN sampling The algorithm achieves significant improvements compared with other typical algorithms. | ||||
Keywords | ||||
Imbalanced data; resampling techniques; SMOTE; SMOTEEN; SMOTE Tomek | ||||
Statistics Article View: 155 PDF Download: 485 |
||||