Evaluating Feature Selection Methods for Machine Learning Models in Cybersecurity | ||||
The Egyptian International Journal of Engineering Sciences and Technology | ||||
Articles in Press, Accepted Manuscript, Available Online from 02 February 2025 | ||||
Document Type: Original Article | ||||
DOI: 10.21608/eijest.2025.350769.1318 | ||||
![]() | ||||
Authors | ||||
Anas N Moursi ![]() ![]() | ||||
1Cyber Security, Faculty of Computer Studies, Arab Open University, El-Sorouk, Egypt | ||||
2Computer and Systems Engineering Department, Faculty of Engineering, Zagazig University, Zagazig , Egypt | ||||
Abstract | ||||
Cyber-attack incidents are increasing daily, with the adoption of modern communication technologies, cloud services, and the Internet of things. Providing high accuracy real-time protection for networks against network vulnerabilities is of paramount importance. In machine learning, one of the crucial items, which influence models’ performance enhancement in detecting and preventing these threats, is feature selection. This paper evaluates two feature selection methodologies, which are: (1) feature selection using traditional statistical approaches, such as Mutual Information (MI) and correlation-based; and (2) automated feature selection using embedded methods, such as LightGBM. The evaluation is performed on six established cybersecurity datasets which are CIC-DDoS2019, ISCX-IDS2012, UNSW-NB15, CIC-IDS2017, NSL-KDD, and CSE-CIC-IDS2018. The datasets are used to train and test various models. Each feature selection methodology is applied to get the optimal combination of features. Subsequently, a comparison analysis of multiple metrics, including time cost, is conducted across the models. The findings show that there is a huge variation in model performance, regardless of the dataset or the feature selection methodology. The time cost reduced significantly for the models with LightGBM feature selection method. Some models improved their metrics when using LightGBM. This makes LightGBM a promising choice in cybersecurity applications. | ||||
Keywords | ||||
Cybersecurity; Feature Selection; Machine Learning; LightGBM | ||||
Statistics Article View: 195 |
||||