Logistic Regression Hyperparameter Optimization for Cancer Classification

Ahmed Arafa, Ahmed Hamdy; Radad, Marwa; Badawy, Mohammed M; El-Fishawy, Nawal

doi:10.21608/mjeer.2021.70512.1034

	Logistic Regression Hyperparameter Optimization for Cancer Classification
Menoufia Journal of Electronic Engineering Research
Article 1, Volume 31, Issue 1, January 2022, Page 1-8 PDF (453.7 K)
Document Type: Original Article
DOI: 10.21608/mjeer.2021.70512.1034
View on SCiNiTO
Authors
Ahmed Hamdy Ahmed Arafa ¹; Marwa Radad²; Mohammed M Badawy ³; Nawal El-Fishawy⁴
¹Computer Science &amp; Engineering Dept. Faculty of Electronic Engineering Menoufia, Egypt.
²Computer Science & Engineering Dept. Faculty of Electronic Engineering Menoufia, Egypt.
³Computer Science and Engineering Dept., Faculty of Electronic Engineering, Menoufia University
⁴Computer Science an Engineering, Faculty Of Electronic Engineering, Menoufia University, Egypt
Abstract
In machine learning, optimization of hyperparameters aims to find the best values of model hyperparameters yielding an optimal model with minimum prediction error. It is the most important step that directly affects the performance of learned model. Many techniques have been proposed to optimize hyperparameters for different predictive models. In this paper, the performance of grid search, random search, Bayesian Tree Parzen Estimator (TPE) and Simulated Annealing (SA) optimization techniques is evaluated to determine the best hyperparameters for a logistic regression model when used in cancer classification. Wisconsin Breast Cancer Dataset (WBCD) has been used to evaluate the previously mentioned optimization techniques. The results show that Bayesian TPE outperformed other techniques in terms of number of iterations and running time. The number of iterations to get optimal parameters in TPE is less than SA by 75.75 %, and random search by 77.1%. While the time taken by TPE is better than SA, random search and grid search by 79.9%, 86.1% and 99.9% respectively. The resulted optimal hyperparameter values have been utilized to learn a logistic regression model to classify cancer using WBCD dataset. The optimized model succeeded in classifying cancer with 98.2% for test accuracy, 0.962 for kappa statistic and 0.963 for MCC metrics when evaluated using 10-fold cross validation.
Keywords
Hyperparameter Optimization; Random Search Grid Search; Tree Parzen Estimator; Simulated Annealing


Statistics Article View: 682 PDF Download: 721