The impact of the number of training samples and the number of test points on the accuracy of land use and land cover maps derived from satellite imagery using machine learning-based classification algorithms: A case study on Al-Ahsa Oasis in Saudi Arabia

Faqeih, Khadeijah yahya

doi:10.70216/2682-485X.1671

	The impact of the number of training samples and the number of test points on the accuracy of land use and land cover maps derived from satellite imagery using machine learning-based classification algorithms: A case study on Al-Ahsa Oasis in Saudi Arabia
مجلة كلية الآداب - جامعة القاهرة
Article 4, Volume 84, Issue 7, October 2024 PDF (1.72 M)
DOI: 10.70216/2682-485X.1671
Author
Khadeijah yahya Faqeih
Associate Professor of Maps Department of Geography and Environmental Sustainability_College of Humanities and Social Sciences Princess Nourah bint Abdulrahman University
Abstract
This study investigated the effect of training sample size and check points on the accuracy of land use and land cover (LULC) maps for the Al-Ahsa oasis in Saudi Arabia, derived from Landsat 8 imagery using Support Vector Machines (SVM) and Random Forest (RF) classifiers. Training sample sizes ranged from 5 to 30 samples per class. The classification accuracy was evaluated using overall accuracy (OA) and the kappa coefficient, with check points varying from 5 to 30 per class, compared against reference data from SAS Planet. Additionally, the accuracy of LULC maps was assessed by comparing the water surface areas obtained from different training sample sizes with those calculated using the Normalized Difference Water Index (NDWI).The results indicated that SVM generally outperformed RF in most scenarios, achieving high and stable accuracy even with a small number of training samples. SVM showed kappa values from 0.93 to 1.00 and OA from 0.95 to 1.00 with 5 samples. As the number of check points increased to 30, SVM maintained kappa values from 0.90 to 0.98 and OA from 0.92 to 0.99, reflecting its robustness. RF, while producing good results, exhibited greater variability in performance. With 5 training samples, RF's accuracy was lower, with kappa values from 0.80 to 0.90 and OA from 0.85 to 0.92. With 30 check points, RF's kappa values ranged from 0.78 to 0.95 and OA from 0.80 to 0.93, indicating less stability.Furthermore, the analysis of water surface areas showed that RF performed significantly worse with fewer training samples but improved notably with more samples, with errors decreasing from 15% to 5% as samples increased. Conversely, SVM maintained consistent performance across all training sample ranges, with errors consistently below 5%. In conclusion, SVM was generally more accurate and stable than RF, making it the preferred classifier for LULC mapping in most cases.
Keywords
Land use and land cover (LULC); Support Vector Machines (SVM); Random Forest (RF); Training sample size; Check points; Overall accuracy; Kappa Coefficient

References

Statistics Article View: 49 PDF Download: 13