Effects of auxiliary and ancillary data on LULC classification in a heterogeneous environment using optimized random forest algorithm

dc.contributor.authorKavzoglu, Taskin
dc.contributor.authorBilucan, Furkan
dc.date.accessioned2025-10-29T11:30:53Z
dc.date.issued2023
dc.departmentFakülteler, Mühendislik Fakültesi, Harita Mühendisliği Bölümü
dc.description.abstractLand use and land cover (LULC) maps, providing crucial information for monitoring the Earth's surface, are one of the most essential products for numerous studies. Using only the spectral information in the classification process might cause poor performances in the areas with heterogeneous landscape characteristics. To overcome this problem, auxiliary and ancillary data are usually employed to improve classification accuracy. The objective of this study is to integrate auxiliary data (topographic and climatic features) and ancillary data (spectral indices and texture measures) into spectral bands of Sentinel-2A imagery and evaluate the performances of advanced feature selection methods. In this context, genetic algorithm-based random forest (GA-RF), HSIC-Lasso, and Relief-F feature selection approaches were utilized to determine the most informative features for the classification process from a high-dimensional dataset consisting of 102 features. Whilst the GA-RF algorithm selected 65 features, HSIC-Lasso chose 38 features, and Relief-F determined 51 features as ideal subsets. These feature subsets together with the whole data were inputted into a supervised classification process using the random forest (RF) classifier, whose parameters were optimized using random search algorithm. The highest overall accuracy of the produced thematic maps was estimated as 91.05% for the subset determined by the HSIC-Lasso algorithm, which was also the fastest algorithm (5.71 s). McNemar's statistical significance test confirmed the superiority of the HSIC-Lasso method over the GA-RF and Relief-F algorithms. SHapley Additive exPlanations method was also applied to analyze the relative importance of a feature according to the model output.
dc.identifier.doi10.1007/s12145-022-00874-9
dc.identifier.endpage435
dc.identifier.issn1865-0473
dc.identifier.issn1865-0481
dc.identifier.issue1
dc.identifier.orcid0000-0001-7920-6914
dc.identifier.orcid0000-0002-9779-3443
dc.identifier.scopus2-s2.0-85141453473
dc.identifier.scopusqualityQ2
dc.identifier.startpage415
dc.identifier.urihttps://doi.org/10.1007/s12145-022-00874-9
dc.identifier.urihttps://hdl.handle.net/20.500.14854/11773
dc.identifier.volume16
dc.identifier.wosWOS:000879682400001
dc.identifier.wosqualityQ2
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherSpringer Heidelberg
dc.relation.ispartofEarth Science Informatics
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_WOS_20251020
dc.subjectAuxiliary data
dc.subjectAncillary data
dc.subjectGenetic algorithm
dc.subjectHSIC-Lasso
dc.subjectRelief-F
dc.subjectFeature selection
dc.titleEffects of auxiliary and ancillary data on LULC classification in a heterogeneous environment using optimized random forest algorithm
dc.typeArticle

Dosyalar