Feature Extraction and FeatureSelection for Object-based LandCover Classification : Optimisation of Support Vector Machines in aCloud Computing Environment

Detta är en Master-uppsats från KTH/Geoinformatik

Sammanfattning: Mapping the Earth’s surface and its rapid changes with remotely sensed data is a crucial tool to un-derstand the impact of an increasingly urban world population on the environment. However, the impressive amount of freely available Copernicus data is only marginally exploited in common clas-sifications. One of the reasons is that measuring the properties of training samples, the so-called ‘fea-tures’, is costly and tedious. Furthermore, handling large feature sets is not easy in most image clas-sification software. This often leads to the manual choice of few, allegedly promising features. In this Master’s thesis degree project, I use the computational power of Google Earth Engine and Google Cloud Platform to generate an oversized feature set in which I explore feature importance and analyse the influence of dimensionality reduction methods. I use Support Vector Machines (SVMs) for object-based classification of satellite images - a commonly used method. A large feature set is evaluated to find the most relevant features to discriminate the classes and thereby contribute most to high clas-sification accuracy. In doing so, one can bypass the sensitive knowledge-based but sometimes arbi-trary selection of input features.Two kinds of dimensionality reduction methods are investigated. The feature extraction methods, Linear Discriminant Analysis (LDA) and Independent Component Analysis (ICA), which transform the original feature space into a projected space of lower dimensionality. And the filter-based feature selection methods, chi-squared test, mutual information and Fisher-criterion, which rank and filter the features according to a chosen statistic. I compare these methods against the default SVM in terms of classification accuracy and computational performance. The classification accuracy is measured in overall accuracy, prediction stability, inter-rater agreement and the sensitivity to training set sizes. The computational performance is measured in the decrease in training and prediction times and the compression factor of the input data. I conclude on the best performing classifier with the most effec-tive feature set based on this analysis.In a case study of mapping urban land cover in Stockholm, Sweden, based on multitemporal stacks of Sentinel-1 and Sentinel-2 imagery, I demonstrate the integration of Google Earth Engine and Google Cloud Platform for an optimised supervised land cover classification. I use dimensionality reduction methods provided in the open source scikit-learn library and show how they can improve classification accuracy and reduce the data load. At the same time, this project gives an indication of how the exploitation of big earth observation data can be approached in a cloud computing environ-ment.The preliminary results highlighted the effectiveness and necessity of dimensionality reduction methods but also strengthened the need for inter-comparable object-based land cover classification benchmarks to fully assess the quality of the derived products. To facilitate this need and encourage further research, I plan to publish the datasets (i.e. imagery, training and test data) and provide access to the developed Google Earth Engine and Python scripts as Free and Open Source Software (FOSS).

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)