Estimating the intrinsic dimensionality of high dimensional data

Detta är en Master-uppsats från KTH/Matematisk statistik

Författare: Joakim Winiger; [2015]

Nyckelord: ;

Sammanfattning: This report presents a review of some methods for estimating what is known as intrinsic dimensionality (ID). The principle behind intrinsic dimensionality estimation is that frequently, it is possible to find some structure in data which makes it possible to re-express it using a fewer number of coordinates (dimensions). The main objective of the report is to solve a common problem: Given a (typically high-dimensional) dataset, determine whether the number of dimensions are redundant, and if so, find a lower dimensional representation of it. We introduce different approaches for ID estimation, motivate them theoretically and compare them using both synthetic and real datasets. The first three methods estimate the ID of a dataset while the fourth finds a low dimensional version of the data. This is a useful order in which to organize the task, given an estimate of the ID of a dataset, construct a simpler version of the dataset using this number of dimensions. The results show that it is possible to obtain a remarkable decrease in high-dimensional data. The different methods give similar results despite their different theoretical backgrounds and behave as expected when using them on synthetic datasets with known ID.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)