Towards disease progression sub-typing via responsibility sampling for robust expectation-maximisation learning

Detta är en Master-uppsats från KTH/Optimeringslära och systemteori

Författare: Mathias Edman; [2019]

Nyckelord: ;

Sammanfattning: Most diseases have different heterogeneous effects on patients. Broadly, one may conclude what manifested symptoms correspond to which diagnosis, but usually there is more than one disease progression pattern. Because there is more than one pattern, and because each pattern may require a bespoke (and personalised) therapeutic intervention, time-series clustering is one option by which disease subpopulations can be identified. Such patient sub-typing is difficult due to information heterogeneity, information sparsity (few longitudinal observations) and complex temporal governing disease dynamics. To deal with these problems, and seeking to gain a robust description of them, we introduce a generative clustering model by way of a mixture of hidden Markov models. Our model deals with non-ergodic temporal dynamics, has variable state cardinality for the mixtures components and initialises the mixture in a more structured way than current methods. With the task of disease progression modelling in mind, we also take a broader perspective on parameter learning in finite mixture models (FFM). In many mixture models, obtaining optimal or near-optimal parameters is difficult with current learning methods, where the most common approach is to employ monotone learning algorithms e.g. the conventional expectation-maximisation algorithm. While effective, the success of any monotone algorithm is crucially dependant on good parameter initialisation. A common approach is to repeat the learning procedure multiple times starting from different points in the parameter space or to employ model specific initialisation schemes e.g. K-means initialisation for Gaussian mixture models. For other types of mixture models the path to good initialisation parameters is often unclear and may require a solution specific not only model, but also the data. To this end, we propose a general heuristic learning algorithm that utilises Boltzmann exploration to assign each observation to a specific base distribution within the mixture model, which we call Boltzmann exploration expectationmaximisation (BEEM). With BEEM, hard assignments allow straight forward parameter learning for each base distribution by conditioning only on its assigned observations. Consequently it can be applied to mixtures of any base distribution where single component parameter learning is tractable. The stochastic learning procedure is able to escape local optima and explores the parameter space, thus mitigates sensitivity to parameter initialisation. We show competitive performance on a number of synthetic benchmark cases as well as on real-world datasets. Finally we employ BEEM for the disease progression sub-typing task and contrast it to a task specific initialisation procedure on synthetic data as well as on a real progression modelling task, where we identify clinical phenotypes in Parkinson’s disease

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)