Investigating correlations between chemical connectivity patterns and cancer mutations using machine learning

Detta är en Uppsats för yrkesexamina på avancerad nivå från Uppsala universitet/Avdelningen för systemteknik

Författare: Gustav Lind; [2021]

Nyckelord: ;

Sammanfattning: Proteins are a group of naturally occurring, highly versatile organic macromolecules which can perform a wide range of biological functions in living systems by folding into complex three-dimensional shapes. This thesis project deploys data science and machine learning methods in the study of protein structures with the aim to better the knowledge of the relation between structural and chemical properties of proteins and cancer mutations observed in humans. The project is conducted at the Protein Dynamics and Cancer Lab at Karolinska Institute.   A collection of proteins from the Catalogue of Somatic Mutations in Cancer are studied. A two-dimensional projection entailing both chemical and structural properties is designed and used to predict the mutational vulnerability of certain restricted areas of proteins using fully convolutional networks. The correlation between predictions and true values are then analyzed using the mutual information score.   The obtained mutual information scores of the six tested networks all indicate that a small correlation might exist. In view of the fact that significant data handling issues resulted in validation on previously seen data, the conclusion is drawn that the existence of such a relation is improbable. However, as no discrimination based on dynamical properties of the clusters has been made, and due to the size of the benchmark being too small, the hypothesis cannot be ruled out.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)