Quality control of a somatic mutation analysis pipeline for next generation sequencing data

Detta är en Uppsats för yrkesexamina på avancerad nivå från Uppsala universitet/Institutionen för immunologi, genetik och patologi

Författare: Marina Rashyna; [2018]

Nyckelord: ;

Sammanfattning: Many studies are focused on analysis of next generation sequencingdata from normal and cancer tissues with the intention ofidentifying somatic mutations in cancer. In brief, the producedsequences are mapped to the reference genome; later the data fromthe tumour and normal sample is compared to identify mutations inthe tumour. Errors can be introduced during the sample handling orby the sequencing platform, leading to incorrect alignment andultimately to false positive mutations. To be certain that thediscovered mutation is not an artefact; quality control should beperformed on the raw sequencing data, on the results of readalignment and finally following the mutation calling.There are two aims of this study. First, to identify the mostimportant metrics for control of raw sequencing data andreadalignment data. Second, to develop tools that can evaluatethese metrics. To discover the most essential metrics, freelyavailable software packages for quality control of the rawsequencing data and read alignment were analysed.Two tools, RawQC and MapQC have been developed in Python 3, toperform a quality control of raw sequencing and alignment data.RawQC can handle targeted panel data from the main commerciallyavailable sequencing platforms Illumina, Ion Torrent and PacificBiosciences. A novel feature implemented in RawQC is the analysisof read duplications for estimating the duplication level withregard to the read length. For MapQC, a new feature is Flagoverview metric that presents a quick summary of the alignment,where the read length is also considered. Both tools produceuseful statistics and graphs for quality assessment of input data.The evaluation of these metrics is an important step beforesomatic variant calling. By evaluating the quality of the datacertain decisions on the data processing and filtering arefacilitated to reduce the amount of false positive or falsenegative mutation calls.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)