SubKluster: Novel method to bin scaffolds from cereal genomes into subgenomes using substring frequency analysis

Detta är en Master-uppsats från Lunds universitet/Examensarbeten i bioinformatik

Författare: Victor Kalbskopf; [2023]

Nyckelord: Biology and Life Sciences;

Sammanfattning: The genome of the Belinda variety of the hexaploid oat (Avena sativa) has recently been sequenced and assembled. This project aims to improve the assembly by clustering the thousands of scaffolds into their three ancestral subgenomes using Principle Component Analysis (PCA) of kmer and repeat-element frequencies. The method was developed using a chromosome level assembly of hexaploid Wheat (Tritium aestivum), which formed highly distinguishable subgenome true clusters in their PCA graph, which indicates that the method has merit. The longest scaffolds of oats that formed 90% of the genome (N90) were processed in the same manner, and which resulted in 2 clusters, one with about one third of the 3-copy BUSCOs (Benchmarking Universal Single-Copy Orthologs), and another with two thirds. The latter cluster could then be subdivided into two clusters, with about half of the 2-copy BUSCOs in each cluster. A one:one:one ratio of BUSCOs in each cluster would indicate that the subgenomes are dividing into their respective clusters. The clustering is not neat or as clear as in the wheat example, but the length of the scaffolds or the state of the assembly may have a very large effect on the efficacy of the method. It is hoped that this method, with additional improvements, could be used to assess the assemblies of other large polyploid genomes and be part of a larger pipeline for understanding crop genome evolution.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)