A Mixture-of-Experts Approach for Gene Regulatory Network Inference

Detta är en Master-uppsats från Blekinge Tekniska Högskola/Institutionen för datalogi och datorsystemteknik

Sammanfattning: Context. Gene regulatory network (GRN) inference is an important and challenging problem in bioinformatics. A variety of machine learning algorithms have been applied to increase the GRN inference accuracy. Ensemble learning methods are shown to yield a higher inference accuracy than individual algorithms. Objectives. We propose an ensemble GRN inference method, which is based on the principle of Mixture-of-Experts ensemble learning. The proposed method can quantitatively measure the accuracy of individual GRN inference algorithms at the network motifs level. Based on the accuracy of the individual algorithms at predicting different types of network motifs, weights are assigned to the individual algorithms so as to take advantages of their strengths and weaknesses. In this way, we can improve the accuracy of the ensemble prediction. Methods. The research methodology is controlled experiment. The independent variable is method. It has eight groups: five individual algorithms, the generic average ranking method used in the DREAM5 challenge, the proposed ensemble method including four types of network motifs and five types of network motifs. The dependent variable is GRN inference accuracy, measured by the area under the precision-recall curve (AUPR). The experiment has training and testing phases. In the training phase, we analyze the accuracy of five individual algorithms at the network motifs level to decide their weights. In the testing phase, the weights are used to combine predictions from the five individual algorithms to generate ensemble predictions. We compare the accuracy of the eight method groups on Escherichia coli microarray dataset using AUPR. Results. In the training phase, we obtain the AUPR values of the five individual algorithms at predicting each type of the network motifs. In the testing phase, we collect the AUPR values of the eight methods on predicting the GRN of the Escherichia coli microarray dataset. Each method group has a sample size of ten (ten AUPR values). Conclusions. Statistical tests on the experiment results show that the proposed method yields a significantly higher accuracy than the generic average ranking method. In addition, a new type of network motif is found in GRN, the inclusion of which can increase the accuracy of the proposed method significantly.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)