Transformer-based Source Code Description Generation : An ensemble learning-based approach

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Code comprehension can be significantly benefited from high-level source code summaries. For the majority of the developers, understanding another developer’s code or code that was written in the past by them, is a timeconsuming and frustrating task. This is necessary though in software maintenance or in cases where several people are working on the same project. A fast, reliable and informative source code description generator can automate this procedure, which is often avoided by developers. The rise of Transformers has turned the attention to them leading to the development of various Transformer-based models that tackle the task of source code summarization from different perspectives. Most of these models though are treating each other in a competitive manner when their complementarity could be proven beneficial. To this end, an ensemble learning-based approach is followed to explore the feasibility and effectiveness of the collaboration of more than one powerful Transformer-based models. The used base models are PLBart and GraphCodeBERT, two models with different focuses, and the ensemble technique is stacking. The results show that such a model can improve the performance and informativeness of individual models. However, it requires changes in the configuration of the respective models, that might harm them, and also further fine-tuning at the aggregation phase to find the most suitable base models’ weights and next-token probabilities combination, for the at the time ensemble. The results also revealed the need for human evaluation since metrics like BiLingual Evaluation Understudy (BLEU) are not always representative of the quality of the produced summary. Even if the outcome is promising, further work should follow, driven by this approach and based on the limitations that are not resolved in this work, for the development of a potential State Of The Art (SOTA) model.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)