Test Case Generation from Specifications Using Natural Language Processing

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Software testing plays a fundamental role in software engineering as it ensures the quality of a software system. However, one of the major challenges of software testing is its costs since it is a time and resource-consuming process which according to academia and industry can take up to 50% of the total development cost. Today, one of the most common ways of generating testcases is through manual labor by analyzing specification documents to produce test scripts, which tends to be an expensive and error prone process. Therefore, optimizing software testing by automating the test case generation process can result in time and cost reductions and also lead to better quality of the end product. Currently, most of the state-of-the-art solutions for automatic test case generation require the usage of formal specifications. Such formal specifications are not always available during the testing process and if available, they require expert knowledge for writing and understanding them. One artifact that is often available in the testing domain is test case specifications written in natural language. In this thesis, an approach for generating integration test cases from natural language test case specifications is designed, applied and, evaluated. Machine learning and natural language processing techniques are used to implement the approach. The proposed approach is conducted and evaluated on an industrial testing project at Ericsson AB in Sweden. Additionally, the approach has been implemented as a tool with a graphical user interface for aiding testers in the process of test case generation. The approach involves performing natural language processing techniques for parsing and analyzing the test case specifications to generate feature vectors that are later mapped to label vectors containing existing C# test scripts filenames. The feature and label vectors are used as input and output, respectively, in a multi-label text classification process. The approach managed to produce test scripts for all test case specifications and obtained a best F1 score of 89% when using LinearSVC as the classifier and performing data augmentation on the training set.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)