Kombinatorisk Optimering med Pointer Networks och Reinforcement Learning

Detta är en Master-uppsats från Linköpings universitet/Artificiell intelligens och integrerade datorsystem

Sammanfattning: Given the complexity and range of combinatorial optimization problems, solving them can be computationally easy or hard. There are many ways to solve them, but all available methods share a problem: they take a long time to run and have to be rerun when new cases are introduced. Machine learning could prove a viable solution to solving combinatorial optimization problems due to the possibility for models to learn and generalize, eliminating the need to run a complex algorithm every time a new instance is presented. Uniter is a management consulting firm that provides services within product modularization. Product modularization results in the possibility for many different product variations to be created based on customer needs. Finding the best combination given a specific customer's need will require solving a combinatorial optimization problem. Based on Uniter's need, this thesis sought to develop and evaluate a machine learning model consisting of a Pointer Network architecture and trained using Reinforcement Learning.  The task was to find the combination of parts yielding the lowest cost, given a use case. Each use case had different attributes that specified the need for the final product. For each use case, the model was tasked with selecting the most appropriate combination from a set of 4000 distinct combinations. Three experiments were conducted: examining if the model could suggest an optimal solution after being trained on one use case, if the model could suggest an optimal solution of a previously seen use case, and if the model could suggest an optimal solution of an unseen use case. For all experiments, a single data set was used. The suggested model was compared to three baselines: a weighted random selection, a naive model implementing a feed-forward network, and an exhaustive search. The results showed that the proposed model could not suggest an optimal solution in any of the experiments. In most tests conducted, the proposed model was significantly slower at suggesting a solution than any baseline. The proposed model had high accuracy in all experiments, meaning it suggested almost only feasible solutions in the allowed solution space. However, when the model converged, it suggested only one combination for every use case, with the feed-forward baseline showing the same behavior. This behavior could suggest that the model misinterpreted the task and identified a solution that would work in most cases instead of suggesting the optimal solution for each use case. The discussion concludes that an exhaustive search is preferable for the studied data set and that an alternate approach using supervised learning may be a better solution.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)