Text-to-image Synthesis for Fashion Design

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Zhengrong Yi; [2019]

Nyckelord: ;

Sammanfattning: Generating high-quality images from textual descriptions is an active research direction in image generation and has aroused great interest in fashion design. The synthesized image should be consistent with the meaning of text as well as being of acceptable quality. Generative Adversarial Networks (GANs) successfully show the capability of synthesizing sharper images compared to other generative models. Many GAN-based methods have been developed to deal with text-to-image synthesis, generating compelling images on simple non-fashion datasets. Nevertheless, inherent problems of GANs and more complex datasets greatly increase the difficulty of synthesizing realistic and high-resolution images. In this degree project, with the aim to study the respective impacts of network architectures and training data on the performance of text-to-image synthesis, two GAN-based algorithms are adopted, namely, Attentional Generative Network (AttnGAN) and Stacked Generative Network (StackGAN). They are applied on two fashion datasets separately, i.e., FashionGen and FashionSynthesis. The models are evaluated using Fréchet Inception Distance (FID). For FashionGen, AttnGAN outperforms StackGAN with better FID and synthesizes high-quality images in most common categories of fashion items. For Fashion Synthesis, StackGAN achieves better FID, but some generated images are less realistic.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)