29. ■ Text-to-imageのタスク
■ 文章を入力して画像を生成
Image synthesis via GANs
Conditional Image Synthesis
Text-to-image
Label-to-image
背景
People riding on
elephants that
are walking through
a river.
引用5 [Seunghoon Hong et al., 2018]
63. ■ Discriminatorはpix2pixHDと同じMulti-scale discriminator (PatchGAN準拠)
(Adversarial loss + Feature Matching loss + Perceptual loss)
■ least squared loss -> Hinge lossに変更
■ DiscriminatorにはSPADE層をいれない
実装詳細
引用1 [Taesung Park et al., 2019]
引用7 [Ting-Chun Wang et al, 2017]
66. ■ Base Line:
① Pix2pixHD:SOTAなGANベースアプローチ
ベースライン
引用7 [Ting-Chun Wang et al, 2017]
67. ■ Base Line:
① Pix2pixHD:SOTAなGANベースアプローチ
② CRN:段階的に高解像度Semantic mapを入力するFeedforwardアプローチ
ベースライン
引用14 [Qifeng Chen et al., 2017]
68. ■ Base Line:
① Pix2pixHD:SOTAなGANベースアプローチ
② CRN:段階的に高解像度Semantic mapを入力するFeedforwardアプローチ
③ SIMS:本物画像のDBからセグメント合成するアプローチ
ベースライン
引用15 [Xiaojuan Qi et al., 2018]
79. 参考文献
■ [1] Taesung Park et al. Semantic Image Synthesis with Spatially-Adaptive Normalization, 2019
https://arxiv.org/abs/1903.07291
https://youtu.be/9GR8V-VR4Qg?t=614
■ [2] Tero Karras et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2018
https://arxiv.org/abs/1710.10196
https://youtu.be/XOxxPcy5Gr4
■ [3] Alec Radford et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, 2015
https://arxiv.org/abs/1511.06434
■ [4] Takeru Miyato et al. cGANs with Projection Discriminator, 2018
https://arxiv.org/abs/1802.05637
■ [5] Seunghoon Hong et al. Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis, 2018
https://arxiv.org/abs/1801.05091
■ [6] Phillip Isola et al. Image-to-Image Translation with Conditional Adversarial Networks, 2016
https://arxiv.org/abs/1611.07004
■ [7] Ting-Chun Wang et al. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs, 2017
https://arxiv.org/abs/1711.11585
https://youtu.be/3AIpPlzM_qs
80. 参考文献
■ [8] Qifeng Chen, et al. Photographic Image Synthesis with Cascaded Refinement Networks, 2017
https://arxiv.org/abs/1707.09405
■ [9] Xun Huang, et al. Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization, 2017
https://arxiv.org/abs/1703.06868
■ [10] Harm de Vries, et al. Modulating early visual processing by language, 2017
https://arxiv.org/abs/1707.00683
■ [11] Holger Caesar, et al. COCO-Stuff: Thing and Stuff Classes in Context, 2018
https://arxiv.org/abs/1612.03716
■ [12] Bolei Zhou, et al. Semantic Understanding of Scenes through the ADE20K Dataset, 2016
https://arxiv.org/abs/1608.05442
■ [13] Marius Cordts, et al. The Cityscapes Dataset for Semantic Urban Scene Understanding, 2016
https://arxiv.org/abs/1604.01685
■ [14] Qifeng Chen, et al. Photographic Image Synthesis with Cascaded Refinement Networks, 2017
https://arxiv.org/abs/1707.09405
■ [15] Xiaojuan Qi, et al. Semi-parametric Image Synthesis, 2018
https://arxiv.org/abs/1804.10992