Görüntüden Görüntüye Dönüşüm ve Görüntü Sentezleme Yapan Üretici Çekişmeli Ağların İncelenmesi

Görüntüden görüntüye dönüşüm işlemi; görüntü işleme, bilgisayar grafikleri ve bilgisayarla görme alanındaki problem çözümlerinde kullanılmaktadır. Görüntüden görüntüye dönüşüm, belirli bir girdinin bir görsel temsilini başka bir temsille eşleştirmeyi öğrenmeyi gerektirmektedir. Üretici çekişmeli ağlarla (GAN'lar) görüntüden görüntüye dönüşüm yoğun bir şekilde incelenmiştir. Ve incelenen yöntemler çok modlu, süper çözünürlüklü, nesne dönüştürme ile ilgili dönüşüm vb. gibi birçok kısma uygulanmıştır. Ancak, görüntüden görüntüye dönüşüm teknikleri, istikrarsızlık ve çeşitlilik eksikliği gibi bazı sorunlardan muzdariptir. Bu çalışmada, GAN algoritmalarına ve türevlerine dayanan görüntüden görüntüye dönüşüm yöntemlerine kapsamlı bir genel bakış açısı sağlama amaçlanmıştır. Ayrıca, görüntüden görüntüye dönüşüm teknikleri tartışılıp ve analiz edilmektedir. Son olarak gelecekteki araştırılan yöntemler özetlenmekte ve tartışılmaktadır.

Anahtar Kelimeler:

Görüntüden görüntüye dönüşüm, üretici çekişmeli ağlar, derin öğrenme

Review of Generative Adversarial Networks for Image-to-Image Translation and Image Synthesis

Image to image translation process is used in image processing, computer graphics and computer vision problem solving. Image-to-image translation requires learning to match one visual representation of a given input to another representation. Image-to-image translation has been extensively studied with producer adversarial networks (GANs). And the methods studied are multimodal, super tool, transformation into an object, and so on. It has been applied to many parts such as. However, it suffers from some issues such as image-to-image formatting, instability, and lack of variety. This article is intended to provide a comprehensive overview of the image translation methods according to GAN algorithms and their derivatives. In addition, its image-to-image formation is discussed and analyzed. Finally, future research methods are summarized and discussed.

Keywords:

Image-to-image translation, generative adversarial networks, deep learning.,

PDF

___

Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1125-1134).
Wang, T. C., Liu, M. Y., Zhu, J. Y., Tao, A., Kautz, J., & Catanzaro, B. (2018). High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8798-8807).
Choi, Y., Choi, M., Kim, M., Ha, J. W., Kim, S., & Choo, J. (2018). Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8789-8797).
Wu, X., Xu, K., & Hall, P. (2017). A survey of image synthesis and editing with generative adversarial networks. Tsinghua Science and Technology, 22(6), 660-674. Azadi, S., Tschannen, M., Tzeng, E., Gelly, S., Darrell, T., & Lucic, M. (2019). Semantic bottleneck scene generation. arXiv preprint arXiv:1911.11357.
Ntavelis, E., Romero, A., Kastanis, I., Van Gool, L., & Timofte, R. (2020, August). Sesame: Semantic editing of scenes by adding, manipulating or erasing objects. In European Conference on Computer Vision (pp. 394-411). Springer, Cham.
Zhu, J. Y., Zhang, R., Pathak, D., Darrell, T., Efros, A. A., Wang, O., & Shechtman, E. (2017). Toward multimodal image-to-image translation. arXiv preprint arXiv:1711.11586.
Xiong, F., Wang, Q., & Gao, Q. (2019). Consistent embedded GAN for image-to-image translation. IEEE Access, 7, 126651-126661.
Tripathy, S., Kannala, J., & Rahtu, E. (2018, December). Learning image-to-image translation using paired and unpaired training samples. In Asian Conference on Computer Vision (pp. 51-66). Springer, Cham.
Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2223-2232).
Almahairi, A., Rajeshwar, S., Sordoni, A., Bachman, P., & Courville, A. (2018, July). Augmented cyclegan: Learning many-to-many mappings from unpaired data. In International Conference on Machine Learning (pp. 195-204). PMLR.
Kim, T., Cha, M., Kim, H., Lee, J. K., & Kim, J. (2017, July). Learning to discover cross-domain relations with generative adversarial networks. In International Conference on Machine Learning (pp. 1857-1865). PMLR.
Yi, Z., Zhang, H., Tan, P., & Gong, M. (2017). Dualgan: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE international conference on computer vision (pp. 2849-2857).
Li, Y., Tang, S., Zhang, R., Zhang, Y., Li, J., & Yan, S. (2019). Asymmetric GAN for unpaired image-to-image translation. IEEE Transactions on Image Processing, 28(12), 5881-5896.
Chen, L., Wu, L., Hu, Z., & Wang, M. (2019). Quality-aware unpaired image-to-image translation. IEEE Transactions on Multimedia, 21(10), 2664-2674.
Royer, A., Bousmalis, K., Gouws, S., Bertsch, F., Mosseri, I., Cole, F., & Murphy, K. (2020). Xgan: Unsupervised image-to-image translation for many-to-many mappings. In Domain Adaptation for Visual Understanding (pp. 33-49). Springer, Cham.
Emami, H., Aliabadi, M. M., Dong, M., & Chinnam, R. B. (2020). Spa-gan: Spatial attention gan for image-to-image translation. IEEE Transactions on Multimedia, 23, 391-401.
Kim, J., Kim, M., Kang, H., & Lee, K. (2019). U-GAT-IT: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. arXiv preprint arXiv:1907.10830.
Mo, S., Cho, M., & Shin, J. (2018). Instagan: Instance-aware image-to-image translation. arXiv preprint arXiv:1812.10889.
Liu, M. Y., Breuel, T., & Kautz, J. (2017). Unsupervised image-to-image translation networks. arXiv preprint arXiv:1703.00848.
Liu, M. Y., & Tuzel, O. (2016). Coupled generative adversarial networks. arXiv preprint arXiv:1606.07536.
Liu, Y., De Nadai, M., Yao, J., Sebe, N., Lepri, B., & Alameda-Pineda, X. (2020). GMM-UNIT: Unsupervised Multi-Domain and Multi-Modal Image-to-Image Translation via Attribute Gaussian Mixture Modeling. arXiv preprint arXiv:2003.06788.
Zhou, Y. F., Jiang, R. H., Wu, X., He, J. Y., Weng, S., & Peng, Q. (2019). Branchgan: Unsupervised mutual image-to-image transfer with a single encoder and dual decoders. IEEE Transactions on Multimedia, 21(12), 3136-3149.
Guo, W., Wang, J., & Wang, S. (2019). Deep multimodal representation learning: A survey. IEEE Access, 7, 63373-63394.
Huang, X., Liu, M. Y., Belongie, S., & Kautz, J. (2018). Multimodal unsupervised image-to-image translation. In Proceedings of the European conference on computer vision (ECCV) (pp. 172-189).
Lee, H. Y., Tseng, H. Y., Huang, J. B., Singh, M., & Yang, M. H. (2018). Diverse image-to-image translation via disentangled representations. In Proceedings of the European conference on computer vision (ECCV) (pp. 35-51).
Lee, H.-Y.; Tseng, H.-Y.; Mao, Q.; Huang, J.-B.; Lu, Y.-D.; Singh, M.; Yang, M.-H. Drit++: Diverse image-to-image translation via disentangled representations. Int. J. Comput. Vis. 2020, 1–16.
Liu, A. H., Liu, Y. C., Yeh, Y. Y., & Wang, Y. C. F. (2018). A unified feature disentangler for multi-domain image translation and manipulation. arXiv preprint arXiv:1809.01361.
Lin, J., Chen, Z., Xia, Y., Liu, S., Qin, T., & Luo, J. (2019). Exploring explicit domain supervision for latent space disentanglement in unpaired image-to-image translation. IEEE transactions on pattern analysis and machine intelligence.