| Peer-Reviewed

Scaling up an Unsupervised Image-to-Image Translation Framework from Basic to Complex Scenes

Received: 27 July 2021    Accepted: 31 August 2021    Published: 29 October 2021
Views:       Downloads:
Abstract

Unsupervised image-to-image translation methods have received a lot of attention in the last few years. Multiple techniques emerged to tackle the initial challenge from different perspectives. Some focus on learning as much as possible from the target-style using several images of that style for each translation while others make use of object detection in order to produce more realistic results on content-rich scenes. In this paper, we explore multiple frameworks that rely on different paradigms and assess how one of these that has initially been developed for single object translation performs on more diverse and content-rich images. Our work is based on an already existing framework. We explore its versatility by training it with a more diverse dataset than the one it was designed and tuned for. This helps understanding how such methods behave beyond their original application. We explore how to make the most out of the datasets despite our computational power limitations. We present a way to extend a dataset by passing it through an object detector. The latter provides us with new and diverse dataset classes. Moreover, we propose a way to adapt the framework in order to leverage the power of object detection by integrating it in the architecture as one can see in other methods.

Published in American Journal of Computer Science and Technology (Volume 4, Issue 4)
DOI 10.11648/j.ajcst.20210404.12
Page(s) 97-105
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Unsupervised Deep Learning, Image-To-Image Translation, Style Transfer, Object Detection, Dataset Augmentation

References
[1] Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, and Jan Kautz. Few-shot unsupervised image-to-image translation. In arxiv, 2019.
[2] Zhiqiang Shen, Mingyang Huang, Jianping Shi, Xiangyang Xue, and Thomas S Huang. Towards instance-level image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3683–3692, 2019.
[3] Deblina Bhattacharjee, Seungryong Kim, Guillaume Vizier, and Mathieu Salzmann. Dunit: Detection-based unsupervised image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4787–4796, 2020.
[4] Ming-Yu Liu, Thomas Breuel, and Jan Kautz. Unsupervised image-to-image translation networks. In Advances in neural information processing systems, pages 700–708, 2017.
[5] Aaron Gokaslan, Vivek Ramanujan, Daniel Ritchie, Kwang In Kim, and James Tompkin. Improving shape deformation in unsupervised image-to-image translation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 649–665, 2018.
[6] Sangwoo Mo, Minsu Cho, and Jinwoo Shin. Instagan: Instance-aware image-to-image translation. arXiv preprint arXiv:1812.10889, 2018.
[7] Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. Multimodal unsupervised image-to-image translation. In Proceedings of the European conference on computer vision (ECCV), pages 172–189, 2018.
[8] Liqian Ma, Xu Jia, Stamatios Georgoulis, Tinne Tuytelaars, and Luc Van Gool. Exemplar guided unsupervised image-to-image translation with semantic consistency. arXiv preprint arXiv:1805.11145, 2018.
[9] Kyungjune Baek, Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Hyunjung Shim. Rethinking the truly unsupervised image-to-image translation. arXiv preprint arXiv:2006.06500, 2020.
[10] Casey Chu, Andrey Zhmoginov, and Mark Sandler. Cyclegan, a master of steganography. arXiv preprint arXiv:1712.02950, 2017.
[11] Runfa Chen, Wenbing Huang, Binghui Huang, Fuchun Sun, and Bin Fang. Reusing discriminators for encoding: Towards unsupervised image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8168–8177, 2020.
[12] Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. Diverse imageto- image translation via disentangled representations. In Proceedings of the European conference on computer vision (ECCV), pages 35–51, 2018.
[13] Siyuan Li, Semih Gunel, Mirela Ostrek, Pavan Ramdya, Pascal Fua, and Helge Rhodin. Deformation-aware unpaired image translation for pose estimation on laboratory animals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13158–13168, 2020.
[14] Christos Papaioannidis, Vasileios Mygdalis, and Ioannis Pitas. Domain-translated 3d object pose estimation. IEEE Transactions on Image Processing, 29:9279–9291, 2020.
[15] Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. volume abs/1708.02002, 2017.
Cite This Article
  • APA Style

    Daniel Filipe Nunes Silva, Samuel Chassot, Luca Barras, Deblina Bhattacharjee, Sabine Süsstrunk. (2021). Scaling up an Unsupervised Image-to-Image Translation Framework from Basic to Complex Scenes. American Journal of Computer Science and Technology, 4(4), 97-105. https://doi.org/10.11648/j.ajcst.20210404.12

    Copy | Download

    ACS Style

    Daniel Filipe Nunes Silva; Samuel Chassot; Luca Barras; Deblina Bhattacharjee; Sabine Süsstrunk. Scaling up an Unsupervised Image-to-Image Translation Framework from Basic to Complex Scenes. Am. J. Comput. Sci. Technol. 2021, 4(4), 97-105. doi: 10.11648/j.ajcst.20210404.12

    Copy | Download

    AMA Style

    Daniel Filipe Nunes Silva, Samuel Chassot, Luca Barras, Deblina Bhattacharjee, Sabine Süsstrunk. Scaling up an Unsupervised Image-to-Image Translation Framework from Basic to Complex Scenes. Am J Comput Sci Technol. 2021;4(4):97-105. doi: 10.11648/j.ajcst.20210404.12

    Copy | Download

  • @article{10.11648/j.ajcst.20210404.12,
      author = {Daniel Filipe Nunes Silva and Samuel Chassot and Luca Barras and Deblina Bhattacharjee and Sabine Süsstrunk},
      title = {Scaling up an Unsupervised Image-to-Image Translation Framework from Basic to Complex Scenes},
      journal = {American Journal of Computer Science and Technology},
      volume = {4},
      number = {4},
      pages = {97-105},
      doi = {10.11648/j.ajcst.20210404.12},
      url = {https://doi.org/10.11648/j.ajcst.20210404.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajcst.20210404.12},
      abstract = {Unsupervised image-to-image translation methods have received a lot of attention in the last few years. Multiple techniques emerged to tackle the initial challenge from different perspectives. Some focus on learning as much as possible from the target-style using several images of that style for each translation while others make use of object detection in order to produce more realistic results on content-rich scenes. In this paper, we explore multiple frameworks that rely on different paradigms and assess how one of these that has initially been developed for single object translation performs on more diverse and content-rich images. Our work is based on an already existing framework. We explore its versatility by training it with a more diverse dataset than the one it was designed and tuned for. This helps understanding how such methods behave beyond their original application. We explore how to make the most out of the datasets despite our computational power limitations. We present a way to extend a dataset by passing it through an object detector. The latter provides us with new and diverse dataset classes. Moreover, we propose a way to adapt the framework in order to leverage the power of object detection by integrating it in the architecture as one can see in other methods.},
     year = {2021}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Scaling up an Unsupervised Image-to-Image Translation Framework from Basic to Complex Scenes
    AU  - Daniel Filipe Nunes Silva
    AU  - Samuel Chassot
    AU  - Luca Barras
    AU  - Deblina Bhattacharjee
    AU  - Sabine Süsstrunk
    Y1  - 2021/10/29
    PY  - 2021
    N1  - https://doi.org/10.11648/j.ajcst.20210404.12
    DO  - 10.11648/j.ajcst.20210404.12
    T2  - American Journal of Computer Science and Technology
    JF  - American Journal of Computer Science and Technology
    JO  - American Journal of Computer Science and Technology
    SP  - 97
    EP  - 105
    PB  - Science Publishing Group
    SN  - 2640-012X
    UR  - https://doi.org/10.11648/j.ajcst.20210404.12
    AB  - Unsupervised image-to-image translation methods have received a lot of attention in the last few years. Multiple techniques emerged to tackle the initial challenge from different perspectives. Some focus on learning as much as possible from the target-style using several images of that style for each translation while others make use of object detection in order to produce more realistic results on content-rich scenes. In this paper, we explore multiple frameworks that rely on different paradigms and assess how one of these that has initially been developed for single object translation performs on more diverse and content-rich images. Our work is based on an already existing framework. We explore its versatility by training it with a more diverse dataset than the one it was designed and tuned for. This helps understanding how such methods behave beyond their original application. We explore how to make the most out of the datasets despite our computational power limitations. We present a way to extend a dataset by passing it through an object detector. The latter provides us with new and diverse dataset classes. Moreover, we propose a way to adapt the framework in order to leverage the power of object detection by integrating it in the architecture as one can see in other methods.
    VL  - 4
    IS  - 4
    ER  - 

    Copy | Download

Author Information
  • School of Computer and Communication Sciences, Ecole Polytechnique Federale de Lausanne (EPFL), Lausanne, Switzerland

  • School of Computer and Communication Sciences, Ecole Polytechnique Federale de Lausanne (EPFL), Lausanne, Switzerland

  • School of Computer and Communication Sciences, Ecole Polytechnique Federale de Lausanne (EPFL), Lausanne, Switzerland

  • School of Computer and Communication Sciences, Ecole Polytechnique Federale de Lausanne (EPFL), Lausanne, Switzerland

  • School of Computer and Communication Sciences, Ecole Polytechnique Federale de Lausanne (EPFL), Lausanne, Switzerland

  • Sections