Attentive Sequential Auto-Encoding Towards Unsupervised Object-centric Scene Modeling

Attentive Sequential Auto-Encoding Towards Unsupervised Object-centric Scene Modeling

This paper describes an unsupervised sequential auto-encoding model targeting multi-object scenes. The proposed model uses an attention-based formulation, with reconstruction-driven losses. The main model relies on iteratively writing regions onto a canvas, in a differentiable manner. To enforce attention to objects and/or parts, the model uses a convolutional localization network, a region level bottleneck auto-encoder and a loss term that encourages reconstruction within a limited number of iterations. An extended version of the model incorporates a background modeling component that aims at handling scenes with complex backgrounds. The model is evaluated on two separate datasets: a synthetic dataset that is constructed by composing MNIST digit instances together, and the MS-COCO dataset. The model achieves high reconstruction ability on MNIST based scenes. The extended model shows promising results on the complex and challenging MS-COCO scenes.

___

  • Goodfellow I. J., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., & Bengio Y. (2014). Generative Adversarial Networks. Advances in Neural Information Processing Systems.
  • Arjovsky M., Chintala S., & Bottou L. (2017). Wasserstein GAN. ArXiv:1701.07875 [Cs, Stat].
  • Karras T., Laine S., & Aila T. (2019). A Style-Based Generator Architecture for Generative Adversarial Networks. Proc. CVPR.
  • Kingma Diederik P., & Welling, M. (2014). Auto-Encoding Variational Bayes. International Conference on Learning Representations.
  • Rezende D. J., Mohamed S., & Wierstra D. (2014). Stochastic Backpropagation and Approximate Inference in Deep Generative Models. ArXiv:1401.4082.
  • Li Y., Swersky K., & Zemel R. (2015). Generative Moment Matching Networks. PMLR.
  • Dinh L., Sohl-Dickstein J., & Bengio S. (2016). Density estimation using Real NVP.
  • Kobyzev I., Prince S. J., & Brubaker M. A. (2020). Normalizing flows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intelligence, 43(11), 3964-3979.
  • Kingma Durk P. & Dhariwal P. (2018). Glow: Generative Flow with Invertible 1x1 Convolutions. In Advances in Neural Information Processing Systems 31 (pp. 10215–10224).
  • Behrmann J., Grathwohl W., Chen R. T. Q., Duvenaud, D., & Jacobsen, J.-H. (2019). Invertible Residual Networks. ArXiv:1811.00995 [Cs, Stat].
  • Köhler J., Klein L. & Noé F. (2020). Equivariant Flows: exact likelihood generative learning for symmetric densities. ArXiv:2006.02425 [Physics, Stat].
  • San-Roman R., Nachmani E., & Wolf L. (2021). Noise estimation for generative diffusion models. ArXiv Preprint ArXiv:2104.02600.
  • Huang C.-W., Lim J. H., & Courville A. C. (2021). A variational perspective on diffusion-based generative models and score matching. Advances in Neural Information Processing Systems, 34.
  • Liu K., Tang W., Zhou F., & Qiu G. (2019, October). Spectral Regularization for Combating Mode Collapse in GANs. ICCV.
  • Brock A., Donahue J., & Simonyan K. (2018). Large Scale GAN Training for High Fidelity Natural Image Synthesis. ArXiv:1809.11096 [Cs, Stat].
  • Karras T., Laine S., Aittala M., Hellsten J., Lehtinen J., & Aila T. (2020). Analyzing and Improving the Image Quality of StyleGAN. Proc. CVPR.
  • Karnewar A., & Wang O. (2020). MSG-GAN: Multi-Scale Gradients for Generative Adversarial Networks. CVPR.
  • Karras T., Aittala M., Laine S., Härkönen E., Hellsten J., Lehtinen J., & Aila T. (2021). Alias-free generative adversarial networks. Advances in Neural Information Processing Systems, 34.
  • Casanova A., Careil M., Verbeek J., Drozdzal M., & Romero-Soriano, A. (2021, November). Instance-Conditioned GAN. NeurIPS.
  • Zhang Y., Ling H., Gao J., Yin K., Lafleche J.-F., Barriuso A., Torralba A., & Fidler S. (2021). DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort. ArXiv:2104.06490 [Cs].
  • Gregor K., Danihelka I., Graves A., Rezende D. J., & Wierstra D. (2015). DRAW: A Recurrent Neural Network For Image Generation. ArXiv:1502.04623 [Cs].
  • Lin T.-Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Dollár P., & Zitnick C. L. (2014). Microsoft COCO: Common Objects in Context. 740–755.
  • van der Maaten L., & Hinton G. (2008). Visualizing Data using t-SNE . Journal of Machine Learning Research, 9, 2579–2605.
  • Felzenszwalb P. F., & Huttenlocher D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.
  • Cour T., Benezit F., & Shi J. (2005). Spectral segmentation with multiscale graph decomposition. IEEE Conference on Computer Vision and Pattern Recognition, 2, 1124–1131.
  • Comaniciu D., & Meer P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.
  • Arbelaez P., Maire M., Fowlkes C., & Malik J. (2009). From contours to regions: An empirical evaluation. IEEE Conference on Computer Vision and Pattern Recognition, 2294–2301.
  • Pont-Tuset J., Arbelaez P., Barron J. T., Marques F., & Malik J. (2016). Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(1), 128–140.
  • Xia, X. & Kulis B. (2017). W-net: A deep model for fully unsupervised image segmentation. ArXiv Preprint ArXiv:1711.08506.
  • Karras T., Aila T., Laine S., & Lehtinen J. (2017). Progressive growing of GANs for improved quality, stability, and variation. Proc. Int. Conf. Learn. Represent.
Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji-Cover
  • Yayın Aralığı: 4
  • Başlangıç: 2013
  • Yayıncı: Gazi Üniversitesi , Fen Bilimleri Enstitüsü
Sayıdaki Diğer Makaleler

Beaulieu-Xie Gölgelemeli Sönümlü Kanallar İçin Efektif Kanal Kapasite İfadesinin Kuramsal Türetimi

Mehmet BİLİM, Yasin KABALCI

Derin Öğrenme Yöntemleri İle Konuşmadan Duygu Tanıma Üzerine Bir Literatür Araştırması

Emrah DİKBIYIK, Önder DEMİR, Buket DOĞAN

Investigation of Engineering Properties of Self-Compacting Concretes Produced with Different Mineral Additives

Emriye ÇINAR RESULOĞULLARI, Behcet DÜNDAR

Diyarbakır Lice'de meydana gelen LPG tankeri kazasının fiziksel etkilerinin modellenmesi

Ahmet DURMUŞ, Saliha ÇETİNYOKUŞ

Yüksek Karbonlu Toz Metalurjisi Çeliklerinin Su Verme ve Temperleme Davranışlarına Ham Yoğunluk Etkisinin İncelenmesi

Tolga YILMAZ, Dilan Zehra SEVİNDİROĞLU, Atakan SUVAY

Üretim Planlama ve Kontrol Süreçlerinde Dijital İkiz Teknolojisinin Kullanılması: Tekstil Sektöründe Bir Uygulama

Aysel KOÇAK, Aytaç YILDIZ

Attentive Sequential Auto-Encoding Towards Unsupervised Object-centric Scene Modeling

Yarkın Deniz ÇETİN, Ramazan Gökberk CİNBİŞ

Mixed Integer Programming Formulation for Time-Dependent Petrol Station Replenishment Problem: A Real-Life Case in İstanbul

Ertuğrul AYYILDIZ, Alev TAŞKIN GÜMÜŞ

Al7075 Alaşımına İlave Edilen Al-5Ti-1B Tane İncelticinin Yaşlanma, Mikroyapı, Sertlik ve Korozif Özellikleri Üzerindeki Etkisi

Mete Berke YAMAN, Engin KOCAMAN, Barış AVAR

Design, Control and Automation of MHPP - An Experimental Setup

Hüseyin ALTINKAYA, Fatih Mehmet ULU