A Region Covariances-based Visual Attention Model for RGB-D Images

A Region Covariances-based Visual Attention Model for RGB-D Images

Existing computational models of visual attention generally employ simple image features such as color, intensity or orientation to generate a saliency map which highlights the image parts that attract human attention. Interestingly, most of these models do not process any depth information and operate only on standard two-dimensional RGB images. On the other hand, depth processing through stereo vision is a key characteristics of the human visual system. In line with this observation, in this study, we propose to extend two state-of-the-art static saliency models that depend on region covariances to process additional depth information available in RGB-D images. We evaluate our proposed models on NUS-3D benchmark dataset by taking into account different evaluation metrics. Our results reveal that using the additional depth information improves the saliency prediction in a statistically significant manner, giving more accurate saliency maps.

___

  • Y. Benjamini, and Y. Hochberg (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), pages 289-300.
  • A. Borji, and L. Itti (2013). State-of-the-art in Visual Attention Modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35(1), pages 185-207.
  • N. D. Bruce, and J. K. Tsotsos (2005). An attentional framework for stereo vision. In Proc. IEEE Canadian Conference on Computer and Robot Vision, pages 88-95.
  • N. Bruce, and J. Tsotsos (2006). Saliency based on information maximization. In Proc. Advance in Neural Information Processing Systems (NIPS), pages 155-162.
  • N. Bruce, and J. Tsotsos (2009). Saliency, attention, and visual search: An information theoretic approach. Journal of Vision, Vol. 9(3):5, pages 1-24.
  • Z. Bylinskii, T. Judd, A. Borji, L. Itti, F. Durand, A. Oliva, and A. Torralba (accessed by 2016). MIT Saliency Benchmark, http://saliency.mit.edu.
  • Z. Bylinskii, T. Judd, A. Oliva, A. Torralba, and F. Durand (2016). What do different evaluation metrics tell us about saliency models?. arXiv preprint arXiv:1604.03605.
  • E. Erdem, and A. Erdem (2013). Visual saliency estimation by nonlinearly integrating features using region covariances. Journal of Vision, Vol. 13(4):1, pages 1-20.
  • W. Föerstner, and B. Moonen (1999). A metric for covariance matrices (Tech. Rep.). Department of Geodesy and Geoinformatics, Stuttgart University, Germany.
  • D. Gao, and N. Vasconcelos (2007). Bottom-up saliency is a discriminant process. In Proc. IEEE International Conference on Computer Vision (ICCV), pages 1-6.
  • S. Goferman, L. Zelnik-Manor, and A. Tal (2010). Context-aware saliency detection. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2376-2383.
  • J. Harel, C. Koch, and P. Perona (2007). Graph-based visual saliency. In Proc. Advance in Neural Information Processing Systems (NIPS), pages 545-552.
  • X. Hong, H. Chang, S. Shan, X. Chen, and W. Gao (2009). Sigma Set: A small second order statistical region descriptor. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1802-1809.
  • X. Hou, and L. Zhang (2007). Saliency detection: A spectral residual approach. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1-8.
  • B. Hu, R. Kane-Jackson, and E. Niebur (2016). A proto-object based saliency model in three-dimensional space. Vision Research, Vol. 119, pages 42-49.
  • H. Hügli, T. Jost, and N. Ouerhani (2005). Model performance for visual attention in real 3d color scenes. In Proc. Artificial intelligence and knowledge engineering applications: A bioinspired approach, pages 469-478
  • I. Iatsun, M.-C. Larabin, C. Fernandez-Maloigne (2015). A visual attention model for stereoscopic 3D images using monocular cues. Signal Processing: Image Communication, Vol. 38, pages 70-83.
  • L. Itti, C. Koch, and E. Niebur (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20(11), pages 1254-1259.
  • T. Jost, N. Ouerhani, R. von Wartburg, R., Müri, and H. Hügli (2004). Contribution of depth to visual attention: Comparison of a computer model and human. In Proc. Early Cognitive Vision Workshop, pages 1-4.
  • T. Judd, K. Ehinger, F. Durand, and A. Torralba (2009). Learning to predict where humans look. In Proc. IEEE International Conference on Computer Vision (ICCV), pages 2106-2113.
  • S. S. S. Kruthiventi, V. Gudisa, J. H. Dholakiya, and R. V. Babu (2016). Saliency Unified: A Deep Architecture for Simultaneous Eye Fixation Prediction and Salient Object Segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5781-5790.
  • C. Lang, T. V. Nguyen, H. Katti, K. Yadati, M. Kankanhalli, and S. Yan (2012). Depth matters: Influence of depth cues on visual saliency. In Proc. European Conference of Computer Vision (ECCV), pages 101-115.
  • C.-Y. Ma, and H.-M. Hang (2015). Learning-based saliency model with depth information. Journal of Vision, Vol. 15(6):19, pages. 1-22.
  • R. Margolin, A. Tal, and L. Zelnik-Manor (2013). What makes a patch distinct? In Proc IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1139-1146.
  • N. Ouerhani, and H. Hügli (2000). Computing visual attention from scene depth. In Proc. International Conference on Pattern Recognition, pages 375-378.
  • J. Pan, E. Sayrol, X. Giro-i Nieto, K. McGuinness, and N. O’Connor (2016). Shallow and deep convolutional networks for saliency prediction. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 598-606.
  • S. Ramenahalli, and E. Niebur (2013). Computing 3D saliency from a 2D image. In Proc. Annual conference on information sciences and systems (CISS), pages 1-5.
  • A. F. Russell, S. Mihalas, R. von der Heydt, E., Niebur, and R. Etienne-Cummings (2014). A model of proto-object based saliency. Vision Research, Vol. 94, pages 1-15.
  • H. J. Seo, and P. Milanfar (2009). Static and space-time visual saliency detection by self-resemblance. Journal of Vision, Vol. 9(12):15, pages 1-27.
  • O. Tuzel, F., Porikli, and P. Meer, (2006). Region covariance: A fast descriptor for detection and classification. In Proc. European Conference of Computer Vision (ECCV), pages 589-600.
  • J. Wang, M. P. DaSilva, P. LeCallet, and V. Ricordel (2013). Computational model of stereoscopic 3d visual saliency. IEEE Transactions on Image Processing, Vol. 22(6), pages 2151-2165.
  • L. Zhang, M. H. Tong, T. K. Marks, H. Shan, and G. W. Cottrell (2008). SUN: A Bayesian framework for saliency using natural statistics. Journal of Vision, Vol. 8(7):32, pages 1-20.
  • Y. Zhang, G. Jiang, M. Yu, and K. Chen (2010). Stereoscopic visual attention model for 3D video. In Proc. Advances in Multimedia Modeling, pages 314–324.