Combined feature compression encoding in image retrieval
Combined feature compression encoding in image retrieval
Recently, features extracted by convolutional neural networks (CNNs) are popularly used for image retrieval.In CNN representation, high-level features are usually chosen to represent the images in coarse-grained datasets, whilemid-level features are successfully applied to describe the images for fine-grained datasets. In this paper, we combinethese different levels of features as a joint feature to propose a robust representation that is suitable for both coarsegrained and fine-grained image retrieval datasets. In addition, in order to solve the problem that the efficiency of imageretrieval is influenced by the dimensionality of indexing, a unified subspace learning model named spectral regression(SR) is applied in this paper. We combine SR and the robust representation of the CNN to form a combined featurecompression encoding (CFCE) method. CFCE preserve the information without noticeably impacting image retrievalaccuracy. We find the tendency of the image retrieval performance to change the compressed dimensionality of features.We further discover a reasonable dimensionality of indexing in image retrieval. Experiments demonstrate that our modelprovides state-of-the-art performances across datasets.
___
- [1] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Communications of the ACM 2017; 60 (6): 84-90.
- [2] Ren S, He K, Girshick RB. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE
Transactions on Pattern Analysis and Machine Intelligence 2017; 39 (6): 1137-1149.
- [3] Ji S, Xu W, Yang M, Yu K. 3D convolutional neural networks for human action recognition. IEEE Transactions on
Pattern Analysis and Machine Intelligence 2013; 35 (1): 221-231.
- [4] He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE
Transactions on Pattern Analysis and Machine Intelligence 2015; 37 (9): 1904-1916.
- [5] Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Transactions on
Pattern Analysis and Machine Intelligence 2017; 39 (4): 640-651.
- [6] Bengio Y, Courville AC, Vincent P. Representation learning: a review and new perspectives. IEEE Transactions
on Pattern Analysis and Machine Intelligence 2013; 35 (8): 1798-1828.
- [7] Babenko A, Lempitsky VS. Aggregating deep convolutional features for image retrieval. arXiv: Computer Vision
and Pattern Recognition, 2015.
- [8] Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S. CNN features off-the-shelf: an astounding baseline for
recognition. In: IEEE Conference on Computer Vision and Pattern Recognition; 2014; Columbus, OH, USA. New
York, NY, USA: IEEE. pp. 806-813.
- [9] Zhao F, Huang Y, Wang L. Deep semantic ranking based hashing for multi-label image retrieval. In: IEEE
Conference on Computer Vision and Pattern Recognition; 2015; Boston, MA, USA. New York, NY, USA: IEEE.
pp. 1556-1564.
- [10] Bellman R. Dynamic programming. Science 1966; 153 (3731): 34-37.
- [11] Azizpour H, Razavian AS, Sullivan J, Maki A, Carlsson S. Factors of transferability for a generic ConvNet
representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 2016; 38 (9): 1790-1802.
- [12] Babenko A, Lempitsky V. Aggregating local deep features for image retrieval. In: IEEE International Conference
on Computer Vision; 2015; Boston, MA, USA. New York, NY, USA: IEEE. pp. 1269-1277.
- [13] Tao R, Gavves E, Snoek CGM. Locality in generic instance search from one example. In: IEEE Conference on
Computer Vision and Pattern Recognition; 2014; Columbus, OH, USA. New York, NY, USA: IEEE. pp. 2091-2098.
- [14] Yue-Hei Ng J, Yang F, Davis LS. Exploiting local features from deep networks for image retrieval. In: IEEE
Conference on Computer Vision and Pattern Recognition; 2015; Boston, MA, USA. New York, NY, USA: IEEE.
pp. 53-61.
- [15] Ke Y, Sukthankar R, Huston L, et al. Efficient near-duplicate detection and sub-image retrieval. In: ACM International Conference on Multimedia; 2004; New York, NY, USA. New York, NY, USA: ACM. p. 5.
- [16] Liu L, Fieguth P, Guo Y, Wang X, Pietikäinene M. Local binary features for texture classification: taxonomy and
experimental study. Pattern Recognition 2017; 62: 135-160.
- [17] Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. International
Journal of Computer Vision 2001; 42 (3): 145-175
- [18] Deng Y, Manjunath BS, Kenney CS. An efficient color representation for image retrieval. IEEE Transactions on
Image Processing 2001; 10 (1): 140-147.
- [19] Xia R, Pan Y, Lai H. Supervised hashing for image retrieval via image representation learning. In: Twenty-Eighth
AAAI Conference on Artificial Intelligence; 2014; Quebec City, Canada. pp. 2156-2162.
- [20] Yan K, Wang Y, Liang D. CNN vs. sift for image retrieval: alternative or complementary? In: ACM International
Conference on Multimedia; 2016; Amsterdam, the Netherlands. New York, NY, USA: ACM. pp. 407-411.
- [21] Bengio Y, Courville AC, Vincent P. Representation learning: a review and new perspectives. IEEE Transactions
on Pattern Analysis and Machine Intelligence 2013; 35 (8): 1798-1828.
- [22] Zhang Q, Nian Wu Y, Zhu SC. Interpretable convolutional neural networks. In: IEEE Conference on Computer
Vision and Pattern Recognition; 2018; Salt Lake City, UT, USA. New York, NY, USA: IEEE. pp. 8827-8836.
- [23] Elisha O, Dekel S. Function space analysis of deep learning representation layers. arXiv: Artificial Intelligence,
2017.
- [24] Razavian AS, Sullivan J, Carlsson S. Visual instance retrieval with deep convolutional networks. ITE Transactions
on Media Technology and Applications 2016; 4 (3): 251-258.
- [25] Zheng L, Zhao Y, Wang S. Good practice in CNN feature transfer. arXiv: Computer Vision and Pattern Recognition, 2016.
- [26] Zhang Q, Nian WY, Zhu SC. Interpretable convolutional neural networks. In: IEEE Conference on Computer Vision
and Pattern Recognition; 2018; Salt Lake City, UT, USA. New York, NY, USA: IEEE. pp. 8827-8836.
- [27] Perronnin F, Liu Y, Sánchez J. Large-scale image retrieval with compressed fisher vectors. In: IEEE Computer
Society Conference on Computer Vision and Pattern Recognition; 2010; San Francesco, CA, USA. New York, NY,
USA: IEEE. pp. 3384-3391.
- [28] Jégou H, Douze M, Schmid C. Aggregating local descriptors into a compact image representation. In: IEEE
Conference on Computer Vision and Pattern Recognition; 2010; San Francesco, CA, USA. New York, NY, USA:
IEEE. pp. 3304-3311.
- [29] Sivic J, Zisserman A. Video Google: a text retrieval approach to object matching in videos. In: International
Conference on Computer Vision; 2003; Nice, France. New York, NY, USA: IEEE. pp. 1470-1477.
- [30] Salakhutdinov R, Hinton G. Semantic hashing. International Journal of Approximate Reasoning 2009; 50 (7): 969-
978.
- [31] Carreira-Perpinán MA, Raziperchikolaei R. Hashing with binary autoencoders. In: IEEE Conference on Computer
Vision and Pattern Recognition; 2015; Boston, MA, USA. New York, NY, USA: IEEE. pp. 557-566.
- [32] Yang HF, Lin K, Chen CS. Supervised learning of semantics-preserving hash via deep convolutional neural networks.
IEEE Transactions on Pattern Analysis and Machine Intelligence 2018; 40 (2): 437-451.
- [33] Cai D, He X, Han J. Spectral regression: a unified approach for sparse subspace learning. In: IEEE International
Conference on Data Mining; 2007; Omaha, NE, USA. New York, NY, USA: IEEE. pp. 73-82.
- [34] Clevert D, Unterthiner T, Hochreiter S. Fast and accurate deep network learning by exponential linear units (ELUs).
arXiv: International Conference on Learning Representations, 2016.
- [35] Cai D, He X, Han J. Spectral regression: a unified approach for sparse subspace learning. In: IEEE International
Conference on Data Mining; 2007; Omaha, NE, USA. New York, NY, USA: IEEE. pp. 73-82.
- [36] Fei-Fei L, Fergus R, Perona P. Learning generative visual models from few training examples: an incremental
Bayesian approach tested on 101 object categories. Computer Vision and Image Understanding 2007; 106 (1):
59-70.
- [37] Khosla A, Jayadevaprakash N, Yao B. Novel dataset for fine-grained image categorization: Stanford dogs. In: CVPR
Workshop on Fine-Grained Visual Categorization; 2011; Colorado Springs, CO, USA: . New York, NY, USA: IEEE.
- [38] Quattoni A, Torralba A. Recognizing indoor scenes. In: IEEE Conference on Computer Vision and Pattern
Recognition; 2009; Miami Beach, FL, USA. New York, NY, USA: IEEE. pp. 413-420.
- [39] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: International
Conference on Learning Representations, 2015.
- [40] Babenko A, Slesarev A, Chigorin A. Neural codes for image retrieval. In: European Conference on Computer Vision;
2014; Zurich, Switzerland. Berlin, Germany: Springer. pp. 584-599.
- [41] Perronnin F, Dance C. Fisher kernels on visual vocabularies for image categorization. In: IEEE Conference on
Computer Vision and Pattern Recognition; 2007; Minneapolis, MN, USA. New York, NY, USA: IEEE. pp. 1-8.