A Hybrid Framework for Matching Printing Design Files to Product Photos

We propose a real-time image matching framework, which is hybrid in the sense that it uses both hand - crafted features and deep features obtained from a well -tuned deep convolutional network. The matching problem, which we concentrate on, is specific to a certain application, that is, printing design to product photo matching. Printing designs are any kind of template image files, created using a design tool, thus are perfect image signals. For this purpose, we create an image set that includes printing design and corresponding product photo pairs with collaboration of an actual printing facility. Using this image set, we benchmark various hand-crafted (SIFT, SURF, GIST, HoG) and deep features for matching performance. Various segmentation algorithms including deep learning based segmentation methods are applied to select feature regions. Results show that SIFT features selected from deep segmented regions achieves up to 96% product photo to design file matching success in our dataset. We propose a framework in which deep learning is utilized with highest contribution, but without disabling real-time operation using an ordinary desktop computer.

PDF

___

[1] T. Dharani, I. L. Aroquiaraj, "A survey on content based image retrieval," International Conference on Pattern Recognition, Informatics and Mobile Engineering, Tamilnadu, India, pp 485-490, 2013.
[2] Y. Liu, D. Zhang, G. Lu, W.Y. Ma, "A survey of content-based image retrieval with high-level semantics," Pattern Recognition, vol. 40. 1, 2007, pp 262 - 282.
[3] J. Sivic, A. Zisserman, "Video Google: a text retrieval approach to object matching in videos," International Conference on Computer Vision, 9th IEEE, Nice, France, vol. 2, pp 1470-1477, 2003.
[4] H. Wang, Y. Cai, Y. Zhang, H. Pan, W. Lv, H. Han, "Deep learning for image retrieval: What works and what doesn't," International Conference on Data Mining Workshop, Washington, DC, US, pp 1576-1583, 2015.
[5] J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, H. Lipson, "Understanding neural networks through deep visualization," Deep Learning Workshop, International Conference on Machine Learning, Lille 2015, pp 2015.
[6] K. Simonyan, A. Zisserman, "Very deep convolutional networks for large-scale image recognition" International Conference on Learning Representations Workshops, San Diego, CA, US, 2015.
[7] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, "Going deeper with convolutions," IEEE Conference on Computer Vision and Pattern Recognition, ,Boston, MA, US, pp 1-9, June 2015.
[8] A. Krizhevsky, I. Sutskever, G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, NV, US, pp 1097-1105, 2012.
[9] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun, "Overfeat: Integrated recognition, localization and detection using convolutional networks," International Conference on Learning Representations, ICLR, Banff, Canada, 2014.
[10] A. Babenko, A. Slesarev, A. Chigorin, V. Lempitsky, "Neural codes for image retrieval," European Conference in Computer Vision, Zurich, Switzerland, pp 584-599, 2014.
[11] V. Chandrasekhar, J. Lin, O. Morère, H. Goh, A. Veillard, "A practical guide to CNNs and fisher vectors for image instance retrieval," Signal Processing, vol. 128, pp 426-439, 2016.
[12] I. Melekhov, J. Kannala, and E. Rahtu, "Siamese network features for image matching," International Conference on Pattern Recognition (ICPR), Cancún, Mexico, pp 378-383, 2016.
[13] Y. Taigman, M. Yang, M. Ranzato, L. Wolf, "DeepFace: Closing the gap to human-level performance in face verification," IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, US, pp 1701-1708, 2014.
[14] T. Lin, Y. Cui, S. Belongie, J. Hays, "Learning deep representations for ground-to-aerial geolocalization," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, US, pp 5007-5015, 2015.
[15] D. Cai, X. Gu, C. Wang, "A revisit on deep hashings for large-scale content based image retrieval," ArXiv.CoRR, vol. abs/1711.06016, pp 1-11, 2017.
[16] R. Datta, J. Li, J. Z. Wang, "Content-based image retrieval: Approaches and trends of the new age”, ACM SIGMM International Workshop on Multimedia Information Retrieval, New York, NY, USA, pp. 253-262, 2005.
[17] P. Clough, H. Müller, T. Deselaers, M. Grubinger, T. Martin Lehmann, J. R. Jensen, W. Hersh, "The CLEF 2005 Cross-Language Image Retrieval track," International Conference of the Cross-Language Evaluation Forum for European Languages, Vienna, Austria, vol. 1171, pp. 535-557, 2005.
[18] G. Schaefer, "UCID-RAW - a colour image database in raw format," European Congress on Computational Methods in Applied Sciences and Engineering, Porto, Portugal, pp 179-184, 2017.
[19] T. Ahmad, P. Campr, M. Cadik, G. Bebis, "Comparison of semantic segmentation approaches for horizon/sky line detection," International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, US, pp 4436-4443, 2017.
[20] F. Jiang, A. Grigorev, S. Rho, Z. Tian, Y. Fu, W. Jifara, A. Khan, S. Liu, "Medical image semantic segmentation based on deep learning," Neural Computing and Applications, vol 29. 5, pp 1257–1265, 2018.
[21] M. Siam, S. Elkerdawy, M. Jägersand, S. Yogamani, "Deep semantic segmentation for automated driving: Taxonomy, roadmap and challenges," IEEE International Conference on Intelligent Transportation Systems, Yokohama, Japan, pp. 1-8, 2017.
[22] I. Ulku, E. Akagunduz, "A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images," ArXiv.Corr, vol. abs/1912.10230, pp 1-20, 2019.
[23] M. H. Saffar, M. Fayyaz, M. Sabokrou, M. Fathy, "Semantic video segmentation: A review on recent approaches," ArXiv.Corr, vol. abs/ 1806.06172, pp 1-24, 2018.
[24] H. Yu, Z. Yang, L. Tan, Y. Wang, W. Sun, M. Sun, Y. Tang, "Methods and datasets on semantic segmentation: A review," Neurocomputing, vol. 304, pp. 82 - 103, 2018.
[25] Y. Guo, Y. Liu, T. Georgiou, M. S. Lew, "A review of semantic segmentation using deep neural networks," International Journal of Multimedia Information Retrieval, vol. 7, pp. 87-93, Jun 2018.
[26] A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez, and J. G. Rodríguez, "A review on deep learning techniques applied to semantic segmentation," ArXiv.Corr, vol. abs/1704.06857, pp 1-19, 2017.
[27] E. Shelhamer, J. Long, T. Darrell, "Fully convolutional networks for semantic segmentation," IEEE Transactons on Pattern Analysis and Machine Intelligence, vol. 39, pp. 640-651, Apr. 2017.
[28] A. Vedaldi, K. Lenc, "MatConvNet: Convolutional neural networks for Matlab," ACM International Conference on Multimedia, Brisbane Australia, pp. 689-692, 2015.
[29] H. Zhang, J. E. Fritts, and S. A. Goldman, "Image segmentation evaluation: A survey of unsupervised methods," Computer Vision and Image Understanding, vol. 110. 2, pp. 260 - 280, 2008.
[30] J. Harel, C. Koch, and P. Perona, "Graph-based visual saliency," Advances in Neural Information Processing Systems, Vancouver, Canada, pp 545-552, 2006.
[31] Y. Xu, J. Li, J. Chen, G. Shen, Y. Gao, "A novel approach for visual saliency detection and segmentation based on objectness and top-down attention," International Conference on Image, Vision and Computing, Chengdu, China, pp 361-365, 2017.
[32] J. Lankinen, V. Kangas, J. Kamarainen, "A comparison of local feature detectors and descriptors for visual object categorization by intra-class repeatability and matching," International Conference on Pattern Recognition, Tsukuba, Japan, pp 780-783, 2012.
[33] A. Oliva, A. Torralba, "Modeling the shape of the scene: A holistic representation of the spatial envelope," International Journal of Computer Vision, vol. 42, pp. 145-175, 2001.
[34] N. Dalal, B. Triggs, "Histograms of oriented gradients for human detection," IEEE Computer Vision and Pattern Recognition, San Diego, CA, US, pp. 886-893, 2005.
[35] D. G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, vol. 60, pp. 91-110, 2004.
[36] H. Bay, T. Tuytelaars, L. Van Gool, "SURF: Speeded up robust features," European Conference on Computer Vision, Graz, Austria, pp 404-417, 2006.
[37] K. He, X. Zhang, S. Ren, J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual recognition," Arxiv.CoRR, vol. abs/1406.4729, pp 1-13, 2014.
[38] C. Hentschel, H. Sack, "Does one size really fit all?: Evaluating classifiers in bag-of-visual-words classification," International Conference on Knowledge Technologies and Data-driven Business, New York, NY, USA, pp. 7:1-7:8, 2014.
[39] L. I. Kuncheva, "On the optimality of naïve bayes with dependent binary features," Pattern Recognition Letters, vol. 27, pp. 830-837, 2006.
[40] K. Lenc, A. Vedaldi, "Understanding image representations by measuring their equivariance and equivalence," IEEE Conference On Computer Vision and Pattern Recognition, Boston, MA, US, pp. 991-999, 2015.