Exhaustive hard triplet mining loss for Person Re-Identification

Exhaustive hard triplet mining loss for Person Re-Identification

Person reidentification (Re-ID) is an important task in computer vision and has many applications in video- based surveillance. Recently, the triplet loss has been popular in the deep learning framework for person Re-ID. It is particularly important to note that the selection of hard triplets has significant influence on the performance of the learned deep model. However, the existing triplet losses only focus on some specific forms of hard triplets, thus leading to weaker generalization capability. To address this issue, we propose a novel variant of the triplet loss, named exhaustive hard triplet mining loss (EHTM), which is able to deal with various forms of hard triplets in a comprehensive manner. Moreover, the proposed loss comprises a term to facilitate distinguishing different identities by directly narrowing intraclass distances and indirectly enlarging interclass distances. We also provide an effective training strategy to further enhance model performance. Extensive experiments on several benchmark datasets show that our method outperforms state-of-the-art approaches by a large margin

___

  • [1] Ahmed E, Jones M, Marks TK. An improved deep learning architecture for person re-identification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Boston, MA, USA; 2015. pp. 3908-3916.
  • [2] Chen W, Chen X, Zhang J, Huang K. Beyond triplet loss: A deep quadruplet network for person re-identification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Hawaii, USA; 2017. pp. 403-412.
  • [3] Cheng D, Gong Y, Zhou S, Wang J, Zheng N. Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In:The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Las Vegas, USA; 2016. pp. 1335-1344.
  • [4] Chen H, Geng M, Wang Y, Xiang T, Tian Y. Deep transfer learning for person re-identification. In: 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM); Xi’an, China; 2018. pp. 1-5. doi: 10.1109/BigMM.2018.8499067
  • [5] Li W, Zhao R, Xiao T, WangX. DeepReID: Deep filter pairing neural network for person re-identification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Columbus, USA; 2016. pp. 152-159.
  • [6] Shi H, Yang Y, Zhu X, Liao S, Lei Z.Embedding deep metric for person re-identification: a study against large variations. In: European Conference on Computer Vision; Amsterdam, the Netherlands; 2016. pp. 732-748.
  • [7] Varior RR, Haloi M, Wang G. Gated siamese convolutional neural network architecture for human re-identification. In: European Conference on Computer Vision; Amsterdam, the Netherlands; 2016. pp. 791-808.
  • [8] Wang F, Zuo W, Lin L, Zhang D, Zhang L. Joint learning of single-image and cross-image representations for person re-identification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Columbus, USA; 2016. pp. 1288-1296.
  • [9] Xiao T, Li H, Ouyang W, Wang X. Learning deep feature representations with domain guided dropout for person re-identification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Columbus, USA; 2016. pp. 1249-1258.
  • [10] Zhang W, Hu S, Liu K. Learning compact appearance representation for video-based person re- identification. IEEE Transactions on Circuits and Systems for Video Technology 2019; 29(8): 2442-2452. doi: 10.1109/TCSVT.2018.2865749.
  • [11] Zheng Z, Zheng L, Yang Y. Unlabeled samples generated by gan improve the person re-identification baseline in Vitro. In:The IEEE International Conference on Computer Vision (ICCV); Venice, Italy; 2017. pp. 3754-3762.
  • [12] Zheng Z, Zheng L, Yang Y. A Discriminatively Learned CNN Embedding for Person Reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 2017; 14(1): 1-20. doi: 10.1145/3159171.
  • [13] Schroff F, Kalenichenko D, Philbin J. FaceNet: A unified embedding for face recognition and clustering.In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Boston, USA; 2015. pp. 815-823.
  • [14] Weinberger KQ, Saul LK. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research 2009; 10(2): 207-244.
  • [15] Hermans A, Beyer L, Leibe B. In Defense of the Triplet Loss for Person Re-Identification. arXiv preprint arXiv:1703.07737 (2017).
  • [16] Xiao Q, Luo H, Zhang C. Margin sample mining loss: a deep learning based method for person re-identification. arXiv preprint arXiv:1710.00478 (2017).
  • [17] Wen Y, Zhang K, Li Z, Qiao Y. A discriminative feature learning approach for deep face recognition.In: European Conference on Computer Vision; Amsterdam, the Netherlands; 2016. pp. 499-515.
  • [18] Li W, Wang X. Locally aligned feature transforms across views. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Portland, USA; 2013. pp. 3594-3601.
  • [19] Khamis S, Kuo CH, Singh VK, Shet VD. Joint Learning for attribute-consistent person re-identification. In: European Conference on Computer Vision; Zurich, Switzerland; 2014. pp. 134-146.
  • [20] Koestinger M, Hirzer M, Wohlhart P, Roth PM, Bischof H. Large scale metric learning from equivalence constraints. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Rhode Island, USA; 2012. pp. 2288-2295.
  • [21] Li Z, Chang S, Liang F, Huang TS, Cao L. Learning locally-adaptive decision functions for person verification. In: The IEEE Computer Vision and Pattern Recognition (CVPR); Portland, USA; 2013. pp. 3610-3617
  • [22] Li D, Chen X, Zhang Z, Huang K. Learning deep context-aware features over body and latent parts for person re- identification. In: The IEEE Computer Vision and Pattern Recognition (CVPR); Hawaii, USA; 2017. pp. 384-393.
  • [23] Ustinova E, Ganin Y, Lempitsky V. Multi-region bilinear convolutional neural networks for person re-identification. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS); Agrigento, Italy; 2017. pp. 1-6.
  • [24] Ding S, Lin L, Wang G, Chao H. Deep feature learning with relative distance comparison for person re-identification. Pattern Recognition 2015; 48(10): 2993-3003.
  • [25] Shrivastava A, Gupta A, Girshick R. Training region-based object detectors with online hard example mining. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Columbus, USA ; 2016. pp. 761-769.
  • [26] Lin TY, Goyal P, Girshick R, He K, Dollar P. Focal loss for dense object detection. In:The IEEE International Conference on Computer Vision (ICCV); Venice, Italy; 2017. pp. 2999-3007.
  • [27] Zheng L, Shen L, Tian L, Wang S, Wang J et al. Scalable person re-identification: a benchmark. The IEEE International Conference on Computer Vision (ICCV); Santiago, Chile; 2015. pp. 1116-1124.
  • [28] Zheng L, Bie Z, Sun Y, Wang J, Su C. MARS: A video benchmark for large-scale person re-identification. In: European Conference on Computer Vision; Amsterdam, the Netherlands; 2016.pp. 868-884.
  • [29] Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C. Performance measures and a data set for multi-target, multi- camera tracking. In: European Conference on Computer Vision; Amsterdam, the Netherlands; 2016. pp. 17-35.
  • [30] Abadi M, Barham P, Chen J, Chen Z, Davis A. TensorFlow: a system for large-scale machine learning . In: 12th USENIX symposium on operating systems design and implementation (OSDI16); Savannah, GA, USA; 2016. pp. 265-283
  • [31] Zhong Z, Zheng L, Kang G, Li S, Yang Y. Random erasing data augmentation. In: AAAI Conference on Artificial Intelligence; New York, USA; 2020. pp. 13001-13008
  • 32] Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. 2014. [33] Lin Y, Zheng L ,Zheng Z, Wu Y, YangY. Improving person re-identification by attribute and identity learning. Pattern Recognition 2019; 95: 151-161. doi: 10.1016/j.patcog.2019.06.006
  • [34] Zhou Z, Huang Y, Wang W, Wang L, Tan T. See the Forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Hawaii, USA; 2017. pp. 6776-6785.
  • [35] Zhong Z, Zheng L, Cao D, Li S. Re-ranking person re-identification with k-reciprocal encoding. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Hawaii, USA; 2017. pp. 3652-3661.
  • [36] Sun Y, Zheng L, Deng W, Wang S. SVDNet for pedestrian retrieval. In: The IEEE International Conference on Computer Vision (ICCV); Venice, Italy;2017. pp. 3800-3808.
  • [37] Zhang Y, Xiang T, Hospedales TM, Lu H. Deep mutual learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Salt Lake City, USA; 2018. pp. 4320-4328