A supervised learning approach for detecting erroneous samples in embeddings

A supervised learning approach for detecting erroneous samples in embeddings

Visualizing multidimensional data has been a crucial task in recent years regarding the growing amount of data from various sources. To achieve this, dimensionality reduction algorithms have been used to reduce the number of dimensions for visualization of the data on a screen. However, these algorithms may fail to faithfully represent high dimensional data in lower dimensions and eventually lead to erroneous visualizations. In this work, we propose an error detection algorithm for dimensionality reduction algorithms based on recently developed error prediction algorithms for medical image registration. The proposed algorithm matches the neighborhoods of high and low dimensional data with different similarity measures and predicts the errors using a random forest classifier. The results on three datasets show that the proposed algorithm can successfully detect errors with an accuracy up to 86% and area under the curve score of 0.81

___

  • 1] Mahfouz A, van de Giessen M, van der Maaten L, Huisman S, Reinders M et al. Visualizing the spatial gene expression organization in the brain through non-linear similarity embeddings. Methods 2015; 73: 79-89. doi: 10.1016/j.ymeth.2014.10.004.
  • [2] Rappoport N, Shamir R. Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Research 2018; 46(20): 10546-10562. doi: 10.1093/nar/gky889.
  • [3] Becht E, McInnes L, Healy J, Dutertre CA, Kwok IW et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nature Biotechnology 2019; 37(1): 10546-10562. doi: 10.1093/nar/gky889
  • 4] Srinivasa PR, Chandra MPVSSR. Dimensionality reduced local directional pattern (DR-LDP) for face recognition. Expert Systems with Applications 2016; 63: 66-73. doi: 10.1016/j.eswa.2016.06.031.
  • [5] Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J et al. Human-level control through deep reinforcement learning. Nature 2015; 518(7540): 529-533. doi: 10.1038/nature14236.
  • [6] Van Der Maaten L, Postma E, Van den Herik J. Dimensionality reduction: a comparative review. Journal of Machine Learning Research 2009; 10: 66-71.
  • [7] Cunningham JP, Ghahramani Z. Linear dimensionality reduction: Survey, insights, and generalizations. Journal of Machine Learning Research 2015; 16(1): 2859-2900.
  • [8] Hotelling H. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology 1933; 24(6): 417-441. doi: 10.1037/h0071325.
  • [9] Kim JO, Mueller CW. Factor analysis: Statistical methods and practical issues. New York, NY, USA: Sage Publications, 1978.
  • [10] Cox, TF, Cox MA. Multidimensional scaling. UK: Chapman and Hall/CRC, 2000. [11] Hinton GE, Roweis ST. Stochastic neighbor embedding. In: Advances in Neural Information Processing Systems; Vancouver, BC, Canada; 2003. pp. 857-864.
  • [12] Maaten LVD, Hinton G. Visualizing data using t-SNE. Journal of Machine Learning Research 2008; 9: 2579-2605.
  • [13] Roweis, ST, Lawrence KS. Nonlinear dimensionality reduction by locally linear embedding. Science 2000; 290(5500): 2323-2326. doi: 10.1126/science.290.5500.2323.
  • [14] Sammon JW. A nonlinear mapping for data structure analysis. IEEE Transactions on Computers 1969; 100(5): 401-409. doi: 10.1109/T-C.1969.222678.
  • [15] Tenenbaum, JB, De Silva V, John CL. A global geometric framework for nonlinear dimensionality reduction. Science 2000; 290(5500): 2319-2323. doi: 10.1126/science.290.5500.2319.
  • [16] Van der Maaten L. Accelerating t-SNE using tree-based algorithms. Journal of Machine Learning Research 2014; 15(93): 3221-3245.
  • [17] Chen L, Buja A. Local multidimensional scaling for nonlinear dimension reduction, graph drawing, and proximity analysis. Journal of the American Statistical Association 2009; 104(485): 209-219. doi: 10.1198/jasa.2009.0111.
  • [18] Venna J, Kaski S. Local multidimensional scaling. Neural Networks 2006; 19(6): 889-899.
  • [19] Lee JA, Verleysen M. Nonlinear dimensionality reduction. Berlin, Germany: Springer Science and Business Media, 2007.
  • [20] Lee JA, Verleysen M. Quality assessment of dimensionality reduction: Rank-based criteria. Neurocomputing 2009; 72(7-9): 1431-1443. doi: 10.1016/j.neucom.2008.12.017.
  • [21] Mokbel B, Lueks W, Gisbrecht A, Hammer B. Visualizing the quality of dimensionality reduction. Neurocomputing 2013; 112: 109-123. doi: 10.1016/j.neucom.2012.11.046.
  • [22] Van der Maaten L, Hinton G. Visualizing non-metric similarities in multiple maps. Machine Learning 2012; 87(1): 33-55. doi: 10.1007/s10994-011-5273-4.
  • [23] Sokooti H, Saygili G, Glocker B, Lelieveldt BP, Staring M. Accuracy estimation for medical image registration using regression forests. In: International Conference on Medical Image Computing and Computer-Assisted Intervention; Athens, Greece; 2016. pp. 107-115.
  • [24] Saygili G, Staring M, Hendriks EA. Confidence estimation for medical image registration based on stereo confidences. IEEE Transactions on Medical Imaging 2015; 35(2): 539-549. doi: 10.1109/TMI.2015.2481609
  • [25] Saygili G. Local-search based prediction of medical image registration error. In: Medical Imaging: Image Perception, Observer Performance, and Technology Assessment; Houston, TX, USA; 2018. pp. 105771F(1-6)
  • 26] Sokooti H, Saygili G, Glocker B, Lelieveldt BP, Staring M. Quantitative error prediction of medical image registration using regression forests. Medical Image Analysis 2019; 56: 110-121. doi: 10.1016/j.media.2019.05.005
  • [27] Breiman L. Random forests. Machine Learning 2001; 45(1): 5-32. [28] LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE 1998; 86(11): 2278-2324. doi: 10.1109/5.726791
  • [29] Özdemir A, Barshan B. Detecting falls with wearable sensors using machine learning techniques. Sensors 2014; 14(6): 10691-10708. doi: 10.3390/s140610691
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: Yılda 6 Sayı
  • Yayıncı: TÜBİTAK