A supervised learning approach for detecting erroneous samples in embeddings

A supervised learning approach for detecting erroneous samples in embeddings

Visualizing multidimensional data has been a crucial task in recent years regarding the growing amount of data from various sources. To achieve this, dimensionality reduction algorithms have been used to reduce the number of dimensions for visualization of the data on a screen. However, these algorithms may fail to faithfully represent high dimensional data in lower dimensions and eventually lead to erroneous visualizations. In this work, we propose an error detection algorithm for dimensionality reduction algorithms based on recently developed error prediction algorithms for medical image registration. The proposed algorithm matches the neighborhoods of high and low dimensional data with different similarity measures and predicts the errors using a random forest classifier. The results on three datasets show that the proposed algorithm can successfully detect errors with an accuracy up to 86% and area under the curve score of 0.81

___

  • 1] Mahfouz A, van de Giessen M, van der Maaten L, Huisman S, Reinders M et al. Visualizing the spatial gene expression organization in the brain through non-linear similarity embeddings. Methods 2015; 73: 79-89. doi: 10.1016/j.ymeth.2014.10.004.
  • [2] Rappoport N, Shamir R. Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Research 2018; 46(20): 10546-10562. doi: 10.1093/nar/gky889.
  • [3] Becht E, McInnes L, Healy J, Dutertre CA, Kwok IW et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nature Biotechnology 2019; 37(1): 10546-10562. doi: 10.1093/nar/gky889
  • 4] Srinivasa PR, Chandra MPVSSR. Dimensionality reduced local directional pattern (DR-LDP) for face recognition. Expert Systems with Applications 2016; 63: 66-73. doi: 10.1016/j.eswa.2016.06.031.
  • [5] Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J et al. Human-level control through deep reinforcement learning. Nature 2015; 518(7540): 529-533. doi: 10.1038/nature14236.
  • [6] Van Der Maaten L, Postma E, Van den Herik J. Dimensionality reduction: a comparative review. Journal of Machine Learning Research 2009; 10: 66-71.
  • [7] Cunningham JP, Ghahramani Z. Linear dimensionality reduction: Survey, insights, and generalizations. Journal of Machine Learning Research 2015; 16(1): 2859-2900.
  • [8] Hotelling H. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology 1933; 24(6): 417-441. doi: 10.1037/h0071325.
  • [9] Kim JO, Mueller CW. Factor analysis: Statistical methods and practical issues. New York, NY, USA: Sage Publications, 1978.
  • [10] Cox, TF, Cox MA. Multidimensional scaling. UK: Chapman and Hall/CRC, 2000. [11] Hinton GE, Roweis ST. Stochastic neighbor embedding. In: Advances in Neural Information Processing Systems; Vancouver, BC, Canada; 2003. pp. 857-864.
  • [12] Maaten LVD, Hinton G. Visualizing data using t-SNE. Journal of Machine Learning Research 2008; 9: 2579-2605.
  • [13] Roweis, ST, Lawrence KS. Nonlinear dimensionality reduction by locally linear embedding. Science 2000; 290(5500): 2323-2326. doi: 10.1126/science.290.5500.2323.
  • [14] Sammon JW. A nonlinear mapping for data structure analysis. IEEE Transactions on Computers 1969; 100(5): 401-409. doi: 10.1109/T-C.1969.222678.
  • [15] Tenenbaum, JB, De Silva V, John CL. A global geometric framework for nonlinear dimensionality reduction. Science 2000; 290(5500): 2319-2323. doi: 10.1126/science.290.5500.2319.
  • [16] Van der Maaten L. Accelerating t-SNE using tree-based algorithms. Journal of Machine Learning Research 2014; 15(93): 3221-3245.
  • [17] Chen L, Buja A. Local multidimensional scaling for nonlinear dimension reduction, graph drawing, and proximity analysis. Journal of the American Statistical Association 2009; 104(485): 209-219. doi: 10.1198/jasa.2009.0111.
  • [18] Venna J, Kaski S. Local multidimensional scaling. Neural Networks 2006; 19(6): 889-899.
  • [19] Lee JA, Verleysen M. Nonlinear dimensionality reduction. Berlin, Germany: Springer Science and Business Media, 2007.
  • [20] Lee JA, Verleysen M. Quality assessment of dimensionality reduction: Rank-based criteria. Neurocomputing 2009; 72(7-9): 1431-1443. doi: 10.1016/j.neucom.2008.12.017.
  • [21] Mokbel B, Lueks W, Gisbrecht A, Hammer B. Visualizing the quality of dimensionality reduction. Neurocomputing 2013; 112: 109-123. doi: 10.1016/j.neucom.2012.11.046.
  • [22] Van der Maaten L, Hinton G. Visualizing non-metric similarities in multiple maps. Machine Learning 2012; 87(1): 33-55. doi: 10.1007/s10994-011-5273-4.
  • [23] Sokooti H, Saygili G, Glocker B, Lelieveldt BP, Staring M. Accuracy estimation for medical image registration using regression forests. In: International Conference on Medical Image Computing and Computer-Assisted Intervention; Athens, Greece; 2016. pp. 107-115.
  • [24] Saygili G, Staring M, Hendriks EA. Confidence estimation for medical image registration based on stereo confidences. IEEE Transactions on Medical Imaging 2015; 35(2): 539-549. doi: 10.1109/TMI.2015.2481609
  • [25] Saygili G. Local-search based prediction of medical image registration error. In: Medical Imaging: Image Perception, Observer Performance, and Technology Assessment; Houston, TX, USA; 2018. pp. 105771F(1-6)
  • 26] Sokooti H, Saygili G, Glocker B, Lelieveldt BP, Staring M. Quantitative error prediction of medical image registration using regression forests. Medical Image Analysis 2019; 56: 110-121. doi: 10.1016/j.media.2019.05.005
  • [27] Breiman L. Random forests. Machine Learning 2001; 45(1): 5-32. [28] LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE 1998; 86(11): 2278-2324. doi: 10.1109/5.726791
  • [29] Özdemir A, Barshan B. Detecting falls with wearable sensors using machine learning techniques. Sensors 2014; 14(6): 10691-10708. doi: 10.3390/s140610691
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: 6
  • Yayıncı: TÜBİTAK
Sayıdaki Diğer Makaleler

Mutant selection by using Fourier expansion

Savas TAKAN, Tolga AYAV

A subsynchronous resonance prevention for DFIG-based wind farmsfarms

Davood FATEH, Ali Akbar MOTI BIRJANDI, Josep M. GUERRERO

Chronic obstructive pulmonary disease severity analysis using deep learning on multi-channel lung sounds

Gökhan ALTAN, Ahmet GÖKÇEN, Yakup KUTLU

Low-profile folded dipole UHF RFID tag antenna with outer strip lines for metal mounting application

Fuad ERMAN, Effariza HANAFI, Eng-Hock LIM, Wan Amirul WAN MOHD MAHYIDDIN, Sulaiman Wadi HARUN, Mohamad Sofian ABU TALIP, Rawan SOBOH, Hassan UMAIR

Prominent quality attributes of crisis software systems: a literature review

Ahmet Arif AYDIN

Distribution network reconfiguration based on artificial network reconfiguration for variable load profile

Mohamad Sofian Abu TALIP, Hazlie MOKHLIS, Hesham Hanie YOUSSEF, Mohammad Al SAMMAN, Munir Azam MUHAMMAD, Nurulafiqah Nadzirah MANSOR

An improved memetic genetic algorithm based on a complex network as a solution to the traveling salesman problem

Hadi MOHAMMADI, Kamal MIRZAIE, Mohammad Reza MOLLAKHALILI MEYBODI

A fuzzy neural network for web service selection aimed at dynamic software rejuvenation

Kimia REZAEI KALANTARI, Ali EBRAHIMNEJAD, Homayun MOTAMENI

A novel grouping proof authentication protocol for lightweight devices: GPAPXR+

Ömer AYDIN, Gökhan DALKILIÇ, Cem KÖSEMEN

Low power and low phase noise VCO with dual current shaping for IoT applications

Sajad NEJADHASAN, Narges MOAZENIAN, Ebrahim ABIRI, Mohammad Reza SALEHI