EKSİK VERİLERİ TAMAMLAMADA DERİN ÖĞRENME TEMELLİ YAKLAŞIM

Veri kümelerindeki eksik değerler, makine öğrenme performansını düşüren bir sorundur. Bu sorunun üstesinden gelmek için her gün yeni yöntemler önerilmektedir. İstatistiksel, makine öğrenimi, evrimsel ve derin öğrenme yöntemleri de bu yöntemler arasındadır. Derin öğrenme günümüzün popüler konularından biri olmasına rağmen, eksik veri tamamlama konusunda sınırlı çalışmalar bulunmaktadır. Eksik verileri tamamlamak için birkaç derin öğrenme tekniği kullanılmıştır, bunlardan biri oto-kodlayıcı ve onun denoising ve yığınlanmış varyantlarıdır. Bu çalışmada, üç farklı gerçek dünya veri setindeki eksik değer, gürültü giderici oto-kodlayıcı (DAE), k-en yakın komşu (kNN) ve çok değişkenli zincirleme denklemler (MICE) yöntemleriyle tahmin edilmiştir. Yöntemlerin tahmin başarısı, kök ortalama kare hatası (RMSE) kriterlerine göre karşılaştırıldı. DAE yönteminin eksik değerleri tahmin etmede diğer yöntemlerden daha başarılı olduğu gözlenmiştir.

Anahtar Kelimeler:

derin öğrenme, oto-kodlayıcı, gürültü giderici oto-kodlayıcı, eksik veri

DEEP LEARNING-BASED APPROACH FOR MISSING DATA IMPUTATION

The missing values in the datasets are a problem that will decrease the machine learning performance. New methods are recommended every day to overcome this problem. The methods of statistical, machine learning, evolutionary and deep learning methods are among these methods. Although deep learning is one of the popular subjects of today, there are limited studies in the missing data imputation. Several deep learning techniques have been used to handling missing data, one of them is the auto-encoder and its denoising and stacked variants. In this study, the missing value in three different real-world datasets was estimated by using denoising auto-encoder (DAE), k-nearest neighbor (kNN) and multivariate imputation by chained equations (MICE) methods. The estimation success of the methods was compared according to the root mean square error (RMSE) criterion. It was observed that the DAE method was more successful than other methods in estimating the missing values.

Keywords:

Deep learning, auto-encoder, denoising auto-encoder, missing data,

PDF

___

Şeker A, Diri B, Balık HH. Derin Öğrenme Yöntemleri ve Uygulamaları Hakkında Bir İnceleme Gazi Mühendislik Bilimleri Dergisi 2017; 3:47-64.
Ballard DH. Modular Learning in Neural Networks. In: AAAI, 1987. pp 279-284.
Qiu YL, Zheng H, Gavaert O. A deep learning framework for imputing missing values in genomic data. bioRxiv:406066 2018.
Ahmed H, Wong M, Nandi A. Intelligent condition monitoring method for bearing faults from highly compressed measurements using sparse over-complete. features Mechanical Systems and Signal Processing 2018; 99:459-477.
Ishii T, Komiyama H, Shinozaki T, Horiuchi Y, Kuroiwa S. Reverberant speech recognition based on denoising autoencoder. In: Interspeech 2013; pp 3512-3516.
Del Testa D, Rossi M. Lightweight lossy compression of biometric patterns via denoising autoencoders. IEEE Signal Processing Letters 2015; 22:2304-2308.
Tan CC, Eswaran C. Using autoencoders for mammogram compression. Journal of medical systems 2011; 35:49-58.
Sakurada M, Yairi T. Anomaly detection using autoencoders with nonlinear dimensionality reduction. In: Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis 2014; p 4.
Chen J, Sathe S, Aggarwal C, Turaga D. Outlier detection with autoencoder ensembles. In: Proceedings of the 2017 SIAM International Conference on Data Mining 2017; pp 90-98.
Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks science 313:504-507.
Lu X, Tsao Y, Matsuda S, Hori C. Speech enhancement based on deep denoising autoencoder. In: Interspeech 2013; pp 436-440.
Vincent P, Larochelle H, Bengio Y, Manzagol P-A. Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning 2008; pp 1096-1103.
García-Laencina PJ, Sancho-Gómez J-L, Figueiras-Vidal AR. Pattern classification with missing data: a review. Neural Computing and Applications 2010; 19:263-282.
Duan Y, Lv Y, Kang W, Zhao Y. A deep learning based approach for traffic data imputation. In: Intelligent Transportation Systems (ITSC), 2014 IEEE 17th International Conference on 2014; IEEE, pp 912-917.
Duan Y, Lv Y, Liu Y-L, Wang F-Y. An efficient realization of deep learning for traffic data imputation. Transportation research part C: emerging technologies 2016; 72:168-181.
Gondara L, Wang K. Recovering loss to followup information using denoising autoencoders. In: 2017 IEEE International Conference on Big Data (Big Data) 2017; pp 1936-1945.
Gondara L, Wang K Mida. Multiple imputation using denoising autoencoders. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining 2018; pp 260-272.
Beaulieu-Jones BK, Moore JH. Missing data imputation in the electronic health record using deeply learned autoencoders. In: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017.;World Scientific, pp 207-218.
Zhao L, Chen Z, Yang Z, Hu Y, Obaidat MS. Local similarity imputation based on fast clustering for incomplete data in cyber-physical systems. IEEE Systems Journal 2018; 12:1610-1620.
Shao M, Ding Z, Fu Y. Sparse low-rank fusion based deep features for missing modality face recognition. In: Automatic Face and Gesture Recognition (FG), 2015 11th IEEE International Conference and Workshops on 2015; pp 1-6.
Tran L, Liu X, Zhou J, Jin R. Missing Modalities Imputation via Cascaded Residual Autoencoder. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017; pp 1405-1414.
Malek S, Melgani F, Bazi Y, Alajlan N. Reconstructing Cloud-Contaminated Multispectral Images With Contextualized Autoencoder Neural Networks IEEE Transactions on Geoscience and Remote Sensing 2018; 56:2270-2282.
Ning X, Xu Y, Gao X, Li Y. Missing data of quality inspection imputation algorithm base on stacked denoising auto-encoder. In: Big Data Analysis (ICBDA), 2017 IEEE 2nd International Conference on, 2017. IEEE, pp 84-88.