Kemal AKYOL, ÜMİT ATİLA

Comparing The Effect of Under-Sampling and Over-Sampling on Traditional Machine Learning Algorithms for Epileptic Seizure Detection

Tekrarlayan ve ani krizlere neden olan nörolojik bir hastalık olan epilepsy hastalığı öngörülemeyen zamanlarda ortaya çıkar. Bu çalışma, epileptik nöbet tahmini için elektroensefalogram sinyallerinin sınıflandırılmasını sunmaktadır. Makine öğrenme algoritmalarının performansı, elektroensefalogram sinyallerinden elde edilen veriseti üzerinde değerlendirilmiştir. Veriseti, 23.5 saniye boyunca 4097 veri noktasına sahip 500 örnek içermektedir. Veriseti dengesiz olduğu için, bu veri setinde Rastgele Alt Örnekleme ve Rastgele Üst Örnekleme yöntemleri uygulanmıştır. Bu nedenle bu çalışma üç veri seti üzerinde yürütülmüştür. Her veri seti üç senaryo çerçevesinde % 60 eğitim - % 40 test, % 70 eğitim - % 30 test ve % 80 eğitim - % 20 test verileri olarak ayrılmıştır. Bu verisetleri üzerinde Çapraz Doğrusal Ayırt Edici Analiz, Doğrusal Ayırt Edici Analiz, Lojistik Regresyon ve Rastgele Orman makine öğrenmesi algoritmaların performansları değerlendirilmiş ve tartışılmıştır. Genel sonuçlar, tüm verisetleri için Random Forest algoritmasının doğruluk, hassasiyet ve özgüllük metrikleri açısından üstün olduğunu göstermiştir.

Anahtar Kelimeler:

Epileptik nöbet, makine öğrenmesi, dengesiz ve dengeli veri seti, üst örnekleme, alt örnekleme.

Comparing The Effect of Under-Sampling and Over-Sampling on Traditional Machine Learning Algorithms for Epileptic Seizure Detection

Epilepsy disease, a neurological disorder that causes recurrent and sudden crises, occurs at unforeseen times. This study presents the classification of electroencephalogram signals for epileptic seizure prediction. The performances of the machine learning algorithms are evaluated on the dataset extracted from electroencephalogram signals. The dataset consists of 500 instances which have 4097 data points for 23.5 seconds. Since the dataset unbalanced, Random Under Sampling and Random Over Sampling methods are performed on this dataset. Therefore, this study is conducted on three datasets. Each dataset is split to 60% train - 40% test, 70% train - 30% test and 80% train - 20% test within the three scenarios. The performances of Diagonal Linear Discriminant Analysis, Linear Discriminant Analysis, Logistic Regression and Random Forest machine learning algorithms on these datasets are assessed, and discussed. The overall results show that Random Forest is the superior algorithm for all datasets in terms of accuracy, sensitivity and specificity metrics.

Keywords:

Epileptic seizure, machine learning, unbalanced and balanced dataset, over sampling, under sampling.,

PDF

___

[1] B. Sharif and A. H. Jafari, “Prediction of epileptic seizures from EEG using analysis of ictal rules on Poincaré plane”, Computer Methods and Programs in Biomedicine, vol. 145, pp. 11-22, 2017.
[2] E. Caplan, I. Dey, A. Scammell, K. Burnage, S.P. Paul, “Recognition and management of seizures in children in emergency departments”, Emergency Nurse, vol. 24, no. 5, pp. 30-38, 2016.
[3] O. Kocadagli, R. Langari, “Classification of EEG signals for epileptic seizures using hybrid artificial neural networks based wavelet transforms and fuzzy relations”, Expert Systems with Applications, vol. 88, pp. 419-434, 2017.
[4] H. Chu, C.K. Chung, W. Jeong, K.H. Cho, “Predicting epileptic seizures from scalp EEG based on attractor state analysis”, Computer Methods and Programs in Biomedicine, vol. 143, pp. 75-87, 2017.
[5] Z. Mohammadpoory, M. Nasrolahzadeh, J. Haddadnia, “Epileptic seizure detection in EEGs signals based on the weighted visibility graph entropy”, Seizure, vol. 50, pp. 202-208, 2017.
[6] T. Wan, M. Wu, X. Lai, X. Wan, J. She, Y. Du, “A four-stage localization method for epileptic seizure onset zones”, IFAC-Papers OnLine, vol. 50, no.1, pp. 4412-4417, 2017.
[7] I. Kiral-Kornek, S. Roy, E. Nurse, B. Mashford, P. Karoly, T. Carroll, et al., “Epileptic Seizure Prediction Using Big Data and Deep Learning: Toward a Mobile System”, Ebiomedicine, vol. 27, pp. 103-111, 2018.
[8] A.R. Hassan, A. Subasi, “Automatic identification of epileptic seizures from EEG signals using linear programming boosting”, Computer Methods and Programs in Biomedicine, vol. 136, pp. 65-77, 2016.
[9] N.D. Truong, L. Kuhlmann, M.R. Bonyadi, J. Yang, A. Faulks, O. Kavehei, “Supervised learning in automatic channel selection for epileptic seizure detection”, Expert Systems with Applications, vol. 86, pp. 199-207, 2017.
[10] J. Jia, B. Goparaju, J. Song, R. Zhang, M.B. Westover, “Automated identification of epileptic seizures in EEG signals based on phase space representation and statistical features in the CEEMD domain”, Biomedical Signal Processing and Control, vol. 38, pp. 148-157, 2017.
[11] N.S. Tawfik, S.M. Youssef, M. Kholief, “A hybrid automated detection of epileptic seizures in EEG records”, Computers & Electrical Engineering, vol. 53, pp. 177-190, 2015.
[12] D. Gajic, Z. Djurovic, S.D. Gennaro, F.Gustafsson, “Classification of eeg signals for detection of epileptic seizures based on wavelets and statistical pattern recognition”, Biomed. Eng. Appl. Basis. Commun, vol. 26, no. 2, 1450021, 2014.
[13] K. Samiee, P. Kovács, M. Gabbouj, “Epileptic Seizure Classification of EEG time-series using Rational Discrete Short Time Fourier Transform”, IEEE Transaction on Biomedical Engineering, vol. 62, no. 2, pp. 541-552, 2015.
[14] E. Acar, C.A. Bingol, H. Bingol, R. Bro, B. Yener, “Seizure recognition on epilepsy feature tensor”, 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 4273-4276, Lyon, France, 22-26 Aug. 2007.
[15] Internet: Epileptic Seizure Recognition Dataset, https://archive.ics.uci.edu/ml/datasets/Epileptic+Seizure+Recognition, 02.01.2019.
[16] L. Breiman, “Random forests”, Mach Learn, vol. 45, pp. 5-32, 2001.
[17] O. Akar and O. Gungor, “Classification of multispectral images using Random Forest algorithm”, Journal of Geodesy and Geoinformation, vol. 1, pp. 139-146, 2012.
[18] A. Agresti, An Introduction to Categorical Data Analysis, 2nd ed. New Jersey, USA: Wiley, 2007.
[19] S. Lemeshow and D. Hosmer, Applied Logistic Regression, 2nd ed. New York, USA: Wiley, 2000.
[20] R. Fisher, “The use of multiple measurements in taxonomic problems”, Annals of Eugenics, vol. 7, pp. 179-188, 1936.
[21] Y. Jieping, “Least squares linear discriminant analysis,” ICML '07 Proceedings of the 24th international conference on Machine learning, pp. 1087-1093, Corvalis, Oregon, USA - June 20 - 24, 2007.
[22] S. Dudoit, J. Fridlyand, T.P. Speed, “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data”, Journal of the American Statistical Association, vol. 97, no. 457, pp. 77-87, 2002.
[23] D.J. Dittman, T.M. Khoshgoftaar, R. Wald, A. Napolitano, “Comparison of data sampling approaches for imbalanced bioinformatics data”, Proceedings of the Twenty-Seventh International Florida Artificial Intelligence Research Society Conference, 2014, May 21-23, Florida.
[24] A.O. Durahim, “Comparison of sampling techniques for imbalanced learning”, Yönetim Bilişim Sistemleri Dergisi, vol. 1, no. 3, pp. 181-191, 2016.
[25] S.A. Shaikh, “Measures derived from a 2x2 table for an accuracy of a diagnostic test”, J Biom Biostat, vol. 2, no. 128, pp. 1-4, 2011.
[26] A. Bhowmick, T. Abdou, A. Bener, “Predictive Analytics in Healthcare: Epileptic Seizure Recognition”, Proceeding CASCON '18 Proceedings of the 28th Annual International Conference on Computer Science and Software Engineering, pp. 323-330, Markham, Ontario, Canada, October 29 - 31, 2018.
[27] G. A. Maggiotti, “Automated Brain Disorders Diagnosis through Deep Neural Networks”, The Machine Learning Summer School 2018, At Buenos Aires, Argentina, June 2018.