Evaluation of Machine Learning Hyperparameters Performance for Mice Protein Expression Data in Different Situations

Evaluation of Machine Learning Hyperparameters Performance for Mice Protein Expression Data in Different Situations

In this study, the aim was to assess the effect and significance of hyperparameters in four different datasets containing different values for observation numbers and variable counts with the machine-learning methods of support vector machines and artificial neural networks. With this aim, a dataset comprising 15 repeats of 77 protein levels from 38 healthy and 34 down syndrome mice was used. A total of 138 different models and model classification performance criteria were obtained from the datasets in the study comprising combinations of hyperparameters in machine-learning methods. Comparison of the models used criteria like accurate classification percentage, kappa statistic, mean absolute error and square root of mean error squares. According to performance criteria, the first dataset with 1080 observations x 77 variables had 71.30% accurate classification percentage for assumed parameters with the support vector machines polynomial kernel function, while changing the hyperparameter variables increased this rate to 99.44%. Similarly, the second dataset had 50.65% accurate classification percentage with the artificial neural network single hidden layer 2 neuron model, while changing the hyperparameter values increased this rate to 90.46%. In conclusion, in situations with low variable and observation numbers, the machine learning methods were determined to display lower performance. However, in datasets, it is very important for classification performance in artificial neural networks and support vector machines, especially polynomial and radial basis function kernel functions, to set hyperparameters according to the dataset. In situations with low variable numbers, especially, the effect of hyperparameters was determined to gain importance.

___

  • [1] Gandomi, A., Haider, M., (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35: 137-144.
  • [2] Oussous, A., Benjelloun, F.Z., Lahcen, A.A., Belfkih, S., (2018). Big data technologies: A survey. Journal of King Saud University Computer and Information Sciences, 30: 431-448.
  • [3] Wang, B., Gong, N. Z. (2018). Stealing hyperparameters in machine learning. IEEE Symposium on Security and Privacy. DOI 10.1109/SP.2018.00038
  • [4] Koch, P., Wujek, B., Golovidov, O., Gardner, S. Automated Hyperparameter tuning for effective machine learning. SAS514-2017. SAS Institute Inc. USA. (2017).
  • [5] Ser, G., Bati, C. T. (2019). Determining the best model with deep neural networks: Keras application on mushroom data. YYU J. AGR. SCI. 29(3), 406-417.
  • [6] Claesen, M., Moor, B.D. (2015). Hyperparameter search in machine learning. The XI Metaheuristics International Conference. arXiv:1502.02127v2 [cs.LG] 6 Apr2015.
  • [7] Padierna, L. C., Carpio, M., Rojas, A., Puga, H., Baltazar, R., Fraire, H. (2017). Hyper-parameter tuning for support vector machines by estimation of distribution algorithms. In Nature-Inspired Design of Hybrid Intelligent Systems. 787-800.
  • [8] Probst, P., Boulesteix, A. L., Bischl, B. (2019). Tunability: Importance of Hyperparameters of Machine Learning Algorithms. J. Mach. Learn. Res., 20(53), 1-32.
  • [9] Hsu, C. W., Chang, C. C., Lin, C. J. (2003). A practical guide to support vector classification. http://www.csie.ntu.edu.tw/∼cjlin/papers/guide/guide.pdf. Date of access: 12.10.2020
  • [10] Prevolnik, M., Škorjanc, D., Čandek-Potokar, M., Novič, M. (2011). Application of artificial neural networks in meat production and technology. Computer and Information Science, 11, 223-240.
  • [11] Pour Hamidi, S., Mohammadabadi, M. R., Asadi Foozi, M., Nezamabadi-pour, H. (2017). Prediction of breeding values for the milk production trait in Iranian Holstein cows applying artificial neural networks. Journal of Livestock Science and Technologies, 5(2), 53-61.
  • [12] Liakos, K., Moustakidis, S. P., Tsiotra, G., Bartzanas, T., Bochtis, D., Parisses, C. (2017). Machine Learning Based Computational Analysis Method for Cattle Lameness Prediction. HAICTA 128-139.
  • [13] Bisgin, H., Bera, T., Ding, H., Semey, H. G., Wu, L., Liu, Z., Tong, W. (2018). Comparing SVM and ANN based machine learning methods for species identification of food contaminating beetles. Scientific reports, 8(1), 1-12.
  • [14] UCI Database, 2020. https://archive.ics.uci.edu/ml/datasets/Mice+Protein+Expression#
  • [15] Higuera C., Gardiner K.J., Cios K.J. (2015) Self-Organizing Feature Maps Identify Proteins Critical to Learning in a Mouse Model of Down Syndrome. PLoS ONE 10(6): e0129126. [Web Link] ournal.pone.0129126
  • [16] Frank, E., Mark, A. H., Ian, H. W. The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, Fourth Edition, (2016).
  • [17] Aydin, C. (2018). Classification of Fire Station Needs Using Machine Learning Algorithms. European Journal of Science and Technology, (14), 169-175.
  • [18] Aggarwal, C. C. Neural networks and deep learning. Springer. (2018).
  • [19] Haykin, S. Neural networks and learning machines. Pearson Education Canada. (2008).
  • [20] Jain, Rasmi. (2017). https://www.hackerearth.com/blog/developers/simple-tutorial-svm-parameter-tuning-python-r/. Date of access: 13.09.2020
  • [21] Kavzaoglu, T., Colkesen, I. (2010). Investigation of the effects of kernel functions on the classification of support vector machines and application images. Journal of Map, 144: 73-82.
  • [22] Dawson, Carl. (2019). https://towardsdatascience.com/a-guide-to-svm-parameter-tuning-8bfe6b8a452c. Data of access: 04.08.2020
  • [23] Igual, L., Seguí, S. In Introduction to Data Science (pp. 1-4). Springer, Cham. (2017).
  • [24] Moore, A., 2018. Support vector machines. https://www.autonlab.org/_media/tutorials/svm15.pdf. Date of access: 04.01.2020.
  • [25] Guran, A., Uysal, M., Dogrusoz, O., (2014). Effects of support vector machine parameter optimization on sentiment analysis. DEU Faculty of Engineering Journal of Engineering Sciences, 16(48): 86-93.
  • [26] Pupale, Rushikesh, https://towardsdatascience.com/https-medium-com-pupalerushikesh-svm-f4b42800e989. Date of access: 11.11.2020
  • [27] Van Rijn, J. N., Hutter, F. (2018). Hyperparameter importance across datasets. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 2367-2376).
  • [28] Shaikhina, T., Lowe, D., Daga, S., Briggs, D., Higgins, R., & Khovanova, N. (2015). Machine learning for predictive modelling based on small data in biomedical engineering. IFAC-PapersOnLine, 48(20), 469-474.
  • [29] Xu, B., Wang, W., Falzon, G., Kwan, P., Guo, L., Sun, Z., & Li, C. (2020). Livestock classification and counting in quadcopter aerial images using Mask R-CNN. International Journal of Remote Sensing, 1-22.