Yazılım Hata Tahmini için Sıralı Sınıflandırma Yaklaşımı

Yazılım hata tahmini, kaynak kodda bulunan olası hataların (veya kusurların) varlığını tahmin etmek için sınıflandırma ve/veya regresyon algoritmalarının kullanımı işlemidir. Fakat, literatürde bulunan sınıflandırma çalışmaları, veri setlerindeki hedef özellik değerlerini iki olasılıklı (hatalı veya hatasız) veya sırasız olarak kabul etmektedir. Bu nedenle; sıfır, az veya çok hatalı gibi sınıf değerleri arasındaki sıralama mantığını değerlendirmemektedir. Bu eksikliği gidermek amacıyla, bu çalışma, yazılım hata tahminleme problemi için sıralı sınıflandırma metotlarını kullanan yeni bir yaklaşım önermektedir. Makalede, çeşitli sınıflandırma algoritmalarının (rastgele orman, destek vektör makineleri, Naive Bayes ve k-en yakın komşu) sıralı ve itibari sürümleri, yazılım mühendisliği alanındaki 38 gerçek veriseti üzerinde sınıflandırma performansları açısından karşılaştırılmıştır. Sonuçlar, sıralı sınıflandırma yaklaşımının geleneksel (itibari) çözümlere nispeten ortalamada daha iyi bir sınıflandırma doğruluğuna ulaştığını göstermektedir. 

An Ordinal Classification Approach for Software Bug Prediction

Software bug prediction is the process of utilizing classification and/or regression algorithms to predict the presence of possible errors (or defects) in a source code. However, current classification studies in the literature assume that the target attribute values in the datasets are binary (i.e. buggy or non-buggy) or unordered, so they lose inherent order between the class values such as zero, less and more bug levels. To overcome this drawback, this study proposes a novel approach which suggests ordinal classification methods as a solution for software bug prediction problem. This article compares ordinal and nominal versions of various classification algorithms (random forest, support vector machine, Naive Bayes and k-nearest neighbor) in terms of classification performance on real-world 38 software engineering datasets. The results indicate that ordinal classification approach achieves better classification accuracy on average than the traditional (nominal) solutions.  

___

  • [1] Burnstein, I. 2003. Practical Software Testing: A Process-Oriented Approach. 2003rd edition. Springer-Verlag New York, 710p.
  • [2] Georgoulas, G., Karvelis P., Gavrilis D., Stylios C. D., Nikolakopoulos G. 2017. An Ordinal Classification Approach for CTG Categorization. 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 11-15 July, 2642-2646. DOI: 10.1109/EMBC.2017.8037400
  • [3] Frank, E., Hall, M. 2001. A Simple Approach to Ordinal Classification. 12th European Conference on Machine Learning, Freiburg, Germany, September 5-7, 2001, Lecture Notes in Computer Science, Volume. 2167, 145-156.
  • [4] Kumar, L., Misra S., Rath Ku S. 2017. An Empirical Analysis of the Effectiveness of Software Metrics and Fault Prediction Model for Identifying Faulty Classes, Computer Standards & Interfaces, Volume. 53, p. 1-32. DOI: 10.1016/j.csi.2017.02.003
  • [5] Nucci, D. D., Palomba, F., Oliveto, R., Lucia, D. A. 2017. Dynamic Selection of Classifiers in Bug Prediction: An Adaptive Method, IEEE Transactions on Emerging Topics in Computational Intelligence, Volume. 1, Issue 3, p. 202-212. DOI: 10.1109/TETCI.2017.2699224
  • [6] Gupta, D. L., Saxena, K., 2017. Software Bug Prediction using Object-Oriented Metrics, Sādhanā, Volume. 42, Issue. 5, p. 655-669. DOI: 10.1007/s12046-017-0629-5
  • [7] Gupta, D. L., Saxena K. 2016. AUC based Software Defect Prediction for Object-Oriented Systems, International Journal of Current Engineering and Technology, Volume. 6, Issue. 5.
  • [8] Okutan, A., Yildiz O. T. 2016. A Novel Kernel to Predict Software Defectiveness, Journal of Systems and Software, Volume. 119, p. 109-121. DOI: 10.1016/j.jss.2016.06.006
  • [9] Ryu, D., Baik, J. 2016. Effective Multi-Objective Naïve Bayes Learning for Cross-Project Defect Prediction, Applied Soft Computing, Volume. 49, p. 1062-1077. DOI: 10.1016/j.asoc.2016.04.009
  • [10] Okutan, A., Yıldız O. T. 2014. Software Defect Prediction using Bayesian Networks, Empirical Software Engineering, Volume. 19, Issue. 1, p. 154-181. DOI: 10.1007/s10664-012-9218-8
  • [11] Turhan, B., Mısırlı A. T., Bener, A. 2013. Empirical Evaluation of the Effects of Mixed Project Data on Learning Defect Predictors, Information and Software Technology, Volume. 55, Issue. 6, p. 1101-1118. DOI: 10.1016/j.infsof.2012.10.003
  • [12] Guijo-Rubio, D., Gutiérreza, P.A. Casanova-Mateo C., Sanz-Justob, J., Salcedo-Sanzd, S., Hervás-Martíneza, C. 2018. Prediction of Low-visibility Events due to Fog using Ordinal Classification, Atmospheric Research, Volume. 214, p. 64-73. DOI: 10.1016/j.atmosres.2018.07.017
  • [13] Beckham, C., Pal, C. 2017. Unimodal Probability Distributions for Deep Ordinal Classification. 34th International Conference on Machine Learning, Sydney, Australia.
  • [14] Okyere, S., Yang, J. Aminatou, M., Tuo, G., Zhan, B. 2018. Multimodal Transport System Effect on Logistics Responsive Performance: Application of Ordinal Logistic Regression, European Transport, Issue. 68, Paper. 4.
  • [15] Kim, S., Kim, H., K., Namkoong, Y. 2016. Ordinal Classification of Imbalanced Data with Application in Emergency and Disaster Information Services, IEEE Intelligent Systems, Volume. 31, Issue. 5, p. 50-56. DOI: 10.1109/MIS.2016.27
  • [16] Fontana, F. A., Zanoni, M. 2017. Code Smell Severity Classification using Machine-Learning Techniques, Knowledge-Based Systems, Volume. 128, p. 43-58. DOI: 10.1016/j.knosys.2017.04.014
  • [17] Czibula, G., Marian, Z., Czibula, I. G. 2014. Software Defect Prediction using Relational Association Rule Mining, Information Sciences, Volume. 264, p. 260-278. DOI: 10.1016/j.ins.2013.12.031
  • [18] Madeyski, L., Jureczko, M. 2015. Which Process Metrics Can Significantly Improve Defect Prediction Models? An empirical study, Software Quality Journal, Volume. 23, Issue. 3, p. 393-422. DOI: 10.1007/s11219-014-9241-7
  • [19] Prasad, M. C., Florence, L., Arya, A. 2015. A Study on Software Metrics based Software Defect Prediction using Data Mining and Machine Learning Techniques, International Journal of Database Theory and Application, Volume. 8, Issue. 3, p. 179-190. DOI: 10.14257/ijdta.2015.8.3.15
  • [20] Valles-Barajas, F. 2015. A Comparative Analysis between Two Techniques for the Prediction of Software Defects: Fuzzy and Statistical Linear Regression, Innovations in Systems and Software Engineering, Volume. 11, Issue. 4, p.277-287. DOI: 10.1007/s11334-015-0256-4
  • [21] Felix, E. A., Lee, S. P. 2017. Integrated Approach to Software Defect Prediction, IEEE Access, Volume. 5, p. 21524-21547. DOI: 10.1109/ACCESS.2017.2759180
  • [22] Zhang, F., Keivanloo, I., Zou Y. 2017. Data Transformation in Cross-project Defect Prediction, Empirical Software Engineering, Volume. 22, Issue. 6, p. 3186-3218. DOI: 10.1007/s10664-017-9516-2
  • [23] Herbold, S., Trautsch, A., Grabowski, J. 2017. A Comparative Study to Benchmark Cross-Project Defect Prediction Approaches, IEEE Transactions on Software Engineering, Volume. 44, Issue. 9, p. 811-833. DOI: 10.1109/TSE.2017.2724538
  • [24] Wahono, R. S., Herman, N. S. 2014. Genetic Feature Selection for Software Defect Prediction, Advanced Science Letters, Volume. 20, Issue.1, p. 239-244. DOI: 10.1166/asl.2014.5283
  • [25] Laradji, I. H., Alshayeb, M., Ghouti, L. 2015. Software Defect Prediction using Ensemble Learning on Selected Features, Information and Software Technology, Volume. 58, p. 388-402. DOI: 10.1016/j.infsof.2014.07.005
  • [26] Rana, Z. A., Mian, M. A., Shamail, S. 2015. Improving Recall of Software Defect Prediction Models using Association Mining, Knowledge-Based Systems Volume. 90, p. 1-13. DOI: 10.1016/j.knosys.2015.10.009
  • [27] Huda, S., Liu, K., Abdelrazek, M., Ibrahim, A., Alyahya, S., Al-Dossari, H., Ahmad, S. 2018. An Ensemble Oversampling Model for Class Imbalance Problem in Software Defect Prediction, IEEE Access, Volume. 6, p. 24184-24195. DOI: 10.1109/ACCESS.2018.2817572
  • [28] Wijaya, A., Wahono, R. S. 2017. Tackling Imbalanced Class in Software Defect Prediction using Two-Step Cluster-based Random Undersampling and Stacking Technique. Jurnal Teknologi, Volume. 79, Issue. 7-2, p. 45-50.
  • [29] Tomar, D., Agarwal, S. 2016. Prediction of Defective Software Modules using Class Imbalance Learning, Applied Computational Intelligence and Soft Computing, Volume. 2016. DOI: 10.1155/2016/7658207
  • [30] Rodriguez, D., Herraiz, I., Harrison, R., Dolado, J., Riquelme, J. C. 2014. Preliminary Comparison of Techniques for Dealing with Imbalance in Software Defect Prediction. Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering, 13-14 May, London, England, United Kingdom. DOI: 10.1145/2601248.2601294
  • [31] Wang, S., Yao, X. 2013. Using Class Imbalance Learning for Software Defect Prediction, IEEE Transactions on Reliability, Volume. 62, Issue. 2, p. 434-443. DOI: 10.1109/TR.2013.2259203
  • [32] Weka - Data Mining Software in Java, https://www.cs.waikato.ac.nz/ml/weka/. (Accessed: 21.11. 2018).
  • [33] Tera-Promise Data, https://github.com/klainfo/DefectData/tree/master/inst/extdata/terapromise/ck. (Accessed: 20.11.2018).
  • [34] PROMISE Software Engineering Repository http://promise.site.uottawa.ca/SERepository/ (Accessed: 20.11.2018).
  • [35] Li, J., He, P., Zhu, J., Lyu, M. R. 2017. Software Defect Prediction via Convolutional Neural Network. IEEE International Conference on Software Quality, Reliability and Security (QRS), 25-29 July, Praque, Czech Republic. DOI: 10.1109/QRS.2017.42
  • [36] Bowes, D., Hall, T., Petrić, J. 2018. Software Defect Prediction: Do Different Classifiers Find the Same Defects?, Software Quality Journal, Volume. 26, Issue. 2, p. 525-552. DOI: 10.1007/s11219-016-9353-3
Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi-Cover
  • ISSN: 1302-9304
  • Yayın Aralığı: Yılda 3 Sayı
  • Başlangıç: 1999
  • Yayıncı: Dokuz Eylül Üniversitesi Mühendislik Fakültesi