Çağatay ÇATAL, Can MURATLI, Egemen ERTUĞRUL, Zakir BAYTAR

Performance tuning for machine learning-based software development effort prediction models

Software development effort estimation is a critical activity of the project management process. In thisstudy, machine learning algorithms were investigated in conjunction with feature transformation, feature selection, andparameter tuning techniques to estimate the development effort accurately and a new model was proposed as part ofan expert system. We preferred the most general-purpose algorithms, applied parameter optimization technique (GridSearch), feature transformation techniques (binning and one-hot-encoding), and feature selection algorithm (principalcomponent analysis). All the models were trained on the ISBSG datasets and implemented by using the scikit-learnpackage in the Python language. The proposed model uses a multilayer perceptron as its underlying algorithm, appliesbinning of the features to transform continuous features and one-hot-encoding technique to transform categorical datainto numerical values as feature transformation techniques, does feature selection based on the principal componentanalysis method, and performs parameter tuning based on the GridSearch algorithm. We demonstrate that our effortprediction model mostly outperforms the other existing models in terms of prediction accuracy based on the meanabsolute residual parameter.

PDF

___

[1] Nassif AB, Azzeh M, Capretz LF, Ho D. Neural network models for software development effort estimation: a comparative study. Neural Comput Appl 2016; 27: 2369-2381.
[2] Farr L, Zagorski HJ. Factors that Affect the Cost of Computer Programming. Santa Monica, CA, USA: System Development Corp., 1964.
[3] Nelson EA. Management Handbook for the Estimation of Computer Programming Costs. Santa Monica, CA, USA: System Development Corp., 1967.
[4] Wen J, Li S, Lin Z, Hu Y, Huang C. Systematic literature review of machine learning based software development effort estimation models. Inf Softw Technol 2012; 54: 41-59.
[5] Jorgensen M, Bohem B, Rifkin S. Software development effort estimation: formal models or expert judgment? IEEE Software 2009; 26: 14-19.
[6] Jorgensen M, Shepperd M. A systematic review of software development cost estimation studies. IEEE T Software Eng 2007; 33: 33-53.
[7] Sehra SK, Brar YS, Kaur N, Sehra SS. Research patterns and trends in software effort estimation. Inf Softw Technol 2017; 91: 1-21.
[8] Jørgensen M. Forecasting of software development work effort: evidence on expert judgement and formal models. Int J Forecast 2007; 23: 449-462.
[9] González-Ladrón-de-Guevar F, Fernández-Diego M, Lokan C. The usage of ISBSG data fields in software effort estimation: a systematic mapping study. J Syst Softw 2016; 113: 188-215.
[10] Lokan C, Wright T, Hill PR, Stringer M. Organizational benchmarking using the ISBSG data repository. IEEE Software 2001; 5: 26-32.
[11] Shepperd M, MacDonell, S. Evaluating prediction systems in software project estimation. Inf Softw Technol 2001; 54: 820-827.
[12] Shepperd M, Schofield C, Kitchenham B. Effort estimation using analogy. In: Proceedings of the 18th International Conference on Software Engineering; 13–14 May 2014, London, UK. New York, NY, USA: IEEE. pp. 170-178.
[13] Mendes E, Watson I, Triggs C, Mosley N, Counsell S. A comparative study of cost estimation models for web hypermedia applications. ESEM 2003; 8: 163-196.
[14] Srinivasan K, Fisher D. Machine learning approaches to estimating software development effort. IEEE T Software Eng 1995; 21: 126-137.
[15] Kocaguneli E, Menzies T, Bener A, Keung JW. Exploiting the essential assumptions of analogy-based effort estimation. IEEE T Software Eng 2012; 38: 425-438.
[16] Menzies T, Yang Y, Mathew G, Boehm B, Hihn J. Negative results for software effort estimation. ESEM 2017; 22: 2658-2683.
[17] Whigham PA, Owen CA, Macdonell, SG. A baseline model for software effort estimation. TOSEM 2015; 24: 20.
[18] Sarro F, Petrozziello A, Harman M. Multi-objective software effort estimation. In: Proceedings of the 38th International Conference on Software Engineering; 14–22 May 2016, Austin, TX, USA. pp. 619-630.
[19] Shahpar Z, Khatibi V, Tanavar, A, Sarikhani, R. Improvement of effort estimation accuracy in software projects using a feature selection approach. JACET 2016; 2: 31-38.
[20] Dejaeger K, Verbeke W, Martens D, Baesens B. Data mining techniques for software effort estimation: a comparative study. IEEE T Software Eng 2012; 38: 375-397.
[21] Li YF, Xie M, Goh TN, Baesens B. Data mining techniques for software effort estimation: a comparative study. J Syst Softw 2010; 83: 2332-2343.
[22] Papatheocharous E, Papadopoulos H, Andreou AS. Feature subset selection for software cost modelling and estimation. arXiv preprint. arXiv:1210.1161, 2012.
[23] Nguyen V, Steece B, Boehm B. A constrained regression technique for COCOMO calibration. In: Proceedings of the Second ACM-IEEE International Symposium on Empirical Software Engineering and Measurement; 9–10 October 2008; Kaiserslautern, Germany. pp. 213-222.
[24] Basgalupp MP, Barros RC, da Silva TS, de Carvalho, AC. Software effort prediction: a hyper-heuristic decision-tree based approach. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing; 18–22 March 2013; Coimbra, Portugal. pp. 1109-1116.
[25] Basgalupp MP, Barros RC, Ruiz DD. Predicting software maintenance effort through evolutionary-based decision trees. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing; 26–30 March 2012; Riva, Italy. pp. 1209-1214.
[26] Oliveira AL. Estimation of software project effort with support vector regression. Neurocomputing 2006; 69: 1749- 1753.
[27] Oliveira AL, Braga PL, Lima RMF, Cornélio ML. GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation. Inf Softw Technol 2010; 52: 1155-1166.
[28] Lin JC, Chang CT, Huang SY. Research on software effort estimation combined with genetic algorithm and support vector regression. In: International Symposium on Computer Science and Society; 16–17 July 2011, Kota Kinabalu, Malaysia. pp. 349-352.
[29] Corazza A, Di Martino S, Ferrucci F, Grabino C, Mendes E. Investigating the use of support vector regression for web effort estimation. ESEM 2011; 16: 211-243.
[30] Satapathy SM, Acharya BP, Rath SK. Early stage software effort estimation using random forest technique based on use case points. IET Softw 2016; 10: 10-17.
[31] Nassif AB, Capretz LH, Ho D, Azzeh M. A treeboost model for software effort estimation based on use case points. In: 11th International Conference on Machine Learning and Applications; 12–15 December 2012; Boca Raton, FL, USA. pp. 314-319.
[32] Elish MO. Improved estimation of software project effort using multiple additive regression trees. Expert Syst Appl 2009; 36: 10774-10778.
[33] Friedman JO. Stochastic gradient boosting. Comput Stat Data An 2002; 38: 367-378.
[34] Shepperd M, Schofield C. Estimating software project effort using analogies. IEEE T Software Eng 1997; 23: 736- 743.
[35] Nassif AB, Ho D, Capretz LF. Towards an early software estimation using log-linear regression and a multilayer perceptron model. J Syst Softw 2013; 86: 144-160.
[36] Idri A, Khoshgoftaar TM, Abran A. Can neural networks be easily interpreted in software cost estimation? In: Proceedings of the 2002 IEEE International Conference on Fuzzy Systems; 12–17 May 2002; Honolulu, HI, USA. pp. 1162-1167.
[37] Foss T, Stensrud E, Kitchenham B, Myrtveit I. A simulation study of the model evaluation criterion MMRE. IEEE T Software Eng 2003; 29: 985-995.
[38] Hackeling G. Mastering Machine Learning with scikit-learn. Birmingham, UK: Packt Publishing Ltd., 2017.