A case study on player selection and team formation in football with machine learning

A case study on player selection and team formation in football with machine learning

Machine learning has been widely used in different domains to extract information from raw data. Sports is one of the popular domains for researchers to work on recently. Although score prediction for matches is the most preferred application area for artificial intelligence, player selection, and team formation is also an application area worth working on. There are some studies in the literature about player selection and team formation which are examined in this study. The study has two important contributions: First one is to apply seven different machine learning algorithms on our dataset to find the best player combination for the U13 team of Altınordu Football Academy and comparing the results with that of the coach’s lineup and lineups of 20 matches played in 2019–2020 season. Second is combining the data obtained from the trainings of the players and coach evaluations of the players and feeding the machine to make more accurate predictions. The data from the trainings is gathered with Hit/it Assistant and the coach evaluations of the players are stated by the golden standard according to eighteen criteria stated in the literature. Synthetically generated data is also used in the final dataset to obtain more accurate classification results. Another remarkable aspect of the study is that no match data is used to form the team to be proposed for the next match, instead real match data is only used for evaluation. The results show that machine learning algorithms can be used for player selection and team formation process because random forest algorithm, which is executed on WEKA environment, can make player selections with 93.93% reliability and the lineup suggestions of these algorithms are 97.16% similar to coach’s ideal team and also the best performing algorithm has an average performance of 89.36% for team formation when compared with the match lineups of 2019–2020 football season.

___

  • [1] Arnason A, Sigurdsson SB, Gudmundsson A, Holme I, Engebretsen L et al. Physical fitness, injuries, and team performance in soccer. Medicine & Science in Sports & Exercise 2004; 36: 278-285. doi: 10.1249/01.MSS.0000113478.92945.CA
  • [2] Boon BH, Sierksma G. Team formation: matching quality supply and quality demand. European Journal of Operational Research 2003; 148: 277-292. doi: 10.1016/S0377-2217(02)00684-7
  • [3] Frank E, Hall MA, Witten IH. The WEKA Workbench. Online Appendix for ”Data Mining: Practical Machine Learning Tools and Techniques”. 4th ed. Burlington, MA, USA: Morgan Kaufmann Publishers Inc., 2016.
  • [4] Zahradnik D, Korvas P. The Introduction into Sports Training. Masaryk University Press, Brno, Czech Republic, 2012.
  • [5] Jürgen P, Arnold B. Application of neural networks to analyze performance in sports. In: 8th Annual Congress of the European College of Sport Science ECCS; Salzburg, Austria; 2003. p. 342.
  • [6] Baca A, Kornfeind P. Rapid feedback systems for elite sports training. Pervasive Computing IEEE 2006; 5(4): 70-76. doi: 10.1109/MPRV.2006.82
  • [7] Novatchkov H, Baca A. Machine learning methods for the automatic evaluation of exercises on sensor-equipped weight training machines. Procedia Engineering 2012; 34: 562-567. doi: 10.1016/j.proeng.2012.04.096
  • [8] Owusu G. AI and computer-based methods in performance evaluation of sporting feats: an overview. Artificial Intelligence Review 2007; 27 (1): 57-70. doi: 10.1007/s10462-008-9068-3
  • [9] Pernek I, Hummel KA, Kokol P. Exercise repetition detection for resistance training based on smartphones. Personal Ubiquitous Computing 2013; 17 (4): 771-782. doi: 10.1007/s00779-012-0626-y
  • [10] Vales-Alonso J, López-Matencio P, Gonzalez-Castaño FJ, Navarro-Hellín H, Baños-Guirao PJ et al. Ambient intelligence systems for personalized sport training. Sensors 2010; 10 (3): 2359-2385. doi: 10.3390/s100302359
  • [11] Fisher I, Rauter S, Yang X-S, Ljubic K, Fisher Jr I. Planning the sports training sessions with the bat algorithm. Neurocomputing 2015; 149: 993-1002. doi: 10.1016/j.neucom.2014.07.034
  • [12] Severini TA. Analytic methods in sports: using mathematics and statistics to understand data from baseball, football, basketball, and other sports. Boca Raton, FL, USA: CRC Press, 2014.
  • [13] Liu F, Shi Y, Najjar L. Application of design of experiment method for sports results prediction. Procedia Computer Science 2017; 122: 720-726. doi: 10.1016/j.procs.2017.11.429
  • [14] Leung CK, Joseph KW. Sports data mining: predicting results for the college football games. Procedia Computer Science 2014; 35: 710-719. doi: 10.1016/j.procs.2014.08.153
  • [15] Bunker RP, Thabtah F. A machine learning framework for sports results prediction. Applied Computing and Informatics 2019; 15: 27-33. doi: 10.1016/j.aci.2017.09.005
  • [16] Park YJ, Kim HS, Kim D, Lee H, Kim SB et al. A deep learning-based sports player evaluation model based on game statistics and news articles. Knowledge-Based Systems 2017; 138: 15-26. doi: 10.1016/j.knosys.2017.09.028
  • [17] Yanpeng Z. Hybrid kernel extreme learning machine for evaluation of athletes’ competitive ability based on particle swarm optimization. Computers and Electrical Engineering 2019; 73: 23-31. doi: 10.1016/j.compeleceng.2018.10.017
  • [18] Novatchkov H, Baca A. Fuzzy logic in sports: a review and an illustrative case study in the field of strength training. International Journal of Computing Applications 2013; 71 (6): 8-14. doi: 10.5120/12360-8675
  • [19] Ofoghi B, Zeleznikow J, MacMahon C, Dwyer D. Supporting athlete selection and strategic planning in track cycling omnium: a statistical machine learning approach. Information Sciences 2013; 233: 200-213. doi: 10.1016/j.ins.2012.12.050
  • [20] Tavana M, Azizi F, Azizi F, Behzadian M. A fuzzy inference system with application to player selection and team formation in multi-player sports. Sport Management Review 2013; 16: 97-110. doi: 10.1016/j.smr.2012.06.002
  • [21] Pehlivan NY, Unal Y, Kahraman C. Player selection for a National Football Team using fuzzy AHP and fuzzy TOPSIS. Journal Of Multiple-Valued Logic And Soft Computing 2019; 32 (5-6): 369-405.
  • [22] Braha D. Partitioning tasks to product development teams. In: International Design Engineering Technical Conferences of American Society of Mechanical Engineers (DETC’02 ASME); Montreal, Canada; 2002.
  • [23] Durmusoglu M, Kulak O. A methodology for the design of office cells using axiomatic design principles. Omega 2008; 36: 633-652. doi: 10.1016/j.omega.2005.10.007
  • [24] Schumaker RP, Solieman OK, Chen H. Predictive modeling for sports and gaming. Sports data mining, Springer 2010; 55-63. doi: 10.1007/978-1-4419-6730-5_6
  • [25] Seif El-Nasr M, Drachen A, Canossa A. Game Analytics: Maximizing the Value of Player Data. London, UK: Springer, 2013.
  • [26] Qader MA, Zaidan BB, Zaidan AA, Ali SK, Kamaluddin MA et al. A methodology for football players selection problem based on multi-measurements criteria analysis. Measurement 2017; 111: 38-50. doi: 10.1016/j.measurement.2017.07.024
  • [27] Eberhart R, Kennedy J. A new optimizer using particle swarm theory. In: 6th International Symposium on Micro Machine and Human Science; Nagoya, Japan; 1995. pp. 39-43.
  • [28] Storn R, Price K. Differential Evolution: A Simple and Efficient Adaptive Scheme for Global Optimization over Continuous Spaces. Technical Report TR-95-012. Berkeley, CA, USA: International Computer Science Institute, 1995.
  • [29] Karaboga D, Basturk B. Artificial bee colony (ABC) optimization algorithm for solving constrained optimization problems. In: Melin P, Castillo O, Aguilar LT, Kacprzyk J, Pedrycz W (editors). Foundations of Fuzzy Logic and Soft Computing. IFSA 2007. Lecture Notes in Computer Science, Vol 4529. Berlin, Germany: Springer, 2007. doi: 10.1007/978-3-540-72950-1_77
  • [30] Agarwalla P, Mukhopadhyay S. Efficient player selection strategy based diversified particle swarm optimization algorithm for global optimization. Information Sciences 2017; 397-398: 69-90. doi: 10.1016/j.ins.2017.02.027
  • [31] Agarwalla P, Mukhopadhyay S. Hybrid advanced player selection strategy based population search for global optimization. Expert Systems with Applications 2020; 139: 112825. doi: 10.1016/j.eswa.2019.112825
  • [32] Karsak E. A fuzzy multiple-objective programming approach for personnel selection. In: International Conference on Systems, Man, and Cybernetics; Nashville, TN, USA; 2000.
  • [33] Maanijou R, Mirroshandel SA. Introducing an expert system for prediction of soccer player ranking using ensemble learning. Neural Computing and Applications 2019; 31: 9157-9174. doi: 10.1007/s00521-019-04036-9
  • [34] Caudill M. Neural network primer: part I. AI Expert 1989; 2 (12): 46-52.
  • [35] Karray FO, Silva CD. Soft Computing and Intelligent Systems Design: Theory, Tools, and Applications. New York, NY, USA: Addison Wesley Pearson Press, 2004.
  • [36] Rojas R. Neural Networks: A Systematic Introduction. Berlin, Germany: Springer-Verlag, 1996.
  • [37] Olson M, Wyner A, Berk R. Modern neural networks generalize on small data sets. Advances in Neural Information Processing Systems 2018; 31: 3619-3628.
  • [38] Cortes C, Vapnik V. Support vector networks. Machine Learning 1995; 20: 273-297. doi: 10.1023/A:1022627411411
  • [39] Wang Z, Xue X. Multi-class support vector machine. In: Ma Y, Guo G (editors). Support Vector Machines Applications. Springer, Cham, Switzerland: Springer International Publishing, 2014, pp. 23-24. doi: 10.1007/978- 3-319-02300-7_2
  • [40] Quinlan JR. Induction of decision trees. Machine Learning 1986; 1 (1): 81-106. doi: 10.1023/A:1022643204877
  • [41] Chen YL, Hsu CL, Chou SC. Constructing a multi-valued and multi-labeled decision tree. Expert Systems with Applications 2003; 25(2): 199-209. doi: 10.1016/S0957-4174(03)00047-2
  • [42] Chou PA. Optimal partitioning for classification and regression trees. IEEE Transactions on Pattern Analysis and Machine Intelligence 1991; 13 (4): 340-354. doi: 10.1109/34.88569
  • [43] Salzberg SL. Book Review: C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kauffman Publishers, Inc., 1993. Machine Learning 1994 (16): 235-240. doi: 10.1023/A:1022645310020
  • [44] Landwehr N, Hall M, Frank E. Logistic model trees. Machine Learning 2005; 59 (161). doi: 10.1007/s10994-005- 0466-3
  • [45] Greene WH. Econometric Analysis. 7th ed. Boston, MA, USA: Pearson Education, 2012.
  • [46] Snyman J. Practical Mathematical Optimization: An Introduction to Basic Optimization Theory and Classical and New Gradient-Based Algorithms. In: Pardalos PM, Hearn DW (editors). Applied Optimization, Vol. 97. New York, NY, USA: Springer, 2005.
  • [47] Bayes M, Price M. An essay towards solving a problem in the doctrine of chances. By the late rev. Mr. Bayes, F. R. S. Communicated by Mr. Price, in a letter to John Canton, A. M. F. R. S. Philosophical Transactions 1763; 53: 370-418.
  • [48] Wu X, Kumar V, Quinlan JS, Ghosh J, Yang Q et al. Top 10 algorithms in data mining. Knowledge and Information Systems 2008; 14: 1-37. doi: 10.1007/s10115-007-0114-2
  • [49] MacAllister A. Investigating the use of Bayesian networks for small dataset problems. PhD, Iowa State University, ames, IA, USA, 2018.
  • [50] Ho TK. Random decision forests. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition; Montreal, QC, Canada; 1995. pp. 278-282.
  • [51] Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. Boca Raton, FL, USA: CRC Press, 1984.
  • [52] Jaccard P. Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin de la Société Vaudoise des Sciences Naturelles 1901; 37: 547-579 (in French). doi: 10.5169/seals-266450
  • [53] Pearson K. II. Mathematical contributions to the theory of evolution. II. Skew variation in homogeneous material. Proceedings of the Royal Society of London 1985; 57: 340-346. doi: 10.1098/rspl.1894.0147
  • [54] Student. The probable error of a mean. Biometrika 1908; 6 (1): 1-25. doi: 10.2307/2331554