Supervised Learning Approaches to Flight Delay Prediction

Supervised Learning Approaches to Flight Delay Prediction

Delays in flights and other airline operations have significant consequences in quality of service, operational costs, and customer satisfaction. Therefore, it is important to predict the occurrence of delays and take necessary actions accordingly. In this study, we addressed the flight delay prediction problem from a supervised machine learning perspective. Using a realworld airline operations dataset provided by a leading airline company, we identified optimum dataset features for optimum prediction accuracy. In addition, we trained and tested 11 machine learning models on the datasets that we created from the original dataset via feature selection and transformation. CART and KNN showed consistently good performance in almost all cases achieving 0.816 and 0.807 F-Scores respectively. Similarly, GBM, XGB, and LGBM showed very good performance in most of the cases, achieving F-Scores around 0.810.Keywords: air transportation, flight delay prediction, machine learning, data science Mehmet Cemal ATLIOĞLU1 , Mustafa BOLAT1, Murat ŞAHİN2 , Volkan TUNALI*3 , Deniz KILINÇ 1 Tav Technology, İstanbul, E-Mail: MehmetCemal.Atlioglu@tav.aeroE-Mail: Mustafa.Bolat@tav.aero ORCID: https://orcid.org/0000-0003-1289-2715ORCID: https://orcid.org/0000-0001-8169-0629 2 Manisa Celal Bayar University, Faculty of Technology, E-Mail: muratpq@gmail.comORCID: https://orcid.org/0000-0002-2866-8796 * Corresponding Author: volkan.tunali@gmail.com 3 Maltepe University, Faculty of Engineering and Natural SciencesORCID: https://orcid.org/0000-0002-2735-7996 4İzmir Bakırçay University, Faculty of Engineering and Architecture,

___

  • [1] N. Pyrgiotis, K. M. Malone, and A. Odoni, "Modelling delay propagation within an airport network," Transportation Research Part C: Emerging Technologies, vol. 27, pp. 60-75, 2013.
  • [2] J. J. Rebollo and H. Balakrishnan, "Characterization and prediction of air traffic delays," Transportation Research Part C: Emerging Technologies, vol. 44, pp. 231-241, 2014.
  • [3] Y. Ding, "Predicting flight delay based on multiple linear regression," in 2nd International Conference on Materials Science, Energy Technology and Environmental Engineering (MSETEE 2017), Zhuhai, China, pp. 1-8, 2017.
  • [4] N. Chakrabarty, "A Data Mining Approach to Flight Arrival Delay Prediction for American Airlines," CoRR, vol. abs/1903.06740, 2019.
  • [5] B. Yu, Z. Guo, S. Asian, H. Wang, and G. Chen, "Flight delay prediction for commercial air transport: A deep learning approach," Transportation Research Part E: Logistics and Transportation Review, vol. 125, pp. 203-221, 2019.
  • [6] H. Khaksar and A. Sheikholeslami, "Airline delay prediction by machine learning algorithms," Scientia Iranica, vol. 26, pp. 2689-2702, 2017.
  • [7] G. Gui, F. Liu, J. Sun, J. Yang, Z. Zhou, and D. Zhao, "Flight Delay Prediction Based on Aviation Big Data and Machine Learning," IEEE Transactions on Vehicular Technology, vol. 69, pp. 140- 150, 2020.
  • [8] E. Alpaydın, Introduction to Machine Learning, 3rd ed. London, England: The MIT Press, 2014.
  • [9] J. Han and M. Kamber, Data Mining: Concepts and Techniques. USA: Morgan Kaufmann Publishers, 2006.
  • [10] J. C. Platt, "Fast training of support vector machines using sequential minimal optimization," in Advances in kernel methods, S. Bernhard, J. C. B. Christopher, and J. S. Alexander, Eds., ed: MIT Press, pp. 185-208, 1999.
  • [11] L. Breiman, "Random Forests," Machine Learning, vol. 45, pp. 5-32, 2001.
  • [12] T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," presented at the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA, 2016.
  • [13] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, "LightGBM: a highly efficient gradient boosting decision tree," presented at the Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, California, USA, 2017.
  • [14] A. V. Dorogush, V. Ershov, and A. Gulin, "CatBoost: gradient boosting with categorical features support," CoRR, vol. abs/1810.11363, 2018.
  • [15] W. McKinney, "pandas: a foundational Python library for data analysis and statistics," Python for High Performance and Scientific Computing, vol. 14, 2011.
  • [16] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and É. Duchesnay, "Scikit-learn: Machine Learning in Python," Journal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011.
Sakarya Üniversitesi Fen Bilimleri Enstitüsü Dergisi-Cover
  • ISSN: 1301-4048
  • Yayın Aralığı: Yılda 6 Sayı
  • Başlangıç: 1997
  • Yayıncı: Sakarya Üniversitesi Fen Bilimleri Enstitüsü