Classification and Regression Using Automatic Machine Learning (AutoML) – Open Source Code for Quick Adaptation and Comparison

Classification and Regression Using Automatic Machine Learning (AutoML) – Open Source Code for Quick Adaptation and Comparison

This paper presents a comprehensive exploration of automatic machine learning (AutoML) tools in the context of classification and regression tasks. The focus lies on understanding and illustrating the potential of these tools to accelerate and optimize the process of machine learning, thereby making it more accessible to non-experts. Specifically, we delve into multiple popular open-source AutoML tools and provide illustrative examples of their application. We first discuss the fundamental principles of AutoML, including its key features such as automated data preprocessing, feature engineering, model selection, hyperparameter tuning, and model validation. We subsequently venture into the hands-on application of these tools, demonstrating the implementation of classification and regression tasks using multiple open-source AutoML tools. We provide open-source code samples for two data scenarios for classification and regression, designed to assist readers in quickly adapting AutoML tools for their own projects and in comparing the performance of different tools. We believe that this contribution will aid both practitioners and researchers in harnessing the power of AutoML for efficient and effective machine learning model development.

___

  • [1] Patil, P. S., Kappuram, K., Rumao, R., & Bari, P. (2022, May). Development Of AMES: Automated ML Expert System. In 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON) (Vol. 1, pp. 208-213). IEEE.
  • [2] Glasby, L. T., Whaites, E. H., & Moghadam, P. Z. (2023). Machine Learning and Digital Manufacturing Approaches for Solid‐State Materials Development. AI‐Guided Design and Property Prediction for Zeolites and Nanoporous Materials, 377-409.
  • [3] Pugliese, R., Regondi, S., & Marini, R. (2021). Machine learning-based approach: Global trends, research directions, and regulatory standpoints. Data Science and Management, 4, 19-29.
  • [4] 4-Lu, S. C., Swisher, C. L., Chung, C., Jaffray, D., & Sidey-Gibbons, C. (2023). On the importance of interpretable machine learning predictions to inform clinical decision making in oncology. Frontiers in Oncology, 13, 780.
  • [5] Manduchi, E., & Moore, J. H. (2021). Leveraging automated machine learning for the analysis of global public health data: a case study in malaria. International Journal of Public Health, 31.
  • [6] Hutter, F., Kotthoff, L., & Vanschoren, J. (2019). Automated machine learning: methods, systems, challenges (p. 219). Springer Nature.
  • [7] Chauhan, K., Jani, S., Thakkar, D., Dave, R., Bhatia, J., Tanwar, S., & Obaidat, M. S. (2020, March). Automated machine learning: The new wave of machine learning. In 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA) (pp. 205-212). IEEE.
  • [8] Singh, V. K., & Joshi, K. (2022). Automated Machine Learning (AutoML): An overview of opportunities for application and research. Journal of Information Technology Case and Application Research, 24(2), 75-85.
  • [9] Heizmann, M., Braun, A., Glitzner, M., Günther, M., Hasna, G., Klüver, C., ... & Ulrich, M. (2022). Implementing machine learning: chances and challenges. at-Automatisierungstechnik, 70(1), 90-101.
  • [10] Majidi, F., Openja, M., Khomh, F., & Li, H. (2022, October). An Empirical Study on the Usage of Automated Machine Learning Tools. In 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME) (pp. 59-70). IEEE.
  • [11] Mengi, G., Singh, S. K., Kumar, S., Mahto, D., & Sharma, A. (2023, February). Automated Machine Learning (AutoML): The Future of Computational Intelligence. In International Conference on Cyber Security, Privacy and Networking (ICSPN 2022) (pp. 309-317). Cham: Springer International Publishing.
  • [12] Erickson, Nick & Mueller, Jonas & Shirkov, Alexander & Zhang, Hang & Larroy, Pedro & Li, Mu & Smola, Alexander. (2020). AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. ArXiv, abs/2003.06505.
  • [13] M. Feurer, A. Klein, K. Eggensperger, J. T. Springenberg, M. Blum, and F. Hutter, “Auto-sklearn: Efficient and Robust Automated Machine Learning,” Automated Machine Learning, pp. 113–134, 2019, doi: https://doi.org/10.1007/978-3-030-05318-5_6.
  • [14] Alaiad, A., Migdady, A., Al-Khatib, R. E. M., Alzoubi, O., Zitar, R. A., & Abualigah, L. (2023). Autokeras Approach: A Robust Automated Deep Learning Network for Diagnosis Disease Cases in Medical Images. Journal of Imaging, 9(3), 64.
  • [15] Filippou, K., Aifantis, G., Papakostas, G. A., & Tsekouras, G. E. (2023). Structure Learning and Hyperparameter Optimization Using an Automated Machine Learning (AutoML) Pipeline. Information, 14(4), 232.
  • [16] Vincent, A. M., & Jidesh, P. (2023). An improved hyperparameter optimization framework for AutoML systems using evolutionary algorithms. Scientific Reports, 13(1), 4737.
  • [17] Jin, H., Chollet, F., Song, Q., & Hu, X. (2023). AutoKeras: An AutoML Library for Deep Learning. Journal of Machine Learning Research, 24(6), 1-6.
  • [18] Lee, S., Kim, J., Bae, J. H., Lee, G., Yang, D., Hong, J., & Lim, K. J. (2023). Development of Multi-Inflow Prediction Ensemble Model Based on Auto-Sklearn Using Combined Approach: Case Study of Soyang River Dam. Hydrology, 10(4), 90.
  • [19] Feurer, M., Eggensperger, K., Falkner, S., Lindauer, M., & Hutter, F. (2022). Auto-sklearn 2.0: Hands-free automl via meta-learning. The Journal of Machine Learning Research, 23(1), 11936-11996.
  • [20] Shi, M., & Shen, W. (2022). Automatic Modeling for Concrete Compressive Strength Prediction Using Auto-Sklearn. Buildings, 12(9), 1406.
  • [21] LeDell, E., & Poirier, S. (2020, July). H2o automl: Scalable automatic machine learning. In Proceedings of the AutoML Workshop at ICML (Vol. 2020).
  • [22] Singh, K., & Malhotra, D. (2023). Meta-Health: Learning-to-Learn (Meta-learning) as a Next Generation of Deep Learning Exploring Healthcare Challenges and Solutions for Rare Disorders: A Systematic Analysis. Archives of Computational Methods in Engineering, 1-32.
  • [23] Mohr, F., Wever, M., & Hüllermeier, E. (2018). ML-Plan: Automated machine learning via hierarchical planning. Machine Learning, 107, 1495-1515.
  • [24] Whig, P., Gupta, K., Jiwani, N., Jupalle, H., Kouser, S., & Alam, N. (2023). A novel method for diabetes classification and prediction with Pycaret. Microsystem Technologies, 1-9.
  • [25] Huynh, T., Mazumdar, H., Gohel, H., Emerson, H., & Kaplan, D. Evaluating the Predictive Power of Multiple Regression Models for Groundwater Contamination using PyCaret-23489.
  • [26] Liu, X., Wu, J., & Chen, S. (2023). Efficient hyperparameters optimization through model-based reinforcement learning with experience exploiting and meta-learning. Soft Computing, 1-18.
  • [27] Pol. U.R. and Sawant, T.U. “Automl: building an classification model with PyCaret”, YMER, vol. 20, pp. 547-552, Dec. 2021, doi: 10.37896/YMER20.11/50
  • [28] N. Sarangpure, V. Dhamde, A. Roge, J. Doye, S. Patle and S. Tamboli, "Automating the Machine Learning Process using PyCaret and Streamlit," 2023 2nd International Conference for Innovation in Technology (INOCON), Bangalore, India, 2023, pp. 1-5, doi: 10.1109/INOCON57975.2023.10101357.
  • [29] R. S. Olson et al., “TPOT”, Accessed on March 3, 2023, Available: http://epistasislab.github.io/tpot/
  • [30] Chen, X., Xu, J., Zhou, H., Zhao, Y., Wu, Y., Zhang, J., & Zhang, S. (2023). Tree-based machine learning models assisted fluorescent sensor array for detection of metal ions based on silver nanocluster probe. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 297, 122738.
  • [31] Xiang, C. Y., Gao, F., Jakovlić, I., Lei, H. P., Hu, Y., Zhang, H., ... & Zhang, D. (2023). Using PhyloSuite for molecular phylogeny and tree‐based analyses. iMeta, 2(1), e87.
  • [32] Grjazniha, M. (2023). Performance and Competitiveness of Tree-Based Pipeline Optimization Tool (Doctoral dissertation).
  • [33] Github: https://github.com/research-outcome/automl-sample
  • [34] Regression Dataset Based on a Used Card Dataset at Kaggle: https://www.kaggle.com/datasets/lepchenkov/usedcarscatalog
  • [35] Classification Dataset Adapted from Kaggle Titanic Competition: https://www.kaggle.com/competitions/titanic