Makine Öğrenmesinde Yeni Bir Bakış Açısı: Otomatik Makine Öğrenmesi (AutoML)

Veriden değer çıkartma sürecinde, kaliteli bir makine öğrenmesi yaşam döngüsünün sağlanması, sağlıklı verinin eldesi kadar, doğru araç ve doğru insan işbirliğine de bağlıdır. Teknolojik gelişmeler pek çok yeni ve başarılı aracı bu döngü için kullanılabilir hale getirmişse de yetkin insan sayısının azlığı önemli bir darboğaz yaratmaktadır. Otomatik Makine Öğrenmesi (AutoML), bu darboğazın aşılmasında, insan deneyimine bağlı sürecin daha bağımsız ve demokratik hale getirilmesi için kullanılmaktadır. Bu çalışmada, AutoML kavramına, geliştirilen araçlardaki temel yaklaşımlara yer verilmiştir. Ayrıca açık kaynaklı, startup destekli ve teknoloji devleri tarafından geliştirilen bazı araçların kapsamları hakkında da bilgi verilmektedir. Çalışmada AutoML’in insan işbirliği ile elde edebileceği başarı, bir veri seti ve üç takım üzerinden yapılan deneme süreci kapsamında sunulmaktadır. Elde edilen sonuçlar, makine öğrenmesi yarışmaları düzenleyen Kaggle’ın Mayıs 2019 tarihinde düzenlediği autoML – insan yarışmasıyla da uyumludur.

Anahtar Kelimeler:

AUTOML, Makine Öğrenmesi, Veri

PDF

___

[1] Accenture, “The Team Solution to the Data Scientist Shortage”, Accenture, 2013.
[2] J. G. Harris and R. Eitel-Porter, “Data scientists: 'As rare as unicorns” https://www.theguardian.com/media-network/2015/feb/12/data-scientists-as-rare-as-unicorns, 2015.
[3] insideBIGDATA, “The Data Scientist Shortage is Huge” Here’s How to Beat It, https://insidebigdata.com/2018/12/27/data-scientist-shortage-huge-heres-beat/ , 2018.
[4] A. Woodle, What’s Driving Data Science Hiring in 2019. https://www.datanami.com/2019/01/30/whats-driving-data-science-hiring-in-2019/ , 2019.
[5] J. R. Lloyd, D. K. Duvenaud, R. B. Grosse, J. B. Tenenbaum, Z. Ghahramani, “Automatic construction and natural-language description of nonparametric regression models”. In Proceedings of the Twenty- Eighth AAAI Conference on Artificial Intelligence, 2014.
[6] M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, F. Hutter, “Efficient and robust automated machine learning”. In C. Cortes,N.D. Lawrence, D. D. Lee, M. Sugiyama,&R. Garnett (Eds.), Advances in neural information processing systems, Curran Associates, Inc, 2015, pp. 2962–2970.
[7] A. Romblay, “Automated Machine Learning” Datahack Summit, Anlytics Vidhya, 2017 [8] Bengio, Y. “Gradient-based optimization of hyperparameters”, Neural computation, 2000, 12 (8), pp. 1889-1900.
[9] J. Bergstra, Y. Bengio, Y. (2012). “Random search for hyper-parameter optimization”, Journal of Machine Learning Research, 2012, 13(Feb), pp.281-305.
[10] R. Kohavi ve G. John, “Automatic Parameter Selection by Minimizing Estimated Error” In: Prieditis, A., Russell, S. (eds.) Proceedings of the Twelfth International Conference on Machine Learning, Morgan Kaufmann Publishers. 1995, pp. 304–312.
[11] H. Escalante, M. Montes, E. Sucar, “Particle Swarm Model Selection”. Journal of Machine Learning Research, 2009, 10, pp. 405–440.
[12] C. Thornton, F. Hutter, H. Hoos, K. Leyton-Brown, “Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms”. In: Dhillon, I., Koren, Y., Ghani, R., Senator, T., Bradley, P., Parekh, R., He, J., Grossman, R., Uthurusamy, R. (eds.) The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13), ACM Press, 2013, 847–855.
[13] R. Mantovani, T. Horvath, R. Cerri, J. Vanschoren, A. Carvalho. “Hyper-Parameter Tuning of a Decision Tree Induction Algorithm”. In:5th Brazilian Conference on Intelligent Systems, IEEE Computer Society Press, 2016, 37–42.
[14] S. Sanders ve C. Giraud-Carrier, “Informing the Use of Hyperparameter Optimization Through Metalearning”. In: Gottumukkala, R., Ning, X., Dong, G., Raghavan, V., Aluru, S., Karypis, G., Miele, L., Wu, X. (eds.) 2017 IEEE International Conference on Big Data (Big Data). IEEE Computer Society Press, 2017.
[15] R. Olson, W. La Cava, Z. Mustahsan, A. Varik, J. Moore, “Data-driven advice for applying machine learning to bioinformatics problems”. In: Proceedings of the Pacific Symposium in Biocomputing 2018, 192–203.
[16] L. Kotthoff, C. Thornton, H. Hoos, F. Hutter, K. Leyton-Brown, “Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA”. Journal of Machine Learning Research, 2017, 18, 1–5.
[17] F. Hutter, R. Caruana, R. Bardenet, M. Bilenko, I. Guyon, B. Kegl, H. Larochelle, “AutoML”, ICML Workshop, 2014.
[18] B. Komer, J. Bergstra, C. Eliasmith, “Hyperopt-sklearn: Automatic hyperparameter configuration for scikit-learn”. In ICML workshop on AutoML, 2014.
[19] M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, “Efficient and Robust Automated Machine Learning”. Advances in Neural Information Processing Systems 2015, 28, 2962–2970.
[20] A. Balaji, A. Allen, “Benchmarking Automatic Machine Learning Frameworks”, 2018, arXiv:1808.06492 CoRR abs/1808.06492. URL http://arxiv.org/abs/1808.06492.
[21] F. Hutter, H. Hoos, K. Leyton-Brown, “Sequential model-based optimization for general algorithm con_guration”. In International Conference on Learning and Intelligent Optimization, Springer , 2011, 507-523.
[22] K. Swersky, J. Snoek, P. R. Adams, Freeze-thaw bayesian optimization, 2014, arXiv preprint arXiv:1406.3896.
[23] M. Munoz, L. Villanova, D. Baatar, K. Smith-Miles, “Instance spaces for machine learning classification”, Machine Learning, 2018, 107(1), pp.109-147.
[24] B. Zoph, Q. V. Le, Neural architecture search with reinforcement learning, 2016, arXiv preprint arXiv:1611.01578.
[25] Y. Lu, “An end-to-end autoML aolution for tabular data at KaggleDays”, Google AI Blog: 2019, May 9, https://ai.googleblog.com/2019/05/an-end-to-end-automl-solution-for.html.