An Educational Approach to Higgs Boson Hunting Using Machine Learning Classification Algorithms on ATLAS Open Data

An Educational Approach to Higgs Boson Hunting Using Machine Learning Classification Algorithms on ATLAS Open Data

In this study, the performance of several classification algorithms that are used to separate the H → ττ signal from background is investigated. The data set came from the publicly available ATLAS data, which was utilized for the Machine Learning (ML) competition. The data was obtained from a full ATLAS simulation and originated from proton-proton collisions. There are 250 thousand events in the data set, and 70% of them were used to train the algorithms. The primary objective of this research is to identify the signal events from the background events by using various ML methods in the context of high-energy physics. In order to discover a solution to the binary classification problem that was discussed earlier, six distinct classification algorithms were utilized. This article also compares the performance of these classification algorithms, including Linear Support Vector Machines (SVM), Radical SVM, Logistic Regression, K-Nearest Neighbours, XGBoost Classifier, and the AdaBoost Classifier. The best results were obtained using the XGBoost Classification method, which had an AUC of 0.84 ± 1.9 x 10-3 followed by the AdaBoost Classifier with an AUC of 0.82 ± 2.5 x 10-3.

___

  • Aaboud, M., et al. (ATLAS Collaboration). (2018a). Measurement of the Higgs boson mass in the H → ZZ* → 4ℓ and H → γγ channels with s=13 TeV pp collisions using the ATLAS detector. Physics Letters B, 784,345-366. https://doi.org/10.1016/j.physletb.2018.07.050
  • Aaboud, M., et al. (ATLAS Collaboration). (2018b). Measurement of the Higgs boson coupling properties in the H → ZZ* → 4ℓ decay channel at √s=13 TeV with the ATLAS detector. J. High Energ. Phys, 95. https://doi.org/10.1007/JHEP03(2018)095
  • Aaboud, M., et al. (ATLAS Collaboration). (2019a). Measurements of gluon–gluon fusion and vector-boson fusion Higgs boson production cross-sections in the H → WW* → eνμν decay channel in pp collisions at √s=13 TeV with the ATLAS detector. Physics Letters B, 789, 508-529. https://doi.org/10.1016/j.physletb.2018.11.064
  • Aaboud, M., et al. (ATLAS Collaboration). (2019b). Cross-section measurements of the Higgs boson decaying into a pair of τ leptons in proton-proton collisions at √s=13 TeV with the ATLAS detector. Phys. Rev. D, 99,072001. https://doi.org/10.1103/PhysRevD.99.072001
  • Aad, G., et al. (ATLAS Collaboration). (2022). Measurements of Higgs boson production cross-sections in the H→τ^+ τ^-decay channel in pp collisions at √s=13 TeV with the ATLAS detector. JHEP, 08, 175. https://doi.org/10.1007/JHEP08(2022)175
  • Aad, G. et al. (ATLAS Collaboration). (2020). Test of CP invariance in vector-boson fusion production of the Higgs boson in the H → ττ channel in proton–proton collisions at √s=13 TeV with the ATLAS detector. Phys. Lett. B, 805, 135426. https://doi.org/10.1016/j.physletb.2020.135426
  • Aad, G. et al. (ATLAS Collaboration). (2015). Evidence for the Higgs-boson Yukawa coupling to tau leptons with the ATLAS detector. JHEP, 117. https://doi.org/10.1007/JHEP04(2015)117
  • Aad, G., et al. (ATLAS Collaboration) (2012). Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC. Physics Letters B, 716(1), 1-29. https://doi.org/10.1016/j.physletb.2012.08.020
  • Adam-Bourdarios, C., Cowan, G., Germain, G., Guyon, I., Kegl, B., Rousseau, D., (2015). The Higgs boson machine learning challenge. 664, s. 072015. J. Phys.: Conf. Ser., DOI 10.1088/1742-6596/664/7/072015 Armstrong, W., et al. (ATLAS Collaboration). (1994). ATLAS: technical proposal for a general-purpose pp experiment at the large hadron collider at CERN. ATLAS Collaboration. doi:Retrived from: doi: 10.17181/CERN.NR4P.BG9K.
  • ATLAS Collaboration. (2014). Dataset from the ATLAS Higgs Boson Machine Learning Challenge 2014. January 2022 tarihinde opendata. Open Data. Retrived January 16, 2023, from http://opendata.cern.ch/record/328. ATLAS Collaboration. (2022). A detailed map of Higgs boson interactions by the ATLAS experiment ten years after the discovery. Nature, 607, 52–59. https://doi.org/10.1038/s41586-022-04893-w.
  • Atkin, R. (2015). Review of the reconstruction algorithms. J. Phys.: Conf. Ser., 645 012008. DOI: 10.1088/1742-6596/645/1/012008
  • Bonnin, R., (2017). Machine Learning for Developers: Uplift your regular applications with the power of statistics, analytics, and machine learning. Packt Publishing (First publish).
  • Butterworth, J.M., Davison, A.R., Salam, G.P., (2008). Jet Substructure as a New Higgs-Search Channel at the Large Hadron Collider. Phys. Rev. Lett., 100,242001. doi.org/10.1103/PhysRevLett.200.24001 Browne, M.W. (2000). Cross-Validation Methods. Journal of Mathematical Psychology. 44-p 108-132. https://doi.org/10.1006/jmps.1999.1279.
  • Bruce, P., Bruce, A., Gedeck, P., (2020). Practical Statistics for Data Sciences (Nicole, T.). (Second Edition). O'Reilly Media.
  • Chatrchyan, S., et al. (CMS Collaboration) (2012). Observation of a New Boson at a Mass of 125 GeV with the CMS Experiment at the LHC. Phys. Lett. B, 716, 30--61. https://doi.org/10.1016/j.physletb.2012.08.021
  • Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD. International Conference on Knowledge Discovery and Data Mining, (pp. 785–794). New York, NY, USA: ACM. https://doi.org/10.1145/2939672.2939785
  • CMS Collaboration. (2022). A portrait of the Higgs boson by the CMS experiment ten years after the discovery. Nature, 607, 60–68. https://doi.org/10.1038/s41586-022-04892-x
  • Cortes, C., Vapnik, V., (1995). Support-vector networks. Machine Learning, 20, 273–297. https://doi.org/10.1007/BF00994018
  • Flechl, M., (2015). Higgs physics: Review of recent results and prospects from ATLAS and CMS. J. Phys. Conf. Ser., 631(1), 012028. https://doi.org/10.1088/1742-6596/631/1/012028 Fernow, R.C., (1983). Introduction to Experimental Particle Physics. Cambridge University Press. DOI: 10.1017/9781009290098.
  • Mucherino, A., Papajorgji, P.J., Pardalos, P.M. (2009). k-Nearest Neighbor Classification. In: Data Mining in Agriculture. Springer Optimization and Its Applications, vol 34. Springer, New York, NY. https://doi.org/10.1007/978-0-387-88615-2_4
  • Müller, A.C., Guido, S. (2016). Introduction to Machine Learning with Python. O’Reilly. ISBN: 9781449369897. Nettleton, D. (2014). Commercial Data Mining-Chapter 6 - Selection of Variables and Factor Derivation. p 79-104. https://doi.org/10.1016/B978-0-12-416602-8.00006-6
  • Pedregosa, F., et al., (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 12, p2825-2830. https://doi.org/10.48550/arXiv.1201.0490
  • Rao, A. S., Vardhan, B. V., and Shaik, H. (2021). Role of Exploratory Data Analysis in Data Science. 6th International Conference on Communication and Electronics Systems (ICCES), Coimbatre, India, 2021, pp. 1457-1461. https://doi.org/10.1109/ICCES51350.2021.9488986
  • Schapire, R. E. (2013). Explaining AdaBoost. In: Schölkopf, B., Luo, Z., Vovk, V. (eds) Empirical Inference. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41136-6_5
  • Scikit Learn. (2013a). sklearn.preprocessing.LabelEncoder. Sklearn. Retrived January 16 , 2023, https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html.
  • Scikit Learn. (2013b). sklearn.preprocessing.StandardScaler.Sklearn. Retrived January 16, 2023, from https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html.
  • Scikit Learn. (2013c). sklearn.model_selection.GridSearchCv. Sklearn. Retrived January 16, 2023, https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html (Accesed: May 2023)
  • Scikit Learn. (2013d). sklearn.model_selection.StratifiedKFold. Sklearn. January 16 Retrived, 2023, https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html
  • Tumasyan, A. a. (2022). Measurement of the inclusive and differential Higgs boson production cross sections in the decay mode to a pair of τ leptons in pp collisions at √s=13 TeV. Phys.Rev.Lett., 128, 081805. https://doi.org/10.1103/PhysRevLett.128.081805
  • Vinutha, H.P., Poornima, B., Sagar, B.M. (2018). Detection of Outliers Using Interquartile Range Technique from Intrusion Dataset. In: Satapathy, S., Tavares, J., Bhateja, V., Mohanty, J. (eds) Information and Decision Sciences. Advances in Intelligent Systems and Computing, vol 701. Springer, Singapore. https://doi.org/10.1007/978-981-10-7563-6_53