MAKİNE ÖĞRENİMİ İLE SHAPLEY KONSANTRASYON BÖLGESİNDE GALAKSİLERİN SINIFLANDIRILMASI

Galaksiler, kütle çekim kuvvetiyle bir arada bulunan yıldızlar, gaz, toz ve karanlık maddeden meydana gelen sistemlerdir. Evrende milyarlarca galaksi bulunmaktadır. Her bir galaksinin tek tek incelenmesinin maliyeti yüksek olduğundan galaksi sınıflandırması astronomik veri analizinde önemli bir yer tutmaktadır. Galaksiler morfolojilerine ve spektral özelliklerine göre sınıflandırılmaktadır. Veri seti içindeki gizli örüntüyü ortaya çıkarmayı amaçlayan makine öğrenme yöntemleri mevcut veriyi analiz ederek doğal grupları henüz tespit edilmemiş olan galaksilerin hangi gruba ait olduğunu tahmin etmek amacıyla kullanılabilir. Bu da gerek araştırmacılara gerekse astronomlara zaman ve maliyet açısından kazanç sağlayacaktır. Bu çalışma da Shapley Konsantrasyon bölgesindeki 4215 galaksi, 5 değişken (enlem, boylam, parlaklık, hız ve hızdaki sapma) dikkate alınarak sınıflandırılmıştır. IDL programlama ile doğal grupları tespit edilen galaksiler Weka programı ile makine öğrenme algoritmaları kullanılarak sınıflandırılmıştır. Bayes Sınıflandırıcı yöntemlerinden Naive Bayes ve Bayes net, Karar Ağaçları yöntemlerinden J48, LMT ve Random Forest algoritmaları, Yapay Sinir Ağlarından Çok Katmanlı Algılayıcılar ve Destek Vektör sınıflandırıcı yöntemleri kullanılmıştır. Elde edilen sınıflandırma sonuçları doğal gruplarla karşılaştırılmış ve yöntemlerin tahmin performansları değerlendirilmiştir.

CLASSIFICATION OF GALAXIES IN SHAPLEY CONCENTRATION REGION WITH MACHINE LEARNING

The galaxies, are the systems consisting of stars, gas, dust and dark matter combined with the gravitational force. Thereare billions of galaxies in the universe. Since the cost of examining each galaxy one by one is high, the classification of thegalaxy is an important part of the astronomical data analysis. Galaxies are classified according to their morphologies andspectral properties. Machine learning methods aiming the revealing of hidden pattern within the data set by analyzing theavailable data, can be used to estimate unidentified natural groups of galaxies. This will save time and cost for bothresearchers and astronomers. This study has been classified five-variables (Right ascension, Declination, Magnitude,Velocity, and Sigma of Velocity) of 4215 galaxies. Galaxies whose natural groups were determined with IDL were classifiedby using machine learning algorithms with Weka program. Bayes classifier methods, Naive Bayes and Bayes net, Decisiontree methods J48, LMT and Random Forest algorithms, Artificial Neural Networks Multilayer Perceptron and Supportvector classifier methods were used. The obtained classification results were compared with the natural groups and thepredictive performance of the methods were evaluated.

___

  • [1] Hubble E., “Extra-galactic Nebulae”, Contributions from the Mount Wilson Observatory Astrophysical Journal, Vol. LXIV, No.324, pp.321-369,1926.
  • [2] Vaucouleurs, G. D., “Classification and Morphology of External Galaxies”, Handbuch der Physik, Vol.11 No.53, pp. 275-310, 1959.
  • [3] Sandage, A., “Hubble Atlas of Galaxies”, Vol. 618, Carnegie Institution of Washington, Washington D.C, 1961.
  • [4] Lotz J.M., Primack J, and Madau P., “A New Nonparametric Approach to Galaxy Morphological Classıification”, the Astronomical Journal, Vol.128, No.1, pp. 163–182, 2004.
  • [5] Dressler, A., “A Catalog of Morphological Types In 55 Rich Clusters Of Galaxies”, The Astrophysical Journal Supplement Series, 42, pp. 565-609, 1979.
  • [6] Kasivajhula S., Raghavan N. and Shah H., “Morphological Galaxy Classification Using Machine Learning” cs229.stanford.edu, 2007.
  • [7] Marin M., “A Hierarchical Model for Morphological Galaxy Classification”, Proceedings of the Twenty- Sixth International FLAIRS Conference, 2013, Florida, USA.
  • [8] Miller A. S. and Coe M. J., “Star/Galaxy Classification Using Kohonen Self-Organizing Maps”, Mon. Not. R. Astron. Soc. Vol. 279, pp. 293-300, 1996.
  • [9] Bailin J. and Harris W.E., “Inclination-Independent Galaxy Classification”, The Astrophysical Journal, Vol. 681, pp. 225-231, 2008.
  • [10] Gauci A., Kristian Zarb Adami K.Z. and Abela J., “Machine Learning for Galaxy Morphology Classification”, Mon. Not. R. Astron. Soc., Vol. 000, 1– 8, arXiv preprint arXiv:1005.0390, 2010.
  • [11] Gauthier A., Archa Jain A. and Emil Noordeh E., “Galaxy Morphology Classification”, Lecture Notes, Stanford University, 16 December,2016.
  • [12] D.V. Dobrycheva, I. B. Vavilova,, O. V., Melnyk, and A. A, Elyiv, “Machine learning technique for morphological classification of galaxies at z < 0.1 from the SDSS”, Astronomy & Astrophysics manuscript no. Dobrycheva-EWASS-15 December 27, 2017.
  • [13] Remya G. and Mohan A., “Deep Learning Approach for Classifying Galaxies”, International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 6, Issue 4, April 2016.
  • [14] Selim, I., Keshk, A. E., & El Shourbugy, B. M. Galaxy image classification using non-negative matrix factorization. International Journal of Computer Applications, 137(5), 4-8, 2016.
  • [15] Z.Frei and J.E.Gun,” A Catalog Of Digital Images Of 113 Nearby Galaxıes” Astrophysics and Space Science, Volume 269, Issue 0, pp 649–650, 1999.
  • [16] Goderya N.S. and Lolling S.M., “Morphological classification of Galaxies Using Computer Vision and Artificial Neural Networks: A Computational scheme”, Astrophysics and Space Science 279: 377– 387, 2002.
  • [17] Abell, G. O., “The Distribution of rich clusters of galaxies”, The Astrophysical Journal Supplement Series, 3, pp.211-288, 1957.
  • [18] Abell, G. O., Corwın, H. G., Jr., Olowin, R. P., “A Catalog of Rich Clusters of Galaxies”, The Astrophysical Journal Supplement Series, 70, pp.1-138, 1988.
  • [19] Driver S.P., Liske J., Cross N. J. G., De Propris R. and Allen P. D., “The Millennium Galaxy Catalogue: The Space Density and Surface-Brightness Distribution(S) Of Galaxies”, Mon. Not. R. Astron. Soc. 360, 81–103, 2005.
  • [20] http://astrostatistics.psu.edu/datasets/Shapley_galaxy.d at
  • [21] http://vizier.u-strasbg.fr/viz-bin/VizieR-2
  • [22] James, Gareth, et all. An introduction to statistical learning. Vol. 112. New York: springer, 2013.
  • [23] Haykin, Simon. Neural networks. Vol. 2. New York: Prentice hall, 1994.
  • [24] Ilin R., Kozma R. and Werbos P. J., "Beyond feedforward models trained by backpropagation: A practical training tool for a more efficient universal approximator." IEEE Transactions on Neural Networks 19.6 (2008): 929-937.
  • [25] Rumelhart, D. E., Hinton, G. E., & Williams, R. J., “Learning internal representations by error propagation” No. ICS-8506. California Univ San Diego La Jolla Inst for Cognitive Science, 1985.
  • [26] Cortes C., Vapnik V., 1995, “Support-Vector Networks”, Kluwer Academic Publishers Machine Learning, Vo. 20, No.3, pp. 273-297
  • [27] Quinlan, J. R., "Simplifying decision trees." International journal of man-machine studies 27.3 (1987): 221-234.
  • [28] Korting, T.S., "C4. 5 Algorithm and Multivariate Decision Trees." Image Processing Division, National Institute for Space Research–INPE Sao Jose dos Campos–SP, Brazil 2006.
  • [29] Shanon C., “A Mathematical Theory Of Communication”, Bell System Tech. J. 27: 379-423, 623-656, 1948.
  • [30] Landwehr, N., Hall, M., & Frank, E., “Logistic Model Trees,” Machine Learning, 59, pp.161-205, 2005.
  • [31] Breiman L., "Random Forests", Machine Learning, Vol.45, No. 1, pp.5–32, 2001.
  • [32] Domingos, P., and Pazzani, M., “Beyond independence: Conditions for the optimality of the simple Bayesian classifier”, Machine Learning 29:103–130. 1997.
  • [33] Friedman, N., Geiger, D., & Goldszmidt, M., "Bayesian Network Classifiers." Machine learning 29,2-3 131- 163, 1997.
  • [34] Wampold, B. E., “Kappa As A Measure Of Pattern İn Sequential Data”, Quality&Quantity, 23, 171-187, 1989,
  • [35] Godbole S., Sarawagi S., and Chakrabarti S. "Scaling multi-class support vector machines using interclass confusion." Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2002.
  • [36] Friedman M., “The use of ranks to avoid the assumption of normality implicit in the analysis of variance.” J. Am. Stat. Assoc. 32:675-701, 1937.