Sigmoid-Gumbel: Yeni Bir Hibrit Aktivasyon Fonksiyonu

Bu makalede daha önce sunulan aktivasyon fonksiyonlarının olumlu yanlarını birleştiren ve onlardan daha iyi başarım sağlayan ve Sigmoid-Gumbel (SG) olarak adlandırılan yeni bir hibrit aktivasyon fonksiyonu önerilmiştir. Önerilen fonksiyonun başarımını değerlendirmek için dört uygulama yapılmıştır. Yapılan uygulamalarda karşılaştırma fonksiyonları olarak Sigmoid, Gumbel, ReLU ve Adaptive Gumbel fonksiyonları kullanılmıştır. Uygulamalarda MLP ve CNN sinir ağı modelleri kullanılmıştır. MLP ağı derin öğrenmede ikili sınıflandırma sınıf dengesizliği problemi için kullanılmıştır. CNN ağı ise derin öğrenmede görüntü sınıflandırma uygulamaları yapmak üzere tercih edilmiştir. Birinci uygulamada, önerilen fonksiyonun etkinliğini göstermek için MLP ağında 25 dengesiz veri kümesi kullanılmıştır. En yüksek AUC ortalamasını 0.9013 değeri ile SG elde etmiştir. İkinci uygulamada, önerilen fonksiyon CNN ağında MNIST veri kümesi kullanılarak Sigmoid ve Gumbel fonksiyonlarıyla karşılaştırılmıştır. En yüksek ortalama doğruluk değerini 0.9921 ile SG elde etmiştir. Üçüncü uygulamada, önerilen fonksiyonun üç farklı versiyonu karşılaştırılmıştır. Bunun için Fashion-MNIST veri kümesi CNN ağı üzerinde denenmiştir. En yüksek doğruluğu 0.9351 ortalama değeri ile SGv3 elde etmiştir. Dördüncü uygulamada, önerilen fonksiyon CNN ağında MNIST veri kümesi kullanılarak ReLU ve Adaptive Gumbel fonksiyonlarıyla karşılaştırılmıştır. En yüksek başarım 0.9926 değeri ile SG tarafından elde edilmiştir. Yapılan deney sonuçlarına bakıldığında önerilen aktivasyon fonksiyonunun genel olarak daha başarılı olduğu görülmektedir.

Sigmoid-Gumbel: A New Hybrid Activation Function

In this article, a new hybrid activation function called Sigmoid-Gumbel (SG) is proposed, which combines the positive aspects of the previously presented activation functions and performs better than them. Four applications were made to evaluate the performance of the proposed function. In the applications, Sigmoid, Gumbel, ReLU and Adaptive Gumbel functions were used as comparison functions. MLP and CNN neural network models were used in the applications. MLP network is used for binary classification class imbalance problem in deep learning. CNN network is preferred to perform image classification applications in deep learning. In the first application, 25 unbalanced datasets are used in the MLP network to demonstrate the effectiveness of the proposed function. SG had the highest mean AUC with a value of 0.9013. In the second application, the proposed function is compared with the Sigmoid and Gumbel functions using the MNIST dataset in the CNN network. SG obtained the highest average accuracy value of 0.9921. In the third application, three different versions of the proposed function are compared. For this, the FashionMNIST dataset has been tested on the CNN network. SGv3 achieved the highest accuracy with an average value of 0.9351. In the fourth application, the proposed function is compared with ReLU and Adaptive Gumbel functions using MNIST dataset in CNN network. The highest performance was obtained by SG with a value of 0.9926. Considering the experimental results, it is seen that the proposed activation function is more successful in general.

PDF

___

[1] A. A. Süzen, Z. Yildiz, and T. Yilmaz, “LSTM tabanlı Derin Sinir Ağı ile Ayak Taban Basınç Verilerinden VKİ Durumlarının Sınıflandırılması”, Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, vol. 8, no. 4, pp. 1392–1398, 2019.
[2] L. Munkhdalai, T. Munkhdalai, O.-E. Namsrai, J. Lee, and K. Ryu, “An empirical comparison of machine-learning methods on bank client credit assessments”, Sustainability, vol. 11, no. 3, p. 699, 2019.
[3] L. Munkhdalai, L. Wang, H. W. Park, and K. H. Ryu, “Advanced neural network approach, its explanation with LIME for credit scoring application”, in Intelligent Information and Database Systems, Cham: Springer International Publishing, 2019, pp. 407–419.
[4] S. Wang, W. Liu, J. Wu, L. Cao, Q. Meng, and P. J. Kennedy, “Training deep neural networks on imbalanced data sets”, in 2016 International Joint Conference on Neural Networks (IJCNN), 2016.
[5] F. Zhou, S. Yang, H. Fujita, D. Chen, and C. Wen, “Deep learning fault diagnosis method based on global optimization GAN for unbalanced data”, Knowl. Based Syst., vol. 187, no. 104837, p. 104837, 2020.
[6] F. Manessi and A. Rozza, “Learning combinations of activation functions”, in 2018 24th International Conference on Pattern Recognition (ICPR), 2018.
[7] A. Apicella, F. Isgrò, and R. Prevete, “A simple and efficient architecture for trainable activation functions”, Neurocomputing, vol. 370, pp. 1–15, 2019.
[8] B. Ding, H. Qian, and J. Zhou, “Activation functions and their characteristics in deep neural networks”, in 2018 Chinese Control And Decision Conference (CCDC), 2018.
[9] N. B. K V. and E. Damodar Reddy, “New algebraic activation function for multi-layered feed forward neural networks”, IETE J. Res., vol. 63, no. 1, pp. 71–79, 2017.
[10] P. Ramachandran, B. Zoph, and Q. V. Le, “Searching for Activation Functions”, arXiv:1710.05941v1 [cs.NE], 2017.
[11] B. Xu, R. Huang, and M. Li, “Revise saturated activation functions”, arXiv:1602.05980v2 [cs.LG], 2016.
[12] F. Agostinelli, M. Hoffman, P. Sadowski, and P. Baldi, “Learning activation functions to improve deep neural networks”, arXiv:1412.6830v3 [cs.NE], 2014.
[13] Y. Zhou, D. Li, S. Huo, and S.-Y. Kung, “Shape autotuning activation function”, Expert Syst. Appl., vol. 171, no. 114534, p. 114534, 2021.
[14] F. Farhadi, V. Nia, and A. Lodi, “Activation Adaptation in Neural Networks”, in Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods, 2020.
[15] C. Cai, Y. Xu, D. Ke, and K. Su, “Deep neural networks with multistate activation functions”, Comput. Intell. Neurosci., vol. 2015, p. 721367, 2015.
[16] Y. Koçak and G. Üstündağ Şiray, “New activation functions for single layer feedforward neural network”, Expert Syst. Appl., vol. 164, no. 113977, p. 113977, 2021.
[17] J. Patterson and A. Gibson, Deep learning: A practitioner’s approach. O’Reilly Media, 2017.
[18] B. Dong and X. Wang, “Comparison deep learning method to traditional methods using for network intrusion detection”, in 2016 8th IEEE International Conference on Communication Software and Networks (ICCSN), 2016.
[19] L. Deng, “A tutorial survey of architectures, algorithms, and applications for deep learning”, APSIPA Trans. Signal Inf. Process., vol. 3, no. 1, 2014.
[20] E. J. Gumbel, “The return period of flood flows”, ann. math. stat., vol. 12, no. 2, pp. 163–190, 1941.
[21] K. Cooray, “Generalized Gumbel distribution”, J. Appl. Stat., vol. 37, no. 1, pp. 171–179, 2010.
[22] H. Zhang, G. Liu, L. Pan, K. Meng, and J. Li, “GEV regression with convex loss applied to imbalanced binary classification”, in 2016 IEEE First International Conference on Data Science in Cyberspace (DSC), 2016.
[23] X. Wang, Y. Qin, Y. Wang, S. Xiang, and H. Chen, “ReLTanh: An activation function with vanishing gradient resistance for SAE-based DNNs and its application to rotating machinery fault diagnosis”, Neurocomputing, vol. 363, pp. 88–98, 2019.
[24] S. Hochreiter, “The vanishing gradient problem during learning recurrent neural nets and problem solutions”, Internat. J. Uncertain. Fuzziness Knowledge-Based Systems, vol. 6, no. 2, pp. 107–116, 1998.
[25] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult”, IEEE Trans. Neural Netw., vol. 5, no. 2, pp. 157–166, 1994.
[26] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier Nonlinearities Improve Neural Network Acoustic Models”, in 2013 International Conference on Machine Learning (ICML), 2013.
[27] M. D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q. V. Le, P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, and G. E. Hinton, “On rectified linear units for speech processing”, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013.
[28] X. Glorot, A. Bordes, and Y. Bengio, “Deep Sparse Rectifier Neural Networks”, in 2011 International Conference on Artificial Intelligence and Statistics, 2011.
[29] “Deep Learning”, Deeplearningbook.org. [Online]. Available: http://www.deeplearningbook.org. [Accessed: 15-Jan-2021].
[30] X. Zhang, J. Trmal, D. Povey, and S. Khudanpur, “Improving deep neural network acoustic models using generalized maxout networks”, in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014.
[31] I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio, “Maxout Networks”, in 2013 International Conference on Machine Learning (ICML), 2013.
[32] L. Munkhdalai, T. Munkhdalai, and K. H. Ryu, “GEV-NN: A deep neural network architecture for class imbalance problem in binary classification”, Knowl. Based Syst., vol. 194, no. 105534, p. 105534, 2020.
[33] N. Laptev, J. Yosinski, L. E. Li, and S. Smyl, “Time-series Extreme Event Forecasting with Neural Networks at Uber”, in 2017 International Conference on Machine Learning (ICML), 2017.
[34] B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen, “Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection”, in 2018 International Conference on Learning Representations (ICLR), 2018.
[35] J. Huang and C. X. Ling, “Using AUC and accuracy in evaluating learning algorithms”, IEEE Trans. Knowl. Data Eng., vol. 17, no. 3, pp. 299–310, 2005.
[36] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, Journal of Machine Learning Research, vol. 15, no. 56, pp. 1929−1958, 2014.
[37] “KEEL: A software tool to assess evolutionary algorithms for Data Mining problems (regression, classification, clustering, pattern mining and so on)”, Ugr.es. [Online]. Available: https://sci2s.ugr.es/keel/imbalanced.php. [Accessed: 05-Jan-2021].
[38] “MNIST handwritten digit database, Yann LeCun, Corinna Cortes and Chris Burges”, Lecun.com. [Online]. Available: http://yann.lecun.com/exdb/mnist/. [Accessed: 25-Jan-2021].
[39] H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms”, arXiv:1708.07747v2 [cs.LG], 2017.