Mustafa ORAL, Serkan KARTAL, Buse Melis ÖZYILDIRIM

Genelleştirilmiş regresyon yapay sinir ağı örüntü katman büyüklüğünü azaltmak için kümeleme tabanlı bir yaklaşım

Genelleştirilmiş Regresyon Yapay Sinir Ağı (GRYSA) radyal tabanlı çalışan ve genellikle tahminleyici olarak kullanılan denetimli-öğrenimli bir yapay sinir ağı (YSA) modelidir. Kolay modellenebilmesinin yanında hızlı ve tutarlı sonuçlar üretmesi bu algoritmanın güçlü yanlarını oluşturmaktadır. Ancak GRYSA tahmin mekanizmasında, eğitim veri setindeki her örnek veri için örüntü katmanında bir adet nöron tutulmaktadır. Bu nedenle, eğitim veri setinin çok büyük olduğu çalışmalarda örüntü katman yapısı örnek verilerinin sayısıyla aynı oranda büyümekte, yapılan işlem sayısı ve bellek gereksinimi artmaktadır. Bu çalışmada, GRYSA algoritmasının işlem sayısını azaltmaya yönelik olarak literatürde daha önce de denenmiş olan k-ortalama kümeleme algoritması ön-işlemci olarak kullanılmış, literatürdeki çalışmalardan farklı olarak, bu çalışmaların performansını negatif anlamda etkileyen kümeler arasına düşen test verileri bulunarak aykırı veri oluşmasının önüne geçilmiştir. Böylece, örüntü katmanındaki bellek ihtiyacı ve işlem sayısı azaltılırken, kümeleme algoritmasının eklenmesi ile performansta ortaya çıkan negatif etki büyük oranda giderilmiş ve yaklaşık %90 daha az eğitim verisi ile neredeyse aynı tahmin sonuçları elde edilmiştir.

Anahtar Kelimeler:

Genelleştirilmiş regresyon yapay sinir ağı, Tahminleyici yapay sinir ağı, Örüntü azaltma, Azaltılmış veri seti

A cluster based approach to reduce pattern layer size for generalized regression neural network

Generalized Regression Neural Network (GRNN), is a radial basis function based supervised learning type Artificial Neural Network (ANN) which is commonly used for data predictions. In addition to its easy modelling structure, being fast and producing accurate results are the other strong features of it. On the other hand, GRNN employs a neuron in pattern layer for each data sample in training data set. Therefore, for huge data sets pattern layer size increases proportional to the number of samples in training data set, memory requirement and computational time also increase excessively. In this study, in order to reduce space and time complexity of GRNN, k-means clustering algorithm which had been used as pre-processor in the literature is utilized and outlier data emergence which affects the performances of previous studies negatively, is prevented by identifying test data located between clusters. Hence, while memory requirement in pattern layer and number of calculations are reduced, negative effect on the performance emerged by the use of clustering algorithm is significantly removed and almost the same prediction performances to that of standard GRNN are achieved by using 90% less training samples.

Keywords:

Generalized regression neural network, Prediction neural network, Pattern reduction, Reduced dataset,

PDF

___

Specht DF, Shapiro PD. “Generalization accuracy of probabilistic neural networks compared with backpropagation networks”. IJCNN-91-Seattle International Joint Conference on Neural Networks, Seattle, WA, USA, 08-12 July 1991.
Specht DF. “A general regression neural network”. IEEE Transactions on Neural Networks, 2(6), 568-576, 1991.
Jain AK, Murty MN, Flyn PJ. “Data clustering: a review”. ACM Computing Surveys, 31(3), 264-323, 1999.
Fahad A, Alshatri N, Tari Z. “A survey of clustering algorithms for big data: taxonomy & empirical analysis”. IEEE Transactions on Emerging Topics in Computing, 2(3), 267-279, 2014.
Berkhin P. Grouping multidimensional data. Berlin, Germany, Springer, 2006.
Harkanth S, Phulpagar BD. “A survey on clustering methods and algorithms”. International Journal of Computer Science and Information Technologies, 4(5), 687-691, 2013.
Kotsiantis SB, Pintelas PE. “Recent advances in clustering: a brief survey”. WSEAS Transactions on Information Science and Applications, 1, 73-81, 2004.
Rama B, Jayashree P, Jiwani S. “A Survey on clustering, current status and challenging issues”. International Journal on Computer Science and Engineering, 2(9), 2976-2980, 2010.
Kaufman L, Rousseeuw PJ. “Clustering by Means of Medoids”. North-Holland, Elsevier, 1987.
Wei CP, Lee YH, Hsu CM. "Empirical comparison of fast clustering algorithms for large data sets". Proceedings of the 33rd Annual Hawaii International Conference on System Sciences, Maui, Hawaii, 7 January 2000.
Husain H, Khalid M, Yusof R. “Automatic clustering of generalized regression neural network by similarity index based fuzzy C-means clustering”. IEEE Region 10 Conference, Chiang Mai, Thailand, 24-24 November 2004.
Zhao S, Zhang J, LI X, Song W. “A Generalized Regression Neural Network Based on Fuzzy Means Clustering and Its Application in System Identification”. In International Symposium on Information Technology Convergence, Jeonju, Korea, 23-24 Nov. 2007.
Zheng LG, Yu MG, Yu SJ, Wang W. “Improved prediction of nitrogen oxides using GRNN with k-means clustering and EDA”. Fourth International Conference on Natural Computation, Jinan, China 18-20 October 2008.
Yuen RKK, Lee EWM, Lim CP, Cheng GWY. “Fusion of GRNN and FA for online noisy data regression”. Neural Processing Letters, 19, 227-241, 2004.
Mu T, Nandi AK. “Breast cancer detection from FNA using SVM with different parameter tuning systems and SOM-RBF classifier”. Journal of the Franklin Institute-Engineering and Applied Mathematics, 344, 285-311, 2007.
Seng TL, Khalid M, Tusof R. “Adaptive GRNN for the Modeling of Dynamic Plants”. Proc. of the 2002 IEEE International Symposium on Intelligent Control Vancouver, Vancouver, Canada, 27-30 October 2002.
Hamzacebi C. “Improving genetic algorithms’ performance by local search for continuous function optimization”. Journal of Applied Mathematics and Computation-AMC, 196(1), 309-317, 2008.
Hautamäki V, Cherednichenko S, Kärkkäinen I, Kinnunen T, Fränti P. “Improving k-means by outlier removal”. In Scandinavian Conference on Image Analysis, Joensuu, Finland, 19-22 June 2005.
Marghny, MH, Taloba, AI. “Outlier detection using improved genetic k-means”, International Journal of Computer Applications, 28(11), 33-36, 2011.
Ahmed, M, Mahmood, AN. “A novel approach for outlier detection and clustering improvement”. In Industrial electronics and applications (ICIEA), 2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA), Melbourne, Australia, 19-21 June 2013.
Tang K, Li X, Suganthan PN, Yang Z, Weise T. “Benchmark Functions for the CEC'2010 Special Session and Competition on Large Scale Global Optimization”. Nature Inspired Computation and Applications Laboratory, University of Science and Technology of China, 85, 2009.
Lichman, M. “Irvine, CA: University of California, School of Information and Computer Science”. [http://archive.ics.uci.edu/ml], 2013.