Yarı-gözetimli veri sınıflandırma için kolektif bir öğrenme yaklaşımı

Yarı-gözetimli veri sınıflandırma, makine öğrenme ve veri madenciliğinde önemli bir çalışma alanıdır çünkü az sayıda etiketli ve çok sayıda etiketsiz veri içeren veri kümeleri ile ilgilenmektedir. Gerçek hayat veri kümelerinin çoğu bu özelliği taşıdığından birçok araştırmacı bu alana ilgi duymaktadır. Bu makalede yarı-gözetimli veri sınıflandırma problemlerinin çözümü için kolektif bir yöntem önerilmiştir. Konuyu daha iyi anlamak için R1 de tanımlı veri kümeleri oluşturup önerilen algoritmalar bu veri kümelerine uygulanmıştır. Gelişkin tekniklerle karşılaştırma yapmak için en iyi bilinen WEKA makine öğrenme programı kullanılmıştır. Çalışmalar UCI veri kümesi deposunda bulunan gerçek hayat veri kümeleri üzerinde uygulanmıştır. 10 katlı çapraz geçerlilik ölçütü kullanılarak elde edilen değerlendirme sonuçları tablolarda sunulmuştur.

Anahtar Kelimeler:

Yarı-gözetimli veri sınıflandırma, Kümeleme yöntemi, Gözetimli veri sınıflandırma, Makine öğrenme, Matematiksel programlama

A collective learning approach for semi-supervised data classification

Semi-supervised data classification is one of significant field of study in machine learning and data mining since it deals with datasets which consists both a few labeled and many unlabeled data. The researchers have interest in this field because in real life most of the datasets have this feature. In this paper we suggest a collective method for solving semi-supervised data classification problems. Examples in R1 presented and solved to gain a clear understanding. For comparison between state of art methods, well-known machine learning tool WEKA is used. Experiments are made on real-world datasets provided in UCI dataset repository. Results are shown in tables in terms of testing accuracies by use of ten fold cross validation.

Keywords:

Semi- Supervised data classification, Clustering method, Supervised data classification, Machine learning, Mathematical programming,

PDF

___

Zhu X. “Semi-Supervised Learning Literature Survey”. University of Wisconsin, Madison, United States, Technical Report, 1530, 2008.
Hajmohammadi MS, Ibrahim R, Selamat A, Fujita H. “Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples”. Information Sciences, 317, 67-77, 2015.
Chinaei L. Active Learning with Semi-Supervised Support Vector Machines. Msc. Thesis, Waterloo University, Ontario, Canada, 2007.
Kanga P, Kimb D, Choc S. “Semi-supervised support vector regression based on self training with label uncertainty: An application to virtual metrology insemi conductor manufacturing”. Expert Systems With Applications, 51, 85-106, 2016.
Bruzzone L, Chi M, Marconcini M. “A novel transductive SVM for semisupervised classification of remote-sensing images”. IEEE Transactıons on Geoscience and Remote Sensing, 44(11), 3363-3373, 2006.
Ordin B. “Nonsmooth optimization algorithm for semi-supervised data classification”. Dynamics of Continuous, Discrete and Impulsive Systems Series B: Applications & Algorithms, 17, 741-749, 2010.
Zhou Z, Li M. “Semisupervised regression with cotraining-style algorithms”. Journal IEEE Transactions on Knowledge and Data Engineering Archive, 19(11), 1479-1493, 2007.
Zha ZJ, Mei T, Wang J, Wang Z, Hua XS. “Graph-based semi-supervised learning with multiple labels”. Journal of Visual Communication and Image Representation, 20 (2), 97–103, 2009.
Alpaydın E. Introduction To Machine Learning. 2nd ed. Cambridge, Massachusetts, London, England, The MIT Press, 2010.
Frank E, Hall MA, Witten IH. Data Mining: Practical Machine Learning Tools and Techniques. 4th ed.San Francisco, Morgan Kaufmann, 2016.
Bagirov AM, Rubinov AM, Soukhoroukova NV, Yearwood J. “Unsupervised and supervised data classification via nonsmooth and global optimization”. Top, 11(1), 1-75, 2003.
Bagirov AM, Mardaneh K. “Modified Global K-Means Algorithm for Clustering in Gene Expression Data Sets”. Workshop on Intelligent Systems for Bioinformatics 2006 (WISB 2006), Hobart, Australia, 4-9 December, 2006.
Irina R. “An emprical study of Naive Bayes classifier”. IJCAI Workshop on Emprical Methods in Artificial Intelligence, 223, 41-46, 2001.
Kiang MY. “A comparative assessment of classification methods”. Decision Support Systems, 35, 441-454, 2003
Press SJ, Wilson S. “Choosing between logistic regression and discriminant analysis”. Journal of the American Statistical Association, 73(364), 699-705, 1978.
Frank E, Wang Y, Inglis S, Holmes G, Witten IH. “Using model trees for classification”. Machine Learning, 32(1), 63-76, 1998.
Kohavi R. “The power of decision tables”. 8th European Conference on Machine Learning, Heraclion, Crete, Greece, 25-27 April 1995.
Quinlan R. C4.5: Programs for Machine Learning. San Mateo, CA, Morgan Kaufmann Publishers,1993.
Kohavi R. “A study of cross-validation and bootstrap for accuracy estimation and model selection”. Fourteenth International Joint Conference on Artificial Intelligence, Montreal, Quebec, Canada, 20-25 August 1995.
Garner SR. “Weka: The waikato environment for knowledge analysis”. Second New Zealand Computer Science Research Students Conference, Waikato, Hamilton, New Zealand, 18-21 April, 1995.
Zhou D, Bousquet O, Lal NT, Westor J, Schölkopf B. “Learning with local and global consistency”. Max Planck Institute for Biological Cybernetics, Tübingen, Germany, Technical Report, 112, June 2003.
Driessens K, Reuteman P, Pfahringer B, Leschi C. “ Using weighted nearest neighbor to benefit from unlabeled data”. 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Singapore, 9-12 April 2006.
Laorden C, Sanz B, Santas I, Galan-Garcia P, Bringas PG. “Collective classification for spam filtering”. 4th International Conference Computational Intelligence in Security for Information Systems(CISIS), Spain, 8-10 June, 2011.
Lichman M. “UCI Machine Learning Repository”, http://archieve.ics.uci.edu/ml/datasets.html (01.05.2017).