Dempster-Shafer algoritmasının kullanımı ile sınıflandırma algoritmalarının birleştirilmesi

Sürekli olarak büyümekte olan veri, mevcut istatistiksel yöntemlerin kullanılmasıyla, büyük miktardaki veri içindeki değerli bilginin bulunmasını ve analiz edilmesini imkansız hale getirmektedir. Mevcut analiz araçlarının yetersizliği nedeniyle çok büyük miktardaki veri içindeki değerli fakat saklanmış bilginin bulunup çıkarılması için yeni çözümler bulunmuştur. Bu çözümler veri madenciliği ve veri füzyonudur. Veri madenciliği önceden bilinmeyen, fakat yararlı bilginin büyük miktardaki veri arasından bulunup çıkarılmasıdır. Veri madenciliği, veri içindeki örüntünün keşfedilmesini ve geleceğe ilişkin tahminler yapılmasında kullanılabilecek ilişkilerin çıkarılmasını sağlayan analiz araçlarını kullanır. Veri füzyonu ise farklı sensörlerden gelen bilgilerin birleştirilmesi işlemidir. Veri füzyonu algoritmaları, savunma sektöründe hedef takibi, hedef kimlik tespiti amacıyla istihbarat, keşif ve gözetleme operasyonlarında kullanılmaktadır. Veri madenciliği ve veri füzyonu birbirini tamamlayan prosesler olmasına rağmen, araştırmacılar bu iki alanda birbirinden bağımsız olarak, herhangi bir ilişkiye girmeden çalışmaktadırlar. Sınıflandırmanın etkinliğini artırmak için bu alanlarda kullanılan teknikleri birleştiren çok az sayıda çalışma mevcuttur. Bu çalışmada sınıflandırma sonuçlarını iyileştirmek için yeni bir yöntem önerilmektedir. Söz konusu yöntem Dempster’in Birleştirme Algoritmasını kullanarak, sınıflandırıcıların doğruluğunun da göz önünde bulundurulmasıyla, farklı sınıflandırma algoritmalarından elde edilen sonuçların birleştirilmesinden oluşmaktadır. UCI kütüphanesinden alınan farklı veri takımlarıyla yapılan deneyler sonucunda Dempster’in Birleştirme Algoritmasının kullanımıyla yapılan birleştirme işleminin, birleşimde kullanılan her bir sınıflandırma algoritmasından ve mevcut birleşik algoritmalardan daha doğru sonuçlar verdiği görülmüştür.

Combining classification algorithms using Dempster's rule of combination

The constantly growing volume of data makes it impossible to analyze and capture the valuable knowledge among large amounts of data using the current statistical methods. Because of the insufficiency of the current analysis tools, new solutions have been found for extracting the valuable but hidden knowledge among huge data. These solutions are data mining and data fusion. Data mining tries to extract implicit, previously unknown, and potentially useful information from large amounts of data. It is a process that uses a variety of data analysis tools to discover patterns and relationships in data that may be used to make valid predictions. Data fusion, on the other hand, is the process of combining information coming from different sensors. Data fusion algorithms are mostly used for target tracking and target identification purposes in intelligence, surveillance and reconnaissance operations in the defense sector. Although data mining and data fusion are two reciprocal processes completing each other, people are generally working on these two areas independently without having any interaction. There are few studies which combine the techniques used in these areas in order to improve the performance of classification. What is missing in most of the studies performed on this issue is uncertainty management. Every classifier has uncertainty to some extent. The second issue regarding the current classification algorithms is the lack of degree of confidence. Degree of confidence is the success that a certain classifier has displayed on similar data sets in the past. A classification algorithm must be able to use degree of confidence in order to give more precise classification results. Dempster-Shafer’s Method, in other words evidence combination rule, has the capability to handle uncertainty. Dempster-Shafer’s Method is widely used for combining evidences obtained from different sensory information in the area of data fusion. Dempster-Shafer’s Method does not require exact probability values in order to combine evidences. Pieces of information, some being incomplete, obtained from different information sources can be combined using Dempster’s Rule of Combination. However, Dempster’s Rule of Combination does not include degree of confidence presently. In our proposed method of combining classification algorithms using Demspter’s Rule of Combination, we use degree of confidence during the combination in order to improve the accuracy of the classification. Our proposed combination method using Demspter’s Rule of Combination does the following contributions: Employment of degree of confidence during the combination, uncertainty management in combining classifiers and achievement of better classification results. In our proposed method, we first perform classification with different classification algorithms. Assuming the results of the classifiers as beliefs, we calculate mass functions for each classifier. We then combine the mass values in a pairwise fashion using Dempster’s Rule of Combination. When there are more than two classifiers to combine, we first combine the first two classifiers and then combine the result of this combination with the third classifier and this process continues until there are no classifiers to combine. We perform different experiments using data sets taken from the UCI machine learning repository. Firstly, we test the success rate of the current classifiers in WEKA using 10 different data sets taken from the UCI machine learning repository. In the experiments we use the default values of the classifiers. We then check the success rate of current hybrid classifiers using the same data sets with the default values. Afterwards we do tests with the proposed method of combining classifiers using Dempster’s Rule of Combination employing the same data sets. In the experiments we combine four different classification algorithms selected as represantatives for each class of classifiers, in several combinations. In order to be able to make a one-to-one comparison of the proposed method with the current hybrid classification algorithms, we perform experiments with the current hybrid algorithms which has the capability of combining multiple classifiers, using the same data sets. The results of the experiments show that combining classifiers using Demspter’s Rule of Combination with the employment of degree of confidence not only performs better than each of the classifiers taking place in the combination but also performs better than the current hybrid classifiers. The results of the experiments also show that the employment of degree of confidence during the combination gives more precise classification results which also decrease uncertainty in the combination.

___

  • Basak J., Goyal Z. ve Kothari R., (2004). A modified Dempster's rule of combination for weighted sources of evidence, IBM Research Report, IBM Research Division, IBM India Research Lab, New Delhi.
  • Mahajani, G. A. ve Aslandogan Y. A., (2003). Evidence Combination in medical data mining,Technical Report CSE-2003-23, Department of computer Science and Engineering, University of Texas at Arlington.
  • Shafer, G., (1976). A Mathematical Theory of Evidence, Princeton University Press, New Jersey.
  • Wachowicz M. ve Carvalho L.,M,.T., (2000). Data fusion and mining for the automation of a spacetime reasoning process, 22nd EARSeL Symposium, Prague.
  • Waltz E., (1999). Information understanding: Integrating data fusion and data mining processes, in workshop along with IEEE 1999 International Symposium on Circuits and Systems, Monterey, CA.
  • White, Jr., F.E., (1987). Data fusion lexicon, joint directors of laboratories, Technical Panel for C3, Data Fusion Sub-Panel, Naval Ocean Systems Center, San Diego.
  • Witten I.,H. ve Frank E., (2000). Data Mining: Practical machine learning tools with Java implementations, Morgan Kaufmann, San Francisco.
  • Wu H., Siegel M., Stiefelhagen R. ve Yang J., (2002). Sensor fusion using Dempster-Shafer Theory, IEEE Instrumentation and Measurement Technology Conference, Anchorage, AK, USA.
  • Murphy, P. M. ve Alia, D. W., (1994). UCI repository of machine leaning databases (machinereadable data repository – http://www.ics.uci.edu/~mlearn/ MLRepos.html). University of California.-Irvine, Department of Information and Computer Science.