Bilgi Keşfi Sürecinde Bayesci Ağlar ve Birliktelik Analizi

Veri madenciliği, büyük veri kümelerinden yararlı bilginin, bilinmeyen örüntülerin ve ilginç ilişkilerin ortaya çıkartıldığı istatistiksel bir süreçtir. Bu süreçte, pek çok istatistiksel yöntem kullanılabilir. Bu yöntemlerden ikisi Bayesci ağlar ve birliktelik analizidir. Bayesci ağlar, bir veri tabanında yer alan raslantı değişkenlerinin bir kümesindeki olasılıksal ilişkileri kodlayan grafiksel modellerdir. Hem nedensel hem de olasılıksal özelliklere sahip olduğundan Bayesci ağlar ile veri ve uzman bilgisi kolaylıkla birleştirilebilir. Bayesci ağlar ayrıca, ilgilenilen problemin kesin olmayan tanım kümesi hakkındaki bilgiyi temsil etmek için kullanılır ve güçlü çıkarsamaların yapılmasını sağlar. Birliktelik analizi, büyük veri tabanlarındaki gizli birlikteliklerin, yararlı kuralların ve şaşırtıcı örüntülerin ortaya çıkartılmasını sağlayan bir yöntemdir.Birliktelik analizinin bir kusuru, veri kümesi çok küçük olsa dahi çok sayıda örüntünün ortaya çıkartılmasıdır.Bu nedenle, bu örüntülerden ilginç olmayanların elenmesi için ilginçlik ölçümleri kullanılmalıdır

Bayesian Networks and Association Analysis in Knowledge Discovery Process

Data mining is a statistical process to extract useful information, unknown patterns and interesting relationships in large databases. In this process, many statistical methods are used. Two of these methods are Bayesian networks and association analysis. Bayesian networks are probabilistic graphical models that encode relationships among a set of random variables in a database. Since they have both causal and probabilistic aspects, data information and expert knowledge can easily be combined by them. Bayesian networks can also represent knowledge about uncertain domain and make strong inferences. Association analysis is a useful technique to detect hidden associations and rules in large databases, and it extracts previously unknown and surprising patterns from already known information. A drawback of association analysis is that many patterns are generated even if the data set is very small. Hence, suitable interestingnes measures must be performed to eliminate uninteresting patterns. Bayesian networks and association analysis can be used together in knowledge discovery. As association rules are used to create Bayesian networks, interestingness measures to determine interesting patterns can be established by Bayesian networks. In this study, this mutual utilization between Bayesian Netwoks and association analysis is explained and an illustration over a real life problem is presented.  

___

  • K. Bache, M. Lichman, 2013, UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
  • I. Ben-Gal, 2007, Bayesian Networks,Encyclopedia of Statistics in Quality &Reliability,F. Ruggeri, F.Faltin, R. Kenett, R. (eds), Wiley & Sons.
  • M.W. Berry,M.Browne, 2006,Lecture Notes in Data Mining, World Scientific Publishing, Singapore, 222p.
  • Y. Dong-Peng, L. Jin-Lin, 2008, Research on personal credit evaluation model based on bayesian network and association rules, Knowledge Discovery and Data Mining, 2008. WKDD 2008. First International Workshop on , 457-460.
  • D. Ersel, 2012,An Original Combined Interestingness Measure in Association Analysis, Unpublished PhD Thesis, Hacettepe University Institute of Graduate Studies in Science, Ankara, Türkiye.
  • D. Hand, H. Mannila, P. Smyth, 2001, Principles of Data Mining, The MIT Press, Cambridge, 546p.
  • D. Heckerman, 1995, Bayesian networks for data mining, Data Mining and Knowledge Discovery1, 79-119.
  • S. Jaroszewicz, D.A. Simovici, 2004, Interestingness of frequent itemsets using Bayesian networks as background knowledge, Proceedings of the 10th ACM SIGKDD Conference on Knowledge Dicovery and Data Mining, August 20-25, 2004, New York, USA, 178-186.
  • F.V. Jensen, 2001, Bayesian Networks and Decision Graphs, Springer-Verlag, New York, 268p.
  • D.T. Larose, 2004, Discovering Knowledge in Data: An Introduction to Data Mining, Wiley Interscience, New York, 222p.
  • R. Malhas, Z. Aghbari, 2007, Fast discovery of interesting patterns based on Bayesian network background knowledge, University of Sharjah Journal of Pure and Applied Science, 4 (3).
  • K. Murphy, 1998, A brief introduction to graphical models and Bayesian networks, http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html#infer.
  • A. Siberschatz, A. Tuzhilin, A., 1995, On subjective measures of interestingness in knowledge discovery, Proceedings of the 1st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 20-21, 1995, Montreal, Canada, 275-281.
  • P. Tan, V. Kumar, J. Srivastava, 2002, Selecting the right interestingness measure for association patterns, Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23-26, 2002, Edmonton, Alberta, Canada, 32-41.
  • P.Tan, M. Steinbach, V. Kumar, 2006, Introduction to Data Mining, Addison-Wesley, Boston,769p.