Sınıflandırma için diferansiyel mahremiyete dayalı öznitelik seçimi

Veri madenciliği ve makine öğrenmesi çözümlerinin en önemli ön aşamalarından biri yapılacak analizde kullanılacak verinin özniteliklerinin uygun bir alt kümesini belirlemektir. Sınıflandırma yöntemleri için bu işlem, bir özniteliğin sınıf niteliği ile ne oranda ilişkili olduğuna bakılarak yapılır. Kişisel gizliliği koruyan pek çok sınıflandırma çözümü bulunmaktadır. Ancak bu yöntemler için öznitelik seçimi yapan çözümler geliştirilmemiştir. Bu çalışmada, istatistiksel veritabanı güvenliğinde bilinen en kapsamlı ve güvenli çözüm olan diferansiyel mahremiyete dayalı özgün öznitelik seçimi yöntemleri sunulmaktadır. Önerilen bu yöntemler, yaygın olarak kullanılan bir veri madenciliği kütüphanesi olan WEKA ile entegre edilmiş ve deney sonuçları ile önerilen çözümlerin sınıflandırma başarımına olumlu etkileri gösterilmiştir.

___

  • Kantarcioglu M., Privacy-Preserving Distributed Data Mining And Processing On Horizontally Partitioned Data, PhD thesis, Purdue University, 08-2005.
  • Vaidya J., Privacy Preserving Data Mining over Vertically Partitioned Data, PhD thesis, Purdue University, 08-2004.
  • Sweeney L., Achieving k-anonymity privacy protection using generalization and suppression, Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10(5), 571-588, 2002.
  • Machanavajjhala A., Kifer D., Gehrke J., Venkitasubramaniam M., l-diversity: privacy beyond k-anonymity, ACM Trans. Knowl. Discov. Data, 1(1), 1-36, 2007.
  • Li N., Li T., t-closeness: privacy beyond k-anonymity and l-diversity, Proc. of IEEE 23rd Int’l Conf. on Data Engineering, İstanbul-Turkey, 106-115, 2007.
  • Dwork C., Differential privacy: A survey of results, Proc. of the 5th International Conference on Theory and Applications of Models of Computation, Heidelberg-Berlin, 1-19, 2008.
  • Yang Y., Pedersen J. O., A comparative study on feature selection in text categorization, Proc. of the Fourteenth International Conference on Machine Learning, San Francisco CA - USA, 412-420, 1997.
  • Hall M., Frank E., Holmes G., Pfahringer B., Reutemann P., Witten I. H., The weka data mining software: An update. SIGKDD Explor. Newsl., 11(1), 10-18, 2009.
  • Aggarwal C. C., On k-anonymity and the curse of dimensionality, Proc. of the 31st International Conference on Very Large Data Bases, Trondheim-Norway, 901-909, 2005.
  • M. M. Zhang G. L. Zou. A new data perturbation method of reference control in statistical database. Applied Mechanics and Materials, 241, 3134-3137, Trans. Tech. Publications, 2013.
  • Zayatz L., Evans T., Slanta J., Using noise for disclosure limitation of establishment tabular data, Journal of
  • Official Statistics, 14(4), 537-551, 1998.
  • Demirelli Okkalıoğlu B., Koç M., Polat H., Deriving private data in partitioned data-based privacypreserving
  • collaborative filtering systems, Journal of the Faculty of Engineering and Architecture of Gazi
  • University, 32(1), 53-64, 2017.
  • Shlomo N., Skinner C. J., Privacy protection from sampling and perturbation in survey microdata. Journal of
  • Privacy and Confidentiality, 4(1), 155-169, 2012.
  • Kadampur M. A., Somayajulu D. V. L. N., A noise addition scheme in decision tree for privacy preserving
  • data mining. The Computing Research Repository, arXiv:1001.3504, 2010.
  • Soria-Comas J., Domingo-Ferrer J., Optimal data-independent noise for differential privacy, Information
  • Sciences, 250(0), 200-214, 2013.
  • D.G.Y. Lee. Protecting Patient Data Confidentiality Using Differential Privacy, MSc. Thesis, Oregon Health
  • and Science University, 2008.
  • Lee N. Y., Kwon O., A privacy-aware feature selection method for solving the personalization-privacy
  • paradox in mobile wellness healthcare services. Expert Syst. Appl., 42(5), 2764-2771, 2015.
  • Gkoulalas-Divanis A., Loukides G., Sun J., Publishing data from electronic health records while preserving
  • privacy: A survey of algorithms, Journal of Biomedical Informatics, 50, 4-19, 2014.
  • Çelik C., Bilge H. Ş., Feature selection with weighted conditional mutual information, Journal of the Faculty
  • of Engineering and Architecture of Gazi University, 30(4), 585-596, 2015.
  • Akben S. B., Alkan A., Density-based feature extraction to improve the classification performance in the
  • datasets having low correlation between attributes, Journal of the Faculty of Engineering and Architecture of
  • Gazi University, 30(4), 597-603, 2015.
  • Xiao X., Tao Y., Output perturbation with query relaxation. Proc. VLDB Endow., 1(1), 857-869, 2008.
  • Dwork C., McSherry F., Nissim K., Smith A., Calibrating noise to sensitivity in private data analysis, Lecture
  • Notes in Computer Science, 3876, 265-284. Springer, Berlin Heidelberg, 2006.
  • John G. H., Kohavi R., Pfleger K., Irrelevant features and the subset selection problem. Proc. of the Eleventh
  • International Conference on Machine Learning, New Brunswick NJ – USA, 121-129, 1994.
  • Xiao Z., Dell E., Dou W., Chen L., ESFS: A new embedded feature selection method based on SFS,
  • Rapports de recherché, RR-LIRIS-2008-018, 1-10, 2008.
  • Lichman M., UCI machine learning repository, http://archive.ics.uci.edu/ml, published: 2013, accessed: Jan.
  • Mitchell T. M., Machine Learning, McGraw-Hill Inc., New York NY-USA, 1st edition, ISBN 0070428077,
  • Jagannathan G., Pillaipakkamnatt K., Wright R. N., A practical differentially private random decision tree
  • classifier. Trans. Data Privacy, 5(1), 273-295, 2012.
  • Allison P.D., Missing Data, SAGE Publications, ISBN 9780761916727, 2002.
  • Sayyad Shirabad J., Menzies T.J., The PROMISE Repository of Software Engineering Databases. School of
  • Information Technology and Engineering, University of Ottawa, Canada.
  • http://promise.site.uottawa.ca/SERepository, published: 2005, accessed: Jan. 2018.
Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi-Cover
  • ISSN: 1300-1884
  • Yayın Aralığı: 4
  • Başlangıç: 1986
  • Yayıncı: Oğuzhan YILMAZ