Yavuz CANBAY, Yılmaz VURAL, Şeref SAĞIROĞLU

OAN: aykırı kayıt yönelimli fayda temelli mahremiyet koruma modeli

Veri mahremiyeti, mahremiyet riskleri ile veriden sağlanan fayda arasındaki en iyi dengeyi bulmaya çalışan zor bir problemdir. Anonimleştirme, veri mahremiyetinin sağlanmasında yaygın olarak kullanılan fayda temelli çözümlerin başında gelir. Mahremiyet risklerini arttıran ve veri faydasını olumsuz etkileyen aykırı kayıtların anonimleştirme sürecinde yönetilmesi gerekir. Geleneksel yaklaşımlarda aykırı kayıtlar, anonimleştirme sonrası tespit edilerek mahremiyet risklerini düşürmek amacıyla yayınlanacak veri kümesinden kısmen veya tamamen çıkarılır. Aykırı kayıtların yayınlanacak veri kümesinden çıkarılması veriden elde edilecek toplam veri faydasını düşürürken, bu kayıtların anonimleştirme sonrası tespit edilmesi ise hesaplama maliyetini arttırır. Bu çalışmada, aykırı kayıtları anonimleştirme öncesi tespit ederek hesaplama maliyetini düşüren ve tüm kayıtları kullanarak veri faydasını arttıran aykırı kayıt yönelimli fayda temelli OAN adı verilen yeni bir mahremiyet koruma modeli önerilmiştir. OAN modelinin hesaplama maliyeti açısından etkin bir çözüm olduğu, fayda temelli geliştirilen ilk modelle kıyaslanarak gösterilmiştir. Yapılan deneysel çalışmalara göre, önerilen modelin veri mahremiyetini koruyarak toplam veri faydasını arttırdığı gözlemlenmiştir.

Anahtar Kelimeler:

Veri faydası, aykırı kayıt yönetimi, mahremiyet koruma

PDF

___

1. Sweeney L. Simple demographics often identify people uniquely. https://dataprivacylab.org. Yayın tarihi 2000. Erişim tarihi Mart 19, 2018.
2. Machanavajjhala A., Gehrke J., Kifer D., Venkitasubramaniam M., l-diversity: Privacy beyond k-anonymity, IEEE International Conference on Data Engineering, Atlanta-ABD, 24-24, 3-8 Nisan, 2006.
3. Motwani R., Nabar S.U., Anonymizing unstructured data, arXiv:0810.5582, 2008.
4. Fung B.C.M, Wang K., Fu A.W., Yu P.S., Introduction to Privacy-preserving Data Publishing: Concepts and Techniques, CRC Press, 2010.
5. Majeed A., Attribute-centric Anonymization Scheme for Improving User Privacy and Utility of Publishing e-health Data, Journal of King Saud University-Computer and Information Sciences, basımda, 2018.
6. Ramana K.V., Kumari V.V., Raju K., Impact of Outliers on Anonymized Categorical Data, International Conference on Advances in Digital Image Processing and Information Technology, Tirunelveli-Hindistan, 326-335, 23-25 Eylül, 2011.
7. Wang H.W., Liu R., Hiding Distinguished Ones into Crowd: Privacy-preserving Publishing Data with Outliers, International Conference on Extending Database Technology: Advances in Database Technology, Saint-Petersburg-Russian, 624-635, 23-26 Mart, 2009.
8. Wang H.W., Liu R., Hiding Outliers into Crowd: Privacy-preserving Data Publishing with Outliers, Data & Knowledge Engineering, 100, 94-115, 2015.
9. Vural Y., ρ-Kazanım: Mahremiyet Korumalı Fayda Temelli Veri Yayınlama Modeli, Doktora Tezi, Hacettepe Üniversitesi, Fen Bilimleri Enstitüsü, Ankara, 2017.
10. Vural Y., Aydos M., A New Approach to Utility-Based Privacy Preserving in Data Publishing, IEEE International Conference on Computer and Information Technology, Dakka-Bangladeş, 204-209, 22-24 Aralık, 2017.
11. Vural Y., Aydos M., ρ-Gain: Utility Based Data Publishing Model, Journal of the Faculty of Engineering and Architecture of Gazi University, 2018 (18-1), 1-17, 2018.
12. Lee H., Kim S., Kim J.W., Chung Y.D., Utility-preserving Anonymization for Health Data Publishing. BMC Medical Informatics and Decision Making, 17(1), 104-116, 2017.
13. Breunig M.M., Kriegel H., Ng R.T., Sander J., LOF: Identifying Density-based Local Outliers, ACM International Conference on Management of Data, Teksas-ABD, 93-104, 16-18 Mayıs, 2000.
14. Fung B.C.M, Wang K., Chen R., Yu P.S, Privacy-preserving Data Publishing: A Survey of Recent Developments, ACM Computing Surveys, 42(4), 1-53, 2010.
15. Wong R.C., Fu A.W., Wang K., Pei J., Minimality Attack in Privacy Preserving Data Publishing, International Conference on Very Large Databases, Viyana-Avusturya, 543-554, 23-23 Eylül, 2007.
16. Duncan G., Lambert D., The Risk of Disclosure for Microdata, Journal of Business & Economic Statistics, 7(2), 207-217, 1989.
17. Chen B., LeFevre K., Ramakrishnan R., Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge, International Conference on Very Large Databases, Viyana-Avusturya, 543-554, 23-27 Eylül, 2007.
18. Sweeney L., Computational Disclosure Control: A Primer on Data Privacy Protection, Doktora Tezi, Massachusetts Institute of Technology, Deptartment of Electrical Engineering and Computer Science, Massachusetts, 2001.
19. Nergiz M.E., Atzori M., Clifton C., Hiding the Presence of Individuals from Shared Databases, ACM International Conference on Management of Data, Beijing-Çin, 665-676, 11-14 Haziran, 2007.
20. Fang W., Wen X.Z., Zheng Y., Zhou M., A Survey of Big Data Security and Privacy Preserving, IETE Technical Review, 34(5), 544-560, 2017.
21. Xu Y., Ma T., Tang M., Tian W., A Survey of Privacy Preserving Data Publishing Using Generalization and Suppression, Applied Mathematics & Information Sciences, 8(3), 1103-1116, 2014.
22. Ye Y., Wang L., Han J., Qiu S., Luo F., An Anonymization Method Combining Anatomy and Permutation for Protecting Privacy in Microdata with Multiple Sensitive Attributes, IEEE International Conference on Machine Learning and Cybernetics, Ningbo-Çin, 404-411, 9-12 Haziran, 2017.
23. Rahimi M., Bateni M., Mohammadinejad H., Extended k-anonymity Model for Privacy Preserving on Micro Data, International Journal of Computer Network and Information Security, 7(12), 42-51, 2015.
24. Lin W., Yang D., Wang J., Privacy Preserving Data Anonymization of Spontaneous ADE Reporting System Dataset, BMC Medical Informatics and Decision Making, 16(1), 21-35, 2016.
25. Sweeney L., k-anonymity: A Model for Protecting Privacy, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 557-570, 2002.
26. Meyerson A., Williams R., On the Complexity of Optimal k-anonymity, ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Paris-Fransa, 223-228, 14-16 Haziran, 2004.
27. Li N., Li T., Venkatasubramanian S., t-closeness: Privacy Beyond k-anonymity and l-diversity, IEEE International Conference on Data Engineering, İstanbul-Türkiye,106-115, 15-20 Nisan, 2007.
28. Li N., Li T., Venkatasubramanian S., Closeness: A New Privacy Measure for Data Publishing, IEEE Transactions on Knowledge and Data Engineering, 22(7), 943-956, 2010.
29. Sweeney L., Achieving k-anonymity Privacy Protection Using Generalization and Suppression, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 571-588, 2002.
30. LeFevre K., DeWitt D.J., Ramakrishnan R., Incognito: Efficient Full-domain k-anonymity, ACM SIGMOD International Conference on Management of Data, Maryland-ABD, 49-60, 14-16 Haziran, 2005.
31. Kohlmayer F., Prasser F., Eckert C., Kemper A., Kuhn K.A., Flash: Efficient, Stable and Optimal k-anonymity, IEEE International Conference on Privacy, Security, Risk and Trust and International Confernece on Social Computing, Amsterdam-Hollanda, 708-717, 3-5 Eylül, 2012. 32. Sweeney L., Datafly: A System for Providing Anonymity in Medical Data, Database Security XI, IFIP Advances in Information and Communication Technology, Massachusetts, Springer, 356-381, 1998.
33. Wang K., Yu P.S., Chakraborty S., Bottom-up Generalization: A Data Mining Solution to Privacy Protection, IEEE International Conference on Data Mining, Bringhton-İngilitere, 249-256, 1-4 Kasım, 2004.
34. Fung B.C.M, Wang K., Yu P.S., Top-Down Specialization for Information and Privacy Preservation, International Conference on Data Engineering. Tokyo-Japonya, 205-216, 5-8 Nisan, 2005.
35. LeFevre K., DeWitt D.J., Ramakrishnan R., Mondrian Multidimensional k-anonymity, IEEE International Conference on Data Engineering, Atlanta-ABD, 25-25, 3-7 Nisan, 2006.
36. Xiao X., Tao Y., Personalized Privacy Preservation, ACM SIGMOD International Conference on Management of Data, Şikago-ABD, 229-240, 27-29 Haziran, 2006.
37. Samarati P., Protecting Respondents Identities in Microdata Release, IEEE Transactions on Knowledge and Data Engineering, 13(6), 1010-1027, 2001.
38. Skowron A., Rauszer C., The Discernibility Matrices and Functions in Information Systems, Intelligent Decision Support, Cilt 11, Springer, 331-362, 1992.
39. Aggarwal C.C., Outlier Analysis, Springer, Cham, 2017.
40. Han J., Pei J., Kamber M., Data Mining: Concepts and Techniques, Elsevier, 2011.
41. Witten I.H., Frank E., Hall M.A., Pal C.J., Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, 2016.
42. Dheeru D., Taniskidou E.K. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml. Yayın tarihi 2017, Erişim tarihi Mart 25, 2018.