A hybrid approach of homomorphic encryption and differential privacy for privacy preserving classification

Privacy preserving data mining is a substantial research area that aims at protecting the privacy of individuals while enabling to perform data mining techniques. In this study, we propose a secure protocol that fulfils the privacy restriction by combining homomorphic encryption with differential privacy and integrate this protocol into Holte’s One Rule which is a simple, but accurate and efficient classification algorithm. The proposed method allows a researcher to get the answers of his/her queries to build One Rule classifier by processing the encrypted training dataset under Paillier’s cryptosystem and also applies differential privacy to minimize the privacy leakage of individuals as much as possible in this training dataset. Therefore, both of security and privacy of the individuals in the training dataset for classification are provided thanks to our proposed method; since neither the parties, nor the researcher attain any information about the individuals in the database. Besides the One Rule classifier, we apply our proposed privacy preservation model to Naïve Bayes classification algorithm for the performance comparison, and show the efficiency of the proposed method through experiments on real data from UCI repository.

___

  • Vaghashia H. and Ganatra A., 2015. A survey: Privacy preservation techniques in data mining. International Journal of Computer Applications., vol. 119, no. 4, pp. 20-26.
  • Holte R. C., 1993. Very simple classification rules perform well on most commonly used datasets. Machine Learning., vol. 11, pp. 63-90.
  • Paillier P., 1999. Public key cryptosytems based on composite degree residosity classes. In Advances in Cryptology-Proceedings Eurocrypt ’99. (Lecture Notes in Computer Science, no. 1592). New York: Springer-Verlag, 1999, pp.223-238.
  • Dwork C., McSherry F., Nissim K., and Smith A., 2006. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography., pp. 265-284, Springer.
  • Dwork C., 2008. Differential privacy: A survey of results. In Proc. 5th International Conference on Theory and Applications of Models of Computation, Xi’an, China.
  • Dwork C. and Roth A., 2014. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, Vol. 9, Nos. 3-4 pp. 211-407.
  • Kantarcioglu M., Jiang W., and Malin B., 2008. A cryptographic approach to securely share and query genomic sequences. IEEE Transactions on Information Technology in Biomedicine, Vol. 12, No. 5.
  • Canim, M., Kantarcioglu, M., and Malin, B. (2011). Secure management of biomedical data with cryptographic hardware. IEEE Transactions on Information Technology in Biomedicine, 16(1), 166-175.
  • Faramarzi, N. S., Ayday, E., & Guvenir, H. A. (2016, December). A Privacy-Preserving Solution for the Bipartite Ranking Problem. In 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 375-380). IEEE.
  • Hasan, M. Z., Mahdi, M. S. R., Sadat, M. N., and Mohammed, N. (2018). Secure count query on encrypted genomic data. Journal of biomedical informatics, 81, 41-52.
  • Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7), 422-426.
  • Pappas, V., Krell, F., Vo, B., Kolesnikov, V., Malkin, T., Choi, S. G., ... & Bellovin, S. (2014, May). Blind seer: A scalable private dbms. In 2014 IEEE Symposium on Security and Privacy (pp. 359-374). IEEE.
  • Vijarayani S., and Prabha M. S., 2011. Association rule hiding using artificial bee colony algorithm. International Journal of Computer Applications, 33(2), pp. 41-47.
  • Preethi P., K., Kumar P., Ullhaq M. R., Naveen A., and Janapana H., 2018. Privacy preserving data clustering using a heteregeneous data distortion. Smart Intelligent Computing and Applications, pp. 477-486.
  • Inan A., Kaya S. V., Saygın Y., Savaş E., Hintoğlu A. A., and Levi A., 2007. Privacy preserving clustering on horizontally partioned data. Data and Knowledge Engineering, 63(3), pp.646-666.
  • Hyma, J., Varma, P. S., Gupta, S. N. K., & Salini, R. (2019). Heterogeneous Data Distortion for Privacy-Preserving SVM Classification. In Smart Intelligent Computing and Applications (pp. 459-468). Springer, Singapore.
  • Kantarcioglu, M., & Clifton, C. (2004, September). Privately computing a distributed k-nn classifier. In European conference on principles of data mining and knowledge discovery (pp. 279-290). Springer, Berlin, Heidelberg.
  • Rubinstein B. I. P., Bartlett P. L., Huang L., and Taft N., 2009. Learning in a large function space: Privacy preserving mechanisms for SVM learning. Computing Research Repository.
  • Chaudhuri, K., & Monteleoni, C. (2009). Privacy-preserving logistic regression. In Advances in neural information processing systems (pp. 289-296).
  • Friedman A., and Schuster A., 2010. Data mining with differential privacy. In Proc. 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA.
  • Jagannathan G., Pillaipakkamnatt K., and Wright R. N., 2012. A practical differentially private random decision tree classifier. Transactions on Data Privacy., no. 5, pp. 273-295.
  • Jagannathan G., Monteleoni C., and Pillaipakkamnatt K., 2013. A semi-supervised learning approach to differential privacy. In Proc. 13th International Conference on Data Mining Workshops, TX, USA.
  • Vaidya J., Shafiq B., Basu A., and Hong Y., 2013. Differentially private naïve bayes classification. In Proc. IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technologies, pp. 571–576.
  • Fletcher S., and Islam M. Z., 2015. A differentially private decision forest. In Proc. 13th Australasian Data Mining Conference, Sydney, Australia.
  • Bojarski M., Choromanska A., and Choromanski K., 2015. Differentially-and non-differentially private random decision trees. arXiv preprint arXiv:1410.6973v2.
  • Su, D., Cao, J., Li, N., Bertino, E., & Jin, H. (2016, March). Differentially private k-means clustering. In Proceedings of the sixth ACM conference on data and application security and privacy (pp. 26-37).
  • Gursoy M. E., Inan A., Nergiz M. E., and Saygın Y., 2017. Differentially private nearest neighbor classification. Data Mining and Knowledge Discovery, vol. 31, no. 5, pp. 1544-1575.
  • Goldreich O., 2004. General cryptographic protocols. The Foundations of Cryptography. Cambridge, U.K.:Cambridge Univ. Press.