Privacy preserving in association rules using a genetic algorithm

Association rule mining is one of the data mining techniques used to extract hidden knowledge from large datasets. This hidden knowledge contains useful and confidential information that users want to keep private from the public. Similarly, privacy preserving data mining techniques are used to preserve such confidential information or restrictive patterns from unauthorized access. The pattern can be represented in the form of a frequent itemset or association rule. Furthermore, a rule or pattern is marked as sensitive if its disclosure risk is above a given threshold. Numerous techniques have been used to hide sensitive association rules by performing some modifications in the original dataset. Due to these modifications, some nonrestrictive patterns may be lost, called lost rules, and new patterns are also generated, known as ghost rules. In the current research work, a genetic algorithm is used to counter the side effects of lost rules and ghost rules. Moreover, the technique can be applied for small as well as for large datasets in the domain of medical, military, and business datasets.

Privacy preserving in association rules using a genetic algorithm

Association rule mining is one of the data mining techniques used to extract hidden knowledge from large datasets. This hidden knowledge contains useful and confidential information that users want to keep private from the public. Similarly, privacy preserving data mining techniques are used to preserve such confidential information or restrictive patterns from unauthorized access. The pattern can be represented in the form of a frequent itemset or association rule. Furthermore, a rule or pattern is marked as sensitive if its disclosure risk is above a given threshold. Numerous techniques have been used to hide sensitive association rules by performing some modifications in the original dataset. Due to these modifications, some nonrestrictive patterns may be lost, called lost rules, and new patterns are also generated, known as ghost rules. In the current research work, a genetic algorithm is used to counter the side effects of lost rules and ghost rules. Moreover, the technique can be applied for small as well as for large datasets in the domain of medical, military, and business datasets.

___

  • R. Agrawal, R. Srikant, “Privacy preserving data mining”, ACM SIGMOD International Conference on Management of Data, Vol. 29, pp. 439–450, 2000.
  • L. Brankovic, V. Estivill-Castro, “Privacy issues in knowledge discovery and data mining”, Australian Institute of Computer Ethics Conference, pp. 89–99, 1999.
  • C. Clifton, D. Marks, “Security and privacy implications of data mining”, ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, pp. 15–19, 1996.
  • Y. Lindell, B. Pinkas, “Privacy preserving data mining”, Proceedings of the CRYPTO, pp. 36–54, 2000.
  • D.E. O’Leary, “Knowledge discovery as a threat to database security”, Proceedings of IEEE Knowledge Discovery in Databases, pp. 507–516, 1991.
  • V. Verykios, E. Bertino, I.G. Fovino, L.P. Provenza, Y. Saygin, and Y. Theodoridis, “State-of-the-art in Privacy Preserving Data Mining”, SIGMOD Record, Vol. 33, pp. 50–57, 2004.
  • D. Agrawal, C. Aggarwal, “On the design and quantification of privacy preserving data mining algorithms”, Proceedings of the 20th Conference on Principles of Database Systems, pp. 247–255, 2001.
  • C. Clifton, “Protecting against data mining through samples”, Proceedings of the IFIP WG 11.3 13th International Conference on Database Security, pp. 193–207, 1999.
  • C. Clifton, “Using sample size to limit exposure to data mining”, Journal of Computer Security, Vol. 8, pp. 281–307, 2000.
  • C. Clifton, M. Kantarcioglu, X. Lin, M. Zhu, “Tools for privacy preserving distributed data mining”, Proceedings of the SIGKDD Explorations, Vol. 4, pp. 28–34, 2002.
  • E. Dasseni, V.S. Verykios, A. Elmagarmid, E. Bertino, “Hiding association rules by using confidence and support”, Proceedings of 4th Information Hiding Workshop, pp. 369–383, 2001.
  • S. Oliveira, O. Zaiane, “Privacy preserving frequent itemset mining”, Proceedings of the IEEE 14th International Conference on Data Mining, Vol. 14, pp. 43–54, 2002.
  • S. Oliveira, O. Zaiane, “Algorithms for balancing privacy and knowledge discovery in association rule mining”, Proceedings of the 7th International Database Engineering and Applications Symposium, pp. 54–63, 2003.
  • S. Oliveira, O. Zaiane, “Protecting sensitive knowledge by data sanitization”, Proceedings of the IEEE 3rd International Conference on Data Mining, pp. 613–616, 2003.
  • Y. Saygin, V.S. Verykios, C. Clifton, “Using unknowns to prevent discovery of association rules”, SIGMOD Record, Vol. 30, pp. 45–54, 2001.
  • Evfimievski, R. Srikant, R. Agrawal, J. Gehrke, “Privacy preserving mining of association rules”, Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 217–228, 2002. Evfimievski, “Randomization in privacy preserving data mining”, Proceedings of the SIGKDD Explorations, Vol. 4, pp. 43–48, 2002.
  • Evfimievski, J. Gehrke, R. Srikant, “Limiting privacy breaches in privacy preserving data mining”, PODS, pp. 211–222, 2003.
  • M. Kantarcioglu, C. Clifton, “Privacy-preserving distributed mining of association rules on horizontally partitioned data”, ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, 2002.
  • J. Vaidya, C.W. Clifton. “Privacy preserving association rule mining in vertically partitioned data”, Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 639–644, 2002. R. Agarwal, T. Imielinski, A. Swami, “Mining associations between sets of items in large databases”, ACM SIGMOD International Conference on the Management of Data, Vol. 22, pp. 207–216, 1993.
  • S.L. Wang, A. Jafari, “Using unknowns for hiding sensitive predictive association rules”, Proceedings of the IEEE International Conference on Information Reuse and Integration, pp. 223–228, 2005.
  • C.C. Aggarwal, P.S. Yu, Privacy-Preserving Data Mining: Models and Algorithm, Springer, 2008.
  • V.S. Verykios, A. Elmagarmid, E. Bertino, Y. Saygin, E. Dasseni, “Association rules hiding”, IEEE Transactions on Knowledge and Data Engineering, Vol. 16, pp. 434–447, 2004.
  • K. Duraiswamy, D. Manjula, N. Maheswari, “A new approach to sensitive rule hiding”, Journal of Computer and Information Science, Vol. 1, 2008.
  • M.N. Dehkordi, K. Badie, A.K. Zadeh, “A novel method for privacy preserving in association rule mining based on genetic algorithms”, Journal of Software, Vol. 4, pp. 555–562, 2009.
  • S.L. Wang, A. Jafari, “Hiding sensitive predictive association rules”, IEEE International Conference on Systems, Man and Cybernetics, Vol. 1, pp. 164–169, 2005.
  • Y. Saygin, V.S. Verykios, A.K. Elmagarmid, “Privacy preserving association rule mining”, Proceedings of the 12th International Workshop on Research Issues in Data Engineering, 2002.
  • Clifton, D. Marks, “Security and privacy implications of data mining”, Proceedings of the ACM Workshop Research Issues on Data Mining and Knowledge Discovery, pp. 15–19, 1996.
  • M. Atallah, E. Bertino, A. Elmagarmid, M. Ibrahim, V. Verykios, “Disclosure limitation of sensitive rules”, Proceedings of the IEEE Knowledge and Data Engineering Exchange Workshop, pp. 45–52, 1999.
  • W. Chih-Chia, C. Shan-Tai, L. Hung-Che, “A novel algorithm for completely hiding sensitive association rules”, Proceedings of the IEEE 8th International Conference on Intelligent Systems Design and Applications, Vol. 3, pp. 202–208, 2008.
  • M. Naeem, S. Asghar, “A novel architecture for hiding sensitive association rules”, Proceedings of the DMIN, pp. 380–385, 2010.
  • J. Holland, Genetic Algorithm, Scientific American, 1992.
  • Frank, A. Asuncion, “ {UCI} machine learning repository”, University of California, Irvine, School of Information and Computer Sciences, Available at: http://archive.ics.uci.edu/ml, 2010, Last accessed: 02.03.2012.
  • H. Hamilton, “DBD: data mining projects”, University of Regina Available at: http://www2.cs.uregina.ca/ ∼dbd/cs831/index.html, 2000–9, Last accessed: 15.03.2012.
  • Track Open Source Project, “Extended bakery dataset”, Integrated SCM & Project Management, Available at : https://wiki.csc.calpoly.edu/datasets/wiki/ExtendedBakery20k, 2003, Last accessed: 02.03.2012.