Authentication of uncertain data based on k-means clustering
Authentication of uncertain data based on k-means clustering
Probabilistic databases and database outsourcing are two recent and important applications of database systems. In this paper, we introduce authenticated query processing in outsourced probabilistic databases. We have proposed a novel authenticated data structure (ADS) called PH-tree, which is a combination of PDR-tree and MH-tree. We have also implemented k-means clustering as a preprocessor. We have compared our algorithm with an existing ADS called MR-tree and showed that PH-trees outperform MR-trees significantly.
___
- [1] Almeida TV, G¨uting RH. Supporting uncertainty in moving objects in network databases. In: 2005 Annual International Workshop on Geographic Information Systems; 31 October05 November 2005; Bremen, Germany. New York, NY, USA: ACM. pp. 31-40.
- [2] Andritsos P, Fuxman A, Miller RJ. Clean answers over dirty databases: a probabilistic approach. In: IEEE 2006 International Conference on Data Engineering; 37 April 2006; Georgia, USA. New York, NY, USA: IEEE. pp. 30.
- [3] Silberstein AS, Braynard R, Ellis C, Munagala K, Yang J. A sampling-based approach to optimizing top-k queries in sensor networks. In: IEEE 2006 International Conference on Data Engineering; 37 April 2006; Georgia, USA. New York, NY, USA: IEEE. pp. 68.
- [4] Antova L, Koch C, Olteanu D. (2007) 10(106) Worlds and beyond: efficient representation and processing of incomplete information. In: IEEE 2007 International Conference on Data Engineering; 1520 April 2007; Istanbul, Turkey. New York, NY, USA: IEEE. pp. 606-615.
- [5] Lian X, Chen L. Probabilistic top-k dominating queries in uncertain databases. Inform Sciences 2013: 23-46.
- [6] Kanagal B, Deshpande A. Indexing correlated probabilistic databases. In: ACM SIGMOD 2009 International Conference on Management of Data; 29 June02 July 2009; Rhode Island, USA. New York, NY, USA: ACM. pp. 455-468.
- [7] Soliman MA, Ilyas IF, Chang KCC. Top-k query processing in uncertain databases. In: IEEE 2007 International Conference on Data Engineering; 1520 April 2007; Istanbul, Turkey. New York, NY, USA: IEEE. pp. 896-905.
- [8] Hacigumus H, Iyer B, Mehrotra S. Providing database as a service. In: IEEE 2002 International Conference on Data Engineering; 26 February1 March 2002; Istanbul, Turkey. New York, NY, USA: IEEE. pp. 29-38.
- [9] Yang Y, Papadias D, Papadopoulos S, Kalnis P. Authenticated join processing in outsourced databases. In: ACM SIGMOD 2009 International Conference on Management of Data; 29 June02 July 2009; Rhode Island, USA. New York, NY, USA: ACM. pp. 5-18.
- [10] Singh S, Mayfield C, Prabhakar S, Shah R, Hambrusch SE. Indexing uncertain categorical data. In: IEEE 2007 International Conference on Data Engineering; 1520 April 2007; Istanbul, Turkey. New York, NY, USA: IEEE. pp. 616-625.
- [11] Merkle RC. A certified digital signature. In: 1989 International Conference on Advances in Cryptology; 2024 August 1989; California, USA. pp. 218-238.
- [12] Dalvi N, Suciu D. Management of probabilistic data foundations and challenges. In: ACM SIGMOD 2007 Symposium on Principles of Database Systems; 1214 June 2007; Beijing, China. New York, NY, USA: ACM. pp.1-12.
- [13] AbdulAzeem Y, ElDesouky AI, Ali HA. A framework for ranking uncertain distributed database. Data Knowl Eng 2014; 92: 1-19.
- [14] Frank A, Asuncion A. UCI Machine Learning Repository: Adult data set, 2010.
- [15] Kullback S, Leibler RA. On information and sufficiency. Ann Math 1951; 22: 79-86.
- [16] Pereira FCN, Tishby N, Lee L. Distributional clustering of English words. In: 1993 Meeting of the Association for Computational Linguistics; 2226 June 1993; Ohio, USA. pp. 183-190.
- [17] Ge T, Zdonik SB, Madden S. Top-k queries on uncertain data: on score distribution and typical answers. In: ACM SIGMOD 2009 International Conference on Management of Data; 29 June02 July 2009; Rhode Island, USA. New York, NY, USA: ACM. pp. 375-388.
- [18] Li F, Yi K, Jestes J. Ranking distributed probabilistic data. In: ACM SIGMOD 2009 International Conference on Management of Data; 29 June02 July 2009; Rhode Island, USA. New York, NY, USA: ACM. pp. 361-374.
- [19] Liu D, Wan C, Xiong N, Park JH, Rho S. Top-k entities query processing on uncertainly fused multi-sensory data. Pers Ubiquit Comput 2013; 17: 951-963.
- [20] Jensen FV, Jensen F. Optimal junction trees. In: 1994 Conference on Uncertainty in Artificial Intelligence; 2931 July 1994; Washington, USA. pp. 360-366.
- [21] Angiulli F, Fassetti F. Indexing uncertain data in general metric spaces. IEEE T Knowl Data En 2012; 24: 1640- 1657.
- [22] Zhang Y, Zhang W, Lin Q, Lin X, Shen HT. Effectively indexing the multidimensional uncertain objects. IEEE T Knowl Data En 2013; 26: 608-622.
- [23] Guttman A. R-Trees: a dynamic index structure for spatial searching. In: ACM SIGMOD 1984 International Conference on Management of Data; 1821 June 1984; Massachusetts, USA. New York, NY, USA: ACM. pp. 47-57.
- [24] Li F, Hadjieleftheriou M, Kollios G, Reyzin L. Dynamic authenticated index structure for outsourced databases. In: ACM SIGMOD 2006 International Conference on Management of Data; 2729 June 2006; Illinois, USA. New York, NY, USA: ACM. pp. 121-132.
- [25] Cheng W, Pang H, Tan KL. Authenticating multi dimensional query results in data publishing. In: 2006 International Federation for Information Processing Workshop on Data and Applications Security; 31 July2 August 2006; Sophia Antipolis, France. pp. 60-73.
- [26] Yang Y, Papadopoulos S, Papadias D, Kollios G. Spatial outsourcing for location based services. In: IEEE 2008 International Conference on Data Engineering; 712 April 2008; Cancun, Mexico. New York, NY, USA: IEEE. pp. 1082-1091.
- [27] Ku WS, Hu L, Shahabi C, Wang H. A query integrity assurance scheme for accessing outsourced spatial databases. Geoinformatica 2013; 17: 97-124.
- [28] Evans A, Kantrowitz W, Weiss E. A user authentication scheme not requiring secrecy in the computer. Commun ACM 1974; 17: 437-442.
- [29] Wilkes MV. Time sharing computer systems. New York, NY, USA: Elsevier Science, 1975.
- [30] Anderson R, Biham W. Tiger: A fast new hash function. In: 1996 International Workshop on Fast Software Encryption; 2123 February 1996; Cambridge, UK. pp. 89-97.
- [31] Mendel F. Two passes of Tiger are not one-way. In: 2009 International Conference on Cryptology; 2125 June 2009; Gammarth, TUN. pp. 29-40.
- [32] Ishai Y, Kushilevitz E, Ostrovsky R. Sufficient conditions for collision-resistant hashing. In: 2005 Theory of Cryptography Conference; 1012 February 2005; Cambridge, MA, USA. pp. 445-456.
- [33] Rogaway P, Shrimpton T. Cryptographic hash-function basics: definitions, implications, and separations for preimage resistance, second-preimage resistance, and collision resistance. In: 2004 International Workshop on Fast Software Encryption; 57 February 2004; Delhi, India. pp. 371-388.
- [34] Lloyd SP. Least squares quantization in PCM. IEEE T Inform Theory 1982; 28: 129-137.
- [35] Cai X, Li W. A spectral analysis approach to document summarization: Clustering and ranking sentences simultaneously. Inform Sciences 2011; 181: 3816-3827.
- [36] Ghosh A, Mishra NS, Ghosh S. Fuzzy clustering algorithms for unsupervised change detection in remote sensing images. Inform Sciences 2011; 181: 699-715.
- [37] Huang X, Zheng X, Yuan W, Wang F, Zhu S, Witten IH. Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization. Inform Sciences 2011; 181: 2293-2302.
- [38] Tsai YS, Tsai P. Adaptive data hiding for vector quantization images based on overlapping codeword clustering. Inform Sciences 2011; 181: 3188-3198.
- [39] Oh SK, Kim WD, Pedrycz W, Joo SC. Design of k-means clustering-based polynomial radial basis function neural networks (pRBF NNs) realized with the aid of article swarm optimization and differential evolution. Neurocomputing 2012; 78: 121-132.
- [40] An F, Mattausch HR. K-means clustering algorithm for multimedia applications with flexible HW/SW co-design. J Syst Architect 2013; 59: 155-164.
- [41] Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B Met 1977; 39: 1-38.
- [42] Agrawal R, Ikant R, Thomas D. Privacy preserving OLAP. In: ACM SIGMOD 2005 International Conference on Management of Data; 1416 June 2005; Maryland, USA. New York, NY, USA: ACM. pp. 251-262.
- [43] Zadrozny B. Learning and evaluating classifiers under sample selection bias. In: 2004 International Conference on Machine Learning; 48 July 2004; Alberta, Canada. New York, NY, USA: ACM. pp. 114.
- [44] Saharon R. Model selection via the AUC. In: 2004 International Conference on Machine Learning; 48 July 2004; Alberta, Canada. New York, NY, USA: ACM. pp. 89.
- [45] Freund Y, Mason L. The alternating decision tree learning algorithm. In: 1999 International Conference on Machine Learning; 2730 June 1999; Bled, Slovenia. New York, NY, USA: ACM. pp. 124-133.
- [46] Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA Data Mining Software: An Update. SIGKDD Explor 2009; 11: 10-18.