An efficient storage-optimizing tick data clustering model

An efficient storage-optimizing tick data clustering model

Tick data is a large volume of data, related to a phenomenon such as stock market or weather change, with data values changing rapidly over time. An important issue is to store tick data table in a way that it occupies minimum storage space while at the same time it can provide fast execution of queries. In this paper, a mathematical model is proposed to partition tick data tables into clusters with the aim of minimizing the required storage space. The genetic algorithm is then used to solve the mathematical model which is indeed a clustering model. The proposed method has been evaluated on a real-world weather tick dataset and compared to the storage-optimizing hierarchical agglomerative clustering (SOHAC) algorithm. The experiments show that our proposed method substantially outperforms SOHAC in achieving smaller values for compression ratio while reducing the execution time for small number of clusters.

___

  • [1] Chen M, Mao S, Liu Y. Big data: a survey. Mobile Networks and Applications 2014; 19(2): 171-209.
  • [2] Kaur PD. A survey on big data storage strategies. In: Proceesings of International Conference on Green Computing and Internet of Things (ICGCIoT); Noida; 2015. pp. 280-284
  • [3] Siddiqa A, Hashem IAT, Yaqoob I, Marjani M, Shamshirband et al. A survey of big data management: taxonomy and state-of-the-art. Journal of Network and Computer Applications 2016; (71): 151-166.
  • [4] Oh KJ, Kim KJ. Analyzing stock market tick data using piecewise nonlinear model. Expert Systems with Applications 2002; 22(3): 249-255.
  • [5] Salomon D. Data Compression: The Complete Reference. USA: Springer Science and Business Media, 2004.
  • [6] Nagy GI, Buza K. Sohac: Efficient storage of tick data that supports search and analysis. In: Industrial Conference on Data Mining. New York, NY, USA; 2012. pp. 38-51.
  • [7] Buza K, Nagy GI. Nanopoulos A. Storage-optimizing clustering algorithms for high-dimensional tick data. Expert Systems with Applications 2014; 41(9): 4148-4157
  • 8] Arora S, Chana I. A survey of clustering techniques for big data analysis. In: 5th IEEE International Confluence on the Next Generation Information Technology Summit; 2014. pp. 59-65.
  • [9] Buza K, Buza A, Kis PB. A distributed genetic algorithm for graph-based clustering. In: Springer Man-Machine Interactions; Berlin, Heidelberg, Germany; 2011. pp. 323-331.
  • [10] Homaifar A, Qi CX, Lai SH. Constrained optimization via genetic algorithms. Simulation 1994; 62(4): 242-253.
  • [11] Joines JA, Houck CR. On the use of non-stationary penalty functions to solve nonlinear constrained optimization problems with GA?s. In: Evolutionary Computation, IEEE World Congress on Computational Intelligence; New York, NY, USA; 2015. pp. 579-584.
  • [12] Dionne G, Duchesne P, Pacurar M. Intraday value at risk (ivar) using tick-by-tick data with application to the toronto stock exchange. Journal of Empirical Finance 2009; 16: 777-792.
  • [13] Ohnishi T, Mizuno T, Aihara K, Takayasu M, Takayasu H. Statistical properties of the moving average price in dollar–yen exchange rates. Physica A: Statistical Mechanics and its Applications 2004; 344(1): 207-210.
  • [14] Sazuka N. Analysis of binarized high frequency financial data. The European Physical Journal B-Condensed Matter and Complex Systems 2006; 50(1-2): 129-131.
  • [15] Pal SH, Patet JN. Time-series data mining: a review. Binary Journal of Data Mining and Networking 2015; 5(1): 01-04.
  • [16] Ahmad S, Taskaya-Temizel T, Ahmad K. Summarizing time series: learning patterns in volatile series. In: Springer International Conference on Intelligent Data Engineering and Automated Learning; Berlin, Heidelberg; 2004. pp. 523-532.
  • [17] Nagy GI, Buza K. Partitional clustering of tick data to reduce storage space. In: Proceedings of IEEE 16th International Conference on Intelligent Engineering Systems (INES); Lisbon, Portugal; 2012. pp. 555-560.
  • [18] Wolpert DH, Macready WG. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Compu- tation 1997; 1(1): 67-82.