SARIMAX ile Zaman Serisi Performansı ve Sınırlamaları: Perakende Verileri İçeren Bir Uygulama

Dünya devi perakendecilerin satış verileri yoğunluğu borsa verileriyle rekabet etmektedir. Bir numaralı nedenin işlem sayısı oldugu bu dünyada yenilenme oranı saniyede birkaç yüz civarında seyretmekte, bu da satış tahmin noktasında örnekleri az olan zaman serileri için ideal bir uygulama alanı yapma fikrini ortaya çıkarmaktadir. Zaman serilerinin uzun bir tahmin modelleri listesi oldugu bilinmektedir, ancak hemen hepsi regresyona dayanır ve regresyon her zaman lider model değildir. Burada ARIMA / SARIMA türevi gibi en popüler zaman serisi teknikleriyle olası performans darboğazlarını araştırdık ve regresyonun zaman serilerinin temellerini sınırlayıp sınırlamadığını yanıtlamaya çalıştık. Bu durum ilerideki çalışmalar için iki yeni soru grubu yarattı. Birincisi, dünyada henüz örneklenmemiş olan zaman serilerinin regresyondan ziyade sınıflandırma modellerine dayanma olasılığı ile ilgili oldu. Diğer bir deyişle, ikili sayı sıralamasında çıktı üreten sınıflandırma modellerinin boyutu ile olasılık yüzdelerine dayalı ön çıktı tahminlerinde ciro büyüklüğünü eşleştiren bulanık mantık uygulamalarının performans üzerindeki olası olumlu etkisi incelendi. İkincisi, tanımı gereği zaman dışında veri kabul etmeyen zaman serilerinin önemli bir faktör üretmeyeceği ve bu çabaya ne ölçüde ihtiyaç duyulabileceği ile ilgili oldu. Sonuç olarak, zaman serilerinin ancak daha fazla zaman verisiyle daha iyi performans göstereceği şeklindeki zımnen doğru kabul edilen gerçek şüphe uyandırmış, yeni ikilemler doğurmuştur, ki ileri araştırmalar için bir fırsat olarak görülebilir.

Time Series Performance and Limitations with SARIMAX: An Application with Retail Store Data

World giant retailers’ sales data competes with stock exchange data in respect to latency, where the number one reason is the number of transactions, it is around few hundreds per second and it only goes up in time being. This emerges the idea of making an ideal application area for time series, however the field looks like lacking comparisons. This is an attempt to address its dynamics with a generic reference data that was published with Walmart retail figures. Time series has a long list of predictive models, however they are all based on regression and the problem is that regression is not always make the leader model. Here we explored the possible performance bottlenecks with most popular techniques, like ARIMA/SARIMA derivative and try to answer if regression limits time series foundations. Does the best fit with ARIMA derivatives always give the best scores? In other words, per time-series performance, does this make a lagging or leading factor? This situation created two new question groups for future studies. The first was concerned with the possibility that time series, which have not yet been exemplified in the world, are based on classification models rather than regression. In other words, the possible positive effect of fuzzy logic applications on performance, which matches the size of the classification models that produce output in binary number order with the size of turnover in pre-output estimates based on probability percentages. The second was the fact that time series, which by definition do not accept data other than time, will not produce an important factor, and to what extent this effort might be needed.

___

  • Arunraja, N. S., Ahrensb, D., Fernandesa, M. & Müllera, M. (2014). Time series sales forecasting to reduce food waste in retail ındustry. The 34th International Symposium on Forecasting, Tokyo, 118-134.
  • Bera, S. (2021). An application of operational analytics: For predicting sales revenue of restaurant. In Machine Learning Algorithms for Industrial Applications (pp. 209-235). Springer,
  • Cham. https://doi.org/10.1007/978-3-030-50641-4 Bojer, C. S., & Meldgaard, J. P. (2021). Kaggle forecasting competitions: An overlooked learning opportunity. International Journal of Forecasting, 37(2), 587-603. https://dx.doi.org/10.1016/j.ijforecast.2020.07.007
  • Chen, D., Liu, Q., Yang, X., Yu, X., Ma, F., & Su, J. (2019, October). Self-Adaptive Particle Filter Based Time Series Prediction of Online Retailer Daily Sale. 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) (pp. 1-8). IEEE. https://dx.doi.org/10.1109/CISP-BMEI.2018.8633040
  • Catal, C., Ece, K., Arslan, B., & Akbulut, A. (2019). Benchmarking of regression algorithms and time series analysis techniques for sales forecasting. Balkan Journal of Electrical and Computer Engineering, 7(1), 20-26. http://dx.doi.org/10.17694/bajece.494920
  • Geurts, P. (2002). Contributions to decision tree induction: bias/variance tradeoff and time series classification (Doctoral dissertation, ULiège-University of Liège).
  • Kaggle. (2020, November 1). Walmart Recruiting - Store Sales Forecasting: Use historical markdown data to predict store sales. https://www.kaggle.com/c/walmart-recruiting-storesales-forecasting/overview
  • Keleş, M. B., Keleş, A., & Keleş, A. (2020). Yapay zekâ teknolojisi ile uçuş fiyatı tahmin modeli geliştirme. Turkish Studies, 15(4), 511-520. https://dx.doi.org/10.47844/TurkishStudies.45993
  • Lai, R. K., Fan, C. Y., Huang, W. H., & Chang, P. C. (2009). Evolving and clustering fuzzy decision tree for financial time series data forecasting. Expert Systems with Applications, 36(2), 3761-3773. https://doi.org/10.1016/j.eswa.2008.02.025
  • Liu, Y., Huang, X., An, A., & Yu, X. (2007, July). ARSA: a sentiment-aware model for predicting sales performance using blogs. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 607-614). ACM. https://doi.org/10.1145/1277741.1277845
  • Miller, C., Picchetti, B., Fu, C., & Pantelic, J. (2021). Limitations of machine learning for building energy prediction: ASHRAE Great Energy Predictor III Kaggle competition error analysis. arXiv preprint arXiv:2106.13475.
  • Raiyani, A., Lathigara, A., & Mehta, H. (2021). Usage of time series forecasting model in Supply chain sales prediction. In IOP Conference Series: Materials Science and Engineering (pp. 1-8). IOP Publishing. http://dx.doi.org/10.1088/1757-899X/1042/1/012022
  • Slimani, I., Farissi, I. E., & Achchab, S. (2017). Configuration and implementation of a daily artificial neural network-based forecasting system using real supermarket data. International Journal of Logistics Systems and Management, 28(2), 144-163. https://dx.doi.org/10.1109/GOL.2016.7731709
  • Statsmodel. (2020, November 1). Statsmodels.tsa.statespace.sarimax.SARIMAX. https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX. html
  • Torgo, L. and Gama, J. (1997). Regression using classification algorithms. Intelligent Data Analysis, 1(4), 271-293. https://doi.org/10.1016/S1088-467X(97)00013-9
  • Vishwas, B. V., & Patel, A. (2020). Regression Extension Techniques for Time-Series Data. In Hands-on Time Series Analysis with Python. 99-184. Apress (Springer)