Filtre Tabanlı Nitelik Seçimi ve Topluluk Öğrenme Yaklaşımlarıyla Borsa İstanbul Enerji Endeksi Yön Tahmini

Yapılan çalışmada finansal haber sitelerinde yayınlanan ekonomi haberleri kullanılarak Borsa İstanbul’un önemli endekslerinden XKMYA (enerji)’nın günlük fiyat değişim yönleri tahmin edilmiştir. Fiyat değişimlerinin tahmininde haber metinlerinde yer alan bilgi içeren kelimeler nitelik olarak kullanılmıştır. Haber metinlerinden çıkarılan 13000’e yakın kelime arasından endekslerin hareket yönüne etki eden kelimeler filtre tabanlı Simetrik Belirsizlik (SU) ve Fisher Puanı (F-P) nitelik seçme yöntemleri ile seçilmiştir. Seçilen kelimeler topluluk öğrenme modeli olan LightGBM sınıflandırıcısına girdi olarak verilmiş ve sınıflandırıcıların performansları Makro-Ortalama (MO) F-ölçütü ve doğruluk ile tahmin edilmiştir. Sınıflandırıcıların performansları incelendiğinde, XKMYA endeksinin günlük yön tahmini 0.68 MO F-ölçütü oranıyla tahmin edilmiştir. Tahmin işleminde F-P yöntemiyle seçilen nitelikler SU yöntemiyle seçilenlere göre daha yüksek performans oranlarına sahip olmuştur. Yön tahmininde başarılı olan 5 bireysel modelin yığınlama topluluk öğrenmesi yaklaşımıyla birleştirilmesi sonucunda ise MO F-ölçütü oranında %1’lik, doğruluk oranında ise %2’lik performans artışı meydana gelmiştir.

Borsa Istanbul Energy Index Direction Prediction with Filter-Based Feature Selection and Ensemble Learning Approaches

In the study, daily price change directions of XKMYA (energy), one of the important indexes of Borsa Istanbul, were predicted by using financial news published on financial portal website. In the prediction of price changes, the words containing information in the news texts were used as features. Among the 13000 words extracted from the news texts, the words influencing the movement direction of the index were selected by filter-based Symmetrical Uncertainty (SU) and Fisher Score (F-P) feature selection methods. The selected words were given as input to a robust ensemble learner, the LightGBM classifier and the model performances were predicted with Macro-Averaged (MA) F-measure and accuracy metrics. When the performances of the classifiers were examined, the daily direction prediction of the XKMYA index was estimated with a ratio of 0.68 MA F-measure. In the prediction process, the features selected by the F-P method had higher performance rates than those selected by the SU method. In addition, combining 5 successful individual models with an ensemble learning approach called as stacking resulted in a performance increase of 1% in MA F-measure and 2% in accuracy.

PDF

___

[1] Vachhani, H., Obiadat, M. S., Thakkar, A., Shah, V., Sojitra, R., Bhatia, J., & Tanwar, S. (2019, October). Machine learning based stock market analysis: A short survey. In International Conference on Innovative Data Communication Technologies and Application (pp. 12-26). Springer, Cham.
[2] Li, X., Wu, P., & Wang, W. (2020). Incorporating stock prices and news sentiments for stock market prediction: A case of Hong Kong. Information Processing & Management, 102212.
[3] Patel, J., Shah, S., Thakkar, P., & Kotecha, K. (2015). Predicting stock market index using fusion of machine learning techniques. Expert Systems with Applications, 42(4), 2162-2172.
[4] Ballings, M., Van den Poel, D., Hespeels, N., & Gryp, R. (2015). Evaluating multiple classifiers for stock price direction prediction. Expert Systems with Applications, 42(20), 7046-7056.
[5] Mehta, S., Rana, P., Singh, S., Sharma, A., & Agarwal, P. (2019, August). Ensemble learning approach for enhanced stock prediction. In 2019 Twelfth International Conference on Contemporary Computing (IC3) (pp. 1-5). IEEE.
[6] Nobre, J., & Neves, R. F. (2019). Combining principal component analysis, discrete wavelet transform and XGBoost to trade in the financial markets. Expert Systems with Applications, 125, 181-194.
[7] Hájek, P. (2018). Combining bag-of-words and sentiment features of annual reports to predict abnormal stock returns. Neural Computing and Applications, 29(7), 343-358.
[8] Hardeniya, N., Perkins, J., Chopra, D., Joshi, N., & Mathur, I. (2016). Natural language processing: python and NLTK. Packt Publishing Ltd.
[9] Gündüz, H., Yaslan, Y., & Çataltepe, Z. (2018, May). Stock market prediction with deep learning using financial news. In 2018 26th Signal Processing and Communications Applications Conference (SIU) (pp. 1-4). IEEE.
[10] Url1:https://github.com/otuncelli/turkish-stemmer-python (Erişim Tarihi: 08.09.2020)
[11] Gülşen, E., Gündüz, H., Cataltepe, Z., & Serinol, L. (2015, May). Big data feature selection and projection for gender prediction based on user web behaviour. In 2015 23nd Signal Processing and Communications Applications Conference (SIU) (pp. 1545- 1548). IEEE.
[12] Cherrington, M., Thabtah, F., Lu, J., & Xu, Q. (2019, April). Feature selection: filter methods performance challenges. In 2019 International Conference on Computer and Information Sciences (ICCIS) (pp. 1-4). IEEE.
[13] Sosa-Cabrera, G., García-Torres, M., Gómez, S., Schaerer, C., & Divina, F. (2017). Understanding a version of multivariate symmetric uncertainty to assist in feature selection. arXiv preprint arXiv:1709.08730.
[14] Saqlain, S. M., Sher, M., Shah, F. A., Khan, I., Ashraf, M. U., Awais, M., & Ghani, A. (2019). Fisher score and Matthews correlation coefficient-based feature subset selection for heart disease diagnosis using support vector machines. Knowledge and Information Systems, 58(1), 139-167.
[15] Schapire, R. E. (2003). The boosting approach to machine learning: An overview. In Nonlinear estimation and classification (pp. 149-171). Springer, New York, NY.
[16] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. In Advances in neural information processing systems (pp. 3146-3154).
[17] Gunduz, H. (2019). Deep learning-based Parkinson’s disease classification using vocal feature sets. IEEE Access, 7, 115540- 115551.
[18] Gunduz, H., Yaslan, Y., & Cataltepe, Z. (2017). Intraday prediction of Borsa Istanbul using convolutional neural networks and feature correlations. Knowledge-Based Systems, 137, 138-148.
[19] Pavlyshenko, B. (2018, August). Using stacking approaches for machine learning models. In 2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP) (pp. 255-258). IEEE.