Stock daily return prediction using expanded features and feature selection

Stock market prediction is a very noisy problem and the use of any additional information to increase accuracy is necessary. In this paper, for the stock daily return prediction problem, the set of features is expanded to include indicators not only for the stock to be predicted itself but also a set of other stocks and currencies. Afterwards, different feature selection and classification methods are utilized for prediction. The daily close returns of the 3 most traded stocks (GARAN, THYAO, and ISCTR) in Borsa İstanbul (BIST) are predicted using indicators computed on those stocks, indicators for all the other stocks listed in the BIST100 index, and indicators on the dollar-gold prices. Twenty-five different indicators on daily stock prices are computed to form feature vectors for each trading day. These feature vectors are assigned class labels according to the daily close returns. Expanding the feature space with BIST100 stocks features results in a high dimensional feature space, with possibly noisy or irrelevant features. Therefore, feature selection methods are utilized to select the most informative features. In order to determine relevance scores of features, fast filter-based methods, gain ratio and relief, are used. Experiments are performed based on individual stock features, dollar-gold features (DG), BIST100 stock features (BIST100), and a combination of BIST100 and DG with and without feature selection. Using the gain ratio feature selection with a gradient boosting machine (GBM), the movements of GARAN stock were predicted with an accuracy of 0.599 and an F-measure of 0.614. For THYAO, the relief feature selection with the GBM gave an accuracy of 0.558, and for ISCTR, the gain ratio feature selection with logistic regression achieved an accuracy of 0.581. It was found that using BIST100 stock features boosts classification results for all stocks in terms of accuracy.