A GENETIC ALGORITHM APPROACH FOR THE BEST SUBSET SELECTION IN LINEAR REGRESSION

When a data set including many explanatory variables and a response variable is given, the choice of best model which predicts the response variable is known as "variable selection" or "the selection of the best subset model". Many methods for variable selection have been suggested. Unfortunately, when the correlation between explanatory variables is high, currently used methods are mostly unsuccesful. Also, as the number of possible subsets grows exponentially when the number of explanatory variables increase, all possible subset methods have difficulty handling large dimensional data sets. In this study, a new stochastic optimization method based on Genetic Algorithm (GA) is proposed for variable selection in linear regression. The performance of the method proposed and that of classical variable selection methods are compared by using data sets commonly given in literature.  Key Words: Linear regression, variable selection, stochastic optimization, genetic algorithm.