Genetic Algorithm Based Outlier Detection Using Bayesian Information Criterion in Multiple Regression Models Having Multicollinearity Problems

Multiple linear regression models are widely used applied statistical techniques and they are most useful devices for extracting and understanding the essential features of datasets. However, in multiple linear regression models problems arise when a serious outlier observation or multicollinearity present in the data. In regression however, the situation is somewhat more complex in the sense that some outlying points will have more influence on the regression than others. An important problem with outliers is that they can strongly influence the estimated model, especially when using least squares method. Nevertheless, outlier data are often the special points of interests in many practical situations. Another problem is multicollinearity in multiple linear regression (MLR) models, defined as linear dependencies among the independent variables. The purpose of this study is to define multicollinearity and outlier detection method using a Genetic Algorithm (GA) and Bayesian Information Criterion (BIC) in multiple regression models. Also, GA with BIC is to illustrate the algorithm with real and simulation data for outlier detection in MLR models having multicollinearity problems. Key Words: Bayesian Information Criterion, Genetic Algorithms, Multicollinearity, Multiple Linear Regression, Outlier Detection.
Keywords:

-,

___

  • Acuna, E., Rodriguez, C., “On Detection Of Outliers And Their Effect In Supervised Classification”, http://academic.uprm.edu/~eacuna/vene31.pdf
  • Amidan, B., Ferryman, T., Cooley, S., “Data Outlier Detection Using the Chebyshew Theorem, In: Aerospace”, IEEE Aerospace Conference Proceedings, IEEE, Piscataway NJ USA, 3814- 3819 (2005).
  • Atkinson, A.C., “Influential Observations, High Leverage Regression”, Statistical Science, 1: 397-402 (1986). Outliers in Linear
  • Barnett, V., Lewis T., “Outliers in Statistical Data 3rd ed.”, John Wiley and Sons, USA (1994).
  • Belsley, D.A., “Conditioning Diagnostics”, Wiley, New York (1991).
  • Barker, M., “A Comparisons of Principal Component Regression and Partial Least Squares Regression”, Multivariate Project (1997).
  • Birkes and Dodge, “Alternative Methods of Regression”, 3th ed., John Wiley & Sons, Canada (1993).
  • Davies, L., Gather, U., “The Identification of Multiple Outliers”, Journal of the American Statistical Association, 88(423): 797-801 (1993).
  • Dempster, A.P., Schatzoff, M., Wermuth, N., “A Simulation Study of Alternatives to Ordinary Least Square”, Association, 72: 77-91 (1977). American Statistical
  • Fox, J., “Applied Regression Analysis, Linear Models and Related Methods”, 3th ed., Sage Publication, USA (1997).
  • Garthwaite, P.H., “An Interpretation of Partial Least Squares”, Journal of the American Statistical Association, 89: 122-127 (1994).
  • Goldberg, D.E., “Genetic Algorithms in Search, Optimization, and Machine Learning”, Addison- Wesley, USA (1989).
  • Hoerl, A.E., Kennard, R.W., “Ridge Regression: Biased Estiamtion to Nonorthogonal Problems”, Technometrics, 12: 56-67 (1970).
  • Hoeting, J., Raftery, A.E., Madigan, D., “A Method for Simultaneous Variable Selection and Outlier Identification in Linear Regression”, Computational Statistics and Data Analysis, 22: 251-270 (1996).
  • Ishibuchi, H., Nakashima, T., Nii, M., “Genetic Algorithm Based Instance and Feature Selection”, In: Liu H, Motoda H, Instance Selection and Construction for Data Mining, Kluwer Academic, (2001).
  • Kullback, S., “Information Theory and Statistics”, Dover Publications, USA (1996).
  • Neter, J., Wasserman, W., Kutner, M.H., “Applied Linear Regression Models”, 2nd ed., Irwin Homewood IL (1989).
  • McDonald, G.C., Galarneau, D.I., “A Monte Carlo Evaluation of Some Ridge Ttype Estimators”, J. American Statistics, 70: 407-416 (1975).
  • Mason, R.L., Gunst, R.F., Webster, J.T., “Regression Multicollinearity”, Communication in Statistics, 4(3): 277-292 (1975). and Problem of
  • Montgomery, D.C., Peck, E.A., “Introduction to Linear Regression Analysis”, 2nd ed., John Wiley, New York (1992).
  • Papadimitriou, S., Kitawaga, H., Gibbons, P.G., and Faloutsos, C., "LOCI: Fast Outlier Detection Using the Local Correlation Integral", Intel research Laboratory Technical report no. IRP- TR-02-09 (2002).
  • Rousseeuw, P., Leory, A., “Robust Regression and Outlier Detection”, Wiley Series in Probability and Statistics (1987).
  • Schwarz, G., “Estimating the Dimension of a Model”, The Annals of Statistics, 6(2): 461-464 (1978).
  • Tolvi, J., “Genetic Algorithms for Outlier Detection and Variable Selection in Linear Regression Models”, Soft Computing, Springer, 527-533 (2004).
  • Wold, H., “Estimation of Principal Components and Related Models by Iterative Least Squares. In Multivariate Analysis”, Ed. Krishnaiah, P.R., New York: Academic Pres, 391-420 (1966).