A Classification of Single Influential Observation Statistics in Regression Analysis

A Classification of Single Influential Observation Statistics in Regression Analysis

The results of least squares fit of the general linear model Y=Xβ+ε 1 to given data set can be substantially influenced by omission or addition one or few observations. Therefore, the least squares method does not ensure that the regression model proposed is fully acceptable from the statistical and physical points of view. Usually one of the main problems is that all observations have not an equal influence in least squares fit and in the conclusions that result from such analysis. The detection, assessment, and understanding of influential observations are the major areas of interest in regression model building. It is important for a data analyst to be able to identify influential observations, assess and understanding their effects on various aspects of the analysis. They are rapidly gaining recognition and acceptance by practitioners as supplements to the traditional analysis of residuals. Residuals play an important role in regression diagnostics; no analysis is complete without a thorough examination of the residuals. The standard analysis of regression results is based on certain assumptions for more information we refer to [1], [2] . It is necessary to check the validity of these assumptions before drawing conclusions from an analysis [3].

___

  • SHAPIRO, S.S., WILK, M.B. (1965) An Analysis of Variance Test for Normality. Biometrika, 52, 591-611.
  • WONNACOTT, T. H., WONNACOTT, R. J. (1981) Regression: A second Course in Statistics. Wiley. New York.
  • DRAPER, N. R., SMITH, H. (1981) Applied Regression Analysis. John Willey and Sons, Inc. New York.
  • BELSLEY, D.A., KUH,E., WELSCH,R.E. (1980) Regression Diagnostics: Identifying Influential Data and Sources of Colinearity. Wiley. New York.
  • CATTERJEE, S., HADI, A.S. (1986) Influential Observation, High Leverage Points, and Outliers in Linear Regression. Statistical Science, 1, 379-393.
  • CATTERJEE, S., HADI, A.S. (1988) Sensitivity Analysis in Linear Regression. John Willey and Sons Inc. Canada.
  • MELOUN, M., MILITKY, J. (2001) Detection of Influential Points in OLS Regression Model Building. Analytica Chimica Acta, 439, 169-191.
  • ATKINSON, A.C. (1981) Two Graphical Displays for Outlying and Influential Observations in Regression. Biometrika, 13-20.
  • MYERS, R.H. (1990) Classical and Modern Regression with Applications.Duxbury. New York.
  • HOAGLIN, D.C., WELSCH, R.E. (1978) The Hat Matrix in Regression and ANOVA. The American Statistician, 32, 17-22.
  • VELLEMANN, P.F., WELSCH, R.E. (1981) Efficient Computing of Regression Diagnostics. The American Statistician, 35, 234-242.
  • FARBER, O., KADMON, R. (2003) Assessment of Alternative Approaches for Bioclimatic Modeling with Special Emphasis on the Mahalanobis Distance. Ecological Modelling, 160, 115-130
  • FERNANDEZ Pierna, J. A., WAHL, F., de Noord, O. E., MASSART, D. L. (2002) Methods for Outlier Detection in Prediction. Chemometrics and Intelligent Laboratory Systems, 63, 27-39.
  • COOK R.D. (1977) Detecting of Influential Observations in Linear Regression Technometrics, 19, 15-18.
  • MсDONALD, B. (2002) A teaching Note on Cook’s Distance-A Guideline. Res.Lett. Inf. Math. Sci., 3, 127-128.
  • MONTGOMERY D.C., PECK, E.A. (1992) Introduction to Linear Regression Analysis. John Willey and Sons, Inc. Canada.
  • COOK, R.D., WEISBERG, S. (1982) Residuals and Influence in Regression. Chapmann and Hall. London.
  • WELSCH, R.E. (1982) Influence Functions and Regression Diagnostics in Modern Data Analysis (R.L. Launer and A.F. Siegel, eds.). Academic Press. New York.
  • HOAGLIN, D.C. KEMPTHORNE, P.J. (1986) Comment. Statistical Science, 1, (3), 408-412.
  • COOK, R.D. (1986) Comment. Statistical Science, 1, No.3: 393-397.