Confirmation, Correction and Improvement for Outlier Validation using Dummy Variables

Confirmation, Correction and Improvement for Outlier Validation using Dummy Variables

Dummy variables can be used to detect, validate and measure the impact of outliers in data. This paper uses a model to evaluate the effectiveness of dummy variables in detecting outliers. While generally confirming some findings in the literature, the model refutes the presumption that the t˗statistic or the F-incremental statistic is enough to validate an observation as an outlier. In order to rectify this fallacy, this paper recommends an easily-calculable robust standardized residual statistic that is more compatible with the definition of outliers. The robust standardized residual statistic suggested herein is still used in many robust regression methods and is more effective than the t-statistic or the F-incremental statistic in validating outliers with dummy variables. The results of this study suggest some practical recommendations for dealing with outliers and improvements in maintaining the integrity of data. We recommend all previous studies using this statistics be revised in light of the findings presented in this paper.

___

  • Billingsley, P., D.A. Belsley, E. Kuh and R.E. Welsch (1980). Regression diagnostics: identifying influential data and sources of collinearity. Wiley Series in Probability and Mathematical Statistics. New York-Chichester-Brisbane: Wiley. MR0576408
  • Cook, R.D. (1985). Detection of influential observation in linear regression. Technometrics, 19, 15–18. MR0436478
  • Frees, E.W. (2004). Longitudinal and Panel Data: Analysis and Applications in the Social Sciences. Cambridge: Cambridge University Press. MR2087947
  • Greene, W. H. (2002). Econometric Analysis. Fifth edition, NJ: Prentice Hall.
  • Rousseeuw, P.J. and A.M. Leroy (1987). Robust regression and outlier detection. New York: John Wiley & Sons, Inc. MR0914792
  • Rousseeuw, P.J. and S. Van Aelst (1999). Positive-breakdown robust methods in computer vision. Computing Science and Statistics, 31, 451-460.
  • Studenmund, A.H. (2002). Using Econometrics: A Practical Guide. London: Addison Wesley.
  • Timm, N.H. (2002). Applied Multivariate Analysis. New York: Springer-Verlag. MR1908225
  • Zaman, A., P.J. Rousseeuw and M. Orhan (2001). Econometric applications of high- breakdown robust regression techniques. Economics Letters, 71 (1), 1-8.