Robust variable selection for mixture linear regression models

In this paper, we propose a robust variable selection to estimate and select relevant covariates for the finite mixture of linear regression models by assuming that the error terms follow a Laplace distribution to the data after trimming the high leverage points. We introduce a revised Expectation-maximization (EM) algorithm for numerical computation. Simulation studies indicate that the proposed method is robust to both the high leverage points and outliers in the y-direction, and can obtain a consistent variable selection in the case of outliers or heavy-tail error distribution. Finally, we apply the proposed methodology to analyze a real data.

___

  • Bai, X., Yao, W., and Boyer, J. E. Robust fitting of mixture regression models, Comput. Stat. Data. An. 56 (7), 2347-2359, 2012.
  • Croux, C., and Haesbroeck, G. Influence function and efficiency of the mini- mum covariance determinant scatter matrix estimator, J. Multivariate Anal. 71 (2), 161-190, 1999.
  • Davies, P. Asymptotic Behaviour of S-Estimates of Multivariate Location Param- eters and Dispersion Matrices, Ann. Statist. 15 (3), 1269-1292, 1987.
  • Donoho, D. L. Breakdown properties of multivariate location estimators„ Technical report, Technical report, Harvard University, Boston. URL http://www-stat. stanford. edu/ donoho/Reports/Oldies/BPMLE. pdf, 1982.
  • Du, Y., Khalili, A., Neslehova, J. G., and Steele, R. J. Simultaneous fixed and random effects selection in finite mixture of linear mixed-effects models, Can. J. Stat. 41 (4), 596-616, 2013.
  • Fan, J., and Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc. 96 (456), 1348-1360, 2001.
  • Huang, M., Li, R., Wang, H., and Yao, W. Estimating Mixture of Gaussian Pro- cesses by Kernel Smoothing, J. Bus. Econ. Stat. 32 (2), 259-270, 2014.
  • Huang, M., Li, R., and Wang, S. Nonparametric mixture of regression models, J. Amer. Statist. Assoc. 108 (503), 929-941, 2013.
  • Huang, M., and Yao, W. Mixture of regression models with varying mixing pro- portions: a semiparametric approach, J. Amer. Statist. Assoc. 107 (498), 711-724, 2012.
  • Hunter, D. R., and Li, R. Variable selection using MM algorithms, Ann. Statist. 33 (4), 1617-1642, 2005.
  • Hunter, D. R., and Young, D. S. Semiparametric mixtures of regressions, J. Nonparametr. Stat. 24 (1), 19-38, 2012.
  • Jacobs, R. A., Jordan, M. I., Nowlan, S. J., and Hinton, G. E. Adaptive mixtures of local experts, Neural. Comput. 3 (1), 79-87, 1991.
  • Jiang, W., Tanner, M. A. et al. Hierarchical mixtures-of-experts for exponential fam- ily regression models: approximation and maximum likelihood estimation, Ann. Statist. 27 (3), 987-1011, 1997.
  • Khalili, A. An Overview of the New Feature Selection Methods in Finite Mixture of Regression Models, J. Iran. Stat. Soc. 10 (2), 201-235, 2011.
  • Khalili, A., and Chen, J. Variable Selection in Finite Mixture of Regression Models, J. Amer. Statist. Assoc. 102 (479), 1025-1038, 2007.
  • Khalili, A., and Lin, S. Regularization in finite mixture of regression models with diverging number of parameters, Biometrics 69 (2), 436-446, 2013.
  • Luo, R., Wang, H., and Tsai, C.-L.On Mixture Regression Shrinkage and Selection Via the MR-Lasso, Int. J. Pure. Ap. Mat. 46, 403-414, 2008.
  • Markatou, M. Mixture models, robustness, and the weighted likelihood methodol- ogy, Biometrics 56 (2), 483-486, 2000.
  • Maronna, R. A. et al. Robust M-Estimators of Multivariate Location and Scatter, Ann. Statist. 4 (1), 51-67, 1976.
  • McLachlan, G., and Peel, D. Finite mixture models(John Wiley & Sons, 2004).
  • Neykov, N., Filzmoser, P., Dimova, R., and Neytchev, P. Robust fitting of mix- tures using the trimmed likelihood estimator, Comput. Stat. Data. An. 52 (1), 299-308, 2007.
  • Rousseeuw, P. Multivariate estimation with high breakdown point, status: published, 1985.
  • Rousseeuw, P. J., and Driessen, K. V. A fast algorithm for the minimum covariance determinant estimator, Technometrics 41 (3), 212-223, 1990.
  • Shen, H.-b., Yang, J., and Wang, S.-t. Outlier detecting in fuzzy switching regres- sion models, in Artitificial Intelligence: Methodology, Systems, and Applications Springer, 208- 215, 2004.
  • Skrondal, A., and Rabe-Hesketh, S. Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models (CRC Press, 2004).
  • Song, W., Yao, W., and Xing, Y. Robust mixture regression model fitting by Laplace distribution, Comput. Stat. Data. An. 71, 128-137, 2014.
  • Stahel, W. Robust estimation: Infinitesimal optimality and covariance matrix esti- mators, Unpublished doctoral dissertation, ETH, Zurich, Switzerland, 1981.
  • Wang, P., Puterman, M. L., Cockburn, I., and Le, N. Mixed Poisson regression models with covariate dependent rates, Biometrics 52 (2), 381-400, 1996.
  • Wedel, M. Market segmentation: Conceptual and methodological foundations(Springer, 2000).
  • Wei, Y. Robust mixture regression models using t-distribution, Master’s thesis, 2012.