Robust variable selection in the logistic regression model

In this paper, we proposed an adaptive robust variable selection procedure for the logistic regression model. The proposed method is robust to outliers and considers the goodness-of-fit of the regression model. Furthermore, we apply an MM algorithm to solve the proposed optimization problem. Monte Carlo studies are evaluated the finite-sample performance of the proposed method. The results show that when there are outliers in the dataset or the distribution of covariate variable deviates from the normal distribution, the finite-sample performance of the proposed method is better than that of other existing methods.Finally, the proposed methodology is applied to the data analysis of Parkinson's disease.

___

  • [1] A. Bergesio and V.J. Yohai, Projection estimators for generalized linear models, J. Amer. Statist. Assoc. 106 (494), 661-671, 2011.
  • [2] A.M. Bianco and V.J. Yohai, Robust Estimation in the Logistic Regression Model, Robust Statistics, Data analysis, and Computer Intensive methods, Springer, 1996.
  • [3] Z. Bursac, C.H. Gaussh, D.K. Williams and D.W. Hosmer, Purposeful selection of variables in logistic regression, Source Code Biol. Med. 3 (1), 1-8, 2008.
  • [4] M.H. Chen, J.G. Ibrahim and C. Yiannoutsos, Prior elicitation, variable selection and Bayesian computation for logistic regression models, J. R. Stat. Soc. Ser. B. Stat. Methodol. 61 (1), 223-242, 1999.
  • [5] P. Čížek, Trimmed likelihood-based estimation in binary regression models, Austrian J. Stat. 35 (2&3), 223-232, 2006.
  • [6] P. Číźek, Robust and efficient adaptive estimation of binary-choice regression models, J. Amer. Statist. Assoc. 103 (482), 687-696, 2008.
  • [7] C. Croux, C. Flandre and G. Haesbroeck, The breakdown behavior of the maximum likelihood estimator in the logistic regression model, Statist. Probab. Lett. 60 (4), 377-386, 2002.
  • [8] L. Davies, The asymptotics of Rousseeuw’s minimum volume ellipsoid estimator, Ann. Statist. 20 (4), 1828-1843, 1992.
  • [9] D. Gervini, Robust adaptive estimators for binary regression models, J. Statist. Plann. Inference 131 (2), 297-311, 2005.
  • [10] D. Gervini and V.J. Yohai, A class of robust and fully efficient regression estimators, Ann. Statist. 30 (2), 583-616, 2002.
  • [11] M. Guns and V. Vanacker, Logistic regression applied to natural hazards: Rare event logistic regression with replications, Nat. Hazard Earth Sys. 12 (6), 1937-1947, 2012.
  • [12] Y. Güney, Y. Tuac, S. Özdemir and O. Arslan, Robust estimation and variable selection in heteroscedastic regression model using least favorable distribution, Comput. Statist. 36 (2), 805-827, 2021.
  • [13] D.R. Hunter and K. Lange, Quantile regression via an MM algorithm, J. Comput. Graph. Statist. 9 (1), 60-77, 2000.
  • [14] D.R. Hunter and K. Lange, A tutorial on MM algorithms, Amer. Statist. 58 (1), 30-37, 2004.
  • [15] Y. Jiang, Y.G. Wang, L.Y. Fu and X. Wang, Robust estimation using modified Huber’s functions with new tails, Technometrics 61 (1), 111-122, 2019.
  • [16] R.J. Karunamuni, L.L. Kong and W. Tu, Efficient robust doubly adaptive regularized regression with applications, Stat. Methods Med. Res. 28 (7), 2210-2226, 2019.
  • [17] R.L. Kennedy, A.M. Burton, H.S. Fraser, L.N. McStay and R.F. Harrison, Early diagnosis of acute myocardial infarction using clinical and electrocardiographic data at presentation: Derivation and evaluation of logistic regression models, Eur. Heart J. 17 (8), 1181-1191, 1996.
  • [18] S.K. Kinney and D.B. Dunson, Fixed and random effects selection in linear and logistic models, Biometrics 63 (3), 690-698, 2007.
  • [19] Y. Li and J.S. Liu, Robust variable and interaction selection for logistic regression and general index models, J. Amer. Statist. Assoc. 114 (525), 271-286, 2019.
  • [20] M.A. Little, P.E. McSharry, S.J. Roberts, D.A. Costello and I.M. Moroz, Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection, Biomed. Eng. Online 6 (1), 23, 2007.
  • [21] R.A. Maronna, Robust ridge regression for high-dimensional data, Technometrics 53 (1), 44-53, 2011.
  • [22] L. Meier, S.A. van de Geer and P. Bühlmann, The group lasso for logistic regression, J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 (1), 53-71, 2008.
  • [23] L. Ohno-Machado, Modeling medical prognosis: Survival analysis techniques, J. Biomed. Inform. 34 (6), 428-439, 2001.
  • [24] H. Park and S. Konishi, Robust logistic regression modelling via the elastic net-type regularization and tuning parameter selection, J. Stat. Comput. Simul. 86 (7), 1450- 1461, 2016.
  • [25] D. Pregibon, Logistic regression diagnostics, Ann. Statist. 9 (4), 705-724, 1981.
  • [26] P.J. Rousseeuw, Multivariate estimation with high breakdown point, in: W. Grossmann, G. Pflug, I. Vincze, and W. Wertz (ed.) Mathematical Statistics and Applications, Reidel, 1985.
  • [27] P.J. Rousseeuw and B.C. van Zomeren, Unmasking multivariate outliers and leverage points, J. Amer. Statist. Assoc. 85 (411), 633-639, 1990.
  • [28] L.A. Stefanski, R.J. Carroll and D. Ruppert, Optimally hounded score functions for generalized linear models with applications to logistic regression, Biometrika 73 (2), 413-424, 1986.
  • [29] S. Vinterbo and L. Ohno-Machado, A genetic algorithm to select variables in logistic regression: example in the domain of myocardial infarction, in: Proceedings of the AMIA Symposium, Washington, 984-988, 1999.
  • [30] S. Wang, X.Q. Jiang, Y. Wu, L.J. Cui, S. Cheng and L. Ohno-Machado, Expectation Propagation Logistic Regression (EXPLORER): Distributed privacy-preserving online model learning, J. Biomed. Inform. 46 (3), 480-496, 2013.
  • [31] X. Wang, Y. Jiang, M. Huang and H. Zhang, Robust variable selection with exponential squared loss, J. Amer. Statist. Assoc. 108 (502), 632-643, 2013.
  • [32] F. Xue and A. Qu, Variable selection for highly correlated predictors, arXiv: 1709.04840 [stat.ME].
  • [33] D. Zellner, F. Keller and G.E. Zellner, Variable selection in logistic regression models, Comm. Statist. Simulation Comput. 33 (3), 787-805, 2004.
  • [34] C.X. Zhang, S. Xu and J.S. Zhang, A novel variational Bayesian method for variable selection in logistic regression models, Comput. Statist. Data Anal. 133 (7), 1-19, 2019.