Robust variable selection in the logistic regression model
In this paper, we proposed an adaptive robust variable selection procedure for the logistic regression model. The proposed method is robust to outliers and considers the goodness-of-fit of the regression model. Furthermore, we apply an MM algorithm to solve the proposed optimization problem. Monte Carlo studies are evaluated the finite-sample performance of the proposed method. The results show that when there are outliers in the dataset or the distribution of covariate variable deviates from the normal distribution, the finite-sample performance of the proposed method is better than that of other existing methods.Finally, the proposed methodology is applied to the data analysis of Parkinson's disease.
___
- [1] A. Bergesio and V.J. Yohai, Projection estimators for generalized linear models, J.
Amer. Statist. Assoc. 106 (494), 661-671, 2011.
- [2] A.M. Bianco and V.J. Yohai, Robust Estimation in the Logistic Regression Model,
Robust Statistics, Data analysis, and Computer Intensive methods, Springer, 1996.
- [3] Z. Bursac, C.H. Gaussh, D.K. Williams and D.W. Hosmer, Purposeful selection of
variables in logistic regression, Source Code Biol. Med. 3 (1), 1-8, 2008.
- [4] M.H. Chen, J.G. Ibrahim and C. Yiannoutsos, Prior elicitation, variable selection
and Bayesian computation for logistic regression models, J. R. Stat. Soc. Ser. B. Stat.
Methodol. 61 (1), 223-242, 1999.
- [5] P. Čížek, Trimmed likelihood-based estimation in binary regression models, Austrian
J. Stat. 35 (2&3), 223-232, 2006.
- [6] P. Číźek, Robust and efficient adaptive estimation of binary-choice regression models,
J. Amer. Statist. Assoc. 103 (482), 687-696, 2008.
- [7] C. Croux, C. Flandre and G. Haesbroeck, The breakdown behavior of the maximum
likelihood estimator in the logistic regression model, Statist. Probab. Lett. 60 (4),
377-386, 2002.
- [8] L. Davies, The asymptotics of Rousseeuw’s minimum volume ellipsoid estimator, Ann.
Statist. 20 (4), 1828-1843, 1992.
- [9] D. Gervini, Robust adaptive estimators for binary regression models, J. Statist. Plann.
Inference 131 (2), 297-311, 2005.
- [10] D. Gervini and V.J. Yohai, A class of robust and fully efficient regression estimators,
Ann. Statist. 30 (2), 583-616, 2002.
- [11] M. Guns and V. Vanacker, Logistic regression applied to natural hazards: Rare event
logistic regression with replications, Nat. Hazard Earth Sys. 12 (6), 1937-1947, 2012.
- [12] Y. Güney, Y. Tuac, S. Özdemir and O. Arslan, Robust estimation and variable selection
in heteroscedastic regression model using least favorable distribution, Comput.
Statist. 36 (2), 805-827, 2021.
- [13] D.R. Hunter and K. Lange, Quantile regression via an MM algorithm, J. Comput.
Graph. Statist. 9 (1), 60-77, 2000.
- [14] D.R. Hunter and K. Lange, A tutorial on MM algorithms, Amer. Statist. 58 (1),
30-37, 2004.
- [15] Y. Jiang, Y.G. Wang, L.Y. Fu and X. Wang, Robust estimation using modified
Huber’s functions with new tails, Technometrics 61 (1), 111-122, 2019.
- [16] R.J. Karunamuni, L.L. Kong and W. Tu, Efficient robust doubly adaptive regularized
regression with applications, Stat. Methods Med. Res. 28 (7), 2210-2226, 2019.
- [17] R.L. Kennedy, A.M. Burton, H.S. Fraser, L.N. McStay and R.F. Harrison, Early
diagnosis of acute myocardial infarction using clinical and electrocardiographic data
at presentation: Derivation and evaluation of logistic regression models, Eur. Heart
J. 17 (8), 1181-1191, 1996.
- [18] S.K. Kinney and D.B. Dunson, Fixed and random effects selection in linear and logistic
models, Biometrics 63 (3), 690-698, 2007.
- [19] Y. Li and J.S. Liu, Robust variable and interaction selection for logistic regression
and general index models, J. Amer. Statist. Assoc. 114 (525), 271-286, 2019.
- [20] M.A. Little, P.E. McSharry, S.J. Roberts, D.A. Costello and I.M. Moroz, Exploiting
nonlinear recurrence and fractal scaling properties for voice disorder detection,
Biomed. Eng. Online 6 (1), 23, 2007.
- [21] R.A. Maronna, Robust ridge regression for high-dimensional data, Technometrics 53
(1), 44-53, 2011.
- [22] L. Meier, S.A. van de Geer and P. Bühlmann, The group lasso for logistic regression,
J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 (1), 53-71, 2008.
- [23] L. Ohno-Machado, Modeling medical prognosis: Survival analysis techniques, J.
Biomed. Inform. 34 (6), 428-439, 2001.
- [24] H. Park and S. Konishi, Robust logistic regression modelling via the elastic net-type
regularization and tuning parameter selection, J. Stat. Comput. Simul. 86 (7), 1450-
1461, 2016.
- [25] D. Pregibon, Logistic regression diagnostics, Ann. Statist. 9 (4), 705-724, 1981.
- [26] P.J. Rousseeuw, Multivariate estimation with high breakdown point, in: W. Grossmann,
G. Pflug, I. Vincze, and W. Wertz (ed.) Mathematical Statistics and Applications,
Reidel, 1985.
- [27] P.J. Rousseeuw and B.C. van Zomeren, Unmasking multivariate outliers and leverage
points, J. Amer. Statist. Assoc. 85 (411), 633-639, 1990.
- [28] L.A. Stefanski, R.J. Carroll and D. Ruppert, Optimally hounded score functions for
generalized linear models with applications to logistic regression, Biometrika 73 (2),
413-424, 1986.
- [29] S. Vinterbo and L. Ohno-Machado, A genetic algorithm to select variables in logistic
regression: example in the domain of myocardial infarction, in: Proceedings of the
AMIA Symposium, Washington, 984-988, 1999.
- [30] S. Wang, X.Q. Jiang, Y. Wu, L.J. Cui, S. Cheng and L. Ohno-Machado, Expectation
Propagation Logistic Regression (EXPLORER): Distributed privacy-preserving online
model learning, J. Biomed. Inform. 46 (3), 480-496, 2013.
- [31] X. Wang, Y. Jiang, M. Huang and H. Zhang, Robust variable selection with exponential
squared loss, J. Amer. Statist. Assoc. 108 (502), 632-643, 2013.
- [32] F. Xue and A. Qu, Variable selection for highly correlated predictors, arXiv:
1709.04840 [stat.ME].
- [33] D. Zellner, F. Keller and G.E. Zellner, Variable selection in logistic regression models,
Comm. Statist. Simulation Comput. 33 (3), 787-805, 2004.
- [34] C.X. Zhang, S. Xu and J.S. Zhang, A novel variational Bayesian method for variable
selection in logistic regression models, Comput. Statist. Data Anal. 133 (7), 1-19,
2019.