Aslı YAMAN, Mehmet Ali CENGİZ

Lojistik Regresyon Modellerinde Küçük Veri Setleri İçin LASSO Tahmincisi

Değişken seçimi, regresyon analizinde kullanılan önemli bir konudur. Regresyon analizinde, LASSO (En Küçük Mutlak Daralma ve Seçim Operatörü) değişken seçimine benzer olarak seyrek çözümler sunmaktadır. LASSO, daraltma ve değişken seçimi işlemlerini aynı anda yapabilen kullanışlı bir araçtır ve LASSO ceza kriteri, parametre tahminlerini tam olarak sıfır değerine indirebilir. Genellikle büyük veri kümelerinde kullanılır fakat bu çalışmada, özellikle küçük veri setlerinde bazı bilgi kriterlerini kullanarak çok değişkenli Bernoulli lojistik modelleri için değişken seçim problemi ele alınmıştır. Model seçiminde kullanılan dört farklı bilgi kritere göre elde edilen simülasyon sonuçları karşılaştırılmıştır.

Anahtar Kelimeler:

LASSO, Bernoulli dağılımı, Lojistik regresyon, Değişken seçimi

LASSO Estimator in Logistic Regression for Small Data Sets

Variable selection is an important subject in regression analysis. In regression analysis, the LASSO (Least Absolute Shrinkage and Selection Operator) provides sparse solutions to lead to variable selection. LASSO is a useful tool to achieve the shrinkage and variable selection simultaneously and the LASSO penalty term can shrink the parameter estimates toward exactly to zero. It is used generally in large data sets but in this article, we consider the variable selection problem for the multivariate Bernoulli logistic models adopting some information criteria especially in small data sets. Results of simulation were compared according to the four different criteria used for model selection.

Keywords:

LASSO, Bernoulli distribution, Logistic regression, Feature selection,

PDF

___

Tibshirani R. “Regression shrinkage and selection via the lasso”. Journal of the Royal Statistical Society. Series B (Methodological), 267-288, 1996.
Tibshirani R. “Regression shrinkage and selection via the lasso: a retrospective”. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73 (3), 273-282, 2011.
Donoho DL, Johnstone JM. “Ideal spatial adaptation by wavelet shrinkage”. Biometrika, 81 (3), 425-455, 1994.
Wu TT, Lange K. “Coordinate descent algorithms for lasso penalized regression”. The Annals of Applied Statistics, 224-244, 2008.
Efron B, Hastie T, Johnstone I, Tibshirani, R. “Least angle regression". The Annals of statistics, 32 (2), 407-499, 2004.
Friedman J, Hastie T, Höfling H, Tibshirani R. “Pathwise coordinate optimization”. The Annals of Applied Statistics, 1 (2), 302-332, 2007.
Dai B. MVB: Multivariate Bernoulli log-linear model. R package version, 1, 2013.
Dai B. Multivariate Bernoulli distribution models. Technical Report, Department of Statistics, University of Wisconsin, Madison, WI 53706, 2012.
Akaike H. Information theory and an extension of the maximum likelihood principle. Proc. 2nd Inter. Symposium on Information Theory, 267- 281, Budapest, 1973.
Schwarz G. “Estimating the dimension of a model”. The Annals of Statistics, 6 (2), 461- 464, 1978.
Xiang D, Wahba G. “A generalized approximate cross validation for smoothing splines with non-Gaussian data”. Statistica Sinica, 675-692, 1996.