Bayesian and frequentist approaches on estimation and testing for a zero-inflated binomial distribution

To analyze discrete count data with excessive zeros, different zero-inflated statistical models that allow for frequent zero-valued observations have been developed. When the underlying data generation process of non-zero values is based on the number of successes in a sequence of independent Bernoulli trials, the zero-inflated binomial distribution is perhaps adequate for modeling purposes. In this paper, we discuss statistical inference for a zero-inflated binomial distribution using the objective Bayesian and frequentist approaches. Point and interval estimation of the model parameters and hypothesis testing for excessive zeros in a zero-inflated binomial distribution are developed. A Monte Carlo simulation study is used to assess the performance of estimation and hypothesis testing procedures. A comparative study of the objective Bayesian approach and the frequentist approach is provided. The proposed statistical inferential methods are applied to analyze an earthquake dataset and a baseball dataset for illustration.

___

  • [1] J. Albert and P. Williamson, Using model/data simulations to detect streakiness, Amer. Statist. 55 (1), 41-50, 2001.
  • [2] N. Amek, N. Bayoh, M. Hamel, K.A. Lindblade, J. Gimnig, K.F. Laserson, L. Slutsker, T. Smith and P. Vounatsou, Spatio-temporal modeling of sparse geostatistical malaria sporozoite rate data using a zero inflated binomial model, Spat Spatiotemporal Epidemiol 2 (4), 283-290, 2011.
  • [3] C.C. Astuti and A.D. Mulyanto, Estimation parameters and modelling zero inflated negative binomial, Cauchy: Jurnal Matematika Murni dan Aplikasi 4 (3), 115-119, 2016.
  • [4] M.J. Bayarri, J.O. Berger and G.S. Datta, Objective Bayes testing of Poisson versus inflated Poisson models, IMS Collections 3, 105-121, 2008.
  • [5] J.O. Berger and L.R. Pericchi, The intrinsic Bayes factor for model selection and prediction, J. Amer. Statist. Assoc. 91 (433), 109-122, 1996.
  • [6] W. Bodromurti, K.A. Notodiputro and A. Kurnia, Zero inflated binomial model for infant mortality data in Indonesia, Int. J. Appl. Eng. Res. 13, 3139-3143, 2018.
  • [7] G. Claeskens, R. Nguti and P. Janssen, One-sided tests in shared frailty models, Test 17 (1), 69-82, 2008.
  • [8] A C. Cohen, Estimation in mixtures of discrete distributions, Statistical Pub, 1963.
  • [9] F. De Santis and S. Gubbiotti, Sample size requirements for calibrated approximate credible intervals for proportions in clincal trials, Int. J. Environ. Res. Public Health 18 (2) 1-11, 2021.
  • [10] D. Deng and S.R. Paul, Score tests for zero inflation in generalized linear models, Canad. J. Statist. 28 (3), 563-570, 2000.
  • [11] A. Diallo, A. Diop and J.F. Dupuy, Estimation in zero-inflated binomial regression with missing covariates, Statistics 53 (5), 839-865, 2019.
  • [12] C. Dong, D.B. Clarke, X. Yan, A. Khattak and B. Huang, Multivariate random- parameters zero-inflated negative binomial regression model: an application to estimate crash frequencies at intersections, Accid Anal Prev 70, 320-329, 2014.
  • [13] C. Huang, X. Liu, T. Yao and X. Wang, An efficient EM algorithm for the mixture of negative binomial models, J. Phys. Conf 1324 (1), 012093, 2019.
  • [14] S. Jiang, G. Xiao, A.Y. Koh, J. Kim, Q. Li and X. Zhan, A Bayesian zero-inflated negative binomial regression model for the integrative analysis of microbiome data, Biostatistics 22 (3), 522-540, 2021.
  • [15] N.L. Johnson and S. Kotz, Distributions in statistics: discrete distributions, John Wiley & Sons, 1969.
  • [16] R. Kass and A.E. Raftery, Bayes Factors, J. Amer. Statist. Assoc. 90 (430), 773-795, 1995.
  • [17] R. Kass and S. Vaidyanathan, Approximate Bayes factors and orthogonal parameters with application to testing equality of two binomial proportions, J. R. Stat. Soc. Ser. B. Stat. Methodol. 54 (1), 129-144, 1992.
  • [18] S.W. Kim, S. Shahin, H.K.T. Ng and J. Kim, Binary segmentation procedures using the bivariate binomial distribution for detecting streakiness in sports data, Comput. Statist., 36 (3), 1821-1843, 2021.
  • [19] Q. Li, M. Zhang, Y. Xie and G. Xiao, Bayesian modeling of spatial molecular profiling data via Gaussian process. Bioinformatics 37 (22), 4129-4136, 2021.
  • [20] Z. Li, K. Lee, M. Karagas, J. Madan, A. Hoen, A. O’Malley, and H. Li, Conditional regression based on a multivariate zero-inflated logistic-normal model for microbiome relative abundance data, Stat. Biosci. 10 (3), 587-608, 2018.
  • [21] T. Loyes, B. Moerkerke, O.D. Smet and A. Buysse, The analysis of zero-inflated count data: beyond zero-inflated Poisson regression, Br. J. Math. Stat. Psychol. 65 (1), 163-180, 2011.
  • [22] B. Quost and T. Denoeux, Clustering and classification of fuzzy data using the fuzzy EM algorithm, Fuzzy Sets and Systems 286, 134-156, 2016.
  • [23] M. Ridout, J. Hinde and C.G.B. Demetrio, A score test for testing a zero-inflated Poisson regression model against zero-inflated negative binomial alternatives, Biometrics 57 (1), 219-223, 2001.
  • [24] L. Sahabo and S. Yi, Normally approximated Bayesian credible interval of binomial proportion, J Korean Stat Soc 30 (1), 233-244, 2019.
  • [25] S. Self and K. Liang, Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions, J. Amer. Statist. Assoc. 82 (398), 605-610, 1987.
  • [26] F. Tang and J.E. Cavanaugh, State-space models for binomial time series with excess zeros, J Time Ser Anal. 9, 128-151, 2017.
  • [27] T. W. Yee, VGAM: Vector generalized linear and additive models, R package version 1.1-5, 2021.
  • [28] X. Zhang, H. Mallick, Z. Tang, L. Zhang, X. Cui, A. Benson and N. Yi, Negative binomial mixed models for analyzing microbiome count data, BMC Bioinform. 18 (4), 1-10, 2017.
  • [29] M. Zulkifli, I. Noriszura and A.M. Razali, Zero-inflated Poisson versus zero-inflated negative binomial: application to theft insurance data, The 7th IMT-GT International Conference on Mathematics, Statistics and its Applications, 2011.