Ferhan ELMALI, Cengiz BAL, Canan BAYDEMİR, Kazım ÖZDAMAR, Ertuğrul ÇOLAK, Hayati DEMİRASLAN

Çapraz tablolarda asimptotik, exact ve monte carlo yöntemleriyle elde edilen önemlilik değerlerinin karşılaştırılması

Bu çalışmada, çapraz tablolarda veri setindeki birim sayısı ve tablo boyutu arttırıldığında ki-kare test istatistiğine bağlı olarak asimptotik, exact ve Monte Carlo (MC) yöntemleri ile elde edilen p değerlerinin karşılaştırılması amaçlanmıştır. Simülasyon çalışması 2x2, 3x3, 4x4, 5x5 ve 10x10 düzeni çapraz tablolar için yapılmıştır. 2x2 düzeni çapraz tablolarda birim sayıları N=500, 5000 ve 50000 alınarak veriler türetilmiştir. Diğer tablolarda birim sayılarına göre değil, beşten küçük beklenen değerlerin yer aldığı hücrelerin oranına göre veriler türetilmiştir. 2x2 tablolar için asimptotik ve exact; 3x3 tablolar için asimptotik, exact, MC; diğer tablolar için asimptotik ve MC yöntemleri karşılaştırılmıştır. MC yönteminde örneklem sayıları M=10000, 100000 ve 250000 olarak belirlenmiştir. Yöntemlerin karşılaştırmasında eşleştirilmiş t testi ile iki yönlü varyans analizi kullanılmıştır. 2x2 düzeni çapraz tablolarda asimptotik ile exact yöntemden elde edilen p değerleri arasında fark bulunmuştur. 3x3 düzeni çapraz tablolarda exact ve MC yöntemleri ile elde edilen p değerleri arasında fark bulunmamışken, asimptotik yöntemden elde edilen değerler diğer yöntemlerden farklı bulunmuştur. 4x4, 5x5 ve 10x10 düzeni çapraz tablolarda MC değerleri arasında fark bulunmamış, asimptotik değer MC değerlerinden farklı bulunmuştur. İki kategorik değişken karşılaştırılırken çapraz tablolarda hesaplanabildiği takdirde ki-kare testinin exact yönteminin, hesaplanamadığı takdirde en az 10000 örneklemli MC yönteminin kullanılması önerilmektedir.

Comparison of significance values obtained from asymptotic, exact and monte carlo methods in cross tables

In this study, it was aimed to compare the p values obtained from asymptotic, exact and Monte Carlo (MC) methods based on the Chi-square test statistics, when the sample size and dimension of the table increase in a dataset. Simulation study was made for 2x2, 3x3, 4x4, 5x5 and 10x10 design cross tables. For the 2x2 design cross tables, sample size were simulated as N=500, 5000 and 50000. Data were simulated for the cell proportions where expected values less than five exist, instead of sample size in other datasets. For 2x2 tables asymptotic and exact, for 3x3 tables asymptotic and exact, for other tables asymptotic and MC methods were compared. In MC method, sample size were defined as M=10000, 100000 and 250000. Paired t test and two way analysis of variance were used for method comparisons. For 2x2 design cross tables, a difference is found between the p values obtained from asymptotic and exact methods. For 3x3 design cross tables, the values obtained from asymptotic method were found different than other methods, while no difference is found between p values obtained from exact and MC method. No difference was found among MC values and asymptotic value was found different than MC values for 4x4, 5x5 and 10x10 design cross tables. While comparing two categorical variables in cross tables, exact method of chi-squares test if it is possible to compute, otherwise MC method with at least 10000 sample size is suggested to be used.

PDF

___

1.Özdamar K. Paket Programlar ile İstatistiksel Veri Analizi-1 (7.baskı), Eskişehir 2004; ss 402-411.
2.Roderick JA. Testing the equality of two independent binomial proportions. The American Statistician 1989; 43: 283-288.
3.Senchaudhuri P, Mehta CR, Patel NR. Estimating exact p values by the method of control variates or Monte Carlo rescue. Journal of the American Statistical Association 1995; 90:640-648.
4.Freeman GH, Halton JH. Note and on an exact treatment of contingency, goodness of fit and other problems of significance. Biometrika 1951; 38: 141-149.
5.Bolviken E, Skovlund E. Confidence intervals from Monte Carlo tests. Journal of the American Statistical Association 1996; 91: 1071-1078.
6.Pearson K. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine Series 5th 1900; 50: 157175.
7.Mehta CR, Patel NR. A hybrid algorithm for Fishers exact test in unordered rxc contingency tables. Commun Statist-theor meth 1986; 15:387-403.
8.Patefield WM. An efficient method of generating rxc tables with given row and column totals. Applied Statistics 1981; 30: 91-97.
9.Mundry R, Fischer J. Use of statistical programs for nonparametric tests of small samples often leads to incorrect p values: Examples from animal behaviour. Animal Behaviour 1998; 56: 256-259.
10. Mehta CR, Patel NR. A network algorithm for performing Fishers exact test in rxc conitingency tables. Journal of the American Statistical Association 1983; 78:427-434.
11. Mehta CR, Patel NR. A network algorithm for the exact treatment of the 2xk contingency table. Commun Statist- Simula Computa 1980; B9:649- 664.
12. Wald A, Wolfowitz J. On a test whether two samples are from the same population. The Annals of Mathematical Statistics 1940; 11:147-162.
13. Graubard BI, Korn EL. Choice of column scores for testing independence in ordered 2xk contingency tables. Biometrics 1987; 43 : 471-476.
14. Agresti A, Mehta CR, Patel NR. Exact inference for contingency tables with ordered categories, Journal of the American Statistical Association 1990; 85: 453-458.
15. Agresti A, Wackerl, D, Boyett JM. Exact conditional tests for cross-classifications: Approximations of attained significance levels. Psychometrika 1979; 44: 75-83.
16. Mehta CR, Patel NR. IBM SPSS Exact Tests. Massachusetts 1989; pp 1-31.
17. Demiraslan H, Sevim M, Pala C, et al. Risk factors influencing mortality related to Stenotrophomonas maltophilia infection in hematologyoncology patients. Int J Hematol 2013; 97:414420.
18. Fisher RA. On the interpretation of χ2 from contingency tables, and the calculation of P. Journal of the Royal Statistical Society 1922; 85: 8794.