Agreement models for multiraters

Agreement between 2 or more independent raters evaluating the same items and same scale can be measured by kappa coefficient. In recent years, modeling agreement among raters rather than summarizing indices has been preferred. In this study, the disadvantages of kappa are reviewed. Agreement models are introduced and these models are applied to a real data set. Materials and methods: Three pathologists classified each of 118 slides in terms of carcinoma in situ of the uterine cervix, based on the most involved lesions. Using log-linear agreement models, agreement between 3 pathologists according to their evaluations was investigated. Results: Coefficient of kappa was found to be 0.48 among the 3 pathologists, which indicates a moderate agreement. Models were applied to the data. The agreement parameter was estimated for the best model among models. The probability of giving the same decision by the 3 pathologists was 2.5 times higher than that of giving a different decision. Conclusion: Log-linear models can be used to measure the agreement among more than 2 raters. Modeling agreement can provide more information than kappa.

Agreement models for multiraters

Agreement between 2 or more independent raters evaluating the same items and same scale can be measured by kappa coefficient. In recent years, modeling agreement among raters rather than summarizing indices has been preferred. In this study, the disadvantages of kappa are reviewed. Agreement models are introduced and these models are applied to a real data set. Materials and methods: Three pathologists classified each of 118 slides in terms of carcinoma in situ of the uterine cervix, based on the most involved lesions. Using log-linear agreement models, agreement between 3 pathologists according to their evaluations was investigated. Results: Coefficient of kappa was found to be 0.48 among the 3 pathologists, which indicates a moderate agreement. Models were applied to the data. The agreement parameter was estimated for the best model among models. The probability of giving the same decision by the 3 pathologists was 2.5 times higher than that of giving a different decision. Conclusion: Log-linear models can be used to measure the agreement among more than 2 raters. Modeling agreement can provide more information than kappa.

___

  • Cohen JA. Coeffi cient of agreement for nominal scales. Educational and Psychological Measurement 1960; 20: 37-46. 2. Agresti A. Categorical data analysis. New York: Wiley; 2002. 3. Fleiss J, Cohen J, Everitt, BS. Large sample standard errors of kappa and weighted kappa. Psychological Bulletin 1969; 72: 323-7.
  • Fleiss J, Cohen J. Th e equivalence of weighted kappa and intraclass correlation coeffi cient as measure of reliability. Educational and psychological measurement 1973; 33: 613-9. 5. Landis JR, Koch GG. Th e measurement of observer agreement for categorical data. Biometrics 1977; 33: 159-74.
  • Tanner MA, Young MA. Modeling agreement among raters. Journal of American Statistical Association 1985; 80: 175-80.
  • Tanner MA, Young MA. Modeling ordinal scale disagreement. Psychological Bulletin 1985; 98: 408-15.
  • Lawal, B. Categorical data analysis with SAS and SPSS applications. Mahwah, New Jersey, London: Lawrence Erlbaum Associates Publishers; 2003.
  • Agresti A. Loglinear modeling of pairwise interobserver agreement on a categorical scale. Statistics in Medicine 1992; 11: 101-14.
  • Perkins SM., Becker MP. Assessing rater agreement using marginal association models. Statistics in Medicine 2002; 21: 1743-60.
  • Shoukri M. Measures of interobserver agreement. Chapman & Hall/CRC, Boca Raton, FL, USA; 2004.
Turkish Journal of Medical Sciences-Cover
  • ISSN: 1300-0144
  • Yayın Aralığı: Yılda 6 Sayı
  • Yayıncı: TÜBİTAK
Sayıdaki Diğer Makaleler

The effects of local anaesthesia with bupivacaine and botulinum toxin-A after thoracotomy on stress hormone levels

Makbule ERGİN, Ali YEGİNSU, İsmail Kürşat GÜRLEK

The prevalence of asthma, allergic rhinitis, and eczema among middle school students in Tabriz (northwestern Iran)

Leyla SAHEBI, Mahnaz SHABESTARY SADEGHI

An efficient method for stable transfection of mouse myogenic C2C12 cell line using a nonviral transfection approach

Mehmet Deniz AKYÜZ, Burcu HAYTA BALCI, Pervin R. DİNÇER

Comparison of results in two acoustic analysis programs: Praat and MDVP

Mehmet Akif KILIÇ, Mustafa Asım ŞAFAK, Haldun OĞUZ

Subclinical hypoxia of infants with intrauterine growth retardation determined by increased serum S100B protein levels

Cüneyt TAYMAN, Hasan KAFALI, Mustafa Mansur TATLI, Uğur DİLMEN, Nurdan URAS, Ahmet KARADAĞ, Özlem KIRMEMİŞ, Cemile KOCA

Cardiac complications of secondary hyperparathyroidism in chronic hemodialysis patients

Alper AZAK, Ekrem ABAYLI, Mehmet Deniz AYLI, Hülya ÇİÇEKÇİOĞLU, Cüneyt YÜKSEL, Özgül UÇAR, İhsan ERGÜN

Postoperatif atriyal fi brilasyon, infl amasyon ve oksidatif stress

Mehmet ÖZAYDIN

Agreement models for multiraters

Tülay SARAÇBAŞI

A comparison of intravenous general anesthesia and paracervical block for in vitro fertilization: effects on oocytes using the transvaginal technique

Ege Nazan GÖKER TAVMERGEN, Ayşin AKDOĞAN, Semra KARAMAN, Sevil BÜMEN, Vicdan FIRAT, İlkben GÜNÜŞEN

The prenatal diagnosis of familial satellited Yq chromosomes

Mehmet ŞİMŞEK, Sezin YAKUT, Saffet ÖZTÜRK, İbrahim İnanç MENDİLCİOĞLU, Güven LÜLECİ