Machine Learning Approach for Thyroid Cancer Diagnosis Using Clinical Data

Machine Learning Approach for Thyroid Cancer Diagnosis Using Clinical Data

Objective: With an early diagnosis of thyroid cancer, one of the world's most significant health issues, it is feasible to treat the nodules before the spread of malignant thyroid gland cells. It has become crucial to develop models for predicting thyroid cancer. In light of this, the purpose of this study is to develop a clinical decision support model using the Bagged CART model, a machine learning (ML) model for the prediction of thyroid cancer. Methods: Between 2010 and 2012, 724 patients who applied to China Median University Shengjing Hospital comprised the study's data set. The dataset comprises information on nodule malignancies, demographic characteristics, ultrasound characteristics, and blood test results for all patients who underwent thyroidectomy. Using this open-access data set, the Bagged CART modeling technique was applied. Negative predictive value (NPV), specificity (Spe), balanced accuracy (BACC), positive predictive value (PPV), accuracy (ACC), sensitivity (Sen), and F1-score performance metrics were used to evaluate the model's predictive performance. In addition, a 10-fold cross-validation method was used to determine the validity of the model. In addition, variable importance was established, which reveals how much the input variables impact the output variable. Results: ACC, BACC, Sen, Spe, PPV, NPV, and F1-score obtained from the model performance metrics were calculated to 99.1%, 98.7%, 99.7%, 97.7%, 99.1%, 99.2%, and 99.4%, respectively, as a result of modeling. According to the variable importance values that were acquired for the input variables in the dataset that was investigated in this study, the seven variable that hold the greatest significance are as follows: size, TSH, blood flow: size, TSH, blood flow: enriched, multilateral: yes, FT4, site: isthmus, and age, in that order. Conclusion: As a result, the Bagged CART model was found to be effective at predicting thyroid cancer based on the findings of this study. In addition, in this study, risk factors for thyroid cancer were evaluated and their importance values were given. With these results, the decision-making process about the disease will be able to accelerate and thus, it will be able to effective in preventive medicine practices.


  • Rossi ED, Pantanowitz L, Hornick JL. A worldwide journey of thyroid cancer incidence centred on tumour histology. The Lancet Diabetes and Endocrinology. 2021;9(4):193-4.
  • Anari S, Tataei Sarshar N, Mahjoori N, Dorosti S, Rezaie A. Review of Deep Learning Approaches for Thyroid Cancer Diagnosis. Mathematical Problems in Engineering. 2022;2022.
  • Araque DVP, Bleyer A, Brito JP. Thyroid cancer in adolescents and young adults. Future Oncology. 2017;13(14):1253-61.
  • Tuttle RM, Ball DW, Byrd D, Dilawari RA, Doherty GM, Duh Q-Y, et al. Thyroid carcinoma. Journal of the National Comprehensive Cancer Network. 2010;8(11):1228-74.
  • Carcangiu ML, Steeper T, Zampi G, Rosai J. Anaplastic thyroid carcinoma: a study of 70 cases. American journal of clinical pathology. 1985;83(2):135-58.
  • Olson E, Wintheiser G, Wolfe KM, Droessler J, Silberstein PT. Epidemiology of thyroid cancer: a review of the National Cancer Database, 2000-2013. Cureus. 2019;11(2).
  • Lamartina L, Grani G, Durante C, Filetti S, Cooper DS. Screening for differentiated thyroid cancer in selected populations. The Lancet Diabetes & Endocrinology. 2020;8(1):81-8.
  • Lin JS, Bowles EJA, Williams SB, Morrison CC. Screening for thyroid cancer: updated evidence report and systematic review for the US Preventive Services Task Force. Jama. 2017;317(18):1888-903.
  • Keramidas EG, Iakovidis DK, Maroulis D, Karkanis S, editors. Efficient and effective ultrasound image analysis scheme for thyroid nodule detection. International Conference Image Analysis and Recognition; 2007: Springer.
  • Durante C, Grani G, Lamartina L, Filetti S, Mandel SJ, Cooper DS. The diagnosis and management of thyroid nodules: a review. Jama. 2018;319(9):914-24.
  • Li T, Sheng J, Li W, Zhang X, Yu H, Chen X, et al. A new computational model for human thyroid cancer enhances the preoperative diagnostic efficacy. Oncotarget. 2015;6(29):28463.
  • Jin Z, Zhu Y, Zhang S, Xie F, Zhang M, Zhang Y, et al. Ultrasound computer-aided diagnosis (CAD) based on the thyroid imaging reporting and data system (TI-RADS) to distinguish benign from malignant thyroid nodules and the diagnostic performance of radiologists with different diagnostic experience. Medical Science Monitor: International Medical Journal of Experimental and Clinical Research. 2020;26:e918452-1.
  • Zhao Y, Healy BC, Rotstein D, Guttmann CR, Bakshi R, Weiner HL, et al. Exploration of machine learning techniques in predicting multiple sclerosis disease course. PloS one. 2017;12(4):e0174866.
  • Chen JIZ, Hengjinda P. Early prediction of coronary artery disease (CAD) by machine learning method-a comparative study. Journal of Artificial Intelligence. 2021;3(01):17-33.
  • Hamze-Ziabari S, Bakhshpoori T. Improving the prediction of ground motion parameters based on an efficient bagging ensemble model of M5′ and CART algorithms. Applied Soft Computing. 2018;68:147-61.
  • Deng H, Diao Y, Wu W, Zhang J, Ma M, Zhong X. A high-speed D-CART online fault diagnosis algorithm for rotor systems. Applied Intelligence. 2020;50(1):29-41.
  • Choubin B, Abdolshahnejad M, Moradi E, Querol X, Mosavi A, Shamshirband S, et al. Spatial hazard assessment of the PM10 using machine learning models in Barcelona, Spain. Science of The Total Environment. 2020;701:134474.
  • Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees: Routledge; 2017.
  • Timofeev R. Classification and regression trees (CART) theory and applications. Humboldt University, Berlin. 2004;54.
  • Murphree DH, Arabmakki E, Ngufor C, Storlie CB, McCoy RG. Stacked classifiers for individualized prediction of glycemic control following initiation of metformin therapy in type 2 diabetes. Computers in biology and medicine. 2018;103:109-15.
  • Duan H, Deng Z, Deng F, Wang D. Assessment of groundwater potential based on multicriteria decision making model and decision tree algorithms. Mathematical Problems in Engineering. 2016;2016.
  • Ismay C, Kennedy PC. Getting Used to r, RStudio, and r Markdown. 2016.
  • Asif MA-A-R, Nishat MM, Faisal F, Shikder MF, Udoy MH, Dip RR, et al., editors. Computer aided diagnosis of thyroid disease using machine learning algorithms. 2020 11th International Conference on Electrical and Computer Engineering (ICECE); 2020: IEEE.
  • Yoon JH, Lee HS, Kim E-K, Moon HJ, Kwak JY. Malignancy risk stratification of thyroid nodules: comparison between the thyroid imaging reporting and data system and the 2014 American Thyroid Association management guidelines. Radiology. 2016;278(3):917-24.
  • Kaur K, Sonkhya N, Bapna A, Mital P. A comparative study of fine needle aspiration cytology, ultrasonography and radionuclide scan in the management of solitary thyroid nodule: A prospective analysis of fifty cases. Indian Journal of Otolaryngology and Head and Neck Surgery. 2002;54(2):96-101.
  • Shin JH, Baek JH, Chung J, Ha EJ, Kim J-h, Lee YH, et al. Ultrasonography diagnosis and imaging-based management of thyroid nodules: revised Korean Society of Thyroid Radiology consensus statement and recommendations. Korean journal of radiology. 2016;17(3):370-95.
  • Yoo I, Alafaireet P, Marinov M, Pena-Hernandez K, Gopidi R, Chang J-F, et al. Data mining in healthcare and biomedicine: a survey of the literature. Journal of medical systems. 2012;36(4):2431-48.
  • Mansour R, Eghbal K, Amirhossein H. Comparison of artificial neural network, logistic regression and discriminant analysis efficiency in determining risk factors of type 2 diabetes. 2013.
  • Wang C, Li L, Wang L, Ping Z, Flory MT, Wang G, et al. Evaluating the risk of type 2 diabetes mellitus using artificial neural network: An effective classification approach. Diabetes research and clinical practice. 2013;100(1):111-8.
  • Parikh KS, Shah TP, Kota R, Vora R. Diagnosing common skin diseases using soft computing techniques. International Journal of Bio-Science and Bio-Technology. 2015;7(6):275-86.
  • Ioniţă I, Ioniţă L. Prediction of thyroid disease using data mining techniques. BRAIN Broad Research in Artificial Intelligence and Neuroscience. 2016;7(3):115-24.
  • Chaurasia V, Pal S, Tiwari B. Prediction of benign and malignant breast cancer using data mining techniques. Journal of Algorithms & Computational Technology. 2018;12(2):119-26.
  • Talasila V, Madhubabu K, Mahadasyam MC, Atchala NJ, Kande LS. The Prediction of Diseases Using Rough Set Theory with Recurrent Neural Network in Big Data Analytics. International Journal of Intelligent Engineering & Systems. 2020;13(5).
  • Kumar HH, editor A novel approach of SVM based classification on thyroid disease stage detection. 2020 third international conference on smart systems and inventive technology (ICSSIT); 2020: IEEE.
  • Aversano L, Bernardi ML, Cimitile M, Iammarino M, Macchia PE, Nettore IC, et al. Thyroid disease treatment prediction with machine learning approaches. Procedia Computer Science. 2021;192:1031-40.


Vancouver Balıkçı Çiçek İ. , Küçükakçalı Z. Machine Learning Approach for Thyroid Cancer Diagnosis Using Clinical Data. Middle Black Sea Journal of Health Science. 2023; 9(3): 440-452.
Middle Black Sea Journal of Health Science
  • Yayın Aralığı: Yılda Yılda 3 Sayı
  • Yayıncı: Ordu Üniversitesi


Sayıdaki Diğer Makaleler

Assessment of the Relationship between Vitamin D Deficiency and the Development of Hyperemesis Gravidarum


Doğu Karadeniz Bölgesi Erişkin Popülasyonda Ankete Dayalı Besin Alerji Prevelansı

Handan DUMAN, Adile Berna DURSUN

Rapid Antigen Tests for COVID-19: Are Their Specificity, Sensivity and Accuracy Sufficient?

Hulya SİNAN, Emel UZUNOĞLU, Mediha UĞUR, Esin AVCİ, Cihangir AKDEMİR, Şahin DİREKEL

Determining University Students' Anxiety and Problem Solving Skills in the COVID-19 Pandemic Process


Investigation of the Effectiveness of Cl-Amidine on Wound Healing: An In Vitro Study


Analysis of 55 Adult Cases Surgically Treated for Pontocerebellar Angle Tumors

İbtahim BAŞAR, Sinan BAHADIR, Murat YÜCEL, Tevfik YILMAZ

Evaluation of infections associated with central venous catheters in ICU


Kadınların Pozitif Doğum Deneyimine İlişkin Memnuniyet Düzeyleri ve İlişkili Faktörlerin Belirlenmesi: Kesitsel Bir Çalışma


Determination of Factors Affecting Severity of Helicobacter pylori for Gastric Biopsy Samples by CART Decision Tree Algorithm

Türkan Mutlu YAR, Ülkü KARAMAN, Yeliz KAŞKO ARICI

Investigation of Postgraduate Theses on Using of Web-Based Education in Nursing Education

Hüsne YÜCESOY, Nülüfer ERBİL