Mostafa BOROUMANDZADEH, Elham PARVINNIA

Automated classification of BI-RADS in textual mammography reports

The main purpose of this paper is to process key information in medical text records and also classify patients, per different levels of breast imaging-reporting and data system (BI-RADS). The BI-RADS is a scheme for the standardization of breast imaging reports. Therefore, medical text mining is employed to classify mammography reports supported BI-RADS. In this research, a new method is proposed for automated BI-RADS classifications extraction from textual reports and improves the therapeutic procedures. At first, a mammography lexicon is employed for choosing keywords from medical text reports. Word2vec and term frequency inverse document frequency (TFIDF) techniques are used for extracting features, finally, they are combined with the hospital information system (HIS) reports and called With-HIS. The different classifiers like multiclass support vector machine (SVM), naïve Bayesian (NB), extreme gradient boosting (XGBoost), and multilevel fuzzy min-max neural network (MLF) are used so as to compare the accuracy of With-HIS and without HIS (called Without-HIS). The results are confirmed that using HIS beside the proposed approach (Word2vec +TFIDF) encompasses a significant effect on the accuracy of medical text classification. Accuracy within the proposed method with MLF classifier is 0.89% but Without-HIS is 0.85%

PDF

___

[1] Koo MM, Von Wagner C, Abel GA, McPhail S, Rubin GP et al. Typical and atypical presenting symptoms of breast cancer and their associations with diagnostic intervals: Evidence from a national audit of cancer diagnosis. Cancer Epidemiology 2017; 48: 140-146. doi: 10.1016/j.canep.2017.04.010
[2] Redaniel MT, Martin RM, Ridd MJ, Wade J, Jeffreys M. Diagnostic intervals and its association with breast, prostate, lung and colorectal cancer survival in England: historical cohort study using the Clinical Practice Research Datalink. PLoS ONE 2015; 10 (5). doi: 10.1371/journal.pone.0126608
[3] Wang M, Yang Z, Liu C, Yan J, Zhang W et al. Differential diagnosis of breast category 3 and 4 nodules through BI-RADS classification in conjunction with shear wave elastography. Ultrasound in Medicine and Biology 2017; 43
(3): 601-606. doi: 10.1016/j.ultrasmedbio.2016.10.004
[4] Lucini FR, Fogliatto FS, Da Silveira GJ, Neyeloff JL, Anzanello MJ et al. Text mining approach to predict hospital admissions using early medical records from the emergency department. International journal of medical informatics 2017; 100: 1-8. doi: 10.1016/j.ijmedinf.2017.01.001
[5] Savoie B, Nagy P. PACS and the potential for medical errors. Journal of the American College of Radiology 2012; 9 (10): 756-758. doi: 10.1016/j.jacr.2012.06.021
[6] Shan J, Alam SK, Garra B, Zhang Y, Ahmed T. Computer-aided diagnosis for breast ultrasound using computerized BI-RADS features and machine learning methods. Ultrasound in medicine and biology 2016; 42 (4): 980-988. doi: 10.1016/j.ultrasmedbio.2015.11.016
[7] Castro SM, Tseytlin E, Medvedeva O, Mitchell K, Visweswaran S et al. Automated annotation and classification of BI RADS assessment from radiology reports. Journal of Biomedical Informatics 2017; 69: 177-187. doi: 10.1016/j.jbi.2017.04.011
[8] Lyratzopoulos G, Abel G, McPhail S, Neal R, Rubin G. Measures of promptness of cancer diagnosis in primary care: secondary analysis of national audit data on patients with 18 common and rarer cancers. British Journal of Cancer 2013; 108 (3): 686-690. doi: 10.1038/bjc.2013.1
[9] Hansen RP, Vedsted P, Sokolowski I, Søndergaard J, Olesen F. Time intervals from first symptom to treatment of cancer: a cohort study of 2,212 newly diagnosed cancer patients. BMC Health Services Research 2011; 11 (1): 284. doi: 10.1186/1472-6963-11-284
[10] Neal R, Din N, Hamilton W, Ukoumunne O, Carter B et al. Comparison of cancer diagnostic intervals before and after implementation of NICE guidelines: analysis of data from the UK General Practice Research Database. British Journal of Cancer 2014; 110 (3): 584-592. doi: 10.1038/bjc.2013.791
[11] Webber C, Jiang L, Grunfeld E, Groome PA. Identifying predictors of delayed diagnoses in symptomatic breast cancer: a scoping review. European Journal of Cancer Care 2017; 26 (2): e12483. doi: 10.1111/ecc.12483
[12] Mendonca SC, Abel GA, Saunders CL, Wardle J, Lyratzopoulos G. Pre-referral general practitioner consultations and subsequent experience of cancer care: evidence from the English cancer patient experience survey. European Journal of Cancer Care 2016; 25 (3): 478-490. doi: 10.1111/ecc.12353
[13] Özşen S, Ceylan R. Comparison of AIS and fuzzy c-means clustering methods on the classification of breast cancer and diabetes datasets. Turkish Journal of Electrical Engineering & Computer Silences 2014; 22 (5): 1241-1254. doi: 10.3906/elk-1210-62
[14] Gürüf A, Öztürk M, Bayrak İK, Polat A. Shear wave versus strain elastography in the differentiation of benign and malignant breast lesions. Turkish Journal of Medical Sciences 2019; 49 (5): 1509-1517. doi: 10.3906/sag-1905-15
[15] Levy L, Suissa M, Chiche J, Teman G, Martin B. BIRADS ultrasonography. European Journal of Radiology 2007; 61 (2): 202-211. doi: 10.1016/j.ejrad.2006.08.035
[16] Mundinger A, Wilson A, Weismann C, Madjar H, Heindel W et al. E5. Breast ultrasound-update. European Journal of Cancer Supplements 2010; 8 (3): 11-14. doi: 10.1016/S1359-6349(10)70009-4
[17] Boyer B, Canale S, Arfi-Rouche J, Monzani Q, Khaled W et al. Variability and errors when applying the BIRADS mammography classification. European Journal of Radiology 2013; 82 (3): 388-397. doi: 10.1016/j.ejrad.2012.02.005
[18] Fei G, Hyunsoo Y, Wu T, Xianghua C. A feature transfer enabled multi-task deep learning model on medical imaging. Expert Systems with Applications 2020; 143: 1-11. doi: 10.1016/j.eswa.2019.112957
[19] Karim AM, Güzel MS, Tolun MR, Kaya H, Çelebi FV. A new generalized deep learning framework combining sparse autoencoder and Taguchi method for novel data classification and processing. Mathematical Problems in Engineering 2018; 6: 1-13. doi: 10.1155/2018/3145947
[20] Karim AM, Güzel MS, Tolun MR, Kaya H, Çelebi FV. A new framework using deep auto-encoder and energy spectral density for medical waveform data classification and processing. Biocybernetics and Biomedical Engineering 2018; 39 (1): 148-159. doi: 10.1016/j.bbe.2018.11.004
[21] Zobeidi S, Naderan M, Alavi SE. Opinion mining in Persian language using a hybrid feature extraction approach based on convolutional neural network. Multimedia Tools and Applications 2019; 78 (22): 32357-32378. doi: 10.1007/s11042-019-07993-4
[22] Yao H, Zhang B, Zhang P, Li M. A novel kernel for text classification based on semantic and statistical information. Computing and Informatics 2018; 37 (4): 992-1010. doi: 10.4149/cai 2018 4 992
[23] Khan S, Islam A, Aleem M, Iqbal M. Temporal specificity-based text classification for information retrieval. Turkish Journal of Electrical Engineering & Computer Silences 2018; 26 (6): 2916-2927. doi: 10.3906/elk-1711-136
[24] Spasić I, Livsey J, Keane JA, Nenadić G. Text mining of cancer-related information: review of current status and future directions. International Journal of Medical Informatics 2014; 83 (9): 605-623. doi: 10.1016/j.ijmedinf.2014.06.009
[25] Solt I, Tikk D, Gál V, Kardkovács ZT. Semantic classification of diseases in discharge summaries using a contextaware rule-based classifier. Journal of the American Medical Informatics Association 2009; 16 (4): 580-584. doi: 10.1197/jamia.M3087
[26] Yang H, Spasic I, Keane JA, Nenadic G. A text mining approach to the prediction of disease status from clinical discharge summaries. Journal of the American Medical Informatics Association 2009; 16 (4): 596-600. doi: 10.1197/jamia.M3096
[27] Ambert KH, Cohen AM. A system for classifying disease comorbidity status from medical discharge summaries using automated hotspot and negated concept detection. Journal of the American Medical Informatics Association 2009; 16 (4): 590-595. doi: 10.1197/jamia.M3095
[28] Chen ES, Hripcsak G, Xu H, Markatou M, Friedman C. Automated acquisition of disease-drug knowledge from biomedical and clinical documents: an initial study. Journal of the American Medical Informatics Association 2008; 15 (1): 87-98. doi: 10.1197/jamia.M2401
[29] Wang X, Hripcsak G, Markatou M, Friedman C. Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. Journal of the American Medical Informatics Association 2009; 16 (3): 328-337. doi: 10.1197/jamia.M3028
[30] Michelson JD, Pariseau JS, Paganelli WC. Assessing surgical site infection risk factors using electronic medical records and text mining. American Journal of Infection Control 2014; 42 (3): 333-336. doi: 10.1016/j.ajic.2013.09.007
[31] Yang M, Kiang M, Shang W. Filtering big data from social media-Building an early warning system for adverse drug reactions. Journal of Biomedical Informatics 2015; 54: 230-240. doi: 10.1016/j.jbi.2015.01.011
[32] Vallmuur K. Machine learning approaches to analysing textual injury surveillance data: a systematic review. Accident Analysis & Prevention 2015; 79: 41-49. doi: 10.1016/j.aap.2015.03.018
[33] Günal S. Hybrid feature selection for text classification. Turkish Journal of Electrical Engineering & Computer Silences 2012; 20 (2): 1296-1311. doi: 10.3906/elk-1101-1064
[34] Sun P, Wang L, Xia Q. The keyword extraction of Chinese medical web page based on WF-TF-IDF algorithm. In: International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC); Nanjing, China; 2017. p. 193-198. doi: 10.1109/CyberC.2017.40
[35] Dreisbach C, Koleck TA, Bourne PE, Bakken S. A systematic review of natural language processing and text mining of symptoms from electronic patient-authored text data. International Journal of Medical Informatics 2019; 125: 37-46. doi: 10.1016/j.ijmedinf.2019.02.008
[36] Ding X, Zhang X. Research on text structuralization in medical field. In: 2nd International Conference on Cloud Computing and Internet of Things (CCIOT); Dalian, China; 2016. p. 155-161. doi: 10.1109/CCIOT.2016.7868324
[37] Nii M, Tuchida Y, Iwamoto T, Uchinuno A, Sakashita R. Nursing-care text evaluation using word vector representations realized by word2vec. In: IEEE International Conference on Fuzzy Systems (FUZZ-IEEE); Vancouver, BC, Canada; 2016. pp. 2165-2169. doi: 10.1109/FUZZ-IEEE.2016.7737960
[38] Narváez F, Dı́az G, Poveda C, Romero E. An automatic BI-RADS description of mammographic masses by fusing multiresolution features. Expert Systems with Applications 2017; 74: 82-95. doi: 10.1016/j.eswa.2016.11.031
[39] Østerås BH, Martinsen ACT, Brandal SHB, Chaudhry KN, Eben E et al. BI-RADS density classification from areometric and volumetric automatic breast density measurements. Academic Radiology 2016; 23 (4): 468-478. doi: 10.1016/j.acra.2015.12.016
[40] Diab DM, El Hindi KM. Using differential evolution for fine tuning naïve Bayesian classifiers and its application for text classification. Applied Soft Computing 2017; 54: 183-199. doi: 10.1016/j.asoc.2016.12.043
[41] Zhang L, Hu X. Word combination kernel for text classification with support vector machines. Computing and Informatics 2014; 32 (4): 877-896. doi: 10.1016/j.irbm.2016.03.002
[42] Phillips J, Fein-Zachary V, Slanetz P. Pearls and pitfalls of contrast-enhanced mammography. Journal of Breast Imaging 2019; 1 (1): 64-72. doi: 10.1093/jbi/wby013
[43] Hu K, Luo Q, Qi K, Yang S, Mao J et al. Understanding the topic evolution of scientific literatures like an evolving city: using Google word2Vec model and spatial autocorrelation analysis. Information Processing & Management 2019; 56 (4): 1185-1203. doi: 10.1016/j.ipm.2019.02.014
[44] Yang L, Liu B, Lin H, Lin Y. Combining local and global information for product feature extraction in opinion documents. Information Processing Letters 2016; 116 (10): 623-627. doi: 10.1016/j.ipl.2016.04.009
[45] Ittoo A, Bouma G. Term extraction from sparse, ungrammatical domain-specific documents. Expert Systems with Applications 2013; 40 (7): 2530-2540. doi: 10.1016/j.eswa.2012.10.067
[46] Davtalab R, Dezfoulian MH, Mansoorizadeh M. Multi-level fuzzy min-max neural network classifier. IEEE Transactions on Neural Networks and Learning Systems 2013; 25 (3): 470-482. doi: 10.1109/TNNLS.2013.2275937
[47] Barzegar S, Davis B, Handschuh S, Freitas A. Classification of composite semantic relations by a distributionalrelational model. Data & Knowledge Engineering 2018; 117: 319-335. doi: 10.1016/j.datak.2018.06.005
[48] Ali I, Asif M, Shahbaz M, Khalid A, Rehman M et al. Text categorization approach for secure design pattern selection using software requirement specification. IEEE Access 2018; 6: 73928-73939. doi: 10.1109/AC-CESS.2018.2883077
[49] Kim D, Seo D, Cho S, Kang P. Multi-co-training for document classification using various document representations: TF-IDF, LDA, and Doc2Vec. Information Sciences 2019; 477: 15-29. doi: 10.1016/j.ins.2018.10.006
[50] Tang F, Adam L, Si B. Group feature selection with multiclass support vector machine. Neurocomputing 2018; 317: 42-49. doi: 10.1016/j.neucom.2018.07.012
[51] Raiskin Y, Eickhoff C, Beeler PE. Categorization of free-text drug orders using character-level recurrent neural networks. International Journal of Medical Informatics 2019; 129: 20-28. doi: 10.1016/j.ijmedinf.2019.05.020
[52] Pennington J, Socher R, Manning CD. Glove: global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2014; 1532-1543. doi: 10.3115/v1/D14-1162
[53] Chen T, Guestrin C. Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2016; 1: 785-794. doi: 10.1145/2939672.2939785
[54] Tianqi C, Tong H. Comparison of neuron-based, kernel-based, tree-based and curve-based machine learning models for predicting daily reference evapotranspiration. PLoS One 2019; 14 (5): 1-27. doi: 10.1371/journal.pone.0217520
[55] Brucker F, Benites F, Sapozhnikova E. Multi-label classification and extracting predicted class hierarchies. Pattern Recognition 2011; 44 (3): 724-738. doi: 10.1016/j.patcog.2010.09.010
[56] Chaudhary A, Kolhe S, Kamal R. A hybrid ensemble for classification in multiclass datasets:An application to oilseed disease dataset. Computers and Electronics in Agriculture 2016; 124: 65-72. doi: 10.1016/j.compag.2016.03.026