Chemical disease relation extraction task using genetic algorithm with two novel voting methods for classifier subset selection

Biomedical relation extraction is an important preliminary step for knowledge discovery in the biomedical domain. This paper proposes a multiple classifier system MCS for the extraction of chemical-induced disease relations. A genetic algorithm GA is employed to select classifier ensembles from a pool of base classifiers. Moreover, the voting method used for combining the members of each of the ensembles is also selected during evolution in the GA framework. The performances of the MCSs are determined by the algorithms used for selecting the classifiers, the diversity among the selected classifiers, and the voting method used in the classifier combination. The base classifiers are represented in the form of chromosomes, where each chromosome contains all information on the ensemble it represents: the subset of classifiers voting and the voting method. The chromosomes are evolved using a variety of genetic selection, mating, and mutation techniques in order to find an optimal solution. The aim of the proposed system is to select the subset of classifiers with diverse abilities while maximizing the strengths of the best classifiers in the classifier ensemble for a given voting method. Two main contributions of this work are the evolution of the voting bit as part of the GA and the novel approach of using two different decision-making under uncertainty techniques as voting methods. Furthermore, two different selection algorithms and crossover operators are employed as ways of increasing variations during evolution. We validated our proposed method on nine different experimental settings and they produced good results comparable to the state-of-the-art systems, thereby justifying our approach.

___

  • [1] Airola A, Pyysalo S, Björne J, Pahikkala T, Ginter F et al. All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning. BMC Bioinformatics 2008; 9 (Suppl. 11): S2. doi: 10.1186/1471- 2105-9-S11-S2
  • [2] Bui QC, Katrenko S, Sloot PM. A hybrid approach to extract protein-protein interactions. Bioinformatics 2010; 27 (2): 259-265. doi: 10.1093/bioinformatics/btq620
  • [3] Miwa M, Sætre R, Miyao Y, Tsujii J. A rich feature vector for protein-protein interaction extraction from multiple corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing; Singapore; 2009. pp. 121-130.
  • [4] Pons E, Becker BF, Akhondi SA, Afzal Z, Van Mulligen EM et al. Extraction of chemical-induced diseases using prior knowledge and textual information. Database 2016; 2016: baw046. doi: 10.1093/database/baw046
  • [5] Kuncheva LI. Combining Pattern Classifiers: Methods and Algorithms. Hoboken, NJ, USA: John Wiley & Sons, 2004.
  • [6] Giot R, Rosenberger C. Genetic programming for multibiometrics. Expert Systems with Applications 2004; 39 (2): 1837-1847.
  • [7] Ruta D, Gabrys B. An overview of classifier fusion methods. Computing and Information Systems 2000; 7 (1): 1-10.
  • [8] Arasu A, Götz M, Kaushik R. On active learning of record matching packages. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data; 2010. pp. 783-794.
  • [9] Gabrys B. Combining neuro-fuzzy classifiers for improved generalisation and reliability. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN2002), A Part of the WCCI2002 Congress; Honolulu, HI, USA; 2002. pp. 2410-2415.
  • [10] Kuncheva L. Fuzzy Classifier Design. Berlin, Germany: Springer Science & Business Media, 2000.
  • [11] Hao H, Liu CL, Sako H. Comparison of genetic algorithm and sequential search methods for classifier subset selection. In: IEEE 2003 Seventh International Conference on Document Analysis and Recognition; 2003. pp. 765-769. doi: 10.1109/ICDAR.2003.1227765
  • [12] Gabrys B, Ruta D. Genetic algorithms in classifier fusion. Applied Soft Computing 2006; 6 (4): 337-347.
  • [13] Beitia IM. Contributions on distance-based algorithms, multi-classifier construction and pairwise classification. PhD, Universidad del País Vasco-Euskal Herriko Unibertsitatea, San Sebastián, Spain, 2015.
  • [14] Bellare K, Iyengar S, Parameswaran A, Rastogi V. Active sampling for entity matching with guarantees. ACM Transactions on Knowledge Discovery from Data (TKDD) 2013; 7 (3): 12.
  • [15] Bennett PN, Carvalho VR. Online stratified sampling: evaluating classifiers at web-scale. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management; 2010. pp. 1581-1584.
  • [16] McDowell LK, Gupta KM, Aha DW. Cautious collective classification. Journal of Machine Learning Research 2009; 10: 2777-2836.
  • [17] Sen P, Namata G, Bilgic M, Getoor L, Galligher B et al. Collective classification in network data. AI Magazine 2008; 29 (3): 93.
  • [18] Zenobi G, Cunningham P. Using diversity in preparing ensembles of classifiers based on different feature subsets to minimize generalization error. In: European Conference on Machine Learning; Berlin, Germany; 2001. pp. 576-587.
  • [19] Onye SC. Novel approaches for relation extraction in biomedical domain. PhD, Eastern Mediterranean University, Famagusta, Northern Cyprus, 2018.
  • [20] Wei CH, Peng Y, Leaman R, Davis AP, Mattingly CJ et al. Overview of the BioCreative V chemical disease relation (CDR) task. In: Proceedings of the Fifth BioCreative Challenge Evaluation Workshop; Seville, Spain; 2015. pp. 154–166.
  • [21] Huang CC, Lu Z. Community challenges in biomedical text mining over 10 years: success, failure and the future. Briefings in Bioinformatics 2016; 17(1): 132-144. doi: 10.1093/bib/bbv024
  • [22] Zheng W, Lin H, Li Z, Liu X, Li Z et al. An effective neural model extracting document level chemical-induced disease relations from biomedical literature. Journal of Biomedical Informatics 2018; 83: 1-9. doi: 10.1016/j.jbi.2018.05.001
  • [23] Alam F, Corazza A, Lavelli A, Zanoli R. A knowledge-poor approach to chemical-disease relation extraction. Database (Oxford) 2016; 2016: baw071. doi: 10.1093/database/baw071
  • [24] Onye SC, Akkeleş A, Dimililer N. relSCAN - a system for extracting chemical-induced disease relation from biomedical literature. Journal of Biomedical Informatics 2018; 87: 79-87. doi: 10.1016/j.jbi.2018.09.018
  • [25] Peng Y, Wei CH, Lu Z. Improving chemical disease relation extraction with rich features and weakly labeled data. Journal of Cheminformatics 2016; 8: 53. doi: 10.1186/s13321-016-0165-z
  • [26] Jiang Z, Jin LK, Li LS, Qin M, Qu C et al. A CRD-WEL system for chemical-disease relations extraction. In: Proceedings of the Fifth BioCreative Challenge Evaluation Workshop; Seville, Spain; 2015. pp. 317-326.
  • [27] Lowe DM, O’Boyle NM, Sayle RA. Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall. Database 2016; 2016: baw039. doi: 10.1093/database/baw039
  • [28] Zhou H, Deng H, Chen L, Yang Y, Jia C et al. Exploiting syntactic and semantics information for chemical-disease relation extraction. Database (Oxford) 2016; 2016: baw048. doi: 10.1093/database/baw048
  • [29] Li Z, Yang Z, Lin H, Wang J, Gui Y et al. CIDExtractor: A chemical-induced disease relation extraction system for biomedical literature. Bioinformatics and Biomedicine 2016; 2016: 994-1001. doi: 10.1109/BIBM.2016.7822658
  • [30] Wei CH, Peng Y, Leaman R, Davis AP, Mattingly CJ et al. Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task. Database 2016; 2016: baw032. doi: 10.1093/database/baw032
  • [31] Gu J, Sun F, Qian L, Zhou G. Chemical-induced disease relation extraction via convolutional neural network. Database 2017; 2017 (1): /bax024. doi: 10.1093/database/bax024
  • [32] Gu J, Qian L, Zhou G. Chemical-induced disease relation extraction with lexical features. In: Proceedings of the Fifth BioCreative Challenge Evaluation Workshop; Seville, Spain; 2015. pp. 220-225.
  • [33] Gu J, Qian L, Zhou G. Chemical-induced disease relation extraction with various linguistic features. Database 2016; 2016: baw042. doi: 10.1093/database/baw042
  • [34] Xu J, Wu Y, Zhang Y, Wang J, Lee HJ et al. CD-REST: a system for extracting chemical-induced disease relation in literature. Database (Oxford) 2016; 2016: baw036. doi: 10.1093/database/baw036
  • [35] Zhou HW, Deng HJ, He J. Chemical-disease relations extraction based on the shortest dependency path tree. In: Proceedings of the Fifth BioCreative Challenge Evaluation Workshop; Seville, Spain; 2015. pp. 214-219.
  • [36] Xu J, Wu Y, Zhang Y, Wang J, Liu R et al. UTH-CCB@ BioCreative V CDR task: identifying chemical-induced disease relations in biomedical text. In: Proceedings of the Fifth BioCreative Challenge Evaluation Workshop; Seville, Spain; 2015. pp. 254–259.
  • [37] Peng Y, Wei CH, Lu Z. Improving chemical disease relation extraction with rich features and weakly labeled data. Journal of Cheminformatics 2016; 8 (1): 53. doi: 10.1186/s13321-016-0165-z
  • [38] Miwa M, Sætre R, Miyao Y, Tsujii J. Protein–protein interaction extraction by leveraging multiple kernels and parsers. International Journal of Medical Informatics 2009; 78 (12): e39-e46. doi: 10.1016/j.ijmedinf.2009.04.010
  • [39] Gurulingappa H, Rajput AM, Roberts A, Fluck J, Hofmann-Apitius M et al. Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports. Journal of Biomedical Informatics 2012; 45 (5): 885-92. doi: 10.1016/j.jbi.2012.04.008
  • [40] Le HQ, Tran MV, Dang TH, Collier N. The UET-CAM system in the BioCretive V CDR task. In: Proceedings of the Fifth BioCreative Challenge Evaluation Workshop; Seville, Spain; 2015. pp. 1-20.
  • [41] Zhou H, Yang Y, Liu Z, Liu Z, Men Y. Integrating word sequences and dependency structures for chemical-disease relation extraction. In: Sun M, Wang X, Chang B, Xiong D (editors). Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. Cham: Springer, 2017. pp. 97-109. doi: 10.1007/978-3-319-69005-6_9
  • [42] Li J, Sun Y, Johnson R, Sciaky D, Wei CH et al. Annotating chemicals, diseases and their interactions in biomedical literature. In: Proceedings of the Fifth BioCreative Challenge Evaluation Workshop; Seville, Spain; 2015. pp. 173- 182.
  • [43] Kim S, Yoon J, Yang J, Park S. Walk-weighted subsequence kernels for protein-protein interaction extraction. BMC Bioinformatics 2010; 11: 107. doi: 10.1186/1471-2105-11-107
  • [44] Murugesan G, Abdulkadhar S, Natarajan J. Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature. PLoS One 2017; 12 (11): e0187379. doi: 10.1371/journal.pone.0187379
  • [45] Davis AP, Murphy CG, Saraceni-Richards CA, Rosenstein MC, Wiegers TC et al. Comparative Toxicogenomics Database: a knowledgebase and discovery tool for chemical–gene–disease networks. Nucleic Acids Research 2008; 37: D786-D792. doi: 10.1093/nar/gkn580
  • [46] Davis AP, Wiegers TC, Roberts PM, King BL, Lay JM et al. A CTD–Pfizer collaboration: manual curation of 88 000 scientific articles text mined for drug–disease and drug–phenotype interactions. Database (Oxford) 2016; 2016: bat080. doi: 10.1093/database/bat080
  • [47] Rocha M, Neves J. Preventing premature convergence to local optima in genetic algorithms via random offspring generation. In: International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems; Springer, Berlin; 1999. pp. 127-36.
  • [48] Dimililer N, Varoglu E, Altançay H. Vote-based classifier selection for biomedical NER using genetic algorithms. In: Martí J, Benedí JM, Mendonça AM, Serrat J (editors). Pattern Recognition and Image Analysis. IbPRIA 2007. Lecture Notes in Computer Science. Berlin: Springer, 2007, pp. 202-209.
  • [49] Pažek K, Rozman Č. Decision making under conditions of uncertainty in agriculture: a case study of oil crops. Poljoprivreda 2009; 15 (1): 45-50.