Toxicity prediction of small drug molecules of aryl hydrocarbon receptor using a proposed ensemble model

Toxicity prediction of small drug molecules of aryl hydrocarbon receptor using a proposed ensemble model

Quantitative structure–activity relationships and quantitative structure–property relationships have provedtheir usefulness for predicting toxicities of drug molecules regarding their biological activities. In silico toxicity predictiontechniques are essential for reducing testing on rodents (in vivo) and for a less time-consuming and more cost-efficientalternative for the identification of toxic effects at an early stage of drug development. The authors aim to build aprediction model for better assessment of toxicity to quickly and efficiently test whether certain chemical compoundshave the potential to disrupt the processes in the human body that may adversely affect human health. Here, wehave proposed a computational method (in silico) for the toxicity prediction of small drug molecules using their variousphysicochemical properties (molecular descriptors) that can bind to the aryl hydrocarbon receptor. Pharmaceutical dataexploration laboratory software is used for extracting the features of drug molecules. The dataset of the aryl hydrocarbonreceptor contains 9008 drug molecules, where 1063 are active and 7945 are inactive, and each drug molecule contains1444 features. It is a novel prediction model based on ensemble learning that can efficiently classify active (binding)and inactive (nonbinding) compounds of the dataset. In our proposed ensemble model, we primarily performed featureselection using the Boruta library in R, after which we resolved the class imbalance problem itself by ensemble learningwhere we divided the dataset into seven data frames, which have approximately equal numbers of active and inactivedrug molecules. An ensemble model based upon the votes of seven random forest models is proposed, which gives anaccuracy of 93.76%. K-fold cross-validation is conducted to measure the consistency of the model. Finally, the validityof the proposed ensemble model for some drug molecules of acquired immune deficiency syndrome therapy and androgenreceptor has been proved.

___

  • [1] Rastogi SC, Mendiratta N, Rastogi P. Bioinformatics Methods and Applications, Genomics, Proteomics and Drug Discovery. New Delhi, India: Prentice-Hall, 2008.
  • [2] Chen YZ, Ung CY. Prediction of potential toxicity and side effect protein targets of a small molecule by a ligand-protein inverse docking approach. Journal of Molecular Graphics and Modelling 2001; 20 (3): 199-218. doi: 10.1016/S1093-3263(01)00109-7
  • [3] Mayr A, Gunter K, Unterthiner T, Hochreiter S. Deep tox: Toxicity prediction using deep learning. Frontiers in Environmental Science 2016; 3: 80. doi: 10.3389/fenvs.2015.00080
  • [4] Toropov AA, Toropova AP, Raska JI, Leszczynska D, Leszczynski J. Comprehension of drug toxicity: software and databases. Computers in Biology and Medicine 2014; 45: 20-25. doi: 10.1016/j.compbiomed.2013.11.013
  • [5] Tannenbaum J, Bennett BT. Russell and Burch’s 3Rs then and now: the need for clarity in definition and purpose. Journal of the American Association for Laboratory Animal Science 2015; 54 (2): 120-132.
  • [6] Khan MTH. Predictions of the ADMET properties of candidate drug molecules utilizing different QSAR/QSPR modelling approaches. Current Drug Metabolism 2010; 11 (4): 285-295. doi: 10.2174/138920010791514306
  • [7] Stockinger B. Beyond toxicity: aryl hydrocarbon receptor-mediated functions in the immune system. Journal of Biology 2009; 8 (7): 61. doi: 10.1186/jbiol170
  • [8] Basak SC, Mills D, Mumtaz MM, Balasubramaniun K. Use of topological indices in predicting aryl hydrocarbon receptor binding potency of dibenzofurans: a hierarchical QSAR approach. Indian Journal of Chemistry 2003; 42A: 1385-1391.
  • [9] Kola I, Landis J. Can the pharmaceutical industry reduce attrition rates? Nature Reviews Drug Discovery 2004; 3: 711–715. doi: 10.1038/nrd1470
  • [10] Pipero EL, Koehler K, Chana A, Benfenati E. Virtual screening for aryl hydrocarbon receptor binding prediction. Journal of Medicinal Chemistry 2006; 49 (19): 5702–5709. doi: 10.1021/jm060526f
  • [11] Cassano A, Manganaro A, Martin T, Young D, Piclin N et al. CAESAR models for developmental toxicity. Chemistry Central Journal 2010; 4 (S1): S4. doi: 10.1186/1752-153X-4-S1-S4
  • [12] Drwal M, Siramshetty VB, Banerjee P, Goede A, Preissner R et al. Molecular similarity-based predictions of the Tox21 screening outcome. Frontiers in Environmental Science 2015; 3: 54. doi: 10.3389/fenvs.2015.00054
  • [13] Stefaniak F. Prediction of compounds activity in nuclear receptor signaling and stress pathway assays using machine learning algorithms and low-dimensional molecular descriptors. Frontiers in Environmental Science 2015; 3: 77. doi: 10.3389/fenvs.2015.00077
  • [14] Capuzzi SJ, Politi R, Isayev O, Farag S, Tropsha A. QSAR modeling of Tox21 challenge stress response and nuclear receptor signaling toxicity assays. Frontiers in Environmental Science 2016; 4: 3. doi: 10.3389/fenvs.2016.00003
  • [15] Yap CW. PaDEL-Descriptor: an open source software to calculate molecular descriptors and fingerprints. Journal of Computational Chemistry 2010; 32 (4): 1466-1474. doi: 10.1002/jcc.21707
  • [16] Kursa MB, Rudnicki WR. Feature selection with the boruta package. Journal of Statistical Software 2010; 36 (11): 1-13. doi: 10.18637/jss.v036.i11
  • [17] Hooda N, Baba S, Rana PS. B2FSE framework for high dimensional imbalanced data: a case study for drug toxicity prediction. Neurocomputing 2018; 276: 31-41. doi: 10.1016/j.neucom.2017.04.081
  • [18] Feng W, Huang W, Ren J. Class imbalance ensemble learning based on the margin theory. Applied Sciences 2018; 8 (815): 1-28. doi: 10.3390/app8050815
  • [19] Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 2002; 16: 321–357. doi: 10.1613/jair.953
  • [20] Jiang J, Wang N, Chen P, Zhang J, Wang W. DrugECs: An ensemble system with feature subspaces for accurate drug-target interaction prediction. BioMed Research International 2017; 2017: 6340316. doi: 10.1155/2017/6340316
  • [21] Arumugam A. A predictive modeling approach for improving paddy crop productivity using data mining techniques. Turkish Journal of Electrical Engineering & Computer Sciences 2017; 25: 4777-4787. doi: 10.3906/elk-1612-361
  • [22] Takci H. Improvement of heart attack prediction by the feature selection methods. Turkish Journal of Electrical Engineering & Computer Sciences 2018; 26: 1-10. doi: 10.3906/elk-1611-235
  • [23] Han J, Kamber M, Pei J. Data Mining Concepts and Techniques. Waltham, MA, USA: Morgan Kaufmann Elsevier, 2012.
  • [24] Tan PN, Kumar V, Steinbach M. Introduction to Data Mining. Manesar, INDIA: Pearson Education, 2016.
  • [25] Khanna D, Rana PS. Multilevel ensemble model for prediction of IgA and IgG antibodies. Immunology Letters 2017; 184: 51–60. doi: 10.1016/j.imlet.2017.01.017
  • [26] Rana PS, Sharma H, Bhattacharya M, Shukla A. Quality assessment of modeled protein structure using physicochemical properties. Journal of Bioinformatics and Computational Biology 2015; 13 (2): 1550005. doi: 10.1142/S0219720015500055
  • [27] Usach I, Melis V, Peris JE. Non-nucleoside reverse transcriptase inhibitors: a review on pharmacokinetics, pharmacodynamics, safety and tolerability. Journal of the International AIDS Society 2013; 16 (1): 18567. doi: 10.7448/IAS.16.1.18567
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: Yılda 6 Sayı
  • Yayıncı: TÜBİTAK