A comparative review of regression ensembles on drug design datasets

Drug design datasets are usually known as hard-modeled, having a large number of features and a small number of samples. Regression types of problems are common in the drug design area. Committee machines (ensembles) have become popular in machine learning because of their good performance. In this study, the dynamics of ensembles used in regression-related drug design problems are investigated with a drug design dataset collection. The study tries to determine the most successful ensemble algorithm, the base algorithm--ensemble pair having the best/worst results, the best successful single algorithm, and the similarities of algorithms according to their performances. We also discuss whether ensembles always generate better results than single algorithms.

A comparative review of regression ensembles on drug design datasets

Drug design datasets are usually known as hard-modeled, having a large number of features and a small number of samples. Regression types of problems are common in the drug design area. Committee machines (ensembles) have become popular in machine learning because of their good performance. In this study, the dynamics of ensembles used in regression-related drug design problems are investigated with a drug design dataset collection. The study tries to determine the most successful ensemble algorithm, the base algorithm--ensemble pair having the best/worst results, the best successful single algorithm, and the similarities of algorithms according to their performances. We also discuss whether ensembles always generate better results than single algorithms.

___

  • name samples features features 1 benzo 195 32 32 2 carbolenes 37 1142 15 3 chang 34 1142 7 4 cristalli 32 1142 14 5 depreux 26 1142 12 6 mtp 274 1142 24 7 pah 80 112 10 8 pdgfr 79 320 11 9 phen 22 110 6 10 phenetyl 22 628 7 11 qsbr y 15 9 3 12 qsfsr 19 9 3 13 selwood 31 53 5 14 strupcz 34 1142 15 15 yokohoma 12 1142 11
  • Experimental results Seven base regressors were used together with each ensemble algorithm on 15 regression-type drug design problems. The experiments were done to answer the following questions in drug design problems: - Do the algorithm ensembles generate more successful results than a single algorithm? - What is the most successful ensemble algorithm? - What is the base algorithm–ensemble pair with the best results? - Which algorithm performs well with the ensembles? - What is the most successful single algorithm? - How are the algorithms and datasets grouped according to their performances? - How does the dimension reduction process affect the results? To answer these questions, 36 algorithms ((4 ensemble + 1 single) × (7 base algorithms) + Zero Rule algorithm = 36) were employed on the 15 drug design datasets described in Table 3 and their dimensionally reduced versions. A 5 × 2 cross validation was used and the RMSE results were averaged. The RMSE is defined as: RM SE a lg .name = 1 N N i=1 y i a lg .name − y i actual 2 , (1) where y i a lg .name is the prediction of alg.name for the ith test sample, y i actual is the actual output value of the ith test sample, and N is the number of test samples. The Zero Rule algorithm measures the default error of a dataset. The RMSE value of the Zero Rule is calculated as follows: RM SE ZeroRule = 1 N N i=1 y m − y i actual 2 , y m = 1 T T j=1 y j actual , (2) where y j actual is the actual output value of the jth training sample, y i a lg .name is the prediction of alg.name for the ith test sample, T is the number of training samples, and N is the number of test samples. Our base learners and ensemble algorithms have some hyperparameters to optimize. We used 2-fold cross-validation to optimize these parameters. In bagging, we optimized the bagging size by trying values of 50%, 75%, and 100%. In additive regression, we optimized the shrinkage by trying values of 0.1, 0.5, and 1. In random subspace, we optimized the subspace size by trying values of 25%, 50%, and 75%. In rotation forest, we optimized the remove percentage by trying values of 25%, 50%, and 75%. In the M5P and REP trees, we optimized the minimum number of instances by trying values of 1, 2, 3, 4, and 5. In K nearest neighbor, we optimized K by trying values of 1, 3, and 5. In support vector regression, we optimized C by trying values of 0.01, 0.1, 1, 10, and 100. In the 5 × 2 cross validation methodology, the dataset is randomly divided after shuffling into 2 halves. One half is used in the training and the other is used in the testing, and vice versa. This validation is repeated 5 times. In the results of this validation, 10 estimates of testing the RMSE were obtained for each algorithm and each dataset. In some experiments, very high RMSE results were obtained, especially with the simple linear regression algorithm disturbing the overall averages. Because of this, the performance comparisons of the algorithms were done with the algorithms’ success ranking instead of the averaged RMSEs. In each experiment, the averaged 5 × 2 cross-validation RMSEs were sorted in ascending order. The algorithm with the lowest RMSE got the 1st ranking. The worst got the 36th ranking. These success rankings are given in Tables 4 and In Table 4, the results with the original datasets are shown. In Table 5, the results with the dimensionally reduced datasets are shown. The 15 datasets are ordered along the columns of the tables. The algorithms are ordered along the rows of the tables. The average success rate and standard deviation of each algorithm are shown in the last 2 columns. In Tables 6 and 7, the summaries of Tables 4 and 5 are given, respectively. Each cell is the averaged success ranking of the experiments with the base algorithm in the cell’s row and the ensemble algorithm in the cell’s column. The average success rankings of the single algorithms used are given in the ‘Single’ column. In the Avg. column, the averaged success rankings of the experiments with respect to the base algorithms are given. In the ‘Avg.’ row, the averaged success rankings of the experiments with respect to the ensemble algorithms are given. The Nemenyi test [23] was also applied to determine whether there was a statistically significant difference between the algorithms’ average ranks. According to the Nemenyi test for 15 datasets, 36 algorithms, and a significance level of 5%, 2 algorithms are different if the distance between their average ranks is at least 14.76. In Figures 1 and 2, the graphical representation of the Nemenyi test results is shown. When Tables 4, 5, 6, and 7 and Figures 1 and 2 are investigated, the following conclusions are reached. For the experiments with the original datasets (Tables 4 and 6): - The best ranking performance (6.00) is obtained with the additive regression-partial least squares (ARPLS) algorithm. - The best performed ensemble algorithms are additive regression (AR) and bagging (BG). Table The success ranking of 36 algorithms on 15 original drug datasets (best to worst, 1 to 36). Dataset ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Avg. 1 20 29 28 24 35 32 33 2 5 34 30 29 28 10 67 15 8 5 8 1 6 17 1 16 25 5 11 11 12 7 867 2 29 17 25 30 26 18 34 4 29 32 27 17 21 24 33 16 23 28 35 31 7 4 24 35 2 30 32 25 35 27 6 17 1 18 9 10 8 24 2 11 13 17 15 1 13 1 67 8 2 10 4 7 9 13 14 12 11 6 12 4 1 25 2 29 36 36 36 36 32 36 36 36 36 11 34 19 36 36 33 3 25 23 10 11 27 5 3 13 7 23 1 18 2 11 13 30 9 6 20 2 28 27 25 32 28 7 16 20 14 4 87 4 3 11 5 12 1 1 4 1 3 8 13 5 3 16 6 9 24 24 26 27 10 2 5 5 17 35 17 32 27 34 6 18 10 19 14 15 11 19 6 17 18 18 7 6 22 2 47 10 11 20 15 21 2 20 15 26 19 3 14 12 4 19 07 31 35 35 33 35 36 25 26 9 33 24 35 34 34 30 33 5 4 2 16 16 34 3 35 33 26 29 21 21 5 12 47 19 12 3 17 3 12 21 7 21 27 12 8 7 20 3 8 6 13 12 6 8 3 6 8 6 6 13 2 2 6 13 333 20 26 34 34 26 13 7 9 18 12 25 31 33 29 32 27 21 14 13 7 6 14 26 10 25 22 14 9 8 15 14 53 11 5 15 3 17 15 14 16 19 14 1 5 3 7 5 10 32 34 33 30 34 29 35 27 3 32 33 33 22 33 31 4 12 28 7 22 29 33 34 28 14 9 19 29 23 26 8 4 22 6 4 18 4 16 15 17 22 24 9 10 13 16 15 07 13 15 21 11 18 4 8 11 7 4 20 18 9 17 20 07 23 21 32 29 25 20 9 18 23 10 31 22 30 31 33 8 24 22 22 21 13 17 29 19 27 16 21 23 14 8 9 19 14 7 8 1 14 18 16 12 20 15 2 6 10 9 21 53 33 32 30 31 33 21 31 29 10 35 22 24 31 30 28 28 25 31 14 12 22 31 10 21 28 20 26 3 26 25 22 07 34 16 9 24 9 30 28 30 29 31 15 28 24 18 17 8 7 17 16 13 19 5 11 13 8 1 16 4 15 10 23 87 27 30 26 27 28 22 12 22 30 8 36 25 36 23 35 8 28 27 27 23 23 23 30 31 31 23 27 19 27 24 18 4 26 18 25 2 20 19 22 20 24 21 4 20 16 11 26 27 35 33 31 32 32 24 23 23 15 34 28 36 35 32 29 47 36 19 1 19 5 25 33 32 34 30 10 26 28 19 6 53 Table The success ranking of 36 algorithms on 15 dimensionally reduced drug datasets (best to worst, 1 to 36). Dataset ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Avg. 1 4 36 35 32 8 9 11 2 4 27 15 16 29 22 73 15 16 5 19 7 9 17 12 17 27 12 27 8 10 17 53 2 5 16 32 33 10 1 1 3 1 32 16 20 30 1 53 16 7 17 23 35 11 2 3 8 6 31 17 11 35 4 07 17 8 10 5 8 12 31 13 21 28 23 28 2 1 5 13 9 11 2 7 9 1 18 4 13 14 6 9 1 7 6 8 29 26 22 33 36 24 26 36 35 11 13 20 23 36 15 67 3 21 23 8 10 13 10 20 14 15 18 1 34 15 23 2 30 31 28 25 14 32 32 33 33 34 7 31 24 20 32 07 4 6 24 2 22 2 11 5 4 7 25 4 21 2 7 733 10 22 31 26 30 14 12 14 9 8 30 5 33 28 24 73 18 12 29 12 2 25 27 21 22 26 19 25 17 11 25 4 11 23 3 13 15 15 28 15 23 19 2 21 12 16 8 93 31 35 32 29 28 16 19 29 34 20 36 32 26 32 28 47 5 9 11 6 3 17 3 30 11 16 28 18 31 17 33 87 19 17 12 20 16 18 20 16 27 32 8 29 13 21 34 13 6 1 6 4 11 3 4 6 5 2 34 10 4 8 3 133 20 24 25 34 31 26 13 22 18 21 20 6 32 9 9 67 21 13 13 14 4 27 30 17 24 29 14 22 9 3 29 93 12 14 1 15 12 4 21 7 12 17 3 7 3 18 10 4 32 34 26 36 34 28 22 27 29 9 15 11 30 33 18 6 13 10 7 9 27 19 5 8 10 10 9 12 27 22 26 27 22 18 20 16 17 20 23 18 26 30 1 26 18 12 30 8 7 2 18 1 5 5 6 2 6 5 26 13 5 4 11 733 23 25 27 21 13 29 14 23 19 18 10 8 19 13 19 73 24 19 33 10 1 30 33 24 25 31 21 30 14 5 16 07 14 15 8 11 18 6 24 9 15 12 4 23 6 14 12 73 33 27 19 24 24 21 7 25 7 13 11 24 25 34 13 47 25 28 21 17 25 22 15 26 16 22 29 2 35 24 20 8 34 29 14 31 26 33 34 28 30 33 16 34 15 26 27 33 8 3 15 3 19 7 8 10 1 3 33 14 10 6 2 467 27 32 35 27 23 31 16 31 28 25 24 3 36 25 21 6 28 33 30 22 6 34 35 32 31 35 22 35 22 27 36 53 26 20 4 18 20 23 25 19 20 23 17 19 7 19 14 27 35 36 34 30 29 35 29 34 32 24 35 33 28 31 31 73 36 30 9 28 21 36 36 35 36 36 5 36 29 23 35 73 Table The averaged success rankings of the algorithms on the original datasets (best to worst, 1 to 36). RF AR BG RS Single Avg. M5P 67 13 47 40 07 95 REP 87 87 80 07 80 48 PLS 33 00 33 07 87 12 SLR 60 60 27 80 80 21 DS 67 47 53 00 40 61 NN 20 07 00 53 27 61 SVR 33 33 40 00 47 91 Avg. 67 21 4 70 10 Table The averaged success rankings of the algorithms on the dimensionally reduced datasets (best to worst, 1 to 36). RF AR BG RS Single Avg. M5P 73 20 87 27 80 77 REP 53 07 13 80 33 77 PLS 53 73 13 73 47 52 SLR 07 73 67 73 60 96 DS 13 40 93 07 53 21 NN 80 93 40 73 27 83 SVR 67 47 60 47 73 39 Avg. 35 22 82 40 25 AR-PLS BG-NN BG-REP BG-DS AR-SLR RF-M5P DS AR-SVR 5 10 15 20 25 30 35 Figure 1. Graphical representation of the Nemenyi test results of the compared methods with the ranks given in Table 6 (on original datasets). The numbers on the line represent the average ranks. Bold lines connect the algorithms that have no significant difference. BG-PLS AR-PLS RS-M5P BG-M5P AR-DS BG-SLR RF-SVR Zero0 5 10 15 20 25 30 35 Figure 2. Graphical representation of the Nemenyi test results of the compared methods with the ranks given in Table 7 (on dimensionally reduced datasets). - The best performed base algorithm is partial least squares (PLS). - Additive regression and bagging increase the performance of each base algorithm. Rotation forest increases the performances of REP, decision stump (DS), and nearest neighbor (NN). It decreases the performance of partial least squares. Random subspace (RS) generally increases performance. - The M5P, PLS, and SLR base algorithms had their best performances with additive regression. REP and the DS algorithm with rotation forest, and the SVR algorithm with random subspace, had their best performances. - Rotation forest and random subspace had their best performances with NN. Additive regression and bagging with PLS had their best performances. - According to the Nemenyi test, there is no statistical difference between the best algorithm (AR-PLS) and the algorithms having average ranks below 20.76 ( = 6.00 + 14.76). For the experiments with the dimensionally reduced datasets (Tables 5 and 7): - The best performance (7.13) is obtained with the BG-PLS algorithm. - The best performing ensemble algorithm is rotation forest. - The best performing base algorithm is partial least squares. All of the ensemble algorithms generally increased the performance of each base algorithm. The exceptions are AR-PLS and RF-PLS. - The M5P and SVR base algorithms had their best performances with random subspace. The REP, SLR, DS, and NN algorithms with rotation forest, and PLS with bagging, achieved their best performances. - Rotation forest had its best performances with NN. Additive regression, random subspace, and bagging with PLS had their best performances. - According to the Nemenyi test, there is no statistical difference between the best algorithm (BG-PLS) and the algorithms having average ranks below 21.89 ( = 7.13 + 14.76). The average successes of the algorithms were investigated above. Next, the best performing algorithm will be investigated over each individual dataset. In Table 8, the dataset name, the error of the Zero Rule algorithm, and the error and the name of the best performing algorithm are shown for the original and dimensionally reduced datasets. The Zero Rule predicts a single value for all of the test samples. This value is the mean value of all of the training samples’ outputs. It only considers the outputs of the samples. It can be thought of as the default error of a dataset. Thus, the Zero Rule errors are the same for the original and dimensionally reduced datasets. Comparing the Zero Rule error and other algorithms errors shows whether the algorithms can decrease the default error. Table The best performing algorithms on the original and dimensionally reduced datasets. Dataset name Zero Rule With all of the features With the selected features Best performing RMSE Best performing RMSE error algorithm algorithm benzo 0.25 RF-M5P 0.21 RF-M5P 0.21 carbolenes 0.23 RF-DS 0.22 BG-PLS 0.15 chang 0.20 Zero0 0.20 BG-NN 0.18 cristalli 0.28 RS-NN 0.24 RS-PLS 0.18 depreux 0.20 RF-REP 0.20 RS-DS 0.16 mtp 0.18 AR-PLS 0.16 RF-NN 0.15 pah 0.20 AR-PLS 0.10 RF-PLS 0.10 pdgfr 0.23 RF-REP 0.20 RF-PLS 0.17 phen 0.27 AR-PLS 0.13 PLS 0.14 phenetyl 0.27 PLS 0.10 RF-PLS 0.06 qsbr y 0.27 BG-NN 0.25 RS-REP 0.26 qsfsr 0.27 AR-M5P 0.19 AR-M5P 0.17 selwood 0.30 RF-DS 0.25 RF-NN 0.21 strupcz 0.22 RF-NN 0.21 RF-DS 0.16 yokohoma 0.28 RF-DS 0.27 RF-PLS 0.20 When Table 8 is investigated, the following conclusions are reached: - The best performing algorithms are generally ensemble algorithms. This is in agreement with the average success of the algorithms. - The experiments with dimensionally reduced datasets have equal or better results than the original datasets, except for 2 datasets (phen, qsbr y). - The dimension reduction process changes the best performing algorithm, except for 2 datasets (benzo, qsfrs). The experiments with dimensionally reduced datasets were further investigated in detail. The results of the best 10 algorithms and the Zero Rule are compared using the paired t-test [24]. In Table 9, the wins and significant wins are shown between each pair of these 11 algorithms. The results are given in X(Y) form, which means that the algorithm in the corresponding row has better results at X datasets out of 15 than the algorithm in the corresponding column. The number in brackets (Y) represents the number of significant wins for the row with regard to the column. A 0 means that the scheme in the corresponding column did not score a single (significant) win with regard to the scheme in the row. For example, the RF-PLS algorithm has a better result than the Zero Rule for 10 datasets, and the differences for 5 out of 10 datasets are significant. Table The significant differences of the algorithms’ performances. RF-PLS RF-DS RF-NN AR-PLS BG-PLS BG-NN RS-M5P RS-PLS RS-NN PLS ZeroR RF-PLS 9(2) 7(0) 7(0) 4(0) 7(0) 9(0) 5(0) 8(0) 2(0) 10(5) RF-DS 6(0) 4(0) 5(0) 3(0) 5(0) 7(0) 4(0) 5(0) 4(0) 13(6) RF-NN 8(0) 11(0) 5(0) 3(0) 7(0) 8(0) 3(0) 9(0) 4(0) 14(5) AR-PLS 8(0) 10(4) 10(0) 5(0) 9(0) 12(0) 3(0) 10(1) 4(0) 12(6) BG-PLS 11(0) 12(3) 12(0) 10(0) 11(0) 12(0) 4(0) 12(0) 7(0) 14(7) BG-NN 8(0) 10(1) 8(0) 6(0) 4(0) 8(0) 4(0) 8(0) 5(0) 15(7) RS-M5P 6(0) 8(2) 7(0) 3(0) 3(0) 7(0) 2(0) 7(0) 2(0) 11(5) RS-PLS 10(0) 11(3) 12(0) 12(0) 11(0) 11(0) 13(0) 13(0) 9(0) 13(7) RS-NN 7(0) 10(0) 6(0) 5(0) 3(0) 7(0) 8(0) 2(0) 4(0) 15(6) PLS 13(0) 11(2) 11(1) 11(0) 8(0) 10(0) 13(0) 6(0) 11(0) 13(7) ZeroR 5(0) 2(0) 1(0) 3(0) 1(0) 0(0) 4(0) 2(0) 0(0) 2(0) When Table 9 is investigated, the following conclusions are reached: - The BG-PLS, BG-NN, RS-PLS, and PLS algorithms are the most significantly winning algorithms over the Zero Rule (at 7 datasets). - The RF-PLS, AR-PLS, BG-PLS, BG-NN, RS-M5P, RS-PLS, and PLS algorithms have no significant losses. - The AR-PLS algorithm has the biggest significant winning number (11). In Figures 3 and 4, the hierarchical clusters of the algorithms and datasets are given, respectively. The closeness of the connection point of the clusters to the left side directly represents the similarity of the algorithms/datasets. When the algorithms are clustered, the algorithms are represented by points having 15 (the number of datasets) features (dimensions). When the datasets are clustered, the datasets are represented by points having 36 (the number of algorithms) features (dimensions). According to Figure 3, the following conclusions are reached: - In both figures, the ensemble–algorithm pairs are generally clustered with their base single algorithms. - The feature selection process does not affect the similarities of the algorithms dramatically. According to Figure 4, the following conclusions are reached: - On the left side of Figure 4, the datasets having 1142 features are generally clustered together. - On the right side of Figure 4, there is no obvious pattern between the clusters and the number of features/samples. Previous works
  • The selected previous studies in this area for both classification and regression are shown comparatively in Table 10. It is observed that a larger number of datasets was used in the classification problems. However, the number of chemical/drug design datasets used is not sufficient to reach general conclusions. 0.5 1 5 2 0.5 1 Figure 3. The hierarchical clusters of the algorithms according to their RMSE values on the original (left) and dimensionally reduced (right) 15 datasets. Compared methods n the study Datasets Results PLS, BG w h PLS, PLS ensemble w h and w hout no e Dec s on tree, BG, boos ng, random forest, SVR 8 chem cal class ca on-type datasets SVR and random forest are better than the other alg ms. One base learner (C4.5). boost ng, RS, random trees, BG, random forest Table 11. The questions and their answers obtained with the experimental studies on drug datasets. Question Answer (based on our drug design experiments) Do the ensemble algorithms generate more successful results than a single algorithm? Generally, yes. How are the most successful ensemble algorithms ranked? Success ranking in original datasets: AR > BG > RF > RSs > single. In dimensionally reduced datasets: RF > RSs > BG > AR > single. What is the base algorithm–ensemble pair having the best results? In original datasets: AR with PLS. In dimensionally reduced datasets: BG with PLS. Which ensemble algorithm works well with which base algorithms? Which base algorithm works well with which ensemble algorithms? References G. Brown, J.L. Wyatt, P. Tino, “Managing diversity in regression ensembles”, Journal of Machine Learning Research, Vol. 6, pp. 1621–1650, 2005.
  • I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edition, San Francisco, Morgan Kaufmann, 2005.
  • T.G. Dietterich, “An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization”, Machine Learning, Vol. 40, pp. 139–157, 1998.
  • J.H. Friedman, “Greedy function approximation: a gradient boosting machine”, Technical Report, Department of Statistics, Stanford University, 1999.
  • T.K. Ho, “The random subspace method for constructing decision forests”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, pp. 832–844, 1998.
  • J.J. Rodr´ıguez, L.I. Kuncheva, C.J. Alonso, “Rotation forest: a new classifier ensemble method”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28, pp. 1619–1630, 2006.
  • E. Frank, Y. Wang, S. Inglis, G. Holmes, I.H. Witten, “Using model trees for classification”, Machine Learning, Vol. 32, pp. 63–76, 1998.
  • B.H. Mevik, V.H. Segtnan, T. Næs, “Ensemble methods and partial least squares regression”, Journal of Chemometrics, Vol. 18, pp. 498–507, 2004.
  • S.K. Shevade, S.S. Keerthi, C. Bhattacharyya, K.R.K. Murthy, “Improvements to the SMO algorithm for SVM regression”, IEEE Transactions on Neural Networks, Vol. 11, pp. 1188–1193, 2000.
  • M.A. Hall, “Correlation-based feature selection for machine learning”, PhD, Department of Computer Science, University of Waikato, 1998.
  • ADRIANA.Code, Molecular Networks, Germany; www.mol-net.de. WEKA Collections of Datasets; www.cs.waikato.ac.nz/ml/weka/index datasets.html. M. Karthikeyan, R.C. Glen, A. Bender, “General melting point prediction based on a diverse compound dataset and artificial neural networks”, Journal of Chemical Information and Modeling, Vol. 45, pp. 581–590, 2005.
  • B.D. Silverman, E. Daniel., J. Platt, “Comparative molecular moment analysis (CoMMA): 3D-QSAR without molecular superposition”, Journal of Medicinal Chemistry, Vol. 39, pp. 2129–2140, 1996.
  • D.E. Patterson, R.D. Cramer, A.M. Ferguson, R.D. Clark, L.W. Weinberger, “Neighborhood behavior: a useful concept for validation of molecular diversity descriptors”, Journal of Medicinal Chemistry, Vol. 39, pp. 3049–3059, 19
  • R. Todeschini, P. Gramatica, E. Marengo, R. Provenzani, “Weighted holistic invariant molecular descriptors”, Chemometrics and Intelligent Laboratory Systems, Vol. 27, pp. 221–229, 1995.
  • R. Guha, P. Jurs, “The development of linear, ensemble and non-linear models for the prediction and interpretation of the biological activity of a set of PDGFR inhibitors”, Journal of Chemical Information and Computer Sciences, Vol. 44, pp. 2179–2189, 2004.
  • A. Cammarata, “Interrelationship of the regression models used for structure-activity analyses”, Journal of Medicinal Chemistry, Vol. 15, pp. 573–577, 1972.
  • H. Kubinyi, QSAR: Hansch Analysis and Related Approaches, New York, VCH Publishers/Weinheim, VCH Verlagsgesellschaft, pp. 57–68, 1993.
  • J. Damborsk´ y, K. Manova, M. Kuty, Biodegradability Prediction, Dordrecht, Kluwer Academic Publishers, pp. 75–92, 1996.
  • J. Damborsk´ y, “Quantitative structure-function and structure-stability relationships of purposely modified proteins”, Protein Engineering, Vol. 11, pp. 21–30, 1998.
  • D.L. Selwood, D.J. Livingstone, J.C. Comley, A.B. O’Dowd, A.T. Hudson, P. Jackson, K.S. Jandu, V.S. Rose, J.N. Stables, “Structure-activity relationships of antifilarial antimycin analogues: a multivariate pattern recognition study”, Journal of Medicinal Chemistry, Vol. 33, pp. 136–142, 1990.
  • J. Demsar, “Statistical comparisons of classifiers over multiple data sets”, Journal of Machine Learning Research, Vol. 7, pp. 1–30, 2006.
  • D.W. Zimmerman, “A note on interpretation of the paired-samples t test”, Journal of Educational and Behavioral Statistics, Vol. 22, pp. 349–360, 1997.
  • H. Shinzawa, J.H. Jiang, P. Ritthiruangdej, Y. Ozaki, “Investigations of bagged kernel partial least squares (KPLS) and boosting KPLS with applications to near-infrared (NIR) spectra”, Journal of Chemometrics, Vol. 20, pp. 436–444, 2006.
  • V. Svetnik, T. Wang, C. Tong, A. Liaw, R.P. Sheridan, Q. Song, “Boosting: an ensemble learning tool for compound classification and QSAR modeling”, Journal of Chemical Information and Modeling, Vol. 45, pp. 786–799, 2005.
  • C. Merkwirth, H. Mauser, T. Schulz-Gasch, O. Roche, M. Stahl, T. Lengauer, “Ensemble methods for classification in cheminformatics”, Journal of Chemical Information and Computer Sciences, Vol. 44, pp. 1971–1978, 2004.
  • D.K. Agrafiotis, W. Cede˜ no, V.S. Lobanov, “On the use of neural network ensembles in QSAR and QSPR”, Journal of Chemical Information and Computer Sciences, Vol. 42, pp. 903–911, 2002.
  • C.L. Bruce, J.L. Melville, S.D. Pickett, J.D. Hirst, “Contemporary QSAR classifiers compared”, Journal of Chemical Information and Modeling, Vol. 47, pp. 219–227, 2007.
  • R.E. Banfield, L.O. Hall, K.W. Bowyer, D. Bhadoria, W.P. Kegelmeyer, S. Eschrich, “A comparison of ensemble creation techniques”, The 5th International Conference on Multiple Classifier Systems, pp. 223–232, 2004.
  • R.E. Banfield, L.O. Hall, K.W. Bowyer, W.P. Kegelmeyer, “A comparison of decision tree ensemble creation techniques”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, pp. 173–180, 2007.