Taksonomik çeşitlilik tabanlı protein altünite etkileşim tahmini

Protein altünite-altünite etkileşimlerinin (AAE) belirlenmesi, proteinlerin fonksiyonel ve yapısal rollerinin anlaşılmasında önemli bir adımdır. MirrorTree, etkileşen proteinlerin birlikte-evrimi prensibine dayanan, bir AAE tahmin yöntemidir. Ancak bu yöntem, AAE tahmin etmek için karşılaştırılan iki protein homolog kümesindeki taksonomik çeşitliliğe ve evrimsel açıklığa duyarlıdır. Bu çalışmada Taksonomik Çeşitliliğe Dayalı Protein Altünite Etkileşimi Tahmini (TAXDIP) olarak adlandırılan MirrorTree tabanlı yeni bir protein AAE tahmin yöntemi önermekteyiz. TAXDIP, iki protein homolog kümesini karşılaştırmadan önce, bunlarda daha yüksek düzeydeki taksonomik sıraların (ör. Tür yerine Aile) temsil edilmesini destekleyen bir örnekleme adımı ekleyerek, protein homolog kümeleri içindeki evrimsel kapsamın artmasını sağlar. TAXDIP öncelikle deneysel olarak doğrulanmış 6.514 pozitif (etkileşimli) altünite çiftini ve aynı sayıda, bilinen etkileşimleri olmayan, rastgele oluşturulmuş negatif (etkileşmeyen) altünite çiftini içeren bir küme kullanılarak değerlendirildi. TAXDIP bu kümede %71,0 duyarlılık ve %63,0 özgüllük elde etti. Daha sonra, TAXDIP'in performansının ME ve RDFF adlı AAE tahmin yöntemiyle karşılaştırılması için, 500 etkileşimli ve 500 etkileşmeyen altünite çiftini içeren, bir kıyaslama kümesi kullanıldı. TAXDIP RDFF’den daha iyi duyarlılık ve özgüllük gösterdi. TAXDIP’in duyarlılığı ME’den daha iyi olsa da, özgüllüğü ME’nin altında kaldı. Sonuç olarak, TAXDIP göstermiş olduğu performansla mevcut tahmin yöntemlerine uygun bir alternatiftir. Ayrıca, TAXDIP’in diğer tahmin yöntemleriyle örtüşen ve dahası onları tamamlayan doğru AAE tahminleri, onu birçok yöntemi bir araya getiren bir meta-AAE tahmin yönteminin parçası olma konusunda güçlü bir konuma getirmektedir.

Taxonomic diversity-based domain interaction prediction

Identification of protein domain-domain interactions (DDIs) is an essential step in understanding proteins’ functional and structural roles. MirrorTree is a DDI prediction method that is based on the principle of interacting proteins’ co-evolution. However, this method is sensitive to taxonomic diversity and evolutionary span within the two protein homolog sets compared to predict DDI. In this work, we propose a new MirrorTree-based DDI prediction method, namely Taxonomic Diversity-based Domain Interaction Prediction (TAXDIP). TAXDIP improves the MirrorTree method by adding a sampling step that favors representation of higher-level taxonomic ranks (e.g. family over species) in two protein homolog sets prior to their comparison. This additional step ensures increased evolutionary span within protein homolog sets. TAXDIP is first assessed using a set containing 6,514 positive (interacting) domain pairs and a negative (non-interacting) set of equal size containing randomly generated domain pairs with no known interactions. TAXDIP achieved 71.0% sensitivity and 63.0% specificity on this set.  Next, a benchmark-set containing 500 interacting and 500 non-interacting domain pairs is used to compare the performance of TAXDIP against DDI prediction methods ME and RDFF.  TAXDIP showed better sensitivity and specificity than RDFF. While TAXDIP’s sensitivity is better than ME, its specificity remained below ME. In conclusion, TAXDIP, with its performance, is a viable alternative to existing prediction methods. Furthermore, given TAXDIP’s true predictions are overlapping with, and furthermore, complementing other DDI prediction methods, TAXDIP has a strong position in becoming part of a meta-DDI prediction method that combines multiple methods to build a consensus prediction.

___

  • Sprinzak E, Altuvia Y, Margalit H. “Characterization and prediction of protein-protein interactions within and between complexes”. Proceedings of the National Academy of Sciences of the United States of America, 103(40), 14718-14723, 2006.
  • Deng M, Mehta S, Sun F, Chen T. “Inferring domain-domain interactions from protein-protein interactions”. Genome Research, 12(10), 1540-8, 2002.
  • Jothi R, Cherukuri PF, Tasneem A, Przytycka TM. “Co-evolutionary analysis of domains in interacting proteins reveals insights into domain-domain interactions mediating protein-protein interactions”. Journal of Molecular Biology, 362(4), 861-875, 2006.
  • Gonzalez AJ, Liao L. “Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines”. BMC Bioinformatics, 11(537), 2-14, 2010.
  • Chen XW, Liu M. “Prediction of protein-protein interactions using random decision forest framework”. Bioinformatics, 21(24), 4394-4400, 2005.
  • Pagel P, Wong P, Frishman D. “A domain interaction map based on phylogenetic profiling”. Journal of Molecular Biology, 344(5), 1331-46, 2004.
  • Gomez SM, Rzhetsky A. “Towards the prediction of complete protein-protein interaction networks”. Pacific Symposium on Biocomputing Pacific Symposium on Biocomputing, Kauai, Hawaii, 3-7 January 2002.
  • Nye TM, Berzuini C, Gilks WR, Babu MM, Teichmann SA. “Statistical analysis of domains in interacting protein pairs”. Bioinformatics, 21(7), 993-1001, 2005.
  • Kim Y, Min B, Yi GS. “IDDI: Integrated domain-domain interaction and protein interaction analysis system”. Proteome Science, 10, 1-9, 2012.
  • Craig RA, Liao L. “Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices”. BMC Bioinformatics, 8(6), 1-12, 2007.
  • Gertz J, Elfond G, Shustrova A, Weisinger M, Pellegrini M, Cokus S, et al. “Inferring protein interactions from phylogenetic distance matrices”. Bioinformatics, 19(16), 2039-2045, 2003.
  • Goh CS, Bogan AA, Joachimiak M, Walther D, Cohen FE. “Co-evolution of proteins with their interaction partners”. Journal of Molecular Biology, 299(2), 283-293, 2000.
  • Goh CS, Cohen FE. “Co-evolutionary analysis reveals insights into protein-protein interactions”. Journal of Molecular Biology, 324(1), 177-92, 2002.
  • Jothi R, Kann MG, Przytycka TM. “Predicting protein-protein interaction by searching evolutionary tree automorphism space”. Bioinformatics, 21(1), 241-250, 2005.
  • Pazos F, Valencia A. “Similarity of phylogenetic trees as indicator of protein-protein interaction”. Protein Engineering, 14(9), 609-614, 2001.
  • Pazos F, Valencia A. “In silico two-hybrid system for the selection of physically interacting protein pairs”. Proteins, 47(2), 219-227, 2002.
  • Zhou H, Jakobsson E. “Predicting protein-protein interaction by the mirrortree method: possibilities and limitations”. PLoS One, 8(12), 81100, 2013.
  • Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, et al. “Database resources of the national center for biotechnology information”. Nucleic Acids Research, 37, 5-15, 2009.
  • Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, et al. “The Pfam protein families database: towards a more sustainable future”. Nucleic Acids Research, 44(1), 279-285, 2016.
  • Yellaboina S, Tasneem A, Zaykin DV, Raghavachari B, Jothi R. “DOMINE: a comprehensive collection of known and predicted domain-domain interactions”. Nucleic Acids Research, 39, 730-735, 2011.
  • Lee H, Deng M, Sun F, Chen T. “An integrated approach to the prediction of domain-domain interactions”. BMC Bioinformatics, 25(7), 1-15, 2006.
  • Ng SK, Zhang Z, Tan SH, Lin K. “InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes”. Nucleic Acids Research, 31(1), 251-4, 2003.
  • Riley R, Lee C, Sabatti C, Eisenberg D. “Inferring protein domain interactions from databases of interacting proteins”. Genome Biology, 6(10), 1-17, 2005.
  • Guimaraes KS, Jothi R, Zotenko E, Przytycka TM. “Predicting domain-domain interactions using a parsimony approach”. Genome Biology, 7(11), 1-14, 2006.
  • Guimaraes KS, Przytycka TM. “Interrogating domain-domain interactions with parsimony based approaches”. BMC Bioinformatics, 9(171), 1-14, 2008.
  • Zhao XM, Chen L, Aihara K. “A discriminative approach for identifying domain-domain interactions from protein-protein interactions”. Proteins, 78(5), 1243-53, 2010.
  • Liu M, Chen XW, Jothi R. “Knowledge-guided inference of domain-domain interactions from incomplete protein-protein interaction networks”. Bioinformatics, 25(19), 2492-9, 2009.
  • Wang H, Segal E, Ben-Hur A, Li QR, Vidal M, Koller D. “InSite: a computational method for identifying protein-protein interaction binding sites on a proteome-wide scale”. Genome Biology, 8(9), 1-18, 2007.
  • Singhal M, Resat H. “A domain-based approach to predict protein-protein interactions”. BMC Bioinformatics, 13(8), 1-19, 2007.
  • Finn RD, Miller BL, Clements J, Bateman A. “iPfam: a database of protein family and domain interactions found in the Protein Data Bank”. Nucleic Acids Research. 42(Database issue), 364-373, 2014.
  • Mosca R, Ceol A, Stein A, Olivella R, Aloy P. “3did: a catalog of domain-based interactions of known three-dimensional structure”. Nucleic Acids Research, 42(Database issue), 374-379, 2014.
  • Notredame C, Higgins DG, Heringa J. “T-Coffee: A novel method for fast and accurate multiple sequence alignment”. Journal of Molecular Biology, 302(1), 205-217, 2000.
  • Pearson K. “Note on Regression and Inheritance in the Case of Two Parents”. Proceedings of the Royal Society of London, 58, 240-242, 1895.
  • Matthews BW. “Comparison of the predicted and observed secondary structure of T4 phage lysozyme”. Biochimica et Biophysica Acta, 405(2), 442-451, 1975.
  • Fawcett T. “An introduction to ROC analysis”. Pattern Recognition Letters, 27(8), 861-874, 2006.
  • Bradley AP. “The use of the area under the ROC curve in the evaluation of machine learning algorithms”. Pattern Recognition. 30(7), 1145-1159, 1997.