Working with Proteins in silico: A Review of Online Available Tools for Basic Identification of Proteins

Increase in online available bioinformatics tools for protein research creates an important opportunity for scientists to reveal characteristics of the protein of interest by only starting from the predicted or known amino acid sequence without fully depending on experimental approaches. There are many sophisticated tools used for diverse purposes. However, there are not enough reviews covering the tips and tricks in selecting and using the correct tools as the literature mainly states the promotion of the new ones. In this review, with the aim of providing young scientists with no specific experience on protein work a reliable starting point for in silico analysis of the protein of interest, we summarized tools for annotation of proteins. Annotation has included identification of motifs and domains, determination isoelectric point, molecular weight, subcellular localization, and post-translational modifications by focusing on the important points to be considered while selecting from online available tools.

___

Apweiler R, Bairoch A, Wu HC, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O’Donovan C, Redaschi N, Yeh LSL. 2004. UniProt: the Universal Protein knowledgebase. Nucleic Acids Res, 32: D115-119.

Apweiler R, Hermjakob H, Sharon N. 1999. On the frequency of protein glycosylation, as deduced from analysis of the SWISSPROT database. Biochim. Biophys. Acta, Gen. Subj., 1473: 4–8.

Ashburner M, Ball AC, Blake AJ, Botstein D, Butler H, Cherry M, David PA, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. 2000.

Gene Ontology: tool for the unification of biology, The Gene Ontology Consortium. Nat. Genet., 25(1): 25-29.

Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS. 2009. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res., 37: 202-208.

Bailey TL, Williams N, Misleh C, Li WW. 2006. MEME: discovering and analyzing DNA and protein sequence motifs . Nucleic Acids Res., 34: 369-373.

Balla S, Thapar V, Verma S, Luong T, Faghri T, Huang CH, Rajasekaran S, del Campo JJ, Shinn JH, Mohler WA, Maciejewski MW, Gryk MR, Piccirillo B, Schiller SR, Schiller MR. 2006. Minimotif Miner: a tool for investigating protein function. Nat Methods. 3(3): 175-7.

Bendtsen JD, Jensen LJ, Blom N, von Heijne G, Brunak S. 2004. Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng. Des. Sel., 17(4): 349–56.

Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. 2000. The Protein Data Bank. Nucleic Acids Res, 28: 235-242.

Bhasin M, Garg A, Raghava GP. 2005. PSLpred: prediction of subcellular localization of bacterial proteins. Bioinformatics, 21: 2522–2524.

Binder JX, Pletscher-Frankild S, Tsafou K, Stolte C, O’Donoghue SI, Schneider R, Jensen LJ. 2014. COMPARTMENTS: unification and visualization of protein subcellular localization evidence. Database, 2014, 1-9.

Bjellqvist B, Basse B, Olsen E, Celis JE. 1994. Reference points for comparisons of two-dimensional maps of proteins from different human cell types defined in a pH scale where isoelectric points correlate with polypeptide compositions. Electrophoresis. 15: 529-539.

Bjellqvist B, Hughes G, JPasquali Ch, Paquet N, Ravier F, Sanchez JCh,., Frutiger S., Hochstrasser D.F. 1993. The focusing positions of polypeptides in immobilized pH gradients can be predicted from their amino acid sequences. Electrophoresis. 14: 1023-1031.

Blom N, Gammeltoft S, Brunak S. 1999. Sequence- and structurebased prediction of eukaryotic protein phosphorylation sites. J. Mol. Biol., 294(5): 1351-1362.

Chang WC, Lee TY, Shien DM, Hsu JBK, Horng JT, Hsu PC, Wang TY, Huang HD, Pan RL. 2009. Incorporating support vector machine for identifying protein tyrosine sulfation sites. J. Comput. Chem., 30(15): 2526-37.

Chou KC. 2011. Some remarks on protein attribute prediction and pseudo amino acid composition (50th Anniversary Year Review). Journal of Theorotical Biology, 273: 236-247.

Diella F, Cameron S, Gemünd C, Linding R, Via A, Kuster B, Sicheritz-Ponten T, Blom N, Gibson TJ. 2004. Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins. BMC Bioinformatics, 5(79): 1-5.

Dubey A, Chouhan U. 2011. Subcellular localization of proteins. Arch. Appl. Sci. Res., 3(6): 392-401.

Eisenhaber B, Wildpaner M, Schultz CJ, Borner GHH, Dupree P, Eisenhaber F. 2003. Glycosylphosphatidylinositol lipid anchoring of plant proteins. Sensitive prediction from sequenceand genome- wide studies for Arabidopsis and rice. Plant Physiol., 133(4): 1691-1701.

Emanuelsson O, Nielsen H, Brunak S, von Heijne G. 2000. Predicting subcellular localization of proteins based on their Nterminal amino acid sequence. J. Mol. Biol., 3000: 1015-1016.

Frankhauser N, Maser P. 2005. Identification of GPI anchor attachment signals and by a Kohonen self-organizing map. Bioinformatics, 21 (9): 1846-52.

Free RB, Hazelwood LA, Sibley DR. 2009. Identifying novel protein-protein interactions using co-immunoprecipitation and mass spectrometry. In: Free RB, Hazelwood LA, Sibley DR. Current Protocols in Neuroscience. Place of publication: Hoboken, New Jersey. Current Protocols Editorial Office, John Wiley and Sons, Inc. 28. 9780471142300

Garcia-Moreno B. 2009. Adaptations of proteins to cellular and subcellular pH. J. Biol., 8: 98.

Gasteiger E, Hoogland C, Gattiker A, Duvaud S, Wilkins MR, Appel RD, Bairoch A. 2005. Protein Identification and Analysis Tools on the ExPASy Server. In: Walker JM. The Proteomics Protocols Handbook. Place of publication: Hertfordshire. Humana Press. 571-607. 978-1-59259-890-8.

Geda P, Patury S, Ma J, Bharucha N, Dobry CJ, Lawson SK, Gestwicki JE, Kumar A. 2008. A small molecule- directed approach to control protein localization and function. Yeast, 25: 577-594.

Gish W, States DJ. 1993. Identification of protein coding regions by database similarity search. Nature Genet., 3: 266-272. Gnad F, Gunawardena J, Mann M. 2011. PHOSIDA 2011: the posttranslational modification database. Nucleic Acids Res., 39: D253-60.

Hamby SE, Hirst JD. 2008. Prediction of glycosylation sites using random forests. BMC Bioinformatics, 9(500): 1-13.

Henriksson G, Englund AK, Johansson G, Lundahl P. 1995. Calculation of the isoelectric points of native proteins with spreading of pKa values. Electrophoresis, 16(8): 1377-1380.

Hoogland C, Sanchez JC, Tonella L. 2000. The 1999 SWISS2DPAGE database update. Nucleic Acids Res., 28: 286–288.

Hornbeck PV, Kornhauser JM, Tkachev S, Zhang B, Skrzypek E, Murray B, Latham V, Sullivan, M. 2012. PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res., 40: 261-270.

Hu K, Ting AH, and Li, J. 2015. BSPAT: a fast online tool for DNA methylation co-occurrence pattern analysis based on highthroughput bisulfite sequencing data. BMC Bioinformatics, 16(1): 1.

Hua S, Sun Z. 2001. Support vector machine approach for protein sub-cellular localization prediction. Bioinformatics, 17: 721–728.

Huala E, Dickerman A, Garcia-Hernandez M, Weems D, Reiser L, LaFond F, Hanley D, Kiphart D, Zhuang J, Huang W, Mueller L, Bhattacharyya D, Bhaya D, Sobral B, Beavis B, Somerville C, Rhee SY. 2001. The Arabidopsis Information Resource (TAIR): A comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucleic Acids Res, 29 (1): 102-5.

Kiemer L, Bendtsen JD, Blom N. 2005. NetAcet: Prediction of Nterminal acetylation. Bioinformatics, 21(7): 1269-1270.

Kiraga J, Mackiewicz P, Mackiewicz D, Kowalczuk M, Biecek P, Polak N, Smolarczyk K, Dudek MR, Cebrat S. 2007. The relationship between the isoelectric point and: length of proteins, taxonomy and ecology of organisms. BMC Genomics, 8: 163.

Krishnakumar V, Kim M, Rosen BD, Karamycheva S, Bidwell SL, Tang H, Town CD. 2015. MTGD: The Medicago truncatula Genome Database. Plant Cell Pysiol., 56 (1): 1-9.

Liddy KA, White MY, Cordwell SJ. 2013. Function decorations: post-translational modifications and heart disease delineated by targeted proteomics. Genome Med., 5(2): 1-12.

Liu Zi, Xiao X, Qiu WR, Chou KC. 2015. iDNA- Methyl: Identifying DNA methylation sites via pseudo trinucleotide composition. Analytical Biochemistry, 474: 69-77.

Llopis J, McCaffery M, Miyawaki A, Farquhar MG, Tsien RY. 1998. Measurement of cytosolic, mitochondrial, and Golgi pH in single living cells with green fluorescent proteins. Proc. Natl. Acad. Sci., 95(12): 6803-6808.

Maurer-Stroh S, Eisenhaber F. 2005. Refinement and prediction of protein prenylation motifs. Genome Biology, 6: R55.

Maurer-Stroh S, Koranda M, Benetka W, Schneider G, Sirota FL, Eisenhaber F. 2007. Towards complete sets of farnesylated and geranylgeranylated proteins. PLoS Computational Biology, 3(4): e66.

Monigatti F, Gasteiger E, Bairoch A, Jung E. 2002. The Sulfinator: predictig tyrosine sulfation sites in protein sequences. Bioinformatics, 18: 769-770.

Nielsen H, Engelbrecht J, Brunak S, von Heijne G. 1997. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavege site. Protein Eng., 10: 1-6. Obenauer JC, Cantley LC, Yaffe MB. 2003. ScanSite 2.0: proteome- wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res., 31(13): 3635-3641.

Pierleoni A, Martelli PL, Casadio R. 2008. PredGPI: a GPI-anchor predictor. BMC Bioinformatics, 9: 1-11.

Polevoda B, Sherman F. 2000. N-alpha-terminal acetylation of eukaryotic proteins. J. Biol. Chem. 275: 36479- 36482.

Reinhardt A, Hubbard T. 1998. Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res., 26: 2230–2236.

Salzano AM, Crescenzi M. 2005. Mass spectrometry for protein identification and the study of post translational modifications. Ann. Ist. Super. Sanita, 41(4): 443-450.

Scott MS, Thomas DY, Hallett MT. 2004. Predicting Subcellular Localization via Protein Motif Co-occurence. Genome Res., 14: 1957-1966.

Sefton BM, Hunter T. 1998. Protein Phosphorylation. In: Abelson JN, Simon MI. Methods in Enzimology. 1st edition. Place of publication: San Diego. Academic Press. 978-0121821029.

Shi S, Chen X, Xu H, Qiu J. 2015. PredHydroxy: computational prediction of protein hydroxylation site locations based on the primary structure. Molecular BioSystems. 11: 819-825.

Small I, Peeters N, Legeai F, Lurin C. 2004. Predotar: A tool for rapidly screening proteomes for N-terminal targeting sequences. Proteomics, 4: 1581–1590.

The Potato Genome Sequencing Consortium (PGSC). 2011. Genome sequence and analysis of the tuber crop potato. Nature, 475: 189-195.

The UniProt Consortium. 2008. The Universal Protein Resource (UniProt). Nucleic Acids Res., 36: D190-D195.

Ubersax JA, Ferrell JE. 2007. Mechanism of spesifity in protein phosphorylation. Nat. Rev. Mol. Cell Biol., 8(7): 530-541.

Valdes-Mora F, Song JZ, Statham AL, Strbenac D, Robinson MD, Nair SS, Patterson KI, Tremethick DH, Stirzaker C, Clark SJ. 2012. Acetylation of H2A.Z is a key epigenetic modification associated with gene deregulation and epigenetic remodelling in cancer. Genome Res., 22(2): 307-321.

Varki A, Esko JD, Colley KJ. 2009. Cellular Organization of Glycosylation. In: Varki A, Esko JD, Colley KJ, Freeze HH, Stanley P, Bertozzi CR, Hart GW, Etzler ME. Essentials of Glycobiology. 2nd edition. Place of publication: La Jolla, California. Cold Spring Harbor Laboratory Press. 9780879697709.

Veres DV, Gyurko DM, Thaler B, Szalay KZ, Fazekas D, Korcsmaros T, Csermely P. 2014. ComPPI: a cellular compartment-specific database for protein-protein interaction network analysis. Nucleic Acids Res., 43: D485-D493.

Vuzman D, Hoffman Y, Levy Y. 2012. Modulating protein –DNA interactions by post-translational modifications at disordered regions. In: Altman RB, Dunker AK, Hunter L, Murray TA, Klein TE. Pacific Symposium on Biocomputing. Hawai, USA. 3-7 January 2012. 188-189.

Ware D, Jaiswal P, Ni J,Pan X, Chang K, Clark K, Teytelman L, Schmidt S, Zhao W, Cartinhour S, McCouch S, Stein L. 2002. Gramene: a resource for comparative grass genomics. Nucleic Acids Res., 30(1): 103-105.

Wilkins MR, Gasteiger E, Tonella L, Keli O, Tyler M, Sanchez JC, Gooley AA, Walsh BJ, Bairoch A, Appel RD, Williams KL, Hochstrasser DF. 1998a. Protein Identification with N and Cterminal Sequence Tags in Proteome Projects. J. Mol. Biol. 278(3): 599-608.

Wilkins MR, Gasteiger E, Wheeler C, Lindskog I, Sanchez JC, Bairoch A, Appel RD, Dunn MD, Hochstrasser D.F. 1998b. Multiple parameter cross-species protein identification using MultiIdent - a world-wide web accessible tool. Electrophoresis. 19(18): 3199-206.

Williams KR, Stone KL. 1995. Identifying sites of post-translational modifications in proteins via HPLC peptide mapping. In: Shirley BA. Methods in Molecular Biology- Protein Stability and Folding. Place of publication: New Haven. Humana Press. 157- 175. 978-1-59259-527-3.

Wilson CA, Kreychman J, Gerstein M. 2000. Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. J. Mol. Biol., 297: 233-249.

Xue Y, Chen H, Jin C, Sun Z, Yao X. 2006b. NBA-Palm: prediction of palmitoylation site implemented in Naive Bayes Algorithm. BMC Bioinformatics, 7(458): 1-10.

Xue Y, Li A, Wang L, Feng H, Yao X. 2006. PPSP:prediction of PK-spesific phosphorylation site with Bayesian decision theory. BMC Bioinformatics, 7(163): 1-12.

Xue Y, Zhou F, Fu C, Xu Y, Yao X. 2006a. SUMOsp: a web sever for sumoylation site prediction. Nucleic Acids Res., 34: 254-257.

Yu NY, Wagner JR, Laird MR, Melli G, Rey S, Lo R, Dao P, Sahinalp CS, Ester M, Foster LJ, Brinkman FSL. 2010. PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilitives for all prokaryotes. Bioinformatics, 26(13): 1608- 1615.

Zhou F, Xue Y, Yao X, Xu Y. 2006. CSS-Palm: palmitoylation site prediction with a clustering and scoring strategy (CSS). Bioinformatics, 22(7): 894-6.
Türk Tarım - Gıda Bilim ve Teknoloji dergisi-Cover
  • ISSN: 2148-127X
  • Yayın Aralığı: Aylık
  • Başlangıç: 2013
  • Yayıncı: Turkish Science and Technology Publishing (TURSTEP)