Normalizasyon Yöntemlerinin RNA- Seq Verileri Üzerinde Çıkarılan Gen Birlikte İfade Edilme Ağlarının Performansına Etkisi

Protein sentezi sürecinde meydana gelen farklılaşmaların metabolik hastalıklar, kanser gibi kompleks hastalıklara neden olduğu farklı çalışmalarda belirtilmiştir. Protein sentezindeki değişimlerin anlaşılması için proteinleri oluşturan genlerin belirlenmesi ve bu genlerin diğer genlerle ilişkilerin ortaya çıkarılması gerekmektedir. Yeni nesil dizileme teknikleriyle hastalıklara neden olan moleküler düzeyde ilişkilerin doğruluklu olarak belirlenmesi kolaylaşmıştır. Gen birlikte ifade edilme (GBİE) ağları düzenleyen-düzenleyici ilişkisi içermeden benzer biyolojik süreçlere katılan genler arasındaki ilişkileri araştırmacılara göstermektedir. Çalışmamızda RNA-Seq verileri kullanılarak prostat kanseriyle ilişkili GBİE ağları elde edilmiştir. RNA- Seq verileri farklı nükleotit uzunluğundaki genlerden ve farklı sayıda okumalar içeren örneklerden oluştuğu için normalizasyon teknikleri moleküler ilişki çıkarımında önem taşımaktadır. Çalışmamızda gen birlikte ifade edilme ağları ham veri ve farklı iki normalizasyon yaklaşımı olan M- Değerinin Kırpılmış Ortalaması (MDKO), Göreceli Log İfadesi (GLİ) hesaplamalarıyla ayrı ayrı oluşturulmuş veriler üzerinde çıkartılarak örtüşme analizi ve topolojik performans değerlendirilmesi yapılmıştır. Örtüşme analizine göre normalize edilmiş RNA- Seq verileri kullanarak elde edilmiş gen birlikte ifade edilme ağlarının ham verilere göre daha fazla literatürde bulunan ilişkileri tahmin ettiği gözlemlenmiştir. İki normalizasyon yöntemiyle elde edilen GBİE'lere ait örtüşme analizi performans metrikleri değerleri ise birbirlerine yakın çıkmıştır. Topolojik değerlendirme sonuçlara göre normalize edilmiş veriler üzerinde elde edilen GBİE ağlarının ölçeksiz ağ tanımına daha yakın olduğu gözlemlenmiştir. Çalışmamızda aynı zamanda ham ve normalize edilmiş veriler üzerinde GBİE ağ çıkarım algoritmaları olan C3NET, ARACNE ve WGCNA yaklaşımlarının performansları da karşılaştırılmıştır.

Effect of Normalization Methods on the Performance of Gene Co-expression Networks Inferred on RNA-Seq Data

Different studies prove that differentiation on protein synthesis causes different metabolic disorders such as cancer and diabetics. The inference of disease related genes and to derive their interactions enable us to understand the differentiation on protein synthesis. Next generation sequencing techniques can reveal relations of diseases more precisely at molecular level. Gene co-expression networks can reveal interactions between genes without regulator-regulatee information. We utilized RNA- Seq data to infer gene co-expression networks of prostate cancer in our study. RNA- Seq data consists of genes whose nucleotide size may be different from sample to sample. Sample sizes of RNA- Seq data also vary for each samples. RNA- Seq data normalization is an important task to infer robust and reliable gene co-expression networks. We utilized normalized RNA- Seq data that are obtained using two different normalization methods ,which are Trimmed Mean of M- values (TMM) and Relative Log Expression (RLE), and raw RNA- Seq data to infer gene co-expression networks for the performance comparison. We applied overlap and topological analyses to evalaute the performance of raw data based GCN, normalized data based GCNs in our study. Normalization on RNA-Seq data leads to predict more validated gene- gene relations, which are evaluated in overlap analysis, than gene- gene relations on raw dataset. Two normalization methods based gene co-expression networks present similar performance results in overlap analysis. GCNs that are derived from TMM and RLE normalizations resemble scale free topology more than raw based GCN in topological assessment. We also compare the performance results of gene network inference algorithms, which are C3NET, ARACNE and WGCNA, on raw and normalized datasets. 

___

  • [1] A. Korotkov, J. D. Mills, J.A. Gorter, E.A. Van Vliet, E. Aronica, “Systematic review and meta-analysis of differentially expressed miRNAs in experimental and human temporal lobe epilepsy,” Scientific Reports, vol. 7, no. 1, pp. 1-13, 2017.
  • [2] S. M. Salleh, G. Mazzoni, P. Løvendahl, H.N. Kadarmideen, “Gene co-expression networks from RNA sequencing of dairy cattle identifies genes and pathways affecting feed efficiency,” BMC Bioinformatics, vol. 19, no. 1, pp. 513, 2018.
  • [3] Y. Hu et al., “Improving the diversity of captured full-length isoforms using a normalized single-molecule RNA-sequencing method,” Communications Biology, vol. 3, no. 1, pp. 1-15, 2020.
  • [4] F. Ozsolak, P. M. Milos, “RNA sequencing: advances, challenges and opportunities,” Nature Reviews Genetics, vol. 12, no. 2, pp. 87-98, 2011.
  • [5] M. Garber, M. G. Grabherr, M. Guttman and C. Trapnell, “Computational methods for transcriptome annotation and quantification using RNA-seq,” Nature Methods, vol. 8, no. 6, pp. 469, 2011.
  • [6] M. Ö. Cingiz and B. Diri, “Two-tier combinatorial structure to integrate various gene co-expression networks of prostate cancer,” Gene, vol. 721, pp. 144102, 2019.
  • [7] C. F. Xu, C. H. Yu, Y. M. Li, “Regulation of hepatic microRNA expression in response to ischemic preconditioning following ischemia/reperfusion injury in mice,” OMICS A Journal of Integrative Biology, vol. 13, no. 8, pp. 513-520, 2009.
  • [8] S. Ballouz, W. Verleyen and J. Gillis, “Guidance for RNA-seq co-expression network construction and analysis: safety in numbers,” Bioinformatics, vol. 31, no. 13, pp. 2123-2130, 2015.
  • [9] R. de Matos Simoes, S. Dalleau, K. E. Williamson and F. Emmert-Streib, “Urothelial cancer gene regulatory networks inferred from large-scale RNAseq, Bead and Oligo gene expression data,” BMC Systems Biology, vol. 9, no. 1, pp. 21, 2015.
  • [10] O. D. Iancu, S. Kawane, D. Bottomly, R. Searles, R. Hitzemann and S. McWeeney, “Utilizing RNA-Seq data for de novo coexpression network inference,” Bioinformatics, vol. 28, no. 12, pp. 1592-1597, 2012.
  • [11] F. Abbas-Aghababazadeh, Q. Li, B. L. Fridley, “Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing,” PloS One, vol. 13, no. 10, pp. e0206312, 2018.
  • [12] M. Smid et al., “Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisons,” BMC Bioinformatics, vol. 19, no. 1, pp. 1-13, 2018.
  • [13] M. D. Robinson and A. Oshlack, “A scaling normalization method for differential expression analysis of RNA-seq data,” Genome Biology, vol. 11, no. 3, pp. R25, 2010.
  • [14] Y. Lin et al., “Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster,” BMC Genomics, vol. 17, no. 1, pp. 1-20, 2016.
  • [15] M. A. Dillies et al., “A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis,” Briefings in Bioinformatics, vol. 14, no. 6, pp. 671-683, 2013.
  • [16] S. Mandelboum, Z. Manber, O. Elroy-Stein, R. Elkon, “Recurrent functional misinterpretation of RNA-seq data caused by sample-specific gene length bias,” PLoS Biology, vol. 17, no. 11, pp. e3000481, 2019.
  • [17] F. Seyednasrollah, A. Laiho, L. L. Elo, “Comparison of software packages for detecting differential expression in RNA-seq studies,” Briefings in Bioinformatics, vol. 16, no. 1, pp. 59-70, 2015.
  • [18] D. Risso, J. Ngai, T. P. Speed, S. Dudoit, “Normalization of RNA-seq data using factor analysis of control genes or samples,” Nature Biotechnology, vol. 32, no. 9, pp. 896-902, 2014.
  • [19] B. Bhat, M. Yaseen, A. Singh, S. M. Ahmad, N. A. Ganai, “Identification of potential key genes and pathways associated with the Pashmina fiber initiation using RNA-Seq and integrated bioinformatics analysis,” Scientific Reports, vol. 11, no. 1, pp. 1-9, 2021.
  • [20] J. J. Velazquez et. al., “Gene regulatory network analysis and engineering directs development and vascularization of multilineage human liver organoids,” Cell Systems, vol. 12, no. 1, pp. 41-55, 2020.
  • [21] J. N. Weinstein, E. A. Collisson, G. B. Mills, K. R. M. Shaw, B. A. Ozenberger and K. Ellrott, “The cancer genome atlas pan-cancer analysis project,” Nature Genetics, vol. 45, no. 10, pp. 1113, 2013.
  • [22] P. E. Meyer, K. Kontos, F. Lafitte and G. Bontempi, “Information-theoretic inference of large transcriptional regulatory networks,” EURASIP Journal on Bioinformatics and Systems Biology, vol. 2007, pp. 1-9, 2007.
  • [23] G. Altay and F. Emmert-Streib, “Inferring the conservative causal core of gene regulatory networks,” BMC Systems Biology, vol. 4, no. 1, pp. 1-13, 2010.
  • [24] M. Ö. Cingiz, B. Diri, “Topological and biological assessment of gene networks using miRNA-target gene data,”in Innovations in Intelligent Systems and Applications Conference, Turkey, 2019, pp. 1-4.
  • [25] A. A. Margolin et al., “ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context,” BMC Bioinformatics, vol. 7, no. 1, pp. S7, 2006.
  • [26] P. E. Meyer, F. Lafitte and G. Bontempi, “minet: AR/Bioconductor package for inferring large transcriptional networks using mutual information,” BMC Bioinformatics, vol. 9, no. 1, pp. 461, 2008.
  • [27] P. Langfelder and S. Horvath, “WGCNA: an R package for weighted correlation network analysis,” BMC Bioinformatics, vol. 9, no. 1, pp. 559, 2008.
  • [28] M. D. Robinson, D. J. McCarthy and G. K. Smyth, “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data,” Bioinformatics, vol. 26, no. 1, pp. 139-140, 2010.
  • [29] M. I. Love, W. Huber and S. Anders, “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2,” Genome Biology, vol. 15, no. 12, pp. 550, 2014.
  • [30] G. Altay, N. Altay and D. Neal, “Global assessment of network inference algorithms based on available literature of gene/protein interactions,” Turkish Journal of Biology, vol. 37, no. 5, pp. 547-555, 2013.
  • [31] M. Ö. Cingiz, G. Biricik and B. Diri, “ARNetMiT R Package: association rules based gene co-expression networks of miRNA targets,” Cellular and Molecular Biology (Noisy-le-grand), vol. 63, no. 3, pp. 18-25, 2017.
  • [32] F. Chung, L. Lu, “Connected components in random graphs with given expected degree sequences,” Annals of Combinatorics, vol. 6, no.2, pp. 125-145, 2002.
Düzce Üniversitesi Bilim ve Teknoloji Dergisi-Cover
  • Yayın Aralığı: Yılda 4 Sayı
  • Başlangıç: 2013
  • Yayıncı: Düzce Üniversitesi Fen Bilimleri Enstitüsü