Evaluation of genome scaffolding tools using pooled clone sequencing
DNA sequencing technologies hold great promise in generating information that will guide scientists to understand how
the genome affects human health and organismal evolution. The process of generating raw genome sequence data becomes cheaper
and faster, but more error-prone. Assembly of such data into high-quality finished genome sequences remains challenging. Many
genome assembly tools are available, but they differ in terms of their performance and their final output. More importantly, it remains
largely unclear how to best assess the quality of assembled genome sequences. Here we evaluate the accuracies of several genome
scaffolding algorithms using two different types of data generated from the genome of the same human individual: whole genome
shotgun sequencing (WGS) and pooled clone sequencing (PCS). We observe that it is possible to obtain better assemblies if PCS data
are used, compared to using only WGS data. However, the current scaffolding algorithms are developed only for WGS, and PCS-aware
scaffolding algorithms remain an open problem.
___
- Batzoglou S, Jaffe DB, Stanley K, Butler J, Gnerre S, Mauceli E, Berger
B, Mesirov JP, Lander, E (2002). ARACHNE: A whole-genome
shotgun assembler. Genome Res 12: 177-189.
- Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano, W
(2011). Scaffolding pre-assembled contigs using SSPACE.
Bioinformatics 27: 578-579.
- Chaisson M, Pevzner P, Tang H (2004). Fragment assembly with
short reads. Bioinformatics 20: 2067-2074.
- Donmez N, Brudno M (2013). SCARPA: scaffolding reads with
practical algorithms. Bioinformatics 29: 428-434.
- ENCODE Project Consortium (2012). An integrated encyclopedia
of DNA elements in the human genome. Nature 489: 57-74.
- Eslami Rasekh M, Chiatante G, Miroballo M, Tang J, Ventura
M, Amemiya CT, Eichler EE, Antonacci F, Alkan C (2017).
Discovery of large genomic inversions using long range
information. BMC Genomics 18: 65.
- Gao S, Sung WK, Nagarajan N (2011). Opera: reconstructing
optimal genomic scaffolds with high-throughput paired-end
sequences. J Comput Biol 18: 1681-1691.
- Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker
BJ, Sharpe T, Hall G, Shea TP, Sykes S et al (2011). High-quality
draft assemblies of mammalian genomes from massively
parallel sequence data. P Natl Acad Sci USA 108: 1513-1518.
- Hunt M, Newbold C, Berriman M, Otto TD (2014). A comprehensive
evaluation of assembly scaffolding tools. Genome Biology 15:
R42.
- International Human Genome Sequencing Consortium (2004).
Finishing the euchromatic sequence of the human genome.
Nature 431: 931-945.
- Kececioglu JD, Myers EW (1995). Combinatorial algorithms for
DNA sequence assembly. Algorithmica 13: 7.
- Kitzman JO, Mackenzie AP, Adey A, Hiatt JB, Patwardhan RP,
Sudmant PH, Ng SB, Alkan C, Qiu R, Eichler EE et al (2011).
Haplotype-resolved genome sequencing of a Gujarati Indian
individual. Nat Biotechnol 29: 59-63.
- Mardis ER (2008). The impact of next-generation sequencing
technology on genetics. Trends Genet 24: 133-141.
- Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben
LA, Berka J, Braverman MS, Chen YJ, Chen Z et al (2005).
Genome sequencing in microfabricated high-density picolitre
reactors. Nature 437: 376-380.
- Mostovoy Y, Levy-Sakin M, Lam J, Lam ET, Hastie AR, Marks P, Lee
J, Chu C, Lin C, Džakula Ž et al (2016). A hybrid approach for
de novo human genome sequence assembly and phasing. Nat
Methods 13: 587-590.
- Mullikin JC, Ning Z (2003). The phusion assembler. Genome Res 13:
81-90.
- Sahlin K, Vezzi F, Nystedt B, Lundeberg J, Arvestad L (2014). BESST–
efficient scaffolding of large fragmented assemblies. BMC
Bioinformatics 15: 281.
- Scally A, Dutheil JY, Hillier LW, Jordan GE, Goodhead I, Herrero,
J, Hobolth A, Lappalainen T, Mailund T, Marques-Bonet
T (2012). Insights into hominid evolution from the gorilla
genome sequence. Nature 483: 169-175.
- Shendure J, Ji H (2008). Next-generation DNA sequencing. Nat
Biotechnol 26: 1135-1145.
- Simpson JT, Durbin R (2012). Efficient de novo assembly of large
genomes using compressed data structures. Genome Res 22:
549-556.
- Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol I
(2009). ABySS: a parallel assembler for short read sequence
data. Genome Res 19: 1117-1123.
- Steinberg KM, Schneider VA, Alkan C, Montague MJ, Warren WC,
Church DM, Wilson RK (2017). Building and improving
reference genome assemblies. P IEEE 105: 422-435.
- Sutton GG, White O, Adams MD, Kerlavage AR (1995). TIGR
assembler: A new tool for assembling large shotgun sequencing
projects. Genome Science and Technology 1: 9-19.
- The 1000 Genomes Project Consortium (2015). A global reference
for human genetic variation. Nature 526: 68-74.
- Treangen TJ, Salzberg SL (2012). Repetitive DNA and next-generation
sequencing: computational challenges and solutions. Nat Rev
Genet 13: 36-46.
- Zerbino DR, Birney E (2008). Velvet: algorithms for de novo short
read assembly using de Bruijn graphs. Genome Res 18: 821-
829.
- Zimin A, Stevens KA, Crepeau MW, Holtz-Morris A, Koriabine M,
Marçais G, Puiu D, Roberts M, Wegrzyn JL, de Jong PJ et al
(2014). Sequencing and assembly of the 22-gb loblolly pine
genome. Genetics 196: 875-890.