Available tools¶
All the programs are available on the genotoul plateform
- BreakDancer http://breakdancer.sourceforge.net
- supported SV : DEL, INV
- No reference genome needed
- Works in multiple sample mode
- Necessary insert sizes are computed by the script bam2cfg.pl
- Parallelization can be obtained at the chromosome level using -o
- CNVnator https://github.com/abyzovlab/CNVnator
- supported SV : DEL, DUP
- No specific index file for the reference genome
- Works in multiple sample mode
- a list of chromosomes must be provided
- chromosomes are specified using the -chrom option
- No multi-threading mode (mentionned)
- Need to Select a bin size, see CNVnatorbinsize
- Delly https://github.com/tobiasrausch/delly
- supported SV : DEL, INV, DUP
- No specific index files for the reference genome
- Works on different SV type independantly
- Works in multiple sample mode
- Works in multi-threading mode by setting OMP_NUM_THREADS (#threads <= #samples)
- Single chromosome run is not implemented
- uses paired-ends and split-reads to detect SV
- Lumpy https://github.com/arq5x/lumpy-sv
- supported SV : DEL, INV, DUP, BND
- No reference genome needed
- Works in multiple sample mode
- The bam file needs to be split in 3 bam files : full, discordants and splitters
- Single chromosome run is not implemented
- No multi-threading mode (mentionned)
- svtyper is used to genotype the individuals
- Pindel http://gmt.genome.wustl.edu/packages/pindel
- supported SV : DEL, INV, DUP, INS
- No specific index file for the reference genome
- Works in multiple sample mode
- Estimated insert size has to be given in a config file
- Single chromosome run is implemented using the -c option
- Works in multi-threading mode -T option
- GenomeStrip http://software.broadinstitute.org/software/genomestrip
- supported SV : DEL, multi-copy CNV
- A reference bundle has to be created
Encoding structural variants in vcf¶
The pipeline software¶
- Preliminary step for genomeSTRIP
ReferenceBundel - The software
- Delly https://github.com/tobiasrausch/delly
- Lumpy https://github.com/arq5x/lumpy-sv
- Pindel http://gmt.genome.wustl.edu/packages/pindel
- GenomeStrip http://software.broadinstitute.org/software/genomestrip
The process is next divided in 6 different steps
- prepare
compute library parameters and RD statistics- cnvnatorStats for read depth statistics
- breakdancerStats for insert size statistics
- bamstats for insert size statistics
- results in the stats dir, on directoty for each animal
- TODO : a summary table with the statistics for each individual
- preprocess
preprocess the data for genomeSTRIP, see
http://software.broadinstitute.org/software/genomestrip/org_broadinstitute_sv_qscript_SVPreprocess.html- results in genomestrip/metada dir
- run
Run the SV detection pipeline for a list of software- results in each tool specific dir
- parse
Parse the SV tool output files and write corresponding vcf files- Parse is donne for a given sv type (DEL for the moment)
- TODO: add parse for DUP and INV
- merge
Merge the SV for - genotype
Genotype a set of sv given in a vcf file- genotyping is donne with genomestrip
- limited to DEL for the moment
- filter
Select a set of SV based on genotype qualities, redundancy, variation status,....
Good practice for filtering SV calls :
SVFiltering