Available tools

All the programs are available on the genotoul plateform

  • BreakDancer http://breakdancer.sourceforge.net
    • supported SV : DEL, INV
    • No reference genome needed
    • Works in multiple sample mode
    • Necessary insert sizes are computed by the script bam2cfg.pl
    • Parallelization can be obtained at the chromosome level using -o
  • CNVnator https://github.com/abyzovlab/CNVnator
    • supported SV : DEL, DUP
    • No specific index file for the reference genome
    • Works in multiple sample mode
    • a list of chromosomes must be provided
    • chromosomes are specified using the -chrom option
    • No multi-threading mode (mentionned)
    • Need to Select a bin size, see CNVnatorbinsize
  • Delly https://github.com/tobiasrausch/delly
    • supported SV : DEL, INV, DUP
    • No specific index files for the reference genome
    • Works on different SV type independantly
    • Works in multiple sample mode
    • Works in multi-threading mode by setting OMP_NUM_THREADS (#threads <= #samples)
    • Single chromosome run is not implemented
    • uses paired-ends and split-reads to detect SV
  • Lumpy https://github.com/arq5x/lumpy-sv
    • supported SV : DEL, INV, DUP, BND
    • No reference genome needed
    • Works in multiple sample mode
    • The bam file needs to be split in 3 bam files : full, discordants and splitters
    • Single chromosome run is not implemented
    • No multi-threading mode (mentionned)
    • svtyper is used to genotype the individuals
  • Pindel http://gmt.genome.wustl.edu/packages/pindel
    • supported SV : DEL, INV, DUP, INS
    • No specific index file for the reference genome
    • Works in multiple sample mode
    • Estimated insert size has to be given in a config file
    • Single chromosome run is implemented using the -c option
    • Works in multi-threading mode -T option

Encoding structural variants in vcf

see
http://www.internationalgenome.org/wiki/Analysis/Variant%20Call%20Format/VCF%20(Variant%20Call%20Format)%20version%204.0/encoding-structural-variants/

The pipeline software

  1. Preliminary step for genomeSTRIP
    ReferenceBundel
  2. The software

The process is next divided in 6 different steps

  1. prepare
    compute library parameters and RD statistics
    • cnvnatorStats for read depth statistics
    • breakdancerStats for insert size statistics
    • bamstats for insert size statistics
    • results in the stats dir, on directoty for each animal
    • TODO : a summary table with the statistics for each individual
  2. preprocess
    preprocess the data for genomeSTRIP, see
    http://software.broadinstitute.org/software/genomestrip/org_broadinstitute_sv_qscript_SVPreprocess.html
    • results in genomestrip/metada dir
  3. run
    Run the SV detection pipeline for a list of software
    • results in each tool specific dir
  4. parse
    Parse the SV tool output files and write corresponding vcf files
    • Parse is donne for a given sv type (DEL for the moment)
    • TODO: add parse for DUP and INV
  5. merge
    Merge the SV for
  6. genotype
    Genotype a set of sv given in a vcf file
    • genotyping is donne with genomestrip
    • limited to DEL for the moment
  7. filter
    Select a set of SV based on genotype qualities, redundancy, variation status,....

Good practice for filtering SV calls :
SVFiltering

The jflow pipeline
JflowPipeline

The simulations
JflowSimulations