ReferenceBundel

Constructing a custom reference metadata bundle
see http://www.broadinstitute.org/software/genomestrip/node_ReferenceMetadata.html

  1. reference.fasta : the genome file in fasta format (and associated .fai index file)
    provided by the user
  2. reference.fasta.length : ...
    computed using fastalength
  3. reference.ploidymap : expected ploidy of the different chromosomal regions
    provided by the user
  4. reference.gendermask.bed : regions with unreliable read depth on human sex chromosomes (eg. pseudoautosomal regions)
    provided by the user
  5. reference.svmask.fasta : genome alignability mask (in indexed fasta format) that is used to deterimine whether reference sequence should be considered to be uniquely alignable
    computed using ComputeGenomeMask from gatk :
    java -cp ${classpath} -Xmx4g org.broadinstitute.sv.apps.ComputeGenomeMask
        -R ${reference} -O svmask_${chr}.fasta -readLength ${readLength}  
        -sequence ${chr}
    

    putting everything back into a single file and indexing (see also README_GenomeSTRIPreferenceBundle)
  6. reference.rdmask.bed : bed file specifying a set of regions to use for overall sequencing depth estimation
    keeping only autosomes
  7. reference.gcmask.fasta : genome mask (in indexed fasta format) that is used for gc-bias estimation in each sequencing library
    removing sex chromosomes and repeats (see README_GenomeSTRIPreferenceBundle)
  8. reference.lcmask.fasta : genome mask (in indexed fasta format) that masks low-complexity sequence and is used in read-depth analysis
    removing simple repeats (see README_GenomeSTRIPreferenceBundle)
  9. reference.dict a dictionnary for the refrence genome