ReferenceBundel¶
Constructing a custom reference metadata bundle
see http://www.broadinstitute.org/software/genomestrip/node_ReferenceMetadata.html
- reference.fasta : the genome file in fasta format (and associated .fai index file)
provided by the user - reference.fasta.length : ...
computed using fastalength - reference.ploidymap : expected ploidy of the different chromosomal regions
provided by the user - reference.gendermask.bed : regions with unreliable read depth on human sex chromosomes (eg. pseudoautosomal regions)
provided by the user - reference.svmask.fasta : genome alignability mask (in indexed fasta format) that is used to deterimine whether reference sequence should be considered to be uniquely alignable
computed using ComputeGenomeMask from gatk :java -cp ${classpath} -Xmx4g org.broadinstitute.sv.apps.ComputeGenomeMask -R ${reference} -O svmask_${chr}.fasta -readLength ${readLength} -sequence ${chr}
putting everything back into a single file and indexing (see also README_GenomeSTRIPreferenceBundle) - reference.rdmask.bed : bed file specifying a set of regions to use for overall sequencing depth estimation
keeping only autosomes - reference.gcmask.fasta : genome mask (in indexed fasta format) that is used for gc-bias estimation in each sequencing library
removing sex chromosomes and repeats (see README_GenomeSTRIPreferenceBundle) - reference.lcmask.fasta : genome mask (in indexed fasta format) that masks low-complexity sequence and is used in read-depth analysis
removing simple repeats (see README_GenomeSTRIPreferenceBundle) - reference.dict a dictionnary for the refrence genome