- Presentation
- Installation
- Documentation
- Build the Parameter files and the input files from examples
- Support
Presentation¶
QTLMap is a software dedicated to the detection of QTL from experimental designs in outbred population. QTLMap software is developed by the Animal Genetics Division at INRA (French National Institute for Agronomical Research). The statistical techniques used are linkage analysis (LA) and linkage disequilibrium linkage analysis (LDLA) using interval mapping. Different versions of the LA are proposed from a quasi Maximum Likelihood approach to a fully linear (regression) model. The LDLA is a regression approach (Legarra and Fernando, 2009). The population may be sets of half-sib families or mixture of full- and half- sib families. The computations of Phase and Transmission probabilities are optimized to be rapid and as exact as possible. QTLMap is able to deal with large numbers of markers (SNP) and traits (eQTL).
The aim of QTLMap developers is to propose various genetic models depending on
- the number of QTL alleles segregating (biallelic in crosses between monomorphic breeds, biallelic without hypothesis on the origin, multiallelic, haplotype identity),
- the number of QTL segregating (one, two linked, several unlinked),
- the number of traits under the QTL influence. The trait determinism may vary depending on
- the trait distribution (gaussian trait, survival trait or threshold distribution),
- the interactions between the QTL and fixed effects or other loci,
- the residual variance structure (homo- or heteroskedasticity for half-sib families).
Due to differences with the asymptotical conditions from the chi2 theory, the test statistic significance are evaluated either through numerical approximations, or through empirical calculations obtained from permutations or simulations under the null hypothesis.
Up to now, the following functionnalities have been implemented :
- QTL detection in half-sib families or mixture of full- and half-sib families
- One or several linked QTL segregating in the population
- Single trait or multiple trait
- Nuisance parameters (e.g. sex, batch, weight...) and their interactions with QTL can be included in the analysis
- Gaussian, discrete or survival (Cox model) data
- Familial heterogeneity of variances (heteroscedasticity)
- Can handle eQTL analyses
- Computation of transmission and phase probabilities adapted to high throughput genotyping (SNP)
- Empirical thresholds are estimated using simulations under the null hypothesis or permutations of trait values
- Computation of power and accuracy of your design or any simulated design
The main source code of QTLmap is written in fortran 95 and use the OpenMP API (Parallel Programming).
The implementation of regression approach of LA,LD and LDLA analysis are available in CUDA to obtain empirical thresholds for large data (speed improvement).
Installation¶
Prerequisite¶
- Linear Algebra PACKage LAPACK
- CMake
- GCC or The TDM-GCC Recommanded, All packages
Optional Prerequisite¶
Download the QTLMap source code¶
Download the latest release here
Compilation¶
- tar xvf qtlmap-X.X.X-Source.tar.gz
- cd qtlmap-X.X.X-Source
- mkdir build
- cd build
- build the Makefile with cmake :
- Release (Multi-threading Environment)
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_Fortran_COMPILER=gfortran ../
- Release (GPGPU-CUDA Environment)
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_Fortran_COMPILER=gfortran -DCUDA_IMPL=true -DTYPE_ARCH_CUDA="compute_20" ../
- Simple precision support : add
-DCUDA_SP=true
- Simple precision support : add
- Release (Multi-threading Environment)
- make
- make install
the binary are in the qtlmap-X.X.X-Source/bin directory
To give a specific path of LAPACK librairies :
-DLAPACK_LIBRARY_PATH=/path/lapack
Documentation¶
A complete documentation is available in pdf format qtlmap-doc-0.8
Build the Parameter files and the input files from examples¶
PorcQTL (Microsatellites example)¶
Description of the dataset¶
- 5 half-sib families
- 236 ind. with 5 phenotypes and genotypes
- 10 microsatellites
download source:dataset-porcqtl.tar.gz
Pedigree
910014 880607 890755 1
910016 880607 890755 1
910020 880607 890755 1
...
...
944217 910001 910014 2
944220 910001 910014 2
952658 910001 910014 2
952659 910001 910014 2
952660 910001 910014 2
...
...
The file contains pedigree information for the 2 last generations of a design which comprises 3
generations, i.e. parents and progeny. It must not contain the grand parental pedigree information.
Each line is made of an alphanumeric ID triplet (individual, sire, dam). A fourth information gives
the generation number : « 1 » for the parental generation ; « 2 » for the progeny generation. An
animal missing one or both parents ID has not to be included in the file. The file
must be sorted by generation, sire ID and dam ID
Marker map
S0383 7 0.01 0.01 0.01 1
S0025 7 0.05 0.05 0.05 1
SW1354 7 0.23 0.22 0.26 1
SW1369 7 0.52 0.54 0.50 1
SLA 7 0.62 0.62 0.63 1
S0102 7 0.74 0.72 0.82 1
SW352 7 1.01 0.91 1.19 1
SW63 7 1.19 1.01 1.46 1
S0101 7 1.48 1.32 1.73 1
SW764 7 1.70 1.50 2.01 1
This file gives the locations of the markers on the chromosome(s). Each line corresponds to a single marker, and gives (order to be followed) :
- marker name (alphanumeric) ;
- name of the chromosome carrying the marker (alphanumeric) ;
- marker position of the marker on the average map (in Morgan) ;
- marker position of the marker on the male map (in Morgan) ;
- marker position of the marker on the female map (in Morgan) ;
Genotypes
S0383 S0025 SW1354 SW1369 SLA S0102 SW352 SW63 S0101 SW764
890761 9 10 3 4 2 8 5 2 4 12 3 3 2 1 13 13 2 5 2 2
880607 6 6 2 2 3 4 2 2 1 2 2 6 2 1 7 6 2 4 4 6
890769 16 10 1 1 8 5 2 6 4 12 4 3 3 4 11 13 5 5 2 2
...
This file contains the animals phenotypes at the markers. The first line gives the marker names, the
markers must belong to the marker map file. For each animal, a line gives its ID (as decribed in the
pedigree file) followed by the markers phenotypes, ranked following in the first line order . Each
phenotype is made of 2 alleles, unordered. When an animal has no phenotype for a marker, both
alleles must be given the missing value code as given in the parametrisation of the analysis.
Traits
944217 7.1350 1 1 1.9320 1 1 .8730 1 1 35.8630 1 1 31.3220 1 1
944220 6.4060 1 1 2.0680 1 1 .8460 1 1 30.2670 1 1 26.6590 1 1
952658 6.5000 1 1 2.1440 1 1 .9060 1 1 30.6870 1 1 26.7960 1 1
...
This file gives the phenotypes of the traits to be analysed.
The progeny performances only are considered in the analysis and must be given in the file.
For each animal, its ID (identical to the ID given in the pedigree file) is followed by information
about nuisance effects (fixed effect levels, covariable value) and then by three information for each
trait : the performance, an 0/1 variable IP which indicates if (IP=1) or not (IP=0) the trait was
measured for this animal and must be included in the analysis, and 0/1 variable (IC) which indicates
if (IC=0) it was censored or not (IC=1), this IC information being needed for survival analysis (by
default IC=1).
The parameter files¶
the main parameter file : p_analyse
#input files in_map=carte # the marker map in_genealogy=genea # pedegree in_genotype=typage # genotypes in_traits=perf # phenotypes of F2 #an other parameter file describing the statistic model of each trait in_model=model #analysis options opt_step=0.05 # analysis step : 5 cMorgan opt_ndmin=20 # number of progeny to consider full-sib families opt_chromosome=7 # chromosome to analyze. opt_unknown_char=0 # unknown genotype char opt_minsirephaseproba = 0.80 # minimal paternal phase probability opt_mindamphaseproba = 0.10 # minimal maternal phase probability #output/results out_output=./OUTPUT/result # main results : residuals variances, estimation of qtl,haplotype effects found under the tested hypothesis out_summary=./OUTPUT/summary # summary out_phases=./OUTPUT/phases # parental phases out_lrtsires=./OUTPUT/lrts # likelihood ratio test found amongst the linkage group (main and by half-sib fam.) out_lrtdams=./OUTPUT/lrtd # likelihood ratio test found amongst the linkage group (by full-sib fam.) out_pded=./OUTPUT/pded # grand parental segment transmission marginal probabilities out_pdedjoin=./OUTPUT/pded_join # grand parental segment transmission joint probabilities out_pateff=./OUTPUT/pateff # Sire QTL effects estimations amongst the linkage group out_mateff=./OUTPUT/mateff # Dam QTL effects estimations amongst the linkage group
the model file
5 # number of traits 0 0 # number of fixed effect , number of covariable nofe nocov # names of fixed effects and covariable Bardiere r 0 0 # name of trait 1 : model : fixe, cov, QTL*ef fixe / r : real i: integer c : categorial Imf r 0 0 # ... Panne r 0 0 x2 r 0 0 x4 r 0 0
Analyse the dataset¶
- Unitrait - Linkage Analysis without a model description (as in the example, all trait have none fixed effect or covariate)
qtlmap --calcul=1
- Multitraits - Linkage Analysis without a model description (as in the example, all trait have none fixed effect or covariate)
qtlmap --calcul=5
- Multitraits/Discriminant function Analysis - Linkage Analysis without a model description (as in the example, all trait have none fixed effect or covariate)
qtlmap --calcul=6
QTLMAS 2011 Datasets (SNP example)¶
Description of the dataset¶
- 20 half-sib families
- 2000 progenies with phenotypes
- 3200 progenies with genotypes
- 5 chromosomes
- 1998 SNP by chromosome
download source:dataset-qtlmas-2011.tar.gz
Build the parameter files¶
Analyse the dataset¶
- Unitrait - Linkage Analysis with SNP
qtlmap --calcul=4 --snp
- Unitrait - Linkage Disequilibrium with SNP
qtlmap --calcul=26 --snp
- Unitrait - Linkage Disequilibrium Linkage Analysis with SNP
qtlmap --calcul=28 --snp
Analyse the dataset¶
Support¶
Subscribe and post any message/question to the qtlmap-users list : qtlmap-users@listes.inra.fr