Quick start¶
Below we illustrate how to estimate population size history using PopSizeABC. The real dataset considered in this example is a vcf file including the sequences of 234 bulls for a 2Mb long region, 15 of these animals (from the Jersey breed) being actually used in our analysis.
1: Install PopSizeABC. create a directory named PopSizeABC (or any other name of you choice), download the files comp_stat.zip (or comp_stat.1.zip), estim.zip and cattle_data.tar.gz in this directory and uncompress them. Open a terminal and go into the PopSizeABC directory. Create a subdirectoy named res, where output files will be stored.
2: Compute observed summary statistics using the command 'python comp_stat/stat_from_vcf_ex1.py'. This command creates a file named ex1_Jersey_n30_mac6_macld6.stat.
3: Simulate summary statistics using the command 'python comp_stat/simul_data_ex1.py'. This command creates two files named ex1_simu_n30_s1.params (the parameter values of simulated samples) and ex1_simu_n30_s1_mac6_macld6.stat (the summary statistics of simulated samples).
4: Perform abc estimation using the command R -f estim/abc_ex1.R. This command produces two pdf files, ex1_Jersey_mac6_macld6_estim.pdf (the median and 90% credible interval of population size along time) and ex1_Jersey_mac6_macld6_detailed.pdf (qualty check of ABC estimation, see the documentation of R package abc for more details).
5: Test the expected accuracy of ABC estimation using the command R -f estim/abc_cv_ex1.R. This command produces one pdf file, ex1_Jersey_mac6_macld6_cv.pdf, showing the expected prediction error of population size along time.
To ensure that these commands can finnish in a few minutes, we modified several important parameters of the analysis compared to what is done in the paper : observed and simulated summary statistics were computed from a 2Mb region, only 1,000 histories were simulated, prior distributions of recombination rate and population sizes were shrinked in order to avoid histories with large simulation time ... With such parameter values, the expected accuracy of PopSizeABC is very poor, as can be seen from ex1_Jersey_mac6_macld6_cv.pdf. However, these parameter values (and many other ones like the sample size, the minor allele frequency thresholds, the number and the shape of time windows ...) can easily be modified in the python and R scripts, see for instance the other scripts provided in the "Files" tab.