01.06.2015

Simulating a single population

We use Hudson's MS simulator. This program simulates data sets with a pure neutral model of evolution, under various demographic models. The 'ms' program takes 2 arguments: - A number of individuals (samples) to simulate - A number of data sets (replicates) to simulate

In addition, it takes some switch to specify the demography model. The simplest one is the -t switch, which provides a value of \(\theta = 4 \cdot Ne \cdot u\). The following command generates one sample of 10 individuals with \(\theta = 4\).

ms 10 1 -t 4

A value of theta = 4 corresponds, for instance, to a mutation rate of 1e-6 and a population size of 1e6.

Exercise:

Run MS and note the number of segregating sites. Compare to the expected number of seggregating sites:

\(E(S) = L \cdot \theta \cdot a_n\)

with \(a_n = \sum_{i = 1}^{n-1} 1/i\)

Note: in our example, \(L = 1\).

Tip: automatic extraction using unix command line:

ms 10 1000 -t 4 | sed -n "s/segsites: \(\d*\)/\1/p" > nbseg.txt

The resulting numbers can be read in R:

numbers <- read.table("nbseg.txt")

Results for n=10

e <- sum(1/(1:9)) * 4
print(e)
## [1] 11.31587
summary(numbers)
##        V1       
##  Min.   : 1.00  
##  1st Qu.: 7.00  
##  Median :11.00  
##  Mean   :11.44  
##  3rd Qu.:15.00  
##  Max.   :39.00

Results for n=10 (suite)

hist(numbers$V1, col="cornsilk3", main="",
     xlab="Number of segregating sites", nclass=20)
abline(v=e, lwd=2, col="steelblue")