Simulating Brownian motion in R
This short tutorial gives some simple approaches that can be used to simulate Brownian evolution in continuous and discrete time, in the absence of and on a phylogenetic tree.
Brownian motion is a stochastic continuous-time random walk model in which changes from one time to the next are random draws from some distribution with mean 0.0 and variance σ2. The expected variance under Brownian motion increases linearly through time with instantaneous rate σ2.
Brownian motion is very easy to simulate. To start off, let's simulate a single instance of Brownian motion for 100 generations of discrete time in which the variance of the diffusion process is σ2 = 0.01 per generation. In this case, we will draw our evolutionary changes from a normal distribution; however it's worth noting that (due to the CLT) regardless of the distribution, evolution will proceed by Brownian motion as the width of our timesteps decrease towards zero!
t<-0:100 # time
sig2<-0.01
## first, simulate a set of random deviates
x<-rnorm(n=length(t)-1,sd=sqrt(sig2))
## now compute their cumulative sum
x<-c(0,cumsum(x))
plot(t,x,type="l",ylim=c(-2,2))
The trick here is that we draw random normal deviates for the change over each
t time intervals; and then to get the state of our chain at each time
interval we just compute the cumulative sum of all the individual changes
using cumsum
. We can do this because the distribution of the changes
under Brownian motion is invariant and does not depend on the state of the
chain.
We can also easily do a whole bunch of simulations like this at once, using the same conditions:
nsim<-100
X<-matrix(rnorm(n=nsim*(length(t)-1),sd=sqrt(sig2)),nsim,length(t)-1)
X<-cbind(rep(0,nsim),t(apply(X,1,cumsum)))
plot(t,X[1,],xlab="time",ylab="phenotype",ylim=c(-2,2),type="l")
apply(X[2:nsim,],1,function(x,t) lines(t,x),t=t)
## NULL
To see how the outcome depends on σ2, let's compare the result
when we divide sig2
by 10:
X<-matrix(rnorm(n=nsim*(length(t)-1),sd=sqrt(sig2/10)),nsim,length(t)-1)
X<-cbind(rep(0,nsim),t(apply(X,1,cumsum)))
plot(t,X[1,],xlab="time",ylab="phenotype",ylim=c(-2,2),type="l")
apply(X[2:nsim,],1,function(x,t) lines(t,x),t=t)
## NULL
There are a number of different ways we could've done this. For example, instead
of using apply
, which economizes on code, we could have used a
for
loop as follows:
X<-matrix(0,nsim,length(t))
for(i in 1:nsim) X[i,]<-c(0,cumsum(rnorm(n=length(t)-1,sd=sqrt(sig2))))
plot(t,X[1,],xlab="time",ylab="phenotype",ylim=c(-2,2),type="l")
for(i in 1:nsim) lines(t,X[i,])
The expected variance under Brownian motion is just σ2 × t. To see this easiest, we can just do the following. Here I will use 10,000 simulations for 100 generations under the same conditions to “smooth out” our result:
nsim<-10000
X<-matrix(rnorm(n=nsim*(length(t)-1),sd=sqrt(sig2)),nsim,length(t)-1)
X<-cbind(rep(0,nsim),t(apply(X,1,cumsum)))
v<-apply(X,2,var)
plot(t,v,type="l",xlab="time",ylab="variance among simulations")
var(X[,length(t)]) # this should be about 1.00
## [1] 1.009997
OK, now let's try to simulate using Brownian motion up the branches of a tree.
Sticking with discrete time, we first need a discrete time phylogeny, which we can
simulate using pbtree
in phytools:
library(phytools)
t<-100 # total time
n<-30 # total taxa
b<-(log(n)-log(2))/t
tree<-pbtree(b=b,n=n,t=t,type="discrete")
## simulating with both taxa-stop (n) and time-stop (t) is
## performed via rejection sampling & may be slow
##
## 28 trees rejected before finding a tree
plotTree(tree)
Now, to simulate on all the branches of the tree we just need to simulate on each branch separately, and then “shift” each daughter branch by the end state of it's parent. We can do this, remember, because the outcome of Brownian evolution at each time step is independent of all other time steps.
## simulate evolution along each edge
X<-lapply(tree$edge.length,function(x) c(0,cumsum(rnorm(n=x,sd=sqrt(sig2)))))
## reorder the edges of the tree for pre-order traversal
cw<-reorder(tree)
## now simulate on the tree
ll<-tree$edge.length+1
for(i in 1:nrow(cw$edge)){
pp<-which(cw$edge[,2]==cw$edge[i,1])
if(length(pp)>0) X[[i]]<-X[[i]]+X[[pp]][ll[pp]]
else X[[i]]<-X[[i]]+X[[1]][1]
}
## get the starting and ending points of each edge for plotting
H<-nodeHeights(tree)
## plot the simulation
plot(H[1,1],X[[1]][1],ylim=range(X),xlim=range(H),xlab="time",ylab="phenotype")
for(i in 1:length(X)) lines(H[i,1]:H[i,2],X[[i]])
## add tip labels if desired
yy<-sapply(1:length(tree$tip.label),function(x,y) which(x==y),y=tree$edge[,2])
yy<-sapply(yy,function(x,y) y[[x]][length(y[[x]])],y=X)
text(x=max(H),y=yy,tree$tip.label)
In reality, most simulations of Brownian motion are conducted using continuous rather than discrete time. This is because the additivity of Brownian motion means that the expected variances among & covariances between species are the same in whether we simulate t steps each with variance σ2, or one big step with variance σ2t. (And, in fact, the CLT assures us that the sum of changes will be normal, regardless of the distribution of each infinitesimal change.)
Here is an example of a continuous time simulation & visualization using canned functions.
## simulate Brownian evolution on a tree with fastBM
x<-fastBM(tree,sig2=sig2,internal=TRUE)
## visualize Brownian evolution on a tree
phenogram(tree,x,spread.labels=TRUE,spread.cost=c(1,0))
Some additional points
A few additional points about Brownian evolution might be interesting to consider since it is such an important model.
It is possible that there might be different rates of Brownian evolution in different parts of the tree, instead of a constant rate. We'll learn more about this later; but it is straightforward to simulate Brownian motion with different rates on different branches. For example, in the following example, I first simulate the history of a discretely valued character on the tree, and then a continuous trait with a rate that changes as a function of the mapped discrete character. I'll then create a traitgram that projects the phylogeny onto the phenotypic trait axis and overlays the mapped discrete character.
set.seed(100)
## transition matrix for the discrete trait simulation
Q<-matrix(c(-1,1,1,-1),2,2)
## simulated tree
tree<-pbtree(n=30,scale=1)
## simulate discrete character history
tree<-sim.history(tree,Q,anc="1")
## Done simulation(s).
## plot discrete character history on the tree
plotSimmap(tree,setNames(c("blue","red"),1:2),pts=F)
x<-sim.rates(tree,setNames(c(1,20),1:2),internal=TRUE)
phenogram(tree,x,colors=setNames(c("blue","red"),1:2),spread.labels=TRUE,spread.cost=c(1,0))
We discussed the fact that shared ancestry creates “covariance” among the tips
of the tree. This is a very important concept, but it is one that can be
a little bit challenging to grasp. One way of thinking about this is as the
covariance among species across a large number of replicates of the
evolutionary process. The following is an illustration of that point.
(This requires the package "car"
.)
## simulate 5 taxon tree
tree<-pbtree(n=5,tip.label=LETTERS[1:5])
## 500 BM simulations
X<-fastBM(tree,nsim=500)
## plot tree
plot(rotateNodes(tree,"all"),edge.width=2,no.margin=TRUE)
## plot distribution across tips from 500 simulations
library(car)
scatterplotMatrix(t(X))
As we've already noted, Brownian motion does not assume that the process under which the individual lineages are moving is Gaussian. The result will be Gaussian - due to the Central Limit Theorem. Here is an illustration of that:
t <- 0:100 # time
sig2 <- 0.01
nsim <- 1000
## we'll simulate the steps from a uniform distribution
## with limits set to have the same variance (0.01) as before
X<-matrix(runif(n=nsim*(length(t)-1),min=-sqrt(3*sig2),max=sqrt(3*sig2)),nsim,length(t)-1)
X <- cbind(rep(0, nsim), t(apply(X, 1, cumsum)))
plot(t, X[1, ], xlab = "time", ylab = "phenotype", ylim = c(-2, 2), type = "l")
apply(X[2:nsim, ], 1, function(x, t) lines(t, x), t = t)
## NULL
var(X[, length(t)])
## [1] 0.9689945
hist(X[,length(t)])
plot(density(X[,length(t)]))
Brownian motion is generally assumed to be trendless; however it is possible to simulate and (under some limited conditions) fit a model of Brownian evolution with a trend. Here is an example of a simulation (using the same general approach as above) with a trend.
nsim=100
X <- matrix(rnorm(mean=0.02,n = nsim * (length(t) - 1), sd = sqrt(sig2/4)), nsim, length(t) - 1)
X <- cbind(rep(0, nsim), t(apply(X, 1, cumsum)))
plot(t, X[1, ], xlab = "time", ylab = "phenotype", ylim = c(-1, 3), type = "l")
apply(X[2:nsim, ], 1, function(x, t) lines(t, x), t = t)
## NULL
Written by Liam J. Revell. Last updated July 1, 2015