Simulating Brownian motion in R


Simulating Brownian motion in R

This short tutorial gives some simple approaches that can be used to simulate Brownian evolution in continuous and discrete time, in the absence of and on a phylogenetic tree.

Brownian motion is a stochastic continuous-time random walk model in which changes from one time to the next are random draws from some distribution with mean 0.0 and variance σ2. The expected variance under Brownian motion increases linearly through time with instantaneous rate σ2.

Brownian motion is very easy to simulate. To start off, let's simulate a single instance of Brownian motion for 100 generations of discrete time in which the variance of the diffusion process is σ2 = 0.01 per generation. In this case, we will draw our evolutionary changes from a normal distribution; however it's worth noting that (due to the CLT) regardless of the distribution, evolution will proceed by Brownian motion as the width of our timesteps decrease towards zero!

t<-0:100 # time
sig2<-0.01
## first, simulate a set of random deviates
x<-rnorm(n=length(t)-1,sd=sqrt(sig2))
## now compute their cumulative sum
x<-c(0,cumsum(x))
plot(t,x,type="l",ylim=c(-2,2))

plot of chunk unnamed-chunk-1

The trick here is that we draw random normal deviates for the change over each t time intervals; and then to get the state of our chain at each time interval we just compute the cumulative sum of all the individual changes using cumsum. We can do this because the distribution of the changes under Brownian motion is invariant and does not depend on the state of the chain.

We can also easily do a whole bunch of simulations like this at once, using the same conditions:

nsim<-100
X<-matrix(rnorm(n=nsim*(length(t)-1),sd=sqrt(sig2)),nsim,length(t)-1)
X<-cbind(rep(0,nsim),t(apply(X,1,cumsum)))
plot(t,X[1,],xlab="time",ylab="phenotype",ylim=c(-2,2),type="l")
apply(X[2:nsim,],1,function(x,t) lines(t,x),t=t)

plot of chunk unnamed-chunk-2

## NULL

To see how the outcome depends on σ2, let's compare the result when we divide sig2 by 10:

X<-matrix(rnorm(n=nsim*(length(t)-1),sd=sqrt(sig2/10)),nsim,length(t)-1)
X<-cbind(rep(0,nsim),t(apply(X,1,cumsum)))
plot(t,X[1,],xlab="time",ylab="phenotype",ylim=c(-2,2),type="l")
apply(X[2:nsim,],1,function(x,t) lines(t,x),t=t)

plot of chunk unnamed-chunk-3

## NULL

There are a number of different ways we could've done this. For example, instead of using apply, which economizes on code, we could have used a for loop as follows:

X<-matrix(0,nsim,length(t))
for(i in 1:nsim) X[i,]<-c(0,cumsum(rnorm(n=length(t)-1,sd=sqrt(sig2))))
plot(t,X[1,],xlab="time",ylab="phenotype",ylim=c(-2,2),type="l")
for(i in 1:nsim) lines(t,X[i,])

plot of chunk unnamed-chunk-4

The expected variance under Brownian motion is just σ2 × t. To see this easiest, we can just do the following. Here I will use 10,000 simulations for 100 generations under the same conditions to “smooth out” our result:

nsim<-10000
X<-matrix(rnorm(n=nsim*(length(t)-1),sd=sqrt(sig2)),nsim,length(t)-1)
X<-cbind(rep(0,nsim),t(apply(X,1,cumsum)))
v<-apply(X,2,var)
plot(t,v,type="l",xlab="time",ylab="variance among simulations")

plot of chunk unnamed-chunk-5

var(X[,length(t)]) # this should be about 1.00
## [1] 1.009997

OK, now let's try to simulate using Brownian motion up the branches of a tree. Sticking with discrete time, we first need a discrete time phylogeny, which we can simulate using pbtree in phytools:

library(phytools)
t<-100 # total time
n<-30 # total taxa
b<-(log(n)-log(2))/t
tree<-pbtree(b=b,n=n,t=t,type="discrete")
## simulating with both taxa-stop (n) and time-stop (t) is
## performed via rejection sampling & may be slow
## 
##   28 trees rejected before finding a tree
plotTree(tree)

plot of chunk unnamed-chunk-6

Now, to simulate on all the branches of the tree we just need to simulate on each branch separately, and then “shift” each daughter branch by the end state of it's parent. We can do this, remember, because the outcome of Brownian evolution at each time step is independent of all other time steps.

## simulate evolution along each edge
X<-lapply(tree$edge.length,function(x) c(0,cumsum(rnorm(n=x,sd=sqrt(sig2)))))
## reorder the edges of the tree for pre-order traversal
cw<-reorder(tree)
## now simulate on the tree
ll<-tree$edge.length+1
for(i in 1:nrow(cw$edge)){
    pp<-which(cw$edge[,2]==cw$edge[i,1])
    if(length(pp)>0) X[[i]]<-X[[i]]+X[[pp]][ll[pp]]
    else X[[i]]<-X[[i]]+X[[1]][1]
}
## get the starting and ending points of each edge for plotting
H<-nodeHeights(tree)
## plot the simulation
plot(H[1,1],X[[1]][1],ylim=range(X),xlim=range(H),xlab="time",ylab="phenotype")
for(i in 1:length(X)) lines(H[i,1]:H[i,2],X[[i]])
## add tip labels if desired
yy<-sapply(1:length(tree$tip.label),function(x,y) which(x==y),y=tree$edge[,2])
yy<-sapply(yy,function(x,y) y[[x]][length(y[[x]])],y=X)
text(x=max(H),y=yy,tree$tip.label)

plot of chunk unnamed-chunk-7

In reality, most simulations of Brownian motion are conducted using continuous rather than discrete time. This is because the additivity of Brownian motion means that the expected variances among & covariances between species are the same in whether we simulate t steps each with variance σ2, or one big step with variance σ2t. (And, in fact, the CLT assures us that the sum of changes will be normal, regardless of the distribution of each infinitesimal change.)

Here is an example of a continuous time simulation & visualization using canned functions.

## simulate Brownian evolution on a tree with fastBM
x<-fastBM(tree,sig2=sig2,internal=TRUE)
## visualize Brownian evolution on a tree
phenogram(tree,x,spread.labels=TRUE,spread.cost=c(1,0))

plot of chunk unnamed-chunk-8

Some additional points

A few additional points about Brownian evolution might be interesting to consider since it is such an important model.

It is possible that there might be different rates of Brownian evolution in different parts of the tree, instead of a constant rate. We'll learn more about this later; but it is straightforward to simulate Brownian motion with different rates on different branches. For example, in the following example, I first simulate the history of a discretely valued character on the tree, and then a continuous trait with a rate that changes as a function of the mapped discrete character. I'll then create a traitgram that projects the phylogeny onto the phenotypic trait axis and overlays the mapped discrete character.

set.seed(100)
## transition matrix for the discrete trait simulation
Q<-matrix(c(-1,1,1,-1),2,2)
## simulated tree
tree<-pbtree(n=30,scale=1)
## simulate discrete character history
tree<-sim.history(tree,Q,anc="1")
## Done simulation(s).
## plot discrete character history on the tree
plotSimmap(tree,setNames(c("blue","red"),1:2),pts=F)

plot of chunk unnamed-chunk-9

x<-sim.rates(tree,setNames(c(1,20),1:2),internal=TRUE)
phenogram(tree,x,colors=setNames(c("blue","red"),1:2),spread.labels=TRUE,spread.cost=c(1,0))

plot of chunk unnamed-chunk-10

We discussed the fact that shared ancestry creates “covariance” among the tips of the tree. This is a very important concept, but it is one that can be a little bit challenging to grasp. One way of thinking about this is as the covariance among species across a large number of replicates of the evolutionary process. The following is an illustration of that point. (This requires the package "car".)

## simulate 5 taxon tree
tree<-pbtree(n=5,tip.label=LETTERS[1:5])
## 500 BM simulations
X<-fastBM(tree,nsim=500)
## plot tree
plot(rotateNodes(tree,"all"),edge.width=2,no.margin=TRUE)

plot of chunk unnamed-chunk-11

## plot distribution across tips from 500 simulations
library(car)
scatterplotMatrix(t(X))

plot of chunk unnamed-chunk-11

As we've already noted, Brownian motion does not assume that the process under which the individual lineages are moving is Gaussian. The result will be Gaussian - due to the Central Limit Theorem. Here is an illustration of that:

t <- 0:100  # time
sig2 <- 0.01
nsim <- 1000
## we'll simulate the steps from a uniform distribution
## with limits set to have the same variance (0.01) as before
X<-matrix(runif(n=nsim*(length(t)-1),min=-sqrt(3*sig2),max=sqrt(3*sig2)),nsim,length(t)-1)
X <- cbind(rep(0, nsim), t(apply(X, 1, cumsum)))
plot(t, X[1, ], xlab = "time", ylab = "phenotype", ylim = c(-2, 2), type = "l")
apply(X[2:nsim, ], 1, function(x, t) lines(t, x), t = t)

plot of chunk unnamed-chunk-12

## NULL
var(X[, length(t)])
## [1] 0.9689945
hist(X[,length(t)])

plot of chunk unnamed-chunk-13

plot(density(X[,length(t)]))    

plot of chunk unnamed-chunk-14

Brownian motion is generally assumed to be trendless; however it is possible to simulate and (under some limited conditions) fit a model of Brownian evolution with a trend. Here is an example of a simulation (using the same general approach as above) with a trend.

nsim=100
X <- matrix(rnorm(mean=0.02,n = nsim * (length(t) - 1), sd = sqrt(sig2/4)), nsim, length(t) - 1)
X <- cbind(rep(0, nsim), t(apply(X, 1, cumsum)))
plot(t, X[1, ], xlab = "time", ylab = "phenotype", ylim = c(-1, 3), type = "l")
apply(X[2:nsim, ], 1, function(x, t) lines(t, x), t = t)

plot of chunk unnamed-chunk-15

## NULL

Written by Liam J. Revell. Last updated July 1, 2015