Diversification analysis with BAMM and RPANDA
July/2015
BAMM
Downloading and installing BAMM
BAMM (Bayesian Analysis of Macroevolutionary Mixtures) is a piece of software that uses rjMCMC to model complex dynamics of speciation, extinction and trait evolution on phylogenetic trees. However, the following tutorial will focus only on speciation and extinction rates estimation.
The first step to use BAMM is to download the executable file and examples from the program’s website. To install the program, you must follow the operating system’s specific instructions on the website.
As previously commented, the ideal way of working with BAMM analysis is to create a separate folder for each independent run/project/analysis. This way, you can have all necessary files in the right place, and you keep your results organized.
There are three mandatory files for running BAMM:
BAMM executable
Control file
Tree file (in newick format)
Whale tree
We will use the phylogeny of whales as an example. You can get this phylogeny, along with other files, from this link.
The next step is to edit the control file.
Control file
The control file contains all the information you need to run BAMM including the names of the input files you want to analyze. Control file has several options and priors which are all comented in the example and you can check more detailed explanation at BAMM website.
Running BAMM
Navigate to the folder containing the BAMM executable file, the control file and the tree.
./bamm -c divcontrol.txt ## copy and paste into the terminal
BAMMtools R Package
install.packages("BAMMtools")
library(BAMMtools)
tree <- read.tree("whaletree.tre")
plot(tree, cex = 0.35)
Read the file “eventdata.txt” which contains the information that BAMMtools needs to analyze BAMM output.
events <- read.csv("event_data.txt")
However, to explore some of the functions in BAMMtools, let’s use an eventfile from a BAMM analysis that ran with a longer chain length.
data(whales, events.whales)
Here we will use the function “getEventData” to generate an object of the class “bammdata”. “bammdata” object is a complex data structure that includes a phylogenetic tree and a mapping of all macroevolutionary rate parameters sampled using BAMM. Many of the methods in BAMMtools operate directly on objects of class bammdata“.
ed <- getEventData(whales, events.whales, burnin=0.25)
## Processing event data from data.frame
##
## Discarded as burnin: GENERATIONS < 1245000
## Analyzing 751 samples from posterior
##
## Setting recursive sequence on tree...
##
## Done with recursive sequence
head(ed$eventData)
## [[1]]
## node time lam1 lam2 mu1 mu2 index
## 1 88 0.0000 0.0971445 0.00000 0.0199072 0 1
## 2 141 25.6934 0.5699400 -0.13386 0.1152300 0 2
##
## [[2]]
## node time lam1 lam2 mu1 mu2 index
## 1 88 0.0000 0.169028 -0.0207249 0.045860 0 1
## 2 141 25.9095 1.288720 -0.2392280 0.045262 0 2
##
## [[3]]
## node time lam1 lam2 mu1 mu2 index
## 1 88 0.0000 0.479322 -0.0565555 0.1176540 0 1
## 2 141 25.2372 0.672619 -0.1634640 0.0190236 0 2
##
## [[4]]
## node time lam1 lam2 mu1 mu2 index
## 1 88 0.0000 0.1971050 -0.0302915 0.05377420 0 1
## 2 140 24.5441 0.7476610 -0.1280560 0.00978705 0 2
## 3 1 33.4932 0.0535886 0.0000000 1.69524000 0 3
##
## [[5]]
## node time lam1 lam2 mu1 mu2 index
## 1 88 0.0000 0.228074 -0.0324925 0.0828498 0 1
## 2 141 26.3576 0.889405 -0.2088700 0.0212998 0 2
##
## [[6]]
## node time lam1 lam2 mu1 mu2 index
## 1 88 0.0000 0.153257 -0.026538 0.00371713 0 1
## 2 141 26.3608 0.580143 -0.127939 0.02964140 0 2
bamm.whales <- plot.bammdata(ed, lwd=2, labels = T, cex = 0.5)
addBAMMshifts(ed, cex=2)
addBAMMlegend(bamm.whales, corners = c(-1, -0.5, 40, 80), nTicks = 4, side = 4, las = 1)
Rate through time plots
We can also plot the rate through time estimated by BAMM.
plot.new()
plotRateThroughTime(ed, intervalCol="red", avgCol="red", ylim=c(0,1), cex.axis=2)
text(x=30, y= 0.8, label="All whales", font=4, cex=2.0, pos=4)
Or plot the rate through time estimated for a given clade:
plotRateThroughTime(ed, intervalCol="blue", avgCol="blue", node=141, ylim=c(0,1),cex.axis=1.5)
text(x=30, y= 0.8, label="Dolphins only", font=4, cex=2.0, pos=4)
Or exclude this given clade and thus plot the rate through time for the “background clade”.
plotRateThroughTime(ed, intervalCol="darkgreen", avgCol="darkgreen", node=141, nodetype = "exclude", ylim=c(0,1), cex.axis=1.5)
text(x=30, y= 0.8, label="Non-dolphins", font=4, cex=2.0, pos=4)
Extracting tip rates
Bamm estimates speciation and extinction rates for each branch on the tree. Here we will extract the rates estimated for each tip and plot in different ways.
tip.rates <- getTipRates(ed)
str(tip.rates)
## List of 4
## $ lambda : num [1:87, 1:751] 0.0971 0.0971 0.0971 0.0971 0.0971 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:87] "Balaena_mysticetus" "Eubalaena_australis" "Eubalaena_glacialis" "Eubalaena_japonica" ...
## .. ..$ : NULL
## $ mu : num [1:87, 1:751] 0.0199 0.0199 0.0199 0.0199 0.0199 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:87] "Balaena_mysticetus" "Eubalaena_australis" "Eubalaena_glacialis" "Eubalaena_japonica" ...
## .. ..$ : NULL
## $ lambda.avg: Named num [1:87] 0.0794 0.0871 0.0871 0.0871 0.0713 ...
## ..- attr(*, "names")= chr [1:87] "Balaena_mysticetus" "Eubalaena_australis" "Eubalaena_glacialis" "Eubalaena_japonica" ...
## $ mu.avg : Named num [1:87] 0.0773 0.0602 0.0602 0.0602 0.0442 ...
## ..- attr(*, "names")= chr [1:87] "Balaena_mysticetus" "Eubalaena_australis" "Eubalaena_glacialis" "Eubalaena_japonica" ...
You can explore the tip rates with a histogram:
hist(tip.rates$lambda.avg,xlab="average lambda")
hist(tip.rates$mu.avg,xlab="average mu")
Or you can separate different groups of taxa to compare their tip rates:
boxplot(tip.rates$lambda.avg[53:87], tip.rates$lambda.avg[1:52], col = c("red", "blue"), names = c("dolphins", "other cetaceans"))
RPANDA
RPANDA is another recently developed R package that allows us to fit different models of rate variation through time and select the best fitting model using maximum likelihood analysis.
RPANDA shows basically two main differences from BAMM:
In RPANDA, the user must inform which models are going to be tested, whereas in BAMM the program itself will average among the rates for each clade;
In RPANDA, the user must know a priori which are the clades that might show a particular diversification regime, while BAMM will estimate where rate shifts are positioned using a rjMCMC algorithm.
The package also contains some simuation functions as well as datasets. You can find more details on CRAN.
Model selection
For this exercise, we will test four different scenarios with all combinations of constant and variable speciation and extinction rates as follows (based on Morlon et al. 2011, PNAS) :
lambda.cst <- function(x,y){y}
lambda.var <- function(x,y){y[1]*exp(y[2]*x)}
mu.cst <- function(x,y){y}
mu.var <- function(x,y){y[1]*exp(y[2]*x)}
The rate variations for the four possible pairwise combinations look like this:
Having decided which models one would like to test, the next step is to extract the clades that most likely share similar diversification regimes. As said previously, RPANDA needs the user to a priori indicate these clades. In this example, we will use the four main cetacean families plus the rest of cetaceans as a fifth clade.
delphinidae.tree <- drop.tip(whales,whales$tip.label[-match(delphinidae,whales$tip.label)])
phocoenidae.tree <- drop.tip(whales,whales$tip.label[-match(phocoenidae,whales$tip.label)])
ziphidae.tree <- drop.tip(whales,whales$tip.label[-match(ziphidae,whales$tip.label)])
othercetaceans.tree <- drop.tip(whales,c(balaenopteridae,delphinidae, phocoenidae, ziphidae))
fit.multi.rpanda <- function(tree,par)
{
bcstdcst <- fit_bd(tree, max(branching.times(tree)), f.lamb=lambda.cst, f.mu=mu.cst, lamb_par=par[[1]][1],mu_par=par[[1]][2],cst.lamb=TRUE,cst.mu=TRUE,cond="crown",f=87/89,dt=1e-3)
bvardcst <- fit_bd(tree, max(branching.times(tree)), f.lamb=lambda.var, f.mu=mu.cst, lamb_par=par[[2]][c(1,2)],mu_par=par[[2]][3],expo.lamb=TRUE,cst.mu=TRUE,cond="crown",f=87/89,dt=1e-3)
bcstdvar <- fit_bd(tree, max(branching.times(tree)), f.lamb=lambda.cst, f.mu=mu.var, lamb_par=par[[3]][1],mu_par=par[[3]][c(2,3)],cst.lamb=TRUE,expo.mu=TRUE,cond="crown",f=87/89,dt=1e-3)
bvardvar <- fit_bd(tree, max(branching.times(tree)), f.lamb=lambda.var, f.mu=mu.var, lamb_par=par[[4]][c(1,2)],mu_par=par[[4]][c(3,4)],expo.lamb=TRUE,expo.mu=TRUE,cond="crown",f=87/89,dt=1e-3)
return(list("bcstdcst"=bcstdcst,"bvardcst"=bvardcst,"bcstdvar"=bcstdvar,"bvardvar"=bvardvar))
}
whales.par <- list(c(0.4,0),c(0.4,-0.05,0),c(0.4,0.1,0.05),c(0.4,-0.05,0.1,0.05))
Estimation of model parameters
In the posession of all models and clades, we can finally estimate the parameters of all the four models to each of the five clades and create an AICc table in order to select which one is the best model that describes the changes in both diversification rates (speciation and extinction) rates. (This part of the code take a while to complete)
results <- list("balaenopteridae.res"=fit.multi.rpanda(balaenopteridae.tree,whales.par),
"delphinidae.res" = fit.multi.rpanda(delphinidae.tree,whales.par),
"phocoenidae.res" = fit.multi.rpanda(phocoenidae.tree,whales.par),
"ziphidae.res" = fit.multi.rpanda(ziphidae.tree,whales.par),
"othercetaceans.res"= fit.multi.rpanda(othercetaceans.tree,whales.par))
aic.table <- matrix(nrow=4,ncol=5,NA)
for(i in 1:5)
{
for(j in 1:4)
{
aic.table[j,i] <- results[[i]][[j]]$aicc
}
}
colnames(aic.table) <- c("Balaenopteridae","Delphinidae","Phocoenidae","Ziphidae","Other Cetaceans")
rownames(aic.table) <- c("bcstdcst","bvardcst","bcstdvar","bvardvar")
aic.table
## Balaenopteridae Delphinidae Phocoenidae Ziphidae Other Cetaceans
## bcstdcst 56.73336 171.4150 31.79207 132.6440 119.4690
## bvardcst 58.44456 170.0338 34.13348 130.1843 117.5562
## bcstdvar 58.32515 170.6453 41.79207 127.6094 115.3723
## bvardvar 65.67244 171.3258 64.13348 133.2251 118.9789
par.table <- data.frame("Balaenopteridae"=c(results[[1]]$bcstdcst$lamb_par[1:2],results[[1]]$bcstdcst$mu_par[1:2]),"Delphinidae"=c(results[[2]]$bvardcst$lamb_par[1:2],results[[2]]$bvardcst$mu_par[1:2]),"Phocoenidae"=c(results[[3]]$bcstdcst$lamb_par[1:2],results[[3]]$bcstdcst$mu_par[1:2]),"Ziphidae"=c(results[[4]]$bcstdcst$lamb_par[1:2],results[[4]]$bcstdcst$mu_par[1:2]),"Other Cetaceans"=c(results[[5]]$bcstdvar$lamb_par[1:2],results[[5]]$bcstdvar$mu_par[1:2]))
par.table
## Balaenopteridae Delphinidae Phocoenidae Ziphidae Other.Cetaceans
## 1 7.343080e-02 1.405102e-01 1.410234e-01 9.485149e-02 0.1857764
## 2 NA 1.257642e-01 NA NA NA
## 3 -6.391602e-10 -1.521115e-07 -6.012454e-08 7.545982e-08 0.8313274
## 4 NA NA NA NA -0.1742352
Plotting diversity through time
After selecting which model best fits the trees, we can estimate the diversity trajectory through time for each of the five clades.
# Function to calculate species richness in a given point in time
div.times <- c(max(branching.times(balaenopteridae.tree)),max(branching.times(delphinidae.tree)),max(branching.times(phocoenidae.tree)),max(branching.times(ziphidae.tree)),max(branching.times(othercetaceans.tree)))
# Function modified from plot_dtt from RPANDA package
plotdtt <- function (fit.bd, tot_time, N0, col=1, add=FALSE, div.time, xlim, ylim)
{
if (!inherits(fit.bd, "fit.bd"))
stop("object \"fit.bd\" is not of class \"fit.bd\"")
t <- seq(tot_time-div.time, tot_time, 0.01)
if ("f.mu" %in% attributes(fit.bd)$names) {
r <- function(t) {
-fit.bd$f.lamb(t) + fit.bd$f.mu(t)
}
R <- function(s) {
RPANDA:::.Integrate(Vectorize(r), 0, s)
}
N <- N0 * exp(Vectorize(R)(t))
#dev.new()
if(add==FALSE)
{
plot(-t, N, type = "l", xlab = "time", ylab = "Number of species",
main = "Diversity Through Time", col=col, xlim=xlim, ylim=ylim)
}
else
{
lines(-t, N, type = "l", xlab = "time", ylab = "Number of species",
main = "Diversity Through Time", col=col, xlim=xlim, ylim=ylim)
}
}
else {
r <- function(t) {
-fit.bd$f.lamb(t)
}
R <- function(s) {
RPANDA:::.Integrate(Vectorize(r), 0, s)
}
N <- N0 * exp(Vectorize(R)(t))
#dev.new()
if(add==FALSE)
{
plot(-t, N, type = "l", xlab = "time", ylab = "Number of species",
main = "Diversity Through Time",col=col, xlim=xlim, ylim=ylim)
}
else
{
lines(-t, N, type = "l", xlab = "time", ylab = "Number of species",
main = "Diversity Through Time",col=col, xlim=xlim, ylim=ylim)
}
}
}
plotdtt(results$balaenopteridae$bcstdcst,div.times[1],N0=Ntip(balaenopteridae.tree),xlim=c(-max(div.times),0),ylim=c(0,150),div.time=div.times[1])
plotdtt(results$delphinidae$bvardcst,div.times[2],N0=Ntip(delphinidae.tree),col=6,add=TRUE,xlim=c(-max(div.times),0),ylim=c(0,150),div.time=div.times[2])
plotdtt(results$phocoenidae$bcstdcst,div.times[3],N0=Ntip(phocoenidae.tree),col="goldenrod",add=TRUE,xlim=c(-max(div.times),0),ylim=c(0,150),div.time=div.times[3])
plotdtt(results$ziphidae$bcstdcst,div.times[4],N0=Ntip(ziphidae.tree),col=4,add=TRUE,xlim=c(-max(div.times),0),ylim=c(0,150),div.time=div.times[4])
plotdtt(results$othercetaceans$bcstdvar,div.times[5],N0=Ntip(othercetaceans.tree),col="darkred",add=TRUE,xlim=c(-max(div.times),0),ylim=c(0,150),div.time=div.times[5])
legend("topleft",legend=c("Balaenopteridae","Delphinidae","Phocoenidae","Ziphidae","Other Cetaceans"),text.col=c(1,6,"goldenrod",4,"darkred"))
Challenge
Fit models using RPANDA for the entire whale phylogeny, and plot its diversity through time using parts of the previous codes chunks.
Run BAMM in the Anoles tree (provided in the example file) and plot the tree with the rates for each branch.