Diversification analysis with BAMM and RPANDA


Diversification analysis with BAMM and RPANDA

BAMM

Downloading and installing BAMM

BAMM (Bayesian Analysis of Macroevolutionary Mixtures) is a piece of software that uses rjMCMC to model complex dynamics of speciation, extinction and trait evolution on phylogenetic trees. However, the following tutorial will focus only on speciation and extinction rates estimation.

The first step to use BAMM is to download the executable file and examples from the program’s website. To install the program, you must follow the operating system’s specific instructions on the website.

As previously commented, the ideal way of working with BAMM analysis is to create a separate folder for each independent run/project/analysis. This way, you can have all necessary files in the right place, and you keep your results organized.

There are three mandatory files for running BAMM:

  • BAMM executable

  • Control file

  • Tree file (in newick format)

Whale tree

We will use the phylogeny of whales as an example. You can get this phylogeny, along with other files, from this link.

The next step is to edit the control file.

Control file

The control file contains all the information you need to run BAMM including the names of the input files you want to analyze. Control file has several options and priors which are all comented in the example and you can check more detailed explanation at BAMM website.

Running BAMM

Navigate to the folder containing the BAMM executable file, the control file and the tree.

./bamm -c divcontrol.txt ## copy and paste into the terminal

BAMMtools R Package

install.packages("BAMMtools")
library(BAMMtools)

tree <- read.tree("whaletree.tre")
plot(tree, cex = 0.35)

Read the file “eventdata.txt” which contains the information that BAMMtools needs to analyze BAMM output.

events <- read.csv("event_data.txt")

However, to explore some of the functions in BAMMtools, let’s use an eventfile from a BAMM analysis that ran with a longer chain length.

data(whales, events.whales)

Here we will use the function “getEventData” to generate an object of the class “bammdata”. “bammdata” object is a complex data structure that includes a phylogenetic tree and a mapping of all macroevolutionary rate parameters sampled using BAMM. Many of the methods in BAMMtools operate directly on objects of class bammdata“.

ed <- getEventData(whales, events.whales, burnin=0.25)
## Processing event data from data.frame
##
## Discarded as burnin: GENERATIONS <  1245000
## Analyzing  751  samples from posterior
##
## Setting recursive sequence on tree...
##
## Done with recursive sequence
head(ed$eventData)
## [[1]]
##   node    time      lam1     lam2       mu1 mu2 index
## 1   88  0.0000 0.0971445  0.00000 0.0199072   0     1
## 2  141 25.6934 0.5699400 -0.13386 0.1152300   0     2
##
## [[2]]
##   node    time     lam1       lam2      mu1 mu2 index
## 1   88  0.0000 0.169028 -0.0207249 0.045860   0     1
## 2  141 25.9095 1.288720 -0.2392280 0.045262   0     2
##
## [[3]]
##   node    time     lam1       lam2       mu1 mu2 index
## 1   88  0.0000 0.479322 -0.0565555 0.1176540   0     1
## 2  141 25.2372 0.672619 -0.1634640 0.0190236   0     2
##
## [[4]]
##   node    time      lam1       lam2        mu1 mu2 index
## 1   88  0.0000 0.1971050 -0.0302915 0.05377420   0     1
## 2  140 24.5441 0.7476610 -0.1280560 0.00978705   0     2
## 3    1 33.4932 0.0535886  0.0000000 1.69524000   0     3
##
## [[5]]
##   node    time     lam1       lam2       mu1 mu2 index
## 1   88  0.0000 0.228074 -0.0324925 0.0828498   0     1
## 2  141 26.3576 0.889405 -0.2088700 0.0212998   0     2
##
## [[6]]
##   node    time     lam1      lam2        mu1 mu2 index
## 1   88  0.0000 0.153257 -0.026538 0.00371713   0     1
## 2  141 26.3608 0.580143 -0.127939 0.02964140   0     2
bamm.whales <- plot.bammdata(ed, lwd=2, labels = T, cex = 0.5)
addBAMMshifts(ed, cex=2)
addBAMMlegend(bamm.whales, corners = c(-1, -0.5, 40, 80), nTicks = 4, side = 4, las = 1)

Rate through time plots

We can also plot the rate through time estimated by BAMM.

plot.new()
plotRateThroughTime(ed, intervalCol="red", avgCol="red", ylim=c(0,1), cex.axis=2)
text(x=30, y= 0.8, label="All whales", font=4, cex=2.0, pos=4)

Or plot the rate through time estimated for a given clade:

plotRateThroughTime(ed, intervalCol="blue", avgCol="blue", node=141, ylim=c(0,1),cex.axis=1.5)
text(x=30, y= 0.8, label="Dolphins only", font=4, cex=2.0, pos=4)

Or exclude this given clade and thus plot the rate through time for the “background clade”.

plotRateThroughTime(ed, intervalCol="darkgreen", avgCol="darkgreen", node=141, nodetype = "exclude", ylim=c(0,1), cex.axis=1.5)
text(x=30, y= 0.8, label="Non-dolphins", font=4, cex=2.0, pos=4)

Extracting tip rates

Bamm estimates speciation and extinction rates for each branch on the tree. Here we will extract the rates estimated for each tip and plot in different ways.

tip.rates <- getTipRates(ed)
str(tip.rates)
## List of 4
##  $ lambda    : num [1:87, 1:751] 0.0971 0.0971 0.0971 0.0971 0.0971 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:87] "Balaena_mysticetus" "Eubalaena_australis" "Eubalaena_glacialis" "Eubalaena_japonica" ...
##   .. ..$ : NULL
##  $ mu        : num [1:87, 1:751] 0.0199 0.0199 0.0199 0.0199 0.0199 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:87] "Balaena_mysticetus" "Eubalaena_australis" "Eubalaena_glacialis" "Eubalaena_japonica" ...
##   .. ..$ : NULL
##  $ lambda.avg: Named num [1:87] 0.0794 0.0871 0.0871 0.0871 0.0713 ...
##   ..- attr(*, "names")= chr [1:87] "Balaena_mysticetus" "Eubalaena_australis" "Eubalaena_glacialis" "Eubalaena_japonica" ...
##  $ mu.avg    : Named num [1:87] 0.0773 0.0602 0.0602 0.0602 0.0442 ...
##   ..- attr(*, "names")= chr [1:87] "Balaena_mysticetus" "Eubalaena_australis" "Eubalaena_glacialis" "Eubalaena_japonica" ...

You can explore the tip rates with a histogram:

hist(tip.rates$lambda.avg,xlab="average lambda")

hist(tip.rates$mu.avg,xlab="average mu")

Or you can separate different groups of taxa to compare their tip rates:

boxplot(tip.rates$lambda.avg[53:87], tip.rates$lambda.avg[1:52], col = c("red", "blue"), names = c("dolphins", "other cetaceans"))

RPANDA

RPANDA is another recently developed R package that allows us to fit different models of rate variation through time and select the best fitting model using maximum likelihood analysis.

RPANDA shows basically two main differences from BAMM:

  1. In RPANDA, the user must inform which models are going to be tested, whereas in BAMM the program itself will average among the rates for each clade;

  2. In RPANDA, the user must know a priori which are the clades that might show a particular diversification regime, while BAMM will estimate where rate shifts are positioned using a rjMCMC algorithm.

The package also contains some simuation functions as well as datasets. You can find more details on CRAN.

Model selection

For this exercise, we will test four different scenarios with all combinations of constant and variable speciation and extinction rates as follows (based on Morlon et al. 2011, PNAS) :

lambda.cst <- function(x,y){y}
lambda.var <- function(x,y){y[1]*exp(y[2]*x)}
mu.cst <- function(x,y){y}
mu.var <- function(x,y){y[1]*exp(y[2]*x)}

The rate variations for the four possible pairwise combinations look like this:

Having decided which models one would like to test, the next step is to extract the clades that most likely share similar diversification regimes. As said previously, RPANDA needs the user to a priori indicate these clades. In this example, we will use the four main cetacean families plus the rest of cetaceans as a fifth clade.

delphinidae.tree <- drop.tip(whales,whales$tip.label[-match(delphinidae,whales$tip.label)])
phocoenidae.tree <- drop.tip(whales,whales$tip.label[-match(phocoenidae,whales$tip.label)])
ziphidae.tree <- drop.tip(whales,whales$tip.label[-match(ziphidae,whales$tip.label)])
othercetaceans.tree <- drop.tip(whales,c(balaenopteridae,delphinidae, phocoenidae, ziphidae))
fit.multi.rpanda <- function(tree,par)
    {
        bcstdcst <- fit_bd(tree, max(branching.times(tree)), f.lamb=lambda.cst, f.mu=mu.cst, lamb_par=par[[1]][1],mu_par=par[[1]][2],cst.lamb=TRUE,cst.mu=TRUE,cond="crown",f=87/89,dt=1e-3)
        bvardcst <- fit_bd(tree, max(branching.times(tree)), f.lamb=lambda.var, f.mu=mu.cst, lamb_par=par[[2]][c(1,2)],mu_par=par[[2]][3],expo.lamb=TRUE,cst.mu=TRUE,cond="crown",f=87/89,dt=1e-3)
        bcstdvar <- fit_bd(tree, max(branching.times(tree)), f.lamb=lambda.cst, f.mu=mu.var, lamb_par=par[[3]][1],mu_par=par[[3]][c(2,3)],cst.lamb=TRUE,expo.mu=TRUE,cond="crown",f=87/89,dt=1e-3)
        bvardvar <- fit_bd(tree, max(branching.times(tree)), f.lamb=lambda.var, f.mu=mu.var, lamb_par=par[[4]][c(1,2)],mu_par=par[[4]][c(3,4)],expo.lamb=TRUE,expo.mu=TRUE,cond="crown",f=87/89,dt=1e-3)
        return(list("bcstdcst"=bcstdcst,"bvardcst"=bvardcst,"bcstdvar"=bcstdvar,"bvardvar"=bvardvar))
    }
whales.par <- list(c(0.4,0),c(0.4,-0.05,0),c(0.4,0.1,0.05),c(0.4,-0.05,0.1,0.05))

Estimation of model parameters

In the posession of all models and clades, we can finally estimate the parameters of all the four models to each of the five clades and create an AICc table in order to select which one is the best model that describes the changes in both diversification rates (speciation and extinction) rates. (This part of the code take a while to complete)

results <- list("balaenopteridae.res"=fit.multi.rpanda(balaenopteridae.tree,whales.par),
                "delphinidae.res" = fit.multi.rpanda(delphinidae.tree,whales.par),
                "phocoenidae.res" = fit.multi.rpanda(phocoenidae.tree,whales.par),
                "ziphidae.res" = fit.multi.rpanda(ziphidae.tree,whales.par),
                "othercetaceans.res"= fit.multi.rpanda(othercetaceans.tree,whales.par))
aic.table <- matrix(nrow=4,ncol=5,NA)
for(i in 1:5)
    {
        for(j in 1:4)
            {
                aic.table[j,i] <- results[[i]][[j]]$aicc
            }
    }
colnames(aic.table) <- c("Balaenopteridae","Delphinidae","Phocoenidae","Ziphidae","Other Cetaceans")
rownames(aic.table) <- c("bcstdcst","bvardcst","bcstdvar","bvardvar")
aic.table
##          Balaenopteridae Delphinidae Phocoenidae Ziphidae Other Cetaceans
## bcstdcst        56.73336    171.4150    31.79207 132.6440        119.4690
## bvardcst        58.44456    170.0338    34.13348 130.1843        117.5562
## bcstdvar        58.32515    170.6453    41.79207 127.6094        115.3723
## bvardvar        65.67244    171.3258    64.13348 133.2251        118.9789
par.table <- data.frame("Balaenopteridae"=c(results[[1]]$bcstdcst$lamb_par[1:2],results[[1]]$bcstdcst$mu_par[1:2]),"Delphinidae"=c(results[[2]]$bvardcst$lamb_par[1:2],results[[2]]$bvardcst$mu_par[1:2]),"Phocoenidae"=c(results[[3]]$bcstdcst$lamb_par[1:2],results[[3]]$bcstdcst$mu_par[1:2]),"Ziphidae"=c(results[[4]]$bcstdcst$lamb_par[1:2],results[[4]]$bcstdcst$mu_par[1:2]),"Other Cetaceans"=c(results[[5]]$bcstdvar$lamb_par[1:2],results[[5]]$bcstdvar$mu_par[1:2]))
par.table
##   Balaenopteridae   Delphinidae   Phocoenidae     Ziphidae Other.Cetaceans
## 1    7.343080e-02  1.405102e-01  1.410234e-01 9.485149e-02       0.1857764
## 2              NA  1.257642e-01            NA           NA              NA
## 3   -6.391602e-10 -1.521115e-07 -6.012454e-08 7.545982e-08       0.8313274
## 4              NA            NA            NA           NA      -0.1742352

Plotting diversity through time

After selecting which model best fits the trees, we can estimate the diversity trajectory through time for each of the five clades.

# Function to calculate species richness in a given point in time
div.times <- c(max(branching.times(balaenopteridae.tree)),max(branching.times(delphinidae.tree)),max(branching.times(phocoenidae.tree)),max(branching.times(ziphidae.tree)),max(branching.times(othercetaceans.tree)))

# Function modified from plot_dtt from RPANDA package
plotdtt <- function (fit.bd, tot_time, N0, col=1, add=FALSE, div.time, xlim, ylim)
{
    if (!inherits(fit.bd, "fit.bd"))
        stop("object \"fit.bd\" is not of class \"fit.bd\"")
    t <- seq(tot_time-div.time, tot_time, 0.01)
    if ("f.mu" %in% attributes(fit.bd)$names) {
        r <- function(t) {
            -fit.bd$f.lamb(t) + fit.bd$f.mu(t)
        }
        R <- function(s) {
            RPANDA:::.Integrate(Vectorize(r), 0, s)
        }
        N <- N0 * exp(Vectorize(R)(t))
                                        #dev.new()
        if(add==FALSE)
            {
        plot(-t, N, type = "l", xlab = "time", ylab = "Number of species",
             main = "Diversity Through Time", col=col, xlim=xlim, ylim=ylim)
    }
        else
            {
                lines(-t, N, type = "l", xlab = "time", ylab = "Number of species",
                     main = "Diversity Through Time", col=col, xlim=xlim, ylim=ylim)
            }
    }
    else {
        r <- function(t) {
            -fit.bd$f.lamb(t)
        }
        R <- function(s) {
            RPANDA:::.Integrate(Vectorize(r), 0, s)
        }
        N <- N0 * exp(Vectorize(R)(t))
                                        #dev.new()
        if(add==FALSE)
            {
        plot(-t, N, type = "l", xlab = "time", ylab = "Number of species",
             main = "Diversity Through Time",col=col, xlim=xlim, ylim=ylim)
    }
        else
            {
                lines(-t, N, type = "l", xlab = "time", ylab = "Number of species",
                     main = "Diversity Through Time",col=col, xlim=xlim, ylim=ylim)
            }
    }
}

plotdtt(results$balaenopteridae$bcstdcst,div.times[1],N0=Ntip(balaenopteridae.tree),xlim=c(-max(div.times),0),ylim=c(0,150),div.time=div.times[1])
plotdtt(results$delphinidae$bvardcst,div.times[2],N0=Ntip(delphinidae.tree),col=6,add=TRUE,xlim=c(-max(div.times),0),ylim=c(0,150),div.time=div.times[2])
plotdtt(results$phocoenidae$bcstdcst,div.times[3],N0=Ntip(phocoenidae.tree),col="goldenrod",add=TRUE,xlim=c(-max(div.times),0),ylim=c(0,150),div.time=div.times[3])
plotdtt(results$ziphidae$bcstdcst,div.times[4],N0=Ntip(ziphidae.tree),col=4,add=TRUE,xlim=c(-max(div.times),0),ylim=c(0,150),div.time=div.times[4])
plotdtt(results$othercetaceans$bcstdvar,div.times[5],N0=Ntip(othercetaceans.tree),col="darkred",add=TRUE,xlim=c(-max(div.times),0),ylim=c(0,150),div.time=div.times[5])
legend("topleft",legend=c("Balaenopteridae","Delphinidae","Phocoenidae","Ziphidae","Other Cetaceans"),text.col=c(1,6,"goldenrod",4,"darkred"))

Challenge

  1. Fit models using RPANDA for the entire whale phylogeny, and plot its diversity through time using parts of the previous codes chunks.

  2. Run BAMM in the Anoles tree (provided in the example file) and plot the tree with the rates for each branch.