First steps

You need to install diversitree and hisse for this exercise if you haven’t already.

install.packages(c("diversitree", "hisse"))

BiSSE

library(ape)
library(TreeSim)
## Loading required package: geiger
## Loading required package: laser
library(diversitree)
## Loading required package: deSolve
## Loading required package: subplex
## Loading required package: Rcpp
simTree1 <- sim.bd.age(age = 10, numbsim = 1, lambda = 0.4, mu = 0)[[1]]

# first fit a Yule model
pbModel <- make.yule(simTree1)
pbMLFit <- find.mle(pbModel, 0.1)

# next fit a Birth-death model
bdModel <- make.bd(simTree1)
bdMLFit <- find.mle(bdModel, c(0.1, 0.05), method = "optim", lower = 0)

# compare models
anova(bdMLFit, pure.birth = pbMLFit)
##            Df  lnLik     AIC     ChiSq Pr(>|Chi|)
## full        2 67.524 -131.05                     
## pure.birth  1 67.523 -133.05 0.0012572     0.9717

The beauty of diversitree is that we can very easily run a Bayesian analysis of diversification rates.

bdSamples <- mcmc(bdModel, bdMLFit$par, nsteps = 1e+05, lower = c(0, 0), upper = c(Inf,  Inf), w = c(0.1, 0.1), fail.value = -Inf, print.every = 10000)
## 10000: {1.0878, 0.9068} -> 62.21496
## 20000: {0.4890, 0.1769} -> 67.01388
## 30000: {0.4475, 0.1049} -> 67.16241
## 40000: {0.4819, 0.0809} -> 67.47058
## 50000: {0.5041, 0.0672} -> 67.36711
## 60000: {0.8432, 0.6183} -> 64.70752
## 70000: {0.4860, 0.0378} -> 67.40862
## 80000: {0.5202, 0.0581} -> 67.16739
## 90000: {0.4817, 0.1863} -> 66.82557
## 100000: {0.7515, 0.4010} -> 65.71970

Lets visulaize these results

postSamples <- bdSamples[c("lambda", "mu")]
profiles.plot(postSamples, col.line = c("red", "blue"), las = 1, legend = "topright")

often estimates of r (= lambda-mu) are more precise than either lambda and mu

postSamples$r <- with(bdSamples, lambda - mu)
postSamples$eps <- with(bdSamples, mu/lambda)

profiles.plot(postSamples[, c("r", "eps")], col.line = c("red", "blue"), las = 1,  legend = "topright")

Relating diversification rates to character state

We can test for a relationship between character state and diversification rates using BiSSE (and related approaches). The syntax for BiSSE is consistent with other aspects of diversitree that we learned above. We will try this with anole ecomorph data. BiSSE can only handle binary traits, so we will have to change our data over to a binary (0-1) character.

let’s try a simulated data set to see if we can really detect a true effect

simPars <- c(0.4, 0.2, 0.05, 0.05, 0.05, 0.05)
set.seed(3)
simBisseData <- tree.bisse(simPars, max.t = 14, x0 = 0)

hist <- history.from.sim.discrete(simBisseData, 0:1)
plot(hist, simBisseData)

nbModel <- make.bisse(simBisseData, simBisseData$tip.state)
p <- starting.point.bisse(simBisseData)
nbModelMLFit <- find.mle(nbModel, p)

rbind(real = simPars, estimated = round(coef(nbModelMLFit), 2))
##           lambda0 lambda1  mu0  mu1  q01  q10
## real          0.4    0.20 0.05 0.05 0.05 0.05
## estimated     0.4    0.23 0.00 0.47 0.14 0.00

we can test a constrained model where the character does not affect diversification rates

cnbModel <- constrain(nbModel, lambda1 ~ lambda0)
cnbModel <- constrain(cnbModel, mu1 ~ mu0)

cnbModelMLFit <- find.mle(cnbModel, p[c(-1, -3)])

compare models

anova(nbModelMLFit, constrained = cnbModelMLFit)
##             Df   lnLik    AIC ChiSq Pr(>|Chi|)
## full         6 -212.54 437.09                 
## constrained  4 -213.86 435.72 2.632     0.2682

let’s try a Bayesian analysis

prior <- make.prior.exponential(1/(2 * 0.4))

this is not long enough but OK for now. You might want more like 1000000 generations.

mcmcRun <- mcmc(nbModel, nbModelMLFit$par, nsteps = 1000, prior = prior, w = 0.1, print.every = 100)
## 100: {0.4764, 0.2365, 0.1580, 0.4976, 0.1726, 0.0994} -> -214.90686
## 200: {0.5063, 0.2716, 0.0322, 0.2143, 0.1433, 0.2130} -> -215.33748
## 300: {0.5721, 0.5181, 0.4216, 0.1710, 0.1484, 0.6746} -> -219.73947
## 400: {0.4758, 0.1728, 0.0528, 0.0562, 0.1831, 0.4297} -> -214.32404
## 500: {0.4167, 0.4111, 0.0252, 0.5534, 0.2293, 0.4296} -> -215.97337
## 600: {0.2608, 0.3070, 0.0342, 0.2933, 0.2619, 0.3821} -> -222.18071
## 700: {0.4375, 0.7609, 0.1531, 0.5333, 0.8031, 2.9827} -> -225.98707
## 800: {0.4527, 0.1641, 0.3006, 0.3028, 0.0885, 0.0087} -> -218.23327
## 900: {0.4327, 0.2812, 0.1906, 0.1734, 0.1400, 0.3540} -> -215.15756
## 1000: {0.4310, 0.2367, 0.0852, 0.4269, 0.1284, 0.0251} -> -213.39588

plot it

col <- c("blue", "red")
profiles.plot(mcmcRun[, c("lambda0", "lambda1")], col.line = col, las = 1, legend = "topright")

profiles.plot(mcmcRun[, c("mu0", "mu1")], col.line = col, las = 1, legend = "topright")

# looks like speciation rate differs, but not extinction. can we confirm?
sum(mcmcRun$lambda0 > mcmcRun$lambda1)/length(mcmcRun$lambda1)
## [1] 0.875
sum(mcmcRun$mu0 > mcmcRun$mu1)/length(mcmcRun$mu1)
## [1] 0.212

Challenge problem

Repeat the above analyses with anoles (using the trait below). Fit a BiSSE model using both ML and Bayesian methods. What do you conclude?

You need a binary trait, so use the following

anoleData <- read.csv("~/Documents/teaching/revellClass/2014bogota/continuousModels/anolisDataAppended.csv", 
    row.names = 1)
anoleTree <- read.tree("~/Documents/teaching/revellClass/2014bogota/continuousModels/anolis.phy")

ecomorph <- anoleData[, "ecomorph"]
names(ecomorph) <- rownames(anoleData)

isTG <- as.numeric(ecomorph == "TG")
names(isTG) <- names(ecomorph)
isTG

HiSSE: The hidden state speciation and extinction model

This model, proposed by Beaulieu and O’Meara, offers one possible solution to the problem identified by Rabosky and Goldberg with the SSE methods. The idea is to provide an additional model to test a state dependent model against. In this model, the change in diversification is attributed to a second, unobserved character that co-occurs with the observed character identified in a typical BiSSE analysis. In this example, we are going to ask if transitions to coral reefs in haemulids change the diversification rate.

First lets get the data read.

library(hisse)
## Loading required package: GenSA
library(geiger)
##get the data

grunt.tree <- read.tree("/Users/michael_alfaro/Dropbox/content/hisse/haemulidae/grunts.phy")

grunt.data <- read.csv("/Users/michael_alfaro/Dropbox/content/hisse/haemulidae/grunts.csv", row.names = 1)
head(grunt.data)
##                          habitat habitat.names     trait1       trait2
## Pomadasys_panamensis           0      non-reef -0.1097252 -0.221171872
## Pomadasys_macracanthus         0      non-reef  0.1290682 -0.006984002
## Anisotremus_moricandi          1          reef  0.2853245  0.071222353
## Anisotremus_virginicus         1          reef  0.6064791  0.186049096
## Anisotremus_caesius            1          reef  0.4850071  0.076585754
## Anisotremus_surinamensis       1          reef  0.3441719  0.192564145
##                               trait3
## Pomadasys_panamensis      0.04161698
## Pomadasys_macracanthus   -0.04354035
## Anisotremus_moricandi     0.15465876
## Anisotremus_virginicus    0.23311505
## Anisotremus_caesius       0.08355851
## Anisotremus_surinamensis  0.20246620
name.check(grunt.tree, grunt.data)
## [1] "OK"

hisse expects a 2 column data frame with species names in column 1 and trait values in column 2

hd<-cbind(rownames(grunt.data), grunt.data[,"habitat"])

A better null model

Let’s create a null model in the HiSSE framework that allows the diversification rate on the tree to change independently from habitat. This model allows speciation and extinction to change as a function of an unobserved hidden character.

trans.rates.hisse <- TransMatMaker(hidden.states=TRUE)
trans.rates.hisse <- ParDrop(trans.rates.hisse, c(3,5,8,10))
trans.rates.hisse[!is.na(trans.rates.hisse) & !trans.rates.hisse == 0] = 1
null.2.hisse <- hisse(grunt.tree, hd, hidden.states=TRUE, turnover.anc=c(1,1,2,2), eps.anc=c(1,1,2,2), trans.rate=trans.rates.hisse)
## Initializing... 
## Finished. Beginning subplex routine... 
## Finished. Summarizing results...

Let’s store the likelihood and AIC to compare to a model where diversification rate in grunts depends on habitat.

null.logL <- null.2.hisse$loglik
null.AIC <- null.2.hisse$AIC

BiSSE in a HiSSE framework

Now lets compare the null to a model where grunt diversification changes with habitat. Although HiSSE is built upon the diversitree framework, the parameterizations are somewhat different from you used in BiSSE. We will use the following conventions:

state 0: marine state 1: reef

In addition, in HISSE we could consider the possibility that changes in reef diversification are caused by an unobserved hidden state with reef. We will designate this as

state 1A: reef, hidden character state not present state 1B: reef, hidden character state present

A standard BiSSE model does not include a hidden state, so for this step, we want a model that does not allow transitions to the hidden states 0B or 1B

We first create the HiSSE transition matrix

trans.rates.bisse <- TransMatMaker(hidden.states=FALSE)
trans.rates.bisse
##     (0) (1)
## (0)  NA   2
## (1)   1  NA

The hisse function takes turnover.anc and eps.anc as arguments. These arguments control the number of rate classes for the HiSSE model. The order of classes in these arguments corresponds to 0A, 1A, 0B, 1B. Since we want states 0 and 1 (marine and reef) to have different states, and no hidden state B we will pass (1,2,0,0) to these arguments. This gives rate class 1 to marine, rate class 2 to reef, and no rate classes for 0B or 1B).

pp.bisse.no.hidden <- hisse(grunt.tree, hd, hidden.states=FALSE, turnover.anc=c(1,2,0,0), eps.anc=c(1,2,0,0), trans.rate=trans.rates.bisse, output.type="raw")
## Initializing... 
## Finished. Beginning subplex routine... 
## Finished. Summarizing results...
pp.bisse.no.hidden
## 
## Fit
##      -lnL       AIC      AICc ntax
##  127.9808 -243.9616 -242.0081   50
## 
## Probability of extinction: 
## 
## Diversification Rates
##                     lambda           mu
## rate0             36.18318 1.468841e-13
## alpha0             1.00000 1.000000e+00
## beta0              1.00000 1.000000e+00
## timeslice.factor0  1.00000 1.000000e+00
## 
##                    lambda           mu
## rate1             63.1213 1.873791e-15
## alpha1             1.0000 1.000000e+00
## beta1              1.0000 1.000000e+00
## timeslice.factor1  1.0000 1.000000e+00
## 
## Transition Rates
##              (0)      (1)
## (0) 0.000000e+00 7.676702
## (1) 1.236898e-13 0.000000

Notice that the hisse object contains rates for the two states as well as the likelihood and AIC scores for the model. Lets grab the AIC and likelihoods again and compare to the null.

alt.logL <- pp.bisse.no.hidden$loglik
alt.AIC <- pp.bisse.no.hidden$AIC

logL <- c(null.logL, alt.logL)
AIC <- c(null.AIC, alt.AIC)

res <- as.data.frame(logL, row.names = c("null", "BiSSE"))
res$AIC <- AIC
res
##           logL       AIC
## null  126.6359 -243.2717
## BiSSE 127.9808 -243.9616

Allowing the diversification rate to simply change on the tree is better than invoking habitat. dependent diversification The hisse null is more complex than a typical BiSSE null that says that all rates are equal across the tree.

trans.rates.bisse <- TransMatMaker(hidden.states=FALSE)
bisse.null <- hisse(grunt.tree, hd, hidden.states=FALSE, turnover.anc=c(1,1,0,0), eps.anc=c(1,1,0,0), trans.rate=trans.rates.bisse, output.type="raw")
## Initializing... 
## Finished. Beginning subplex routine... 
## Finished. Summarizing results...
bisse.null
## 
## Fit
##      -lnL      AIC      AICc ntax
##  126.2475 -244.495 -243.6061   50
## 
## Probability of extinction: 
## 
## Diversification Rates
##                     lambda           mu
## rate0             48.00023 2.368666e-15
## alpha0             1.00000 1.000000e+00
## beta0              1.00000 1.000000e+00
## timeslice.factor0  1.00000 1.000000e+00
## 
##                     lambda           mu
## rate1             48.00023 2.368666e-15
## alpha1             1.00000 1.000000e+00
## beta1              1.00000 1.000000e+00
## timeslice.factor1  1.00000 1.000000e+00
## 
## Transition Rates
##              (0)      (1)
## (0) 0.000000e+00 7.755563
## (1) 1.247009e-14 0.000000
bisse.null.logL <- bisse.null$loglik
bisse.null.AIC <- bisse.null$AIC


res <- rbind(data.frame(logL = bisse.null.logL, AIC = bisse.null.AIC), res)

row.names(res) <- c("bisse null", "hisse null", "habitat dependent")
res
##                       logL       AIC
## bisse null        126.2475 -244.4950
## hisse null        126.6359 -243.2717
## habitat dependent 127.9808 -243.9616

Notice that if you had just compared the bisse null to the habitat dependent model you might have convinced yourself that habitat had a weak effect on diversification rate in grunts. The HiSSE null shows that almost all of this fit is due simply to diversification rate heterogeneity in the tree that is not related to habitat.

The hidden state model

If we did find a strong effect of the BiSSE model, we might want to know if an additional, unobserved state is a better explanation of diversification on reefs rather than simply a transition to reefs themselves. We might expect that this is driving a significant BiSSE result if only a single subclade of reef species has diversified more quickly. Let’s fit this hidden state model.

Th step adds a hidden state to state 1 and drops other possible transitions

trans.rates.hisse <- TransMatMaker(hidden.states=TRUE)
trans.rates.hisse <- ParDrop(trans.rates.hisse, c(2,3,5,7,8,9,10,12))

When we call HiSSE with a hidden state we want to add state 1B as a separate rate class. Recall that turnover.anc and eps.anc accept rate class designations in this order: 0A, 1A, 0B, 1B so turnover.anc=c(1,2,0,3) says we want separate rate classes for 0A, 1A, and 1B and will specify that 0B (marine, hidden) does not exist.

hisse.grunts <- hisse(grunt.tree, hd, hidden.states=TRUE, turnover.anc=c(1,2,0,3), eps.anc=c(1,2,0,3), trans.rate=trans.rates.hisse, output.type="raw")
## Initializing... 
## Finished. Beginning subplex routine... 
## Finished. Summarizing results...
hisse.logL <- hisse.grunts$loglik
hisse.AIC <- hisse.grunts$AIC
res <- rbind(res, data.frame(logL = hisse.logL, AIC = hisse.AIC, row.names = "habitat with hidden state"))

res
##                               logL       AIC
## bisse null                126.2475 -244.4950
## hisse null                126.6359 -243.2717
## habitat dependent         127.9808 -243.9616
## habitat with hidden state 128.2453 -236.4905

The addition of the hidden state gives the highest likelihood but this is offset by the number of parameters we add, so the AIC score is worse than the other models.