BiSSE
library(ape)
library(TreeSim)
## Loading required package: geiger
## Loading required package: laser
library(diversitree)
## Loading required package: deSolve
## Loading required package: subplex
## Loading required package: Rcpp
simTree1 <- sim.bd.age(age = 10, numbsim = 1, lambda = 0.4, mu = 0)[[1]]
pbModel <- make.yule(simTree1)
pbMLFit <- find.mle(pbModel, 0.1)
bdModel <- make.bd(simTree1)
bdMLFit <- find.mle(bdModel, c(0.1, 0.05), method = "optim", lower = 0)
anova(bdMLFit, pure.birth = pbMLFit)
## Df lnLik AIC ChiSq Pr(>|Chi|)
## full 2 67.524 -131.05
## pure.birth 1 67.523 -133.05 0.0012572 0.9717
The beauty of diversitree is that we can very easily run a Bayesian analysis of diversification rates.
bdSamples <- mcmc(bdModel, bdMLFit$par, nsteps = 1e+05, lower = c(0, 0), upper = c(Inf, Inf), w = c(0.1, 0.1), fail.value = -Inf, print.every = 10000)
## 10000: {1.0878, 0.9068} -> 62.21496
## 20000: {0.4890, 0.1769} -> 67.01388
## 30000: {0.4475, 0.1049} -> 67.16241
## 40000: {0.4819, 0.0809} -> 67.47058
## 50000: {0.5041, 0.0672} -> 67.36711
## 60000: {0.8432, 0.6183} -> 64.70752
## 70000: {0.4860, 0.0378} -> 67.40862
## 80000: {0.5202, 0.0581} -> 67.16739
## 90000: {0.4817, 0.1863} -> 66.82557
## 100000: {0.7515, 0.4010} -> 65.71970
Lets visulaize these results
postSamples <- bdSamples[c("lambda", "mu")]
profiles.plot(postSamples, col.line = c("red", "blue"), las = 1, legend = "topright")
often estimates of r (= lambda-mu) are more precise than either lambda and mu
postSamples$r <- with(bdSamples, lambda - mu)
postSamples$eps <- with(bdSamples, mu/lambda)
profiles.plot(postSamples[, c("r", "eps")], col.line = c("red", "blue"), las = 1, legend = "topright")
Relating diversification rates to character state
We can test for a relationship between character state and diversification rates using BiSSE (and related approaches). The syntax for BiSSE is consistent with other aspects of diversitree that we learned above. We will try this with anole ecomorph data. BiSSE can only handle binary traits, so we will have to change our data over to a binary (0-1) character.
let’s try a simulated data set to see if we can really detect a true effect
simPars <- c(0.4, 0.2, 0.05, 0.05, 0.05, 0.05)
set.seed(3)
simBisseData <- tree.bisse(simPars, max.t = 14, x0 = 0)
hist <- history.from.sim.discrete(simBisseData, 0:1)
plot(hist, simBisseData)
nbModel <- make.bisse(simBisseData, simBisseData$tip.state)
p <- starting.point.bisse(simBisseData)
nbModelMLFit <- find.mle(nbModel, p)
rbind(real = simPars, estimated = round(coef(nbModelMLFit), 2))
## lambda0 lambda1 mu0 mu1 q01 q10
## real 0.4 0.20 0.05 0.05 0.05 0.05
## estimated 0.4 0.23 0.00 0.47 0.14 0.00
we can test a constrained model where the character does not affect diversification rates
cnbModel <- constrain(nbModel, lambda1 ~ lambda0)
cnbModel <- constrain(cnbModel, mu1 ~ mu0)
cnbModelMLFit <- find.mle(cnbModel, p[c(-1, -3)])
compare models
anova(nbModelMLFit, constrained = cnbModelMLFit)
## Df lnLik AIC ChiSq Pr(>|Chi|)
## full 6 -212.54 437.09
## constrained 4 -213.86 435.72 2.632 0.2682
let’s try a Bayesian analysis
prior <- make.prior.exponential(1/(2 * 0.4))
this is not long enough but OK for now. You might want more like 1000000 generations.
mcmcRun <- mcmc(nbModel, nbModelMLFit$par, nsteps = 1000, prior = prior, w = 0.1, print.every = 100)
## 100: {0.4764, 0.2365, 0.1580, 0.4976, 0.1726, 0.0994} -> -214.90686
## 200: {0.5063, 0.2716, 0.0322, 0.2143, 0.1433, 0.2130} -> -215.33748
## 300: {0.5721, 0.5181, 0.4216, 0.1710, 0.1484, 0.6746} -> -219.73947
## 400: {0.4758, 0.1728, 0.0528, 0.0562, 0.1831, 0.4297} -> -214.32404
## 500: {0.4167, 0.4111, 0.0252, 0.5534, 0.2293, 0.4296} -> -215.97337
## 600: {0.2608, 0.3070, 0.0342, 0.2933, 0.2619, 0.3821} -> -222.18071
## 700: {0.4375, 0.7609, 0.1531, 0.5333, 0.8031, 2.9827} -> -225.98707
## 800: {0.4527, 0.1641, 0.3006, 0.3028, 0.0885, 0.0087} -> -218.23327
## 900: {0.4327, 0.2812, 0.1906, 0.1734, 0.1400, 0.3540} -> -215.15756
## 1000: {0.4310, 0.2367, 0.0852, 0.4269, 0.1284, 0.0251} -> -213.39588
plot it
col <- c("blue", "red")
profiles.plot(mcmcRun[, c("lambda0", "lambda1")], col.line = col, las = 1, legend = "topright")
profiles.plot(mcmcRun[, c("mu0", "mu1")], col.line = col, las = 1, legend = "topright")
sum(mcmcRun$lambda0 > mcmcRun$lambda1)/length(mcmcRun$lambda1)
## [1] 0.875
sum(mcmcRun$mu0 > mcmcRun$mu1)/length(mcmcRun$mu1)
## [1] 0.212
Challenge problem
Repeat the above analyses with anoles (using the trait below). Fit a BiSSE model using both ML and Bayesian methods. What do you conclude?
You need a binary trait, so use the following
anoleData <- read.csv("~/Documents/teaching/revellClass/2014bogota/continuousModels/anolisDataAppended.csv",
row.names = 1)
anoleTree <- read.tree("~/Documents/teaching/revellClass/2014bogota/continuousModels/anolis.phy")
ecomorph <- anoleData[, "ecomorph"]
names(ecomorph) <- rownames(anoleData)
isTG <- as.numeric(ecomorph == "TG")
names(isTG) <- names(ecomorph)
isTG
HiSSE: The hidden state speciation and extinction model
This model, proposed by Beaulieu and O’Meara, offers one possible solution to the problem identified by Rabosky and Goldberg with the SSE methods. The idea is to provide an additional model to test a state dependent model against. In this model, the change in diversification is attributed to a second, unobserved character that co-occurs with the observed character identified in a typical BiSSE analysis. In this example, we are going to ask if transitions to coral reefs in haemulids change the diversification rate.
First lets get the data read.
library(hisse)
## Loading required package: GenSA
library(geiger)
grunt.tree <- read.tree("/Users/michael_alfaro/Dropbox/content/hisse/haemulidae/grunts.phy")
grunt.data <- read.csv("/Users/michael_alfaro/Dropbox/content/hisse/haemulidae/grunts.csv", row.names = 1)
head(grunt.data)
## habitat habitat.names trait1 trait2
## Pomadasys_panamensis 0 non-reef -0.1097252 -0.221171872
## Pomadasys_macracanthus 0 non-reef 0.1290682 -0.006984002
## Anisotremus_moricandi 1 reef 0.2853245 0.071222353
## Anisotremus_virginicus 1 reef 0.6064791 0.186049096
## Anisotremus_caesius 1 reef 0.4850071 0.076585754
## Anisotremus_surinamensis 1 reef 0.3441719 0.192564145
## trait3
## Pomadasys_panamensis 0.04161698
## Pomadasys_macracanthus -0.04354035
## Anisotremus_moricandi 0.15465876
## Anisotremus_virginicus 0.23311505
## Anisotremus_caesius 0.08355851
## Anisotremus_surinamensis 0.20246620
name.check(grunt.tree, grunt.data)
## [1] "OK"
hisse expects a 2 column data frame with species names in column 1 and trait values in column 2
hd<-cbind(rownames(grunt.data), grunt.data[,"habitat"])
A better null model
Let’s create a null model in the HiSSE framework that allows the diversification rate on the tree to change independently from habitat. This model allows speciation and extinction to change as a function of an unobserved hidden character.
trans.rates.hisse <- TransMatMaker(hidden.states=TRUE)
trans.rates.hisse <- ParDrop(trans.rates.hisse, c(3,5,8,10))
trans.rates.hisse[!is.na(trans.rates.hisse) & !trans.rates.hisse == 0] = 1
null.2.hisse <- hisse(grunt.tree, hd, hidden.states=TRUE, turnover.anc=c(1,1,2,2), eps.anc=c(1,1,2,2), trans.rate=trans.rates.hisse)
## Initializing...
## Finished. Beginning subplex routine...
## Finished. Summarizing results...
Let’s store the likelihood and AIC to compare to a model where diversification rate in grunts depends on habitat.
null.logL <- null.2.hisse$loglik
null.AIC <- null.2.hisse$AIC
BiSSE in a HiSSE framework
Now lets compare the null to a model where grunt diversification changes with habitat. Although HiSSE is built upon the diversitree framework, the parameterizations are somewhat different from you used in BiSSE. We will use the following conventions:
state 0: marine state 1: reef
In addition, in HISSE we could consider the possibility that changes in reef diversification are caused by an unobserved hidden state with reef. We will designate this as
state 1A: reef, hidden character state not present state 1B: reef, hidden character state present
A standard BiSSE model does not include a hidden state, so for this step, we want a model that does not allow transitions to the hidden states 0B or 1B
We first create the HiSSE transition matrix
trans.rates.bisse <- TransMatMaker(hidden.states=FALSE)
trans.rates.bisse
## (0) (1)
## (0) NA 2
## (1) 1 NA
The hisse function takes turnover.anc and eps.anc as arguments. These arguments control the number of rate classes for the HiSSE model. The order of classes in these arguments corresponds to 0A, 1A, 0B, 1B. Since we want states 0 and 1 (marine and reef) to have different states, and no hidden state B we will pass (1,2,0,0) to these arguments. This gives rate class 1 to marine, rate class 2 to reef, and no rate classes for 0B or 1B).
pp.bisse.no.hidden <- hisse(grunt.tree, hd, hidden.states=FALSE, turnover.anc=c(1,2,0,0), eps.anc=c(1,2,0,0), trans.rate=trans.rates.bisse, output.type="raw")
## Initializing...
## Finished. Beginning subplex routine...
## Finished. Summarizing results...
pp.bisse.no.hidden
##
## Fit
## -lnL AIC AICc ntax
## 127.9808 -243.9616 -242.0081 50
##
## Probability of extinction:
##
## Diversification Rates
## lambda mu
## rate0 36.18318 1.468841e-13
## alpha0 1.00000 1.000000e+00
## beta0 1.00000 1.000000e+00
## timeslice.factor0 1.00000 1.000000e+00
##
## lambda mu
## rate1 63.1213 1.873791e-15
## alpha1 1.0000 1.000000e+00
## beta1 1.0000 1.000000e+00
## timeslice.factor1 1.0000 1.000000e+00
##
## Transition Rates
## (0) (1)
## (0) 0.000000e+00 7.676702
## (1) 1.236898e-13 0.000000
Notice that the hisse object contains rates for the two states as well as the likelihood and AIC scores for the model. Lets grab the AIC and likelihoods again and compare to the null.
alt.logL <- pp.bisse.no.hidden$loglik
alt.AIC <- pp.bisse.no.hidden$AIC
logL <- c(null.logL, alt.logL)
AIC <- c(null.AIC, alt.AIC)
res <- as.data.frame(logL, row.names = c("null", "BiSSE"))
res$AIC <- AIC
res
## logL AIC
## null 126.6359 -243.2717
## BiSSE 127.9808 -243.9616
Allowing the diversification rate to simply change on the tree is better than invoking habitat. dependent diversification The hisse null is more complex than a typical BiSSE null that says that all rates are equal across the tree.
trans.rates.bisse <- TransMatMaker(hidden.states=FALSE)
bisse.null <- hisse(grunt.tree, hd, hidden.states=FALSE, turnover.anc=c(1,1,0,0), eps.anc=c(1,1,0,0), trans.rate=trans.rates.bisse, output.type="raw")
## Initializing...
## Finished. Beginning subplex routine...
## Finished. Summarizing results...
bisse.null
##
## Fit
## -lnL AIC AICc ntax
## 126.2475 -244.495 -243.6061 50
##
## Probability of extinction:
##
## Diversification Rates
## lambda mu
## rate0 48.00023 2.368666e-15
## alpha0 1.00000 1.000000e+00
## beta0 1.00000 1.000000e+00
## timeslice.factor0 1.00000 1.000000e+00
##
## lambda mu
## rate1 48.00023 2.368666e-15
## alpha1 1.00000 1.000000e+00
## beta1 1.00000 1.000000e+00
## timeslice.factor1 1.00000 1.000000e+00
##
## Transition Rates
## (0) (1)
## (0) 0.000000e+00 7.755563
## (1) 1.247009e-14 0.000000
bisse.null.logL <- bisse.null$loglik
bisse.null.AIC <- bisse.null$AIC
res <- rbind(data.frame(logL = bisse.null.logL, AIC = bisse.null.AIC), res)
row.names(res) <- c("bisse null", "hisse null", "habitat dependent")
res
## logL AIC
## bisse null 126.2475 -244.4950
## hisse null 126.6359 -243.2717
## habitat dependent 127.9808 -243.9616
Notice that if you had just compared the bisse null to the habitat dependent model you might have convinced yourself that habitat had a weak effect on diversification rate in grunts. The HiSSE null shows that almost all of this fit is due simply to diversification rate heterogeneity in the tree that is not related to habitat.
The hidden state model
If we did find a strong effect of the BiSSE model, we might want to know if an additional, unobserved state is a better explanation of diversification on reefs rather than simply a transition to reefs themselves. We might expect that this is driving a significant BiSSE result if only a single subclade of reef species has diversified more quickly. Let’s fit this hidden state model.
Th step adds a hidden state to state 1 and drops other possible transitions
trans.rates.hisse <- TransMatMaker(hidden.states=TRUE)
trans.rates.hisse <- ParDrop(trans.rates.hisse, c(2,3,5,7,8,9,10,12))
When we call HiSSE with a hidden state we want to add state 1B as a separate rate class. Recall that turnover.anc and eps.anc accept rate class designations in this order: 0A, 1A, 0B, 1B so turnover.anc=c(1,2,0,3) says we want separate rate classes for 0A, 1A, and 1B and will specify that 0B (marine, hidden) does not exist.
hisse.grunts <- hisse(grunt.tree, hd, hidden.states=TRUE, turnover.anc=c(1,2,0,3), eps.anc=c(1,2,0,3), trans.rate=trans.rates.hisse, output.type="raw")
## Initializing...
## Finished. Beginning subplex routine...
## Finished. Summarizing results...
hisse.logL <- hisse.grunts$loglik
hisse.AIC <- hisse.grunts$AIC
res <- rbind(res, data.frame(logL = hisse.logL, AIC = hisse.AIC, row.names = "habitat with hidden state"))
res
## logL AIC
## bisse null 126.2475 -244.4950
## hisse null 126.6359 -243.2717
## habitat dependent 127.9808 -243.9616
## habitat with hidden state 128.2453 -236.4905
The addition of the hidden state gives the highest likelihood but this is offset by the number of parameters we add, so the AIC score is worse than the other models.