This tutorial is about fitting multi-rate Brownian motion models (using phytools), multi-regime OU models (using the OUwie package), and multivariate Brownian models (using phytools).
Multi-rate Brownian evolution
The first exercise is designed to explore a model developed by O'Meara et al. (2006; Evolution) in which the rate of Brownian evolution (σ2) takes different values in different parts of a phylogeny.
For this analysis we will use the function brownie.lite
in the
phytools package. This function allows us to input a tree with painted rate
“regimes” - for instance, from a stochastically mapped discrete character -
and then fit a model in which the rate differs depending on the mapped
regime.
First, let's read simulated tree & dataset from file. These data will be used
to illustrate fitting a two-rate Brownian model using brownie.lite
in the phytools package. The tree is
simulated.tre
and the data vector is in
simulated.csv.
## load phytools
library(phytools)
Now, let's load the single tree with painted regimes from file. We can do this
by downloading the file from the web, and the loading it using the phytools
function read.simmap
; or we can load the file directly froom the
web as we learned yesterday.
## read simulated tree from file
tree<-read.simmap("simulated.tre",format="phylip")
## or
tree<-read.simmap("http://www.phytools.org/Ilhabela2015/data/simulated.tre",
format="phylip")
Now, let's plot the tree with painted regimes:
## plot tree with regimes
plotSimmap(tree,lwd=3)
## no colors provided. using the following legend:
## 1 2
## "black" "red"
add.simmap.legend(colors=setNames(c("black","red"),1:2),prompt=FALSE,
x=0.05*max(nodeHeights(tree)),y=0.1*Ntip(tree))
Next, we can read our data from file. These are data for a single continuous character:
## read data from file
x<-read.csv("simulated.csv",header=TRUE,row.names=1)
## convert to vector with names
x<-as.matrix(x)[,1]
x
## A B C D E F
## 1.32028474 0.61662698 -1.00911612 -0.70287046 1.56931705 1.64972830
## G H I J K L
## 0.94406092 1.23534571 2.05738145 3.16707622 -0.39868954 -1.40144281
## M N O P Q R
## -0.89405959 -0.76103676 -0.76097347 -0.81459239 -0.19720860 -0.06206985
## S T U V W X
## 0.06946520 -0.18486903 -1.77787925 -0.79620907 -0.39166148 -0.63321831
## Y Z
## -0.48235455 -0.46151922
Now we are ready to fit our multi-rate
## fit multi-rate Brownian model
fitBM<-brownie.lite(tree,x)
fitBM
## ML single-rate model:
## s^2 se a k logL
## value 2.1036 0.5834 -0.1898 2 -31.5992
##
## ML multi-rate model:
## s^2(1) se(1) s^2(2) se(2) a k logL
## value 0.6659 0.2398 5.7837 2.7182 -0.4222 3 -26.2556
##
## P-value (based on X^2): 0.0011
##
## R thinks it has found the ML solution.
This result shows (1) that a two-rate model fits the data highly significantly
better than a one-rate model; and (2) the rate of evolution in the fitted model
is much higher for state "2"
than for state "1"
.
Multi-optimum OU evolution
In addition to this approach in which the rate of evolution differs between different parts of the tree, we can also fit an Ornstein-Uhlenbeck model in which the pull or selection regimes different (i.e., have different optimums) in different parts of the phylogeny.
To explore this model, let's read in the Anolis tree & data for body size, and the fit a multi-optimum OU model in which the regime shifts are associated with the ecomorph state of different anole species. Keep in mind, that our data are merely for overall size, while the ecomorph convergence is multivariate.
In this case, our tree is Anolis.tre and our data is Anolis.csv.
First, let's load the package “OUwie”. If you do not have OUwie, then you should first install it from CRAN.
## load OUwie
library(OUwie)
Now let's read our data from the input file:
## read anole data from file
X<-read.csv("Anolis.csv",row.names=1,header=TRUE)
## read anole tree
tree<-read.tree("Anolis.tre")
plotTree(tree,type="fan",fsize=0.8)
## setEnv=TRUE for this type is experimental. please be patient with bugs
Next, to work in OUwie we need to make a special data frame for our analysis, as follows:
## make analysis input data.frame
ecomorph<-as.matrix(X)[,"ecomorph"]
svl<-as.matrix(X)[,"svl"]
data<-data.frame(Genus_species=rownames(X),Reg=ecomorph,X=as.numeric(svl))
This data frame contains our trait and the regimes (in this case,
"ecomorph"
) for the tips, but we also need a reconstruction of the
regimes across the branches and nodes of the tree. For this, we will use the
method of stochastic mapping. Here, I will just use one stochastic map - but
normally we would want to integrate across a set of stochastic maps.
## perform & plot stochastic maps (we would normally do this x100)
tree<-make.simmap(tree,ecomorph)
## make.simmap is sampling character histories conditioned on the transition matrix
## Q =
## CG GB TC TG Tr Tw
## CG -0.3933357 0.0000000 0.19321489 0.0000000 0.00000000 0.2001209
## GB 0.0000000 -0.5024038 0.00000000 0.1959991 0.00000000 0.3064047
## TC 0.1932149 0.0000000 -0.74116782 0.0000000 0.09187982 0.4560731
## TG 0.0000000 0.1959991 0.00000000 -0.4428234 0.00000000 0.2468243
## Tr 0.0000000 0.0000000 0.09187982 0.0000000 -0.23442525 0.1425454
## Tw 0.2001209 0.3064047 0.45607311 0.2468243 0.14254542 -1.3519684
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
## CG GB TC TG Tr Tw
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## Done.
plotSimmap(tree,type="fan",fsize=0.8,ftype="i")
## no colors provided. using the following legend:
## CG GB TC TG Tr Tw
## "black" "red" "green3" "blue" "cyan" "magenta"
## setEnv=TRUE for this type is experimental. please be patient with bugs
Now, unfortunately, although OUwie takes the stochastic map format, it uses a slightly different convention so we also need to set a vector of node states for our phylogeny:
## hack
tree$node.label<-getStates(tree,"nodes")
Now we are finally ready to fit our models. We will first fit a single-rate Brownian model and then we will also fit a multi-rate OU model for comparison:
## fit Brownian & multi-optimum OU models
fitBM<-OUwie(tree,data,model="BM1",simmap.tree=TRUE)
## Initializing...
## Finished. Begin thorough search...
## Finished. Summarizing results.
fitOU<-OUwie(tree,data,model="OUM",simmap.tree=TRUE)
## Initializing...
## Finished. Begin thorough search...
## Finished. Summarizing results.
fitBM
##
## Fit
## -lnL AIC AICc model ntax
## 5.255984 -6.511968 -6.36007 BM1 82
##
## Rates
## CG GB TC TG Tr Tw
## alpha NA NA NA NA NA NA
## sigma.sq 0.10934 0.10934 0.10934 0.10934 0.10934 0.10934
##
## Optima
## CG GB TC TG Tr Tw
## estimate 4.0535069 4.0535069 4.0535069 4.0535069 4.0535069 4.0535069
## se 0.1010042 0.1010042 0.1010042 0.1010042 0.1010042 0.1010042
##
## Arrived at a reliable solution
fitOU
##
## Fit
## -lnL AIC AICc model ntax
## 41.00746 -66.01491 -64.04231 OUM 82
##
##
## Rates
## CG GB TC TG Tr Tw
## alpha 1.35699400 1.35699400 1.35699400 1.35699400 1.35699400 1.35699400
## sigma.sq 0.08367758 0.08367758 0.08367758 0.08367758 0.08367758 0.08367758
##
## Optima
## CG GB TC TG Tr Tw
## estimate 5.5411250 3.67381482 4.07460796 4.12751620 3.8246868 3.91769612
## se 0.1352433 0.07595605 0.08757856 0.07087277 0.1662112 0.06446996
##
## Arrived at a reliable solution
Note that the -lnL is actually the log-likelihood (not negative), I believe.
Multivariate Brownian evolution
The last thing we are going to do is fit a multivariate Brownian model in which the evolutinary covariance (and thus correlation) between characters can be different in different parts of the tree. This is a method based on Revell & Collar (2009; Evolution).
For this example we will use data and a phylogeny for centrarchid fishes. The data are in Centrarchidae.csv and the tree is in Centrarchidae.tre.
## load phytools
library(phytools)
Now let's read our tree & data:
## read centrarchid tree
tree<-read.tree("Centrarchidae.tre") ## or
tree<-read.tree("http://www.phytools.org/Ilhabela2015/data/Centrarchidae.tre")
## read centrarchid data
X<-read.csv("Centrarchidae.csv",header=TRUE,row.names=1) ## or
X<-read.csv("http://www.phytools.org/Ilhabela2015/data/Centrarchidae.csv",
header=TRUE,row.names=1)
X
## feeding.mode gape.width buccal.length
## Acantharchus_pomotis pisc 0.114 -0.009
## Lepomis_gibbosus non -0.133 -0.009
## Lepomis_microlophus non -0.151 0.012
## Lepomis_punctatus non -0.103 -0.019
## Lepomis_miniatus non -0.134 0.001
## Lepomis_auritus non -0.222 -0.039
## Lepomis_marginatus non -0.187 -0.075
## Lepomis_megalotis non -0.073 -0.049
## Lepomis_humilis non 0.024 -0.027
## Lepomis_macrochirus non -0.191 0.002
## Lepomis_gulosus pisc 0.131 0.122
## Lepomis_symmetricus non 0.013 -0.025
## Lepomis_cyanellus pisc -0.002 -0.009
## Micropterus_coosae pisc 0.045 -0.009
## Micropterus_notius pisc 0.097 -0.009
## Micropterus_treculi pisc 0.056 0.001
## Micropterus_salmoides pisc 0.056 -0.059
## Micropterus_floridanus pisc 0.096 0.051
## Micropterus_punctulatus pisc 0.092 0.002
## Micropterus_dolomieu pisc 0.035 -0.069
## Centrarchus_macropterus non -0.007 -0.055
## Enneacantus_obesus non 0.016 -0.005
## Pomoxis_annularis pisc -0.004 -0.019
## Pomoxis_nigromaculatus pisc 0.105 0.041
## Archolites_interruptus pisc -0.024 0.051
## Ambloplites_ariommus pisc 0.135 0.123
## Ambloplites_rupestris pisc 0.086 0.041
## Ambloplites_cavifrons pisc 0.130 0.040
Now let's pull out the feeding mode from this data frame. This is the character that we are going to map on the tree for our different regimes. We can then use this character to generate a set of stochastic maps on the phylogeny:
fmode<-as.matrix(X)[,1]
fmode
## Acantharchus_pomotis Lepomis_gibbosus Lepomis_microlophus
## "pisc" "non" "non"
## Lepomis_punctatus Lepomis_miniatus Lepomis_auritus
## "non" "non" "non"
## Lepomis_marginatus Lepomis_megalotis Lepomis_humilis
## "non" "non" "non"
## Lepomis_macrochirus Lepomis_gulosus Lepomis_symmetricus
## "non" "pisc" "non"
## Lepomis_cyanellus Micropterus_coosae Micropterus_notius
## "pisc" "pisc" "pisc"
## Micropterus_treculi Micropterus_salmoides Micropterus_floridanus
## "pisc" "pisc" "pisc"
## Micropterus_punctulatus Micropterus_dolomieu Centrarchus_macropterus
## "pisc" "pisc" "non"
## Enneacantus_obesus Pomoxis_annularis Pomoxis_nigromaculatus
## "non" "pisc" "pisc"
## Archolites_interruptus Ambloplites_ariommus Ambloplites_rupestris
## "pisc" "pisc" "pisc"
## Ambloplites_cavifrons
## "pisc"
## stochastic mapping of feeding mode on the tree (we would normally do x100)
tree<-make.simmap(tree,fmode,model="ARD")
## make.simmap is sampling character histories conditioned on the transition matrix
## Q =
## non pisc
## non -6.087788 6.087788
## pisc 3.048905 -3.048905
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
## non pisc
## 0.5 0.5
## Done.
Plot our tree & mapped regimes:
cols<-setNames(c("blue","red"),c("non","pisc"))
plotSimmap(tree,colors=cols,ftype="i")
add.simmap.legend(colors=cols,prompt=FALSE,x=0,y=Ntip(tree))
Next, we can use the two continuous characters to test our hypothesis that the
feeding mode affects the evolutionary correlation/covariance between traits. For
this analysis we will use evol.vcv
in the phytools package.
## data
X<-as.matrix(X[,2:3])
## fit model
fit<-evol.vcv(tree,X)
fit
## ML single-matrix model:
## R[1,1] R[1,2] R[2,2] k log(L)
## fitted 0.114 0.033 0.0556 5 72.1893
##
## ML multi-matrix model:
## R[1,1] R[1,2] R[2,2] k log(L)
## non 0.1892 -0.0056 0.0149 8 82.8025
## pisc 0.0566 0.0498 0.0757
##
## P-value (based on X^2): 1e-04
##
## R thinks it has found the ML solution.
This shows us that the two covariance model fits significantly better than the one covariance model. We can also easily extract the evolutionary correlations from these two alernative models:
## now let's look at the correlation matrices
cov2cor(fit$R.single)
## gape.width buccal.length
## gape.width 1.000000 0.414274
## buccal.length 0.414274 1.000000
cov2cor(fit$R.multiple[,,"non"])
## gape.width buccal.length
## gape.width 1.0000000 -0.1059162
## buccal.length -0.1059162 1.0000000
cov2cor(fit$R.multiple[,,"pisc"])
## gape.width buccal.length
## gape.width 1.0000000 0.7604935
## buccal.length 0.7604935 1.0000000
Challenge problem
Repeat the analysis of parts 2 & 3 (Multi-optimum OU evolution and Multivariate Brownian evolution) but across a stochastic mapping sample of trees. Do the results change? Why?
Note, for the OUwie example we would recommend using a relatively small sample (say, N = 10) of stochastic map trees as the model fitting is slow.
Written by Liam J. Revell. Last updated July 4, 2015