Multi-rate, multi-regime, and multivariate models for continuous traits


Multi-rate Brownian evolution

This tutorial is about fitting multi-rate Brownian motion models (using phytools), multi-regime OU models (using the OUwie package), and multivariate Brownian models (using phytools).

Multi-rate Brownian evolution

The first exercise is designed to explore a model developed by O'Meara et al. (2006; Evolution) in which the rate of Brownian evolution (σ2) takes different values in different parts of a phylogeny.

For this analysis we will use the function brownie.lite in the phytools package. This function allows us to input a tree with painted rate “regimes” - for instance, from a stochastically mapped discrete character - and then fit a model in which the rate differs depending on the mapped regime.

First, let's read simulated tree & dataset from file. These data will be used to illustrate fitting a two-rate Brownian model using brownie.lite in the phytools package. The tree is simulated.tre and the data vector is in simulated.csv.

## load phytools
library(phytools)

Now, let's load the single tree with painted regimes from file. We can do this by downloading the file from the web, and the loading it using the phytools function read.simmap; or we can load the file directly froom the web as we learned yesterday.

## read simulated tree from file
tree<-read.simmap("simulated.tre",format="phylip")
## or
tree<-read.simmap("http://www.phytools.org/Ilhabela2015/data/simulated.tre",
    format="phylip")

Now, let's plot the tree with painted regimes:

## plot tree with regimes
plotSimmap(tree,lwd=3)
## no colors provided. using the following legend:
##       1       2 
## "black"   "red"
add.simmap.legend(colors=setNames(c("black","red"),1:2),prompt=FALSE,
    x=0.05*max(nodeHeights(tree)),y=0.1*Ntip(tree))

plot of chunk unnamed-chunk-3

Next, we can read our data from file. These are data for a single continuous character:

## read data from file
x<-read.csv("simulated.csv",header=TRUE,row.names=1)
## convert to vector with names
x<-as.matrix(x)[,1]
x
##           A           B           C           D           E           F 
##  1.32028474  0.61662698 -1.00911612 -0.70287046  1.56931705  1.64972830 
##           G           H           I           J           K           L 
##  0.94406092  1.23534571  2.05738145  3.16707622 -0.39868954 -1.40144281 
##           M           N           O           P           Q           R 
## -0.89405959 -0.76103676 -0.76097347 -0.81459239 -0.19720860 -0.06206985 
##           S           T           U           V           W           X 
##  0.06946520 -0.18486903 -1.77787925 -0.79620907 -0.39166148 -0.63321831 
##           Y           Z 
## -0.48235455 -0.46151922

Now we are ready to fit our multi-rate

## fit multi-rate Brownian model
fitBM<-brownie.lite(tree,x)
fitBM
## ML single-rate model:
##  s^2 se  a   k   logL
## value    2.1036  0.5834  -0.1898 2   -31.5992    
## 
## ML multi-rate model:
##  s^2(1)  se(1)   s^2(2)  se(2)   a   k   logL    
## value    0.6659  0.2398  5.7837  2.7182  -0.4222 3   -26.2556
## 
## P-value (based on X^2): 0.0011 
## 
## R thinks it has found the ML solution.

This result shows (1) that a two-rate model fits the data highly significantly better than a one-rate model; and (2) the rate of evolution in the fitted model is much higher for state "2" than for state "1".

Multi-optimum OU evolution

In addition to this approach in which the rate of evolution differs between different parts of the tree, we can also fit an Ornstein-Uhlenbeck model in which the pull or selection regimes different (i.e., have different optimums) in different parts of the phylogeny.

To explore this model, let's read in the Anolis tree & data for body size, and the fit a multi-optimum OU model in which the regime shifts are associated with the ecomorph state of different anole species. Keep in mind, that our data are merely for overall size, while the ecomorph convergence is multivariate.

In this case, our tree is Anolis.tre and our data is Anolis.csv.

First, let's load the package “OUwie”. If you do not have OUwie, then you should first install it from CRAN.

## load OUwie
library(OUwie)

Now let's read our data from the input file:

## read anole data from file
X<-read.csv("Anolis.csv",row.names=1,header=TRUE)
## read anole tree
tree<-read.tree("Anolis.tre")
plotTree(tree,type="fan",fsize=0.8)

plot of chunk unnamed-chunk-7

## setEnv=TRUE for this type is experimental. please be patient with bugs

Next, to work in OUwie we need to make a special data frame for our analysis, as follows:

## make analysis input data.frame
ecomorph<-as.matrix(X)[,"ecomorph"]
svl<-as.matrix(X)[,"svl"]
data<-data.frame(Genus_species=rownames(X),Reg=ecomorph,X=as.numeric(svl))

This data frame contains our trait and the regimes (in this case, "ecomorph") for the tips, but we also need a reconstruction of the regimes across the branches and nodes of the tree. For this, we will use the method of stochastic mapping. Here, I will just use one stochastic map - but normally we would want to integrate across a set of stochastic maps.

## perform & plot stochastic maps (we would normally do this x100)
tree<-make.simmap(tree,ecomorph)
## make.simmap is sampling character histories conditioned on the transition matrix
## Q =
##            CG         GB          TC         TG          Tr         Tw
## CG -0.3933357  0.0000000  0.19321489  0.0000000  0.00000000  0.2001209
## GB  0.0000000 -0.5024038  0.00000000  0.1959991  0.00000000  0.3064047
## TC  0.1932149  0.0000000 -0.74116782  0.0000000  0.09187982  0.4560731
## TG  0.0000000  0.1959991  0.00000000 -0.4428234  0.00000000  0.2468243
## Tr  0.0000000  0.0000000  0.09187982  0.0000000 -0.23442525  0.1425454
## Tw  0.2001209  0.3064047  0.45607311  0.2468243  0.14254542 -1.3519684
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##        CG        GB        TC        TG        Tr        Tw 
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## Done.
plotSimmap(tree,type="fan",fsize=0.8,ftype="i")
## no colors provided. using the following legend:
##        CG        GB        TC        TG        Tr        Tw 
##   "black"     "red"  "green3"    "blue"    "cyan" "magenta"

plot of chunk unnamed-chunk-9

## setEnv=TRUE for this type is experimental. please be patient with bugs

Now, unfortunately, although OUwie takes the stochastic map format, it uses a slightly different convention so we also need to set a vector of node states for our phylogeny:

## hack
tree$node.label<-getStates(tree,"nodes")

Now we are finally ready to fit our models. We will first fit a single-rate Brownian model and then we will also fit a multi-rate OU model for comparison:

## fit Brownian & multi-optimum OU models
fitBM<-OUwie(tree,data,model="BM1",simmap.tree=TRUE)
## Initializing... 
## Finished. Begin thorough search... 
## Finished. Summarizing results.
fitOU<-OUwie(tree,data,model="OUM",simmap.tree=TRUE)
## Initializing... 
## Finished. Begin thorough search... 
## Finished. Summarizing results.
fitBM
## 
## Fit
##      -lnL       AIC     AICc model ntax
##  5.255984 -6.511968 -6.36007   BM1   82
## 
## Rates
##               CG      GB      TC      TG      Tr      Tw
## alpha         NA      NA      NA      NA      NA      NA
## sigma.sq 0.10934 0.10934 0.10934 0.10934 0.10934 0.10934
## 
## Optima
##                 CG        GB        TC        TG        Tr        Tw
## estimate 4.0535069 4.0535069 4.0535069 4.0535069 4.0535069 4.0535069
## se       0.1010042 0.1010042 0.1010042 0.1010042 0.1010042 0.1010042
## 
## Arrived at a reliable solution
fitOU
## 
## Fit
##      -lnL       AIC      AICc model ntax
##  41.00746 -66.01491 -64.04231   OUM   82
## 
## 
## Rates
##                  CG         GB         TC         TG         Tr         Tw
## alpha    1.35699400 1.35699400 1.35699400 1.35699400 1.35699400 1.35699400
## sigma.sq 0.08367758 0.08367758 0.08367758 0.08367758 0.08367758 0.08367758
## 
## Optima
##                 CG         GB         TC         TG        Tr         Tw
## estimate 5.5411250 3.67381482 4.07460796 4.12751620 3.8246868 3.91769612
## se       0.1352433 0.07595605 0.08757856 0.07087277 0.1662112 0.06446996
## 
## Arrived at a reliable solution

Note that the -lnL is actually the log-likelihood (not negative), I believe.

Multivariate Brownian evolution

The last thing we are going to do is fit a multivariate Brownian model in which the evolutinary covariance (and thus correlation) between characters can be different in different parts of the tree. This is a method based on Revell & Collar (2009; Evolution).

For this example we will use data and a phylogeny for centrarchid fishes. The data are in Centrarchidae.csv and the tree is in Centrarchidae.tre.

## load phytools
library(phytools)

Now let's read our tree & data:

## read centrarchid tree
tree<-read.tree("Centrarchidae.tre") ## or
tree<-read.tree("http://www.phytools.org/Ilhabela2015/data/Centrarchidae.tre")
## read centrarchid data
X<-read.csv("Centrarchidae.csv",header=TRUE,row.names=1) ## or
X<-read.csv("http://www.phytools.org/Ilhabela2015/data/Centrarchidae.csv",
    header=TRUE,row.names=1)
X
##                         feeding.mode gape.width buccal.length
## Acantharchus_pomotis            pisc      0.114        -0.009
## Lepomis_gibbosus                 non     -0.133        -0.009
## Lepomis_microlophus              non     -0.151         0.012
## Lepomis_punctatus                non     -0.103        -0.019
## Lepomis_miniatus                 non     -0.134         0.001
## Lepomis_auritus                  non     -0.222        -0.039
## Lepomis_marginatus               non     -0.187        -0.075
## Lepomis_megalotis                non     -0.073        -0.049
## Lepomis_humilis                  non      0.024        -0.027
## Lepomis_macrochirus              non     -0.191         0.002
## Lepomis_gulosus                 pisc      0.131         0.122
## Lepomis_symmetricus              non      0.013        -0.025
## Lepomis_cyanellus               pisc     -0.002        -0.009
## Micropterus_coosae              pisc      0.045        -0.009
## Micropterus_notius              pisc      0.097        -0.009
## Micropterus_treculi             pisc      0.056         0.001
## Micropterus_salmoides           pisc      0.056        -0.059
## Micropterus_floridanus          pisc      0.096         0.051
## Micropterus_punctulatus         pisc      0.092         0.002
## Micropterus_dolomieu            pisc      0.035        -0.069
## Centrarchus_macropterus          non     -0.007        -0.055
## Enneacantus_obesus               non      0.016        -0.005
## Pomoxis_annularis               pisc     -0.004        -0.019
## Pomoxis_nigromaculatus          pisc      0.105         0.041
## Archolites_interruptus          pisc     -0.024         0.051
## Ambloplites_ariommus            pisc      0.135         0.123
## Ambloplites_rupestris           pisc      0.086         0.041
## Ambloplites_cavifrons           pisc      0.130         0.040

Now let's pull out the feeding mode from this data frame. This is the character that we are going to map on the tree for our different regimes. We can then use this character to generate a set of stochastic maps on the phylogeny:

fmode<-as.matrix(X)[,1]
fmode
##    Acantharchus_pomotis        Lepomis_gibbosus     Lepomis_microlophus 
##                  "pisc"                   "non"                   "non" 
##       Lepomis_punctatus        Lepomis_miniatus         Lepomis_auritus 
##                   "non"                   "non"                   "non" 
##      Lepomis_marginatus       Lepomis_megalotis         Lepomis_humilis 
##                   "non"                   "non"                   "non" 
##     Lepomis_macrochirus         Lepomis_gulosus     Lepomis_symmetricus 
##                   "non"                  "pisc"                   "non" 
##       Lepomis_cyanellus      Micropterus_coosae      Micropterus_notius 
##                  "pisc"                  "pisc"                  "pisc" 
##     Micropterus_treculi   Micropterus_salmoides  Micropterus_floridanus 
##                  "pisc"                  "pisc"                  "pisc" 
## Micropterus_punctulatus    Micropterus_dolomieu Centrarchus_macropterus 
##                  "pisc"                  "pisc"                   "non" 
##      Enneacantus_obesus       Pomoxis_annularis  Pomoxis_nigromaculatus 
##                   "non"                  "pisc"                  "pisc" 
##  Archolites_interruptus    Ambloplites_ariommus   Ambloplites_rupestris 
##                  "pisc"                  "pisc"                  "pisc" 
##   Ambloplites_cavifrons 
##                  "pisc"
## stochastic mapping of feeding mode on the tree (we would normally do x100)
tree<-make.simmap(tree,fmode,model="ARD")
## make.simmap is sampling character histories conditioned on the transition matrix
## Q =
##            non      pisc
## non  -6.087788  6.087788
## pisc  3.048905 -3.048905
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##  non pisc 
##  0.5  0.5
## Done.

Plot our tree & mapped regimes:

cols<-setNames(c("blue","red"),c("non","pisc"))
plotSimmap(tree,colors=cols,ftype="i")
add.simmap.legend(colors=cols,prompt=FALSE,x=0,y=Ntip(tree))

plot of chunk unnamed-chunk-15

Next, we can use the two continuous characters to test our hypothesis that the feeding mode affects the evolutionary correlation/covariance between traits. For this analysis we will use evol.vcv in the phytools package.

## data
X<-as.matrix(X[,2:3])
## fit model
fit<-evol.vcv(tree,X)
fit
## ML single-matrix model:
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)
## fitted   0.114   0.033   0.0556  5   72.1893 
## 
## ML multi-matrix model:
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)
## non  0.1892  -0.0056 0.0149  8   82.8025 
## pisc 0.0566  0.0498  0.0757  
## 
## P-value (based on X^2): 1e-04 
## 
## R thinks it has found the ML solution.

This shows us that the two covariance model fits significantly better than the one covariance model. We can also easily extract the evolutionary correlations from these two alernative models:

## now let's look at the correlation matrices
cov2cor(fit$R.single)
##               gape.width buccal.length
## gape.width      1.000000      0.414274
## buccal.length   0.414274      1.000000
cov2cor(fit$R.multiple[,,"non"])
##               gape.width buccal.length
## gape.width     1.0000000    -0.1059162
## buccal.length -0.1059162     1.0000000
cov2cor(fit$R.multiple[,,"pisc"])
##               gape.width buccal.length
## gape.width     1.0000000     0.7604935
## buccal.length  0.7604935     1.0000000

Challenge problem

Repeat the analysis of parts 2 & 3 (Multi-optimum OU evolution and Multivariate Brownian evolution) but across a stochastic mapping sample of trees. Do the results change? Why?

Note, for the OUwie example we would recommend using a relatively small sample (say, N = 10) of stochastic map trees as the model fitting is slow.

Written by Liam J. Revell. Last updated July 4, 2015