For better or for worse, the estimation of phenotypic trait values for ancestral nodes in the tree continues to be an important goal in phylogenetic comparative biology.
This tutorial focuses on the estimation of ancestral character states under Brownian motion & OU, including when some values for internal nodes are known.
Anole body size
Let's start by estimating ancestral states for a a continuous character, in this case overall body size, on a phylogeny of Anolis lizards from the Caribbean.
To estimate the states for a continuously valued character at ancestral nodes, we need find the states that have the maximum probability of having arisen under our model. These will be our maximum likelihood estimates.
Ancestral character esimation is implemented in a variety of different R
functions. The most commonly used is ace
. Here, we start with
the phytools function fastAnc
.
The files you will need are here: anole.tre svl.csv. If you have a good web connection, it is also always possible to read in your files directly from the web, too.
## load libraries
library(phytools)
## read tree from file
tree<-read.tree("anole.tre") ## or
tree<-read.tree("http://www.phytools.org/Ilhabela2015/data/anole.tre")
## plot tree
plotTree(tree,type="fan",ftype="i")
## setEnv=TRUE for this type is experimental. please be patient with bugs
## read data
svl<-read.csv("svl.csv",row.names=1) ## or
svl<-read.csv("http://www.phytools.org/Ilhabela2015/data/svl.csv",
row.names=1)
## change this into a vector
svl<-as.matrix(svl)[,1]
svl
## ahli alayoni alfaroi aliniger
## 4.039125 3.815705 3.526655 4.036557
## allisoni allogus altitudinalis alumina
## 4.375390 4.040138 3.842994 3.588941
## alutaceus angusticeps argenteolus argillaceus
## 3.554891 3.788595 3.971307 3.757869
## armouri bahorucoensis baleatus baracoae
## 4.121684 3.827445 5.053056 5.042780
## barahonae barbatus barbouri bartschi
## 5.076958 5.003946 3.663932 4.280547
## bremeri breslini brevirostris caudalis
## 4.113371 4.051111 3.874155 3.911743
## centralis chamaeleonides chlorocyanus christophei
## 3.697941 5.042349 4.275448 3.884652
## clivicola coelestinus confusus cooki
## 3.758726 4.297965 3.938442 4.091535
## cristatellus cupeyalensis cuvieri cyanopleurus
## 4.189820 3.462014 4.875012 3.630161
## cybotes darlingtoni distichus dolichocephalus
## 4.210982 4.302036 3.928796 3.908550
## equestris etheridgei eugenegrahami evermanni
## 5.113994 3.657991 4.128504 4.165605
## fowleri garmani grahami guafe
## 4.288780 4.769473 4.154274 3.877457
## guamuhaya guazuma gundlachi haetianus
## 5.036953 3.763884 4.188105 4.316542
## hendersoni homolechis imias inexpectatus
## 3.859835 4.032806 4.099687 3.537439
## insolitus isolepis jubar krugi
## 3.800471 3.657088 3.952605 3.886500
## lineatopus longitibialis loysiana lucius
## 4.128612 4.242103 3.701240 4.198915
## luteogularis macilentus marcanoi marron
## 5.101085 3.715765 4.079485 3.831810
## mestrei monticola noblei occultus
## 3.987147 3.770613 5.083473 3.663049
## olssoni opalinus ophiolepis oporinus
## 3.793899 3.838376 3.637962 3.845670
## paternus placidus poncensis porcatus
## 3.802961 3.773967 3.820378 4.258991
## porcus pulchellus pumilis quadriocellifer
## 5.038034 3.799022 3.466860 3.901619
## reconditus ricordii rubribarbus sagrei
## 4.482607 5.013963 4.078469 4.067162
## semilineatus sheplani shrevei singularis
## 3.696631 3.682924 3.983003 4.057997
## smallwoodi strahmi stratulus valencienni
## 5.035096 4.274271 3.869881 4.321524
## vanidicus vermiculatus websteri whitemani
## 3.626206 4.802849 3.916546 4.097479
Now we can estimate ancestral states. We will also compute variances & 95% confidence intervals for each node:
fit<-fastAnc(tree,svl,vars=TRUE,CI=TRUE)
fit
## $ace
## 101 102 103 104 105 106 107 108
## 4.065918 4.045601 4.047993 4.066923 4.036743 4.049514 4.054528 4.045218
## 109 110 111 112 113 114 115 116
## 3.979493 3.952440 3.926814 3.964414 3.995835 3.948034 3.955203 3.977842
## 117 118 119 120 121 122 123 124
## 4.213995 4.240867 4.248450 4.257574 4.239061 4.004120 4.007024 4.015249
## 125 126 127 128 129 130 131 132
## 4.006720 3.992617 3.925848 3.995759 4.049619 3.930793 3.908343 3.901518
## 133 134 135 136 137 138 139 140
## 3.885086 4.040356 4.060970 3.987789 3.951878 3.814014 3.707733 3.926147
## 141 142 143 144 145 146 147 148
## 3.988745 4.167983 4.157021 4.166146 4.146362 4.146132 4.159052 4.113267
## 149 150 151 152 153 154 155 156
## 4.133069 4.222223 4.461376 4.488496 4.437250 4.512071 4.953146 5.033253
## 157 158 159 160 161 162 163 164
## 4.962377 5.009025 5.018389 3.974900 3.880482 3.859268 3.859265 3.871895
## 165 166 167 168 169 170 171 172
## 3.899914 3.807943 3.826619 4.151598 3.760446 3.654324 3.676713 3.825559
## 173 174 175 176 177 178 179 180
## 3.747726 3.813974 3.803077 3.702349 3.566476 3.678236 3.675620 3.662452
## 181 182 183 184 185 186 187 188
## 3.563181 3.645426 4.017460 4.113689 4.229393 4.381609 4.243153 5.041281
## 189 190 191 192 193 194 195 196
## 5.051563 5.051719 5.057591 4.183653 4.151505 4.113279 3.969795 3.905122
## 197 198 199
## 4.191810 4.161419 4.092141
##
## $var
## 101 102 103 104 105 106
## 0.011775189 0.008409641 0.009452076 0.011870028 0.014459672 0.018112713
## 107 108 109 110 111 112
## 0.011685860 0.007268889 0.014487712 0.012814461 0.010852276 0.010048935
## 113 114 115 116 117 118
## 0.006240382 0.009816255 0.008377720 0.006250414 0.015406961 0.015306088
## 119 120 121 122 123 124
## 0.008914737 0.008485896 0.016998370 0.014850114 0.012880197 0.011964472
## 125 126 127 128 129 130
## 0.010525923 0.010528040 0.013248168 0.012789504 0.013568841 0.013971730
## 131 132 133 134 135 136
## 0.010048937 0.009535820 0.008387455 0.008359158 0.010126605 0.012608132
## 137 138 139 140 141 142
## 0.013951936 0.018502968 0.014105896 0.018182598 0.017745869 0.012861889
## 143 144 145 146 147 148
## 0.014828558 0.011891297 0.008732027 0.008604618 0.010009986 0.007660375
## 149 150 151 152 153 154
## 0.006817879 0.012378867 0.015540878 0.014154484 0.012761730 0.011952420
## 155 156 157 158 159 160
## 0.005064274 0.002463170 0.006924118 0.003953467 0.003509809 0.011102798
## 161 162 163 164 165 166
## 0.011378573 0.011045003 0.011682628 0.011560333 0.012242267 0.011462597
## 167 168 169 170 171 172
## 0.008888509 0.014206315 0.014405121 0.006290501 0.005434707 0.014714164
## 173 174 175 176 177 178
## 0.010884130 0.014842361 0.011333759 0.012932194 0.007449222 0.011094079
## 179 180 181 182 183 184
## 0.011015666 0.012850744 0.005326252 0.012876076 0.021615405 0.014608411
## 185 186 187 188 189 190
## 0.013155037 0.021043877 0.012658226 0.004540141 0.003540634 0.002698421
## 191 192 193 194 195 196
## 0.001277775 0.011837709 0.012449062 0.013536624 0.015848988 0.008788712
## 197 198 199
## 0.015789566 0.013529210 0.009534335
##
## $CI95
## [,1] [,2]
## 101 3.853231 4.278604
## 102 3.865861 4.225341
## 103 3.857439 4.238548
## 104 3.853382 4.280465
## 105 3.801056 4.272430
## 106 3.785731 4.313298
## 107 3.842649 4.266406
## 108 3.878112 4.212323
## 109 3.743577 4.215408
## 110 3.730566 4.174314
## 111 3.722632 4.130995
## 112 3.767935 4.160893
## 113 3.841003 4.150668
## 114 3.753844 4.142225
## 115 3.775804 4.134601
## 116 3.822885 4.132799
## 117 3.970710 4.457279
## 118 3.998380 4.483354
## 119 4.063391 4.433509
## 120 4.077021 4.438127
## 121 3.983520 4.494601
## 122 3.765272 4.242968
## 123 3.784582 4.229467
## 124 3.800860 4.229639
## 125 3.805632 4.207808
## 126 3.791509 4.193725
## 127 3.700250 4.151445
## 128 3.774102 4.217417
## 129 3.821307 4.277930
## 130 3.699117 4.162469
## 131 3.711864 4.104822
## 132 3.710121 4.092915
## 133 3.705584 4.064589
## 134 3.861156 4.219555
## 135 3.863733 4.258207
## 136 3.767708 4.207869
## 137 3.720366 4.183390
## 138 3.547403 4.080624
## 139 3.474947 3.940519
## 140 3.661855 4.190439
## 141 3.727646 4.249843
## 142 3.945699 4.390267
## 143 3.918347 4.395695
## 144 3.952414 4.379879
## 145 3.963210 4.329515
## 146 3.964320 4.327943
## 147 3.962955 4.355150
## 148 3.941721 4.284813
## 149 3.971230 4.294907
## 150 4.004152 4.440293
## 151 4.217036 4.705715
## 152 4.255309 4.721682
## 153 4.215833 4.658667
## 154 4.297789 4.726352
## 155 4.813665 5.092627
## 156 4.935977 5.130528
## 157 4.799283 5.125471
## 158 4.885787 5.132263
## 159 4.902272 5.134507
## 160 3.768375 4.181425
## 161 3.671408 4.089556
## 162 3.653282 4.065255
## 163 3.647416 4.071113
## 164 3.661158 4.082632
## 165 3.683050 4.116777
## 166 3.598098 4.017787
## 167 3.641833 4.011406
## 168 3.917985 4.385211
## 169 3.525204 3.995688
## 170 3.498871 3.809777
## 171 3.532221 3.821205
## 172 3.587807 4.063310
## 173 3.543245 3.952207
## 174 3.575189 4.052760
## 175 3.594415 4.011738
## 176 3.479458 3.925240
## 177 3.397311 3.735642
## 178 3.471792 3.884680
## 179 3.469907 3.881332
## 180 3.440264 3.884639
## 181 3.420138 3.706225
## 182 3.423020 3.867833
## 183 3.729297 4.305622
## 184 3.876793 4.350585
## 185 4.004590 4.454196
## 186 4.097282 4.665937
## 187 4.022636 4.463670
## 188 4.909215 5.173347
## 189 4.934937 5.168189
## 190 4.949904 5.153533
## 191 4.987529 5.127654
## 192 3.970403 4.396903
## 193 3.932817 4.370192
## 194 3.885239 4.341319
## 195 3.723046 4.216545
## 196 3.721376 4.088869
## 197 3.945523 4.438096
## 198 3.933441 4.389397
## 199 3.900759 4.283523
## as we discussed in class, 95% CIs can be broad
fit$CI[1,]
## [1] 3.853231 4.278604
range(svl)
## [1] 3.462014 5.113994
We will learn more about this later, but there are also some methods for visualizing the ancestral state reconstructions for continuous traits on the tree. For example:
## projection of the reconstruction onto the edges of the tree
obj<-contMap(tree,svl,plot=FALSE)
plot(obj,type="fan",legend=0.7*max(nodeHeights(tree)))
## projection of the tree into phenotype space
phenogram(tree,svl,fsize=0.6,spread.costs=c(1,0))
## Optimizing the positions of the tip labels...
Properties of ancestral states of ancestral state reconstruction
Next, we can explore some of the properties of ancestral state reconstruction of continuous traits in general, starting wih our Brownian model.
First, let's simulate some data:
## simulate a tree & some data
tree<-pbtree(n=26,scale=1,tip.label=LETTERS[26:1])
## simulate with ancestral states
x<-fastBM(tree,internal=TRUE)
## ancestral states
a<-x[as.character(1:tree$Nnode+Ntip(tree))]
## tip data
x<-x[tree$tip.label]
Now, let's estimate ancestral states using fastAnc
which uses
the re-rooting algorithm discussed in class:
fit<-fastAnc(tree,x,CI=TRUE)
fit
## $ace
## 27 28 29 30 31 32
## -0.01383743 -0.43466511 -0.78496842 -0.92652372 -0.97992735 -1.06606765
## 33 34 35 36 37 38
## -1.19242145 -0.77435425 -1.33427168 -0.77148395 0.37519774 0.35486511
## 39 40 41 42 43 44
## 0.78011325 0.63488391 1.19427496 -0.20165383 -0.17792873 1.07419593
## 45 46 47 48 49 50
## 1.45924828 1.55175294 1.75289132 2.16445039 1.81192553 0.68319366
## 51
## 0.81241807
##
## $CI95
## [,1] [,2]
## 27 -0.8879801 0.8603052
## 28 -1.2581222 0.3887920
## 29 -1.4652988 -0.1046380
## 30 -1.5538351 -0.2992123
## 31 -1.4449545 -0.5149002
## 32 -1.4966267 -0.6355086
## 33 -1.4920621 -0.8927808
## 34 -1.1230122 -0.4256963
## 35 -1.4863921 -1.1821513
## 36 -1.2746445 -0.2683234
## 37 -0.3797317 1.1301272
## 38 -0.4031915 1.1129217
## 39 0.1541249 1.4061016
## 40 0.5162435 0.7535243
## 41 1.0781703 1.3103796
## 42 -0.7364614 0.3331537
## 43 -0.6011442 0.2452867
## 44 0.4157610 1.7326308
## 45 0.8942088 2.0242878
## 46 0.9951790 2.1083269
## 47 1.2382347 2.2675479
## 48 1.9117253 2.4171755
## 49 1.3260228 2.2978283
## 50 0.3463241 1.0200632
## 51 0.5937687 1.0310674
We can easily compare these estimates to the (normally unknown) generating values for simulated data:
plot(a,fit$ace,xlab="true states",ylab="estimated states")
lines(range(c(x,a)),range(c(x,a)),lty="dashed",col="red") ## 1:1 line
One of the most common critiques of ancestral state estimation is that the uncertainty around ancestral states can be large; and, furthermore, that this uncertainty is often ignored. Let's address this by obtaining 95% CIs on ancestral values, and then let's show that the 95% CIs include the generating values around 95% of the time:
## first, let's go back to our previous dataset
print(fit)
## $ace
## 27 28 29 30 31 32
## -0.01383743 -0.43466511 -0.78496842 -0.92652372 -0.97992735 -1.06606765
## 33 34 35 36 37 38
## -1.19242145 -0.77435425 -1.33427168 -0.77148395 0.37519774 0.35486511
## 39 40 41 42 43 44
## 0.78011325 0.63488391 1.19427496 -0.20165383 -0.17792873 1.07419593
## 45 46 47 48 49 50
## 1.45924828 1.55175294 1.75289132 2.16445039 1.81192553 0.68319366
## 51
## 0.81241807
##
## $CI95
## [,1] [,2]
## 27 -0.8879801 0.8603052
## 28 -1.2581222 0.3887920
## 29 -1.4652988 -0.1046380
## 30 -1.5538351 -0.2992123
## 31 -1.4449545 -0.5149002
## 32 -1.4966267 -0.6355086
## 33 -1.4920621 -0.8927808
## 34 -1.1230122 -0.4256963
## 35 -1.4863921 -1.1821513
## 36 -1.2746445 -0.2683234
## 37 -0.3797317 1.1301272
## 38 -0.4031915 1.1129217
## 39 0.1541249 1.4061016
## 40 0.5162435 0.7535243
## 41 1.0781703 1.3103796
## 42 -0.7364614 0.3331537
## 43 -0.6011442 0.2452867
## 44 0.4157610 1.7326308
## 45 0.8942088 2.0242878
## 46 0.9951790 2.1083269
## 47 1.2382347 2.2675479
## 48 1.9117253 2.4171755
## 49 1.3260228 2.2978283
## 50 0.3463241 1.0200632
## 51 0.5937687 1.0310674
mean(((a>=fit$CI95[,1]) + (a<=fit$CI95[,2]))==2)
## [1] 0.92
One small tree doesn't tell us much, so let's repeat for a sample of trees & reconstructions:
## custom function that conducts a simulation, estimates ancestral
## states, & returns the fraction on 95% CI
foo<-function(){
tree<-pbtree(n=100)
x<-fastBM(tree,internal=TRUE)
fit<-fastAnc(tree,x[1:length(tree$tip.label)],CI=TRUE)
mean(((x[1:tree$Nnode+length(tree$tip.label)]>=fit$CI95[,1]) +
(x[1:tree$Nnode+length(tree$tip.label)]<=fit$CI95[,2]))==2)
}
## conduct 100 simulations
pp<-replicate(100,foo())
mean(pp)
## [1] 0.9448485
Cool. Well, this shows us that although CIs can be large, when the model is correct they will include the generating value (1-α)% of the time.
Ornstein-Uhlenbeck model
It is also possible to estimate ancestral states using other models of character evolution other than Brownian motion. It is possible to, for instance, estimate ancestral states under the OU model that we have already learned. One note of caution: when the selection parameter (α) is very large, then the ancestral states will all tend towards the same value (the fitted optimum, θ).
For this example, we will use the function anc.ML
which permits
us to specify an OU model:
## simulate tree & data under the OU model
alpha<-2
tree<-pbtree(n=100,scale=1)
x<-fastBM(tree,internal=TRUE,model="OU",alpha=alpha,sig2=0.2)
## Warning: alpha but not theta specified in OU model, setting theta to a.
a<-x[as.character(1:tree$Nnode+Ntip(tree))]
x<-x[tree$tip.label]
phenogram(tree,c(x,a),ftype="off",spread.labels=FALSE)
title("Evolution under OU")
## fit the model (this may be slow)
fit<-anc.ML(tree,x,model="OU")
fit
## $sig2
## [1] 0.05573428
##
## $alpha
## [1] 1e-08
##
## $ace
## 101 102 103 104 105
## 9.963462e-03 2.244757e-02 2.240573e-02 -5.931035e-02 -4.329498e-02
## 106 107 108 109 110
## -3.480964e-02 -4.081294e-02 -1.143122e-01 -2.263273e-02 4.155330e-02
## 111 112 113 114 115
## 4.264017e-02 -1.810736e-02 -9.407912e-02 -1.363360e-01 -6.611306e-02
## 116 117 118 119 120
## -1.392603e-02 -3.488464e-02 -9.974915e-02 -1.471439e-01 -1.667432e-01
## 121 122 123 124 125
## -1.680456e-01 2.899822e-02 4.833528e-02 3.506143e-02 1.868645e-02
## 126 127 128 129 130
## 4.748435e-02 3.448893e-02 3.250233e-02 -1.538064e-03 5.460770e-02
## 131 132 133 134 135
## 1.287799e-01 1.331156e-01 1.469378e-01 8.486024e-03 6.439102e-03
## 136 137 138 139 140
## -5.997895e-02 -2.624823e-02 1.156061e-01 1.450449e-01 1.643474e-01
## 141 142 143 144 145
## 8.871516e-02 -1.500705e-01 -2.088446e-01 -1.485247e-01 -9.546454e-02
## 146 147 148 149 150
## -1.648921e-01 -1.003483e-01 -1.122847e-01 -9.953868e-02 2.319007e-01
## 151 152 153 154 155
## 4.114431e-02 3.261996e-02 5.558858e-02 -1.093121e-03 -2.343046e-03
## 156 157 158 159 160
## -9.063816e-05 -3.395186e-02 -7.084372e-02 -2.180441e-03 -9.891307e-02
## 161 162 163 164 165
## -3.951237e-02 -3.874053e-02 -2.391682e-02 9.737577e-02 -3.801571e-02
## 166 167 168 169 170
## -3.578902e-02 -2.162303e-02 -5.260689e-03 -1.308687e-02 -2.051804e-02
## 171 172 173 174 175
## 1.033658e-02 2.185436e-03 4.877736e-02 9.016846e-02 -6.244681e-03
## 176 177 178 179 180
## 2.600085e-03 -2.081362e-02 -6.850276e-02 -1.012200e-01 -9.815220e-02
## 181 182 183 184 185
## -9.432228e-02 -9.482774e-02 -1.266490e-01 -3.097925e-01 -1.008618e-01
## 186 187 188 189 190
## -1.202127e-02 -1.335903e-02 -3.088787e-04 -1.270292e-01 7.848848e-03
## 191 192 193 194 195
## 2.054886e-01 2.441548e-01 -4.855077e-02 -1.614209e-02 -3.053233e-02
## 196 197 198 199
## -5.844232e-02 2.570525e-02 2.987429e-02 -1.579888e-01
##
## $logLik
## [1] 263.5128
##
## $counts
## function gradient
## 8 8
##
## $convergence
## [1] 0
##
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
##
## $model
## [1] "OU"
##
## attr(,"class")
## [1] "anc.ML"
plot(a,fit$ace,xlab="true values",ylab="estimated under OU")
lines(range(x),range(x),lty="dashed",col="red")
title("ancestral states estimated under OU")
Ancestral state estimates when some nodes are known
We can theoretically fix the state for any internal node during MLE of ancestral states. In practice, we would do this by attaching a terminal edge of zero length to the internal node we wanted to fix, and then treat it like another tip value in our analyses.
Another possibility, which also allows for the possibility that ancestral states are uncertain, is to use an informative prior probability density on one or multiple states at internal nodes, and then estimate ancestral states using Bayesian MCMC.
Let's try this using a really bad case for ancestral character estimation from contemporaneous tip data: Brownian evolution with a trend. Note that although we theoretically could do so - we are not fitting a trended model here.
tree<-pbtree(n=100,scale=1)
## simulate data with a trend
x<-fastBM(tree,internal=TRUE,mu=3)
phenogram(tree,x,ftype="off",spread.labels=FALSE)
a<-x[as.character(1:tree$Nnode+Ntip(tree))]
x<-x[tree$tip.label]
## let's see how bad we do if we ignore the trend
plot(a,fastAnc(tree,x),xlab="true values",
ylab="estimated states under BM")
lines(range(c(x,a)),range(c(x,a)),lty="dashed",col="red")
title("estimated without prior information")
## incorporate prior knowledge
pm<-setNames(c(1000,rep(0,tree$Nnode)),
c("sig2",1:tree$Nnode+length(tree$tip.label)))
## the root & two randomly chosen nodes
nn<-as.character(c(length(tree$tip.label)+1,
sample(2:tree$Nnode+length(tree$tip.label),2)))
pm[nn]<-a[as.character(nn)]
## prior variance
pv<-setNames(c(1000^2,rep(1000,length(pm)-1)),names(pm))
pv[as.character(nn)]<-1e-100
## run MCMC
mcmc<-anc.Bayes(tree,x,ngen=100000,
control=list(pr.mean=pm,pr.var=pv,
a=pm[as.character(length(tree$tip.label)+1)],
y=pm[as.character(2:tree$Nnode+length(tree$tip.label))]))
## Control parameters (set by user or default):
## List of 7
## $ sig2 : num 0.95
## $ a : Named num 0
## ..- attr(*, "names")= chr "101"
## $ y : Named num [1:98] 0 0 0 0 0 0 0 0 0 0 ...
## ..- attr(*, "names")= chr [1:98] "102" "103" "104" "105" ...
## $ pr.mean: Named num [1:100] 1000 0 0 0 0 0 0 0 0 0 ...
## ..- attr(*, "names")= chr [1:100] "sig2" "101" "102" "103" ...
## $ pr.var : Named num [1:100] 1e+06 1e-100 1e+03 1e+03 1e+03 ...
## ..- attr(*, "names")= chr [1:100] "sig2" "101" "102" "103" ...
## $ prop : num [1:100] 0.0095 0.0095 0.0095 0.0095 0.0095 ...
## $ sample : num 100
## Starting MCMC...
## Done MCMC.
anc.est<-colMeans(mcmc[201:1001,
as.character(1:tree$Nnode+length(tree$tip.label))])
plot(a[names(anc.est)],anc.est,xlab="true values",
ylab="estimated states using informative prior")
lines(range(c(x,a)),range(c(x,a)),lty="dashed",col="red")
title("estimated using informative prior")
Written by Liam J. Revell. Last updated July 3, 2015