Phylogenetically independent contrasts
In this tutorial we will learn to apply the independent contrasts method of Felsenstein (1985) for estimating the evolutionary correlation between characters.
Felsenstein (1985) identified (and, to a large extent, solved) a problem that had previously recognized, but that was underappreciated: and that is, that the data for species cannot be treated as independent data points from the point of view of statistical analysis.
To learn the contrasts method, we'll first need to learn some basics in linear model fitting in the R language.
## linear regression model is y = b0 + b*x + e
b0<-5.3
b<-0.75
x<-rnorm(n=100,sd=1)
y<-b0+b*x+rnorm(n=100,sd=0.2)
plot(x,y)
fit<-lm(y~x)
abline(fit)
fit
##
## Call:
## lm(formula = y ~ x)
##
## Coefficients:
## (Intercept) x
## 5.3321 0.7377
summary(fit)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.52591 -0.12877 -0.01271 0.12295 0.57436
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.33205 0.02129 250.39 <2e-16 ***
## x 0.73766 0.02103 35.08 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2129 on 98 degrees of freedom
## Multiple R-squared: 0.9263, Adjusted R-squared: 0.9255
## F-statistic: 1231 on 1 and 98 DF, p-value: < 2.2e-16
anova(fit)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## x 1 55.804 55.804 1230.9 < 2.2e-16 ***
## Residuals 98 4.443 0.045
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## fit is an object of class "lm"
class(fit)
## [1] "lm"
## coefficients of the fitted model
fit$coefficients
## (Intercept) x
## 5.3320548 0.7376578
## p-value of the fitted model
anova(fit)[["Pr(>F)"]][1]
## [1] 2.758908e-57
We can see that when the residual error in y given x are normal & independent, hypothesis tests about our fitted model using ANOVA are unbiased & have the correct type I error rate:
foo<-function(){
x<-rnorm(n=100)
y<-rnorm(n=100)
fit<-lm(y~x)
setNames(c(fit$coefficients[2],anova(fit)[["Pr(>F)"]][1]),c("beta","p"))
}
X<-t(replicate(1000,foo()))
hist(X[,1],xlab="fitted beta",main="fitted beta")
hist(X[,2],xlab="p-value",main="p-value")
## expected value is the nominal type I error rate, 0.05
mean(X[,2]<=0.05)
## [1] 0.056
This simulation shows very simply that when standard statistical assumptions are satisfied, our parameter estimates are unbiased and type I error is at or around the nominal level.
The difficulty with phylogenetic data is that the observations for y are no longer independent & identically distributed. Let's consider one case:
## load phytools
library(phytools)
## simulate a coalescent shaped tree
tree<-rcoal(n=100)
plotTree(tree,ftype="off")
## simulate uncorrelated Brownian evolution
x<-fastBM(tree)
y<-fastBM(tree)
par(mar=c(5.1,4.1,2.1,2.1))
plot(x,y)
fit<-lm(y~x)
abline(fit)
fit
##
## Call:
## lm(formula = y ~ x)
##
## Coefficients:
## (Intercept) x
## 0.7647 0.5534
summary(fit)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8952 -0.3452 -0.1002 0.2604 1.2855
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.76471 0.08144 9.39 2.56e-15 ***
## x 0.55339 0.02494 22.19 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4964 on 98 degrees of freedom
## Multiple R-squared: 0.834, Adjusted R-squared: 0.8323
## F-statistic: 492.2 on 1 and 98 DF, p-value: < 2.2e-16
anova(fit)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## x 1 121.269 121.269 492.23 < 2.2e-16 ***
## Residuals 98 24.144 0.246
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can see from this example, that it is not difficult for phylogeny to induce a type I error.
Here it is because clusters of closely related taxa have highly similar phenotypes. In other words, they are not independent data points about the evolutionary process for x and y on the tree:
## this is a projection of the tree into morphospace
phylomorphospace(tree,cbind(x,y),label="off",node.size=c(0.5,0.7))
abline(fit)
Felsenstein's (1985) algorithm substitutes the contrasts between species (& nodes) for the original values. Let's see if it resolves our type I error in this case:
ix<-pic(x,tree)
iy<-pic(y,tree)
fit<-lm(iy~ix-1) ## we have to fit the model without an intercept term
fit
##
## Call:
## lm(formula = iy ~ ix - 1)
##
## Coefficients:
## ix
## 0.1785
summary(fit)
##
## Call:
## lm(formula = iy ~ ix - 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.79992 -0.78645 0.03339 0.53096 2.27406
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## ix 0.17852 0.08959 1.993 0.0491 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9858 on 98 degrees of freedom
## Multiple R-squared: 0.03894, Adjusted R-squared: 0.02913
## F-statistic: 3.971 on 1 and 98 DF, p-value: 0.04908
anova(fit)
## Analysis of Variance Table
##
## Response: iy
## Df Sum Sq Mean Sq F value Pr(>F)
## ix 1 3.859 3.8585 3.9706 0.04908 *
## Residuals 98 95.233 0.9718
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Investigators sometimes get the naive impression that the contrasts algorithm 'weakens' the power of their statistical analysis. What we're seeing here is actually a spurious result being re-identified as such - not a loss of power.
Finally, IC is unbiased when we do have an effect:
foo<-function(beta,tree){
tree<-pbtree(n=100,scale=1)
x<-fastBM(tree)
y<-beta*x+fastBM(tree,sig2=0.2)
ix<-pic(x,tree)
iy<-pic(y,tree)
fit<-lm(iy~ix-1)
fit$coefficients[1]
}
bhat<-replicate(1000,foo(0.75,tree))
hist(bhat,xlab="fitted beta",main="fitted beta")
Work on your own:
Do as much as you can of the following exercise on your own. We are here to help!
Show that OLS for data simulated on the tree results in elevated type I error. How is the type I error rate affected by tree shape. (Hint, use
pbtree(...,d=0)
,pbtree(...,d>0)
, andrcoal
.)Show that phylogenetic independent contrasts results in the correct type I error.
Show that OLS is unbiased.
Written by Liam J. Revell. Last updated July 2, 2015