Maximum likelihood

Maximum likelihood is a method that we can use to fit models to data. ML can allow us to estimate parameters, as well as compare the fit of various models to data.

In this exercise, we will do four different things.

Build a likelihood function from first principles
Simulate data and use the likelihood function to make a likelihood surface
Maximize the likelihood
Use our function to compare models using a LR test and AIC

Our toy example will use the following scenario. You measure the mean trait value of a species in one generation, then come back and measure it again after a hurricane.

The model you will fit is a stupid one: you think that the change in trait values, leg lengths of lizards, from one generation to the next can be predicted from a normal distribution with some unknown variance \(\sigma^2\). For now, we will be completely agnostic about the process that results in this expectation.

Calculating a likelihood

The likelihood is the probability of observing the data under the model. So, let’s start with some data. Say that you observe a change in leg length of 2.1 cm. What is the likelihood of this data under the model described above with \(\sigma^2 = 1.0\)?

To calculate this we can use the normal distribution formula.

nlik<-function(x, data=2.1) {
  1/ sqrt(2 * pi * x) * exp(-data^2/(2*x))
}

nlik(1.0)

## [1] 0.0439836

Also R has a function that does this but it is lame

dnorm(2.1, mean=0, sd=1)

## [1] 0.0439836

Maximizing the likelihood

One way to find the MLE is to evaluate that function across a range of possible parameter values, and make a plot!

testVals<-seq(0.5, 4, by=0.1)
likelihoodRes<-numeric(length(testVals))
for(i in 1:length(testVals)) {
  likelihoodRes[i]<-nlik(testVals[i])
}
plot(testVals, likelihoodRes, type="l")

# not there yet

testVals<-seq(2, 10, by=0.5)
likelihoodRes<-numeric(length(testVals))
for(i in 1:length(testVals)) {
  likelihoodRes[i]<-nlik(testVals[i])
}
plot(testVals, likelihoodRes, type="l")

We could zoom in to get a better idea of the correct answer

testVals<-seq(4, 4.5, by=0.01)
likelihoodRes<-numeric(length(testVals))
for(i in 1:length(testVals)) {
  likelihoodRes[i]<-nlik(testVals[i])
}
plot(testVals, likelihoodRes, type="l")

But there are ways to do even better!

my_guess <- 4
eps <- 0.01
plot(my_guess, nlik(my_guess), xlim=c(4, 5), ylim=c(0.114, 0.116))
for(i in 1:1000) {
  f1 <- nlik(my_guess)
  move<-rnorm(1)*eps
  f2 <- nlik(my_guess+move) 
  if(f2 > f1) my_guess<-my_guess+move
  points(my_guess, nlik(my_guess))
}

#MLE estimate
my_guess

## [1] 4.409952

What I just did was a hack, and not really a very good one. If you run it a few times you will see that we get the right answer only to a few decimal places. Let’s see what R can do to solve this problem.

res<-optimize(nlik, interval=c(1, 5), maximum=T)
res

## $maximum
## [1] 4.409998
## 
## $objective
## [1] 0.1152242

# did this do better than ours?
nlik(res$maximum)-nlik(my_guess)

## [1] 3.399447e-12

Imagine You do a new experiment and find a change in size of -0.3 cm between two generations. What is the ML estimate for sigma squared?

Most simple solution, taking advantage of the optimize function:

res<-optimize(nlik, interval=c(1, 5), maximum=T, data=-0.3)
res

## $maximum
## [1] 1.00006
## 
## $objective
## [1] 0.3813774

Now you try!

Let’s try a challenge. Imagine you take data across 5 different intervals. Consider each interval as independent. Construct a likelihood function that includes all five of these changes - that is, calculate the joint probability of obtaining all five of these measurements given some value for sigma squared. Then maximize that function.

data <- c(2.3, 2.1, -0.5, -0.7, 1.1)

Solution

nlik_joint<-function(x, data=c(2.3, 2.1, -0.5, -0.7, 1.1)) {
  res<-1
  for(i in 1:length(data)) {
    res <- res * nlik(x, data[i])
  }
  return(res)
}

nlik_joint(1.0)

## [1] 2.983905e-05

res<-optimize(nlik_joint, interval=c(1, 10), maximum=T)
res

## $maximum
## [1] 2.329999
## 
## $objective
## [1] 0.0001000978

# we are really just calculating a variance here...
xx<-c(2.3, 2.1, -0.5, -0.7, 1.1)
sum(xx^2)/5

## [1] 2.33

Model selection

Now consider a second model, one of genetic drift. Say that we go into our population and measure that our trait of interest has an additive genetic variance of 10, and an effective population size of 1000 individuals. Is the amount of change you observed, 2.1, consistent with a hypothesis of genetic drift?

It turns out that under drift we expect change to follow a normal distribution with variance \(\frac{\sigma_A ^ 2}{N_e} = \frac{10}{1000} = 0.01\). We can calculate the likelihood of this model, which has no free parameters, as:

nlik(0.01)

## [1] 6.902029e-96

We now have two models, genetic drift (no free parameters) and our model where variance is estimated (one free parameter). We can compare these two models using a likelihood ratio test.

res<-optimize(nlik, interval=c(1, 5), maximum=T)
delta <- 2 * (log(res$objective) - log(nlik(0.01)))
delta

## [1] 433.911

# p-value
pchisq(delta, 1, lower.tail=F)

## [1] 2.289169e-96

This is a very significant result, and we can reject the null hypothesis of genetic drift.

Let’s try AIC

aic_1 <- 2 * 1 - 2 * log(res$objective)
aic_2 <- 2 * 0 - 2 * log(nlik(0.01))

c(aic_1, aic_2)

## [1]   6.321752 438.232707

delta_aic <- aic_2 - aic_1
delta_aic