Scenario: you are studying a plant. You measure the average length of a leaf in your plant before and after a hurricane.
Model: Assume that the difference in trait values between the two measurements follows a normal distribution with mean zero and some unknown variance \(\sigma ^2\).
Measurement before = 5.3 mm, after is 7.4 mm.
We want P(D|H). We have data, the trait difference 7.4 - 5.3 = 2.1 mm. Let’s use the normal distribution function to calculate a likelihood.
# denote our unknown parameter sigma^2 as x
nlik <- function(x, data=2.1) {
1/sqrt(2 * pi * x) * exp(-data^2/(2*x))
}
nlik(1)
## [1] 0.0439836
nlik(2000)
## [1] 0.008910791
dnorm(2.1, mean=0, sd=1)
## [1] 0.0439836
Run the function across a range of values and find the maximum.
testVals <- seq(0.5, 10, by=0.1)
testLikelihoods <- numeric(length(testVals))
for(i in 1:length(testVals)) {
testLikelihoods[i]<-nlik(testVals[i])
}
plot(testVals, testLikelihoods, type="l")
testVals <- seq(4, 5, by=0.01)
testLikelihoods <- numeric(length(testVals))
for(i in 1:length(testVals)) {
testLikelihoods[i]<-nlik(testVals[i])
}
plot(testVals, testLikelihoods, type="l")
# what is my maximm?u
index<-which(testLikelihoods==max(testLikelihoods))
testLikelihoods[index] # this is the maximum of the likelihood
## [1] 0.1152242
testVals[index] # this is my MLE
## [1] 4.41
my_guess <- 1.0
eps <- 0.1
for(i in 1:1000) {
leftGuess <- my_guess - runif(1, min=0, max=eps)
rightGuess <- my_guess + runif(1, min=0, max=eps)
leftL <- nlik(leftGuess)
currentL <- nlik(my_guess)
rightL <- nlik(rightGuess)
if(leftL > currentL) {
my_guess <- leftGuess
} else if(rightL > currentL) {
my_guess <- rightGuess
}
if(i%%100 == 0) cat("Current guess: ", my_guess, "\n")
}
## Current guess: 4.407499
## Current guess: 4.409618
## Current guess: 4.410108
## Current guess: 4.410108
## Current guess: 4.410108
## Current guess: 4.409952
## Current guess: 4.409952
## Current guess: 4.409952
## Current guess: 4.409952
## Current guess: 4.409952
for(i in 1:1000) {
leftGuess <- my_guess - runif(1, min=0, max=eps)
rightGuess <- my_guess + runif(1, min=0, max=eps)
leftL <- nlik(leftGuess)
currentL <- nlik(my_guess)
rightL <- nlik(rightGuess)
if(leftL > currentL) {
my_guess <- leftGuess
} else if(rightL > currentL) {
my_guess <- rightGuess
}
if(i%%100 == 0) cat("Current guess: ", my_guess, "\n")
}
## Current guess: 4.409952
## Current guess: 4.409952
## Current guess: 4.409952
## Current guess: 4.409952
## Current guess: 4.409952
## Current guess: 4.409952
## Current guess: 4.409952
## Current guess: 4.409952
## Current guess: 4.409952
## Current guess: 4.410043
In general R has a function called optim that climbs peaks and finds minimums - but we can make it work to maximize too.
In this case we have only one parameter so we can use optimize which is set up for this one-D case.
res <- optimize(nlik, interval=c(1, 5), maximum=T)
res
## $maximum
## [1] 4.409998
##
## $objective
## [1] 0.1152242
# who won?
res$objective - nlik(my_guess)
## [1] 2.703116e-12
Now you do it.
Redo our experiment and get the following results.
Leaf size before hurricane = 7.3 mm Leaf size after hurricane = 7.1 mm
What is the ML estimate for sigma ^2 under my specified model?
# method 1
nlik <- function(x, data=0.2) {
1/sqrt(2 * pi * x) * exp(-data^2/(2*x))
}
res <- optimize(nlik, interval=c(0.01, 5), maximum=T)
res
## $maximum
## [1] 0.03999487
##
## $objective
## [1] 1.209854
# method 2
res <- optimize(nlik, interval=c(0.01, 5), maximum=T, data=0.2)
res
## $maximum
## [1] 0.03999487
##
## $objective
## [1] 1.209854
# method 3
foo <- function(x) {dnorm(0.2, mean=0, sd=sqrt(x))}
res <- optimize(foo, interval=c(0.02, 5), maximum=T)
Alternative model: the trait changes due to genetic drift. Additional information: Ne = 1000, va = 10
Under drift, each generation we expect trait change to follow a normal distribution with variance va / Ne = 10 / 1000 = 0.01.
Under a drift hypothesis we have the same model but with \(\sigma^2 = 0.01\) as a fixed parameter.
This is a model with no free parameters.
Calculate the likelihood for this model when our amount of change is 0.2 mm.
# drift model
driftL <- nlik(0.1, data=0.2)
# alternative model
res <- optimize(nlik, interval=c(0.01, 5), maximum=T, data=0.2)
res
## $maximum
## [1] 0.03999487
##
## $objective
## [1] 1.209854
# likelihood ratio test
delta <- 2*(log(res$objective) - log(driftL))
pchisq(delta, 1, lower.tail=F)
## [1] 0.5738454
# conclusion: fail to reject the drift model