Metropolis Hastings

Today: Metropolis Hastings

Reading:

Our goals

Metropolis-Hastings: The Idea

Metropolis-Hastings: Algorithm

Given:

Pick a starting value for the chain \(X_0\).

For \(i = 1, \ldots, n\):

Simple Example: Normal Distribution

sample_with_symmetric_proposal =
    function(proposal_function, target_distribution, current_state) {
        proposal = proposal_function(current_state)
        acceptance_probability = target_distribution(proposal) / target_distribution(current_state)
        if(runif(1) <= acceptance_probability) {
            return(proposal)
        } else {
            return(current_state)
        }   
}
## The proposal distribution is normal, centered at the current state, standard deviation .3
proposal_function = function(x) rnorm(n = 1, mean = x, sd = .3)
## The target distribution is N(0,1)
target_distribution = dnorm
## check the sampling:
sample_with_symmetric_proposal(proposal_function, target_distribution, -10)
## [1] -10
n_samples = 1000
chain_output = numeric(n_samples)
chain_output[1] = -10
for(i in 2:n_samples) {
    chain_output[i] = sample_with_symmetric_proposal(proposal_function, target_distribution, chain_output[i-1])
}
plot(chain_output)
mean(chain_output)
## [1] -0.9162604
sd(chain_output)
## [1] 1.871455

Notice:

Let's run the chain longer and discard the first 200 steps as "burn-in"

n_samples = 10000
chain_output = numeric(n_samples)
chain_output[1] = -10
for(i in 2:n_samples) {
    chain_output[i] = sample_with_symmetric_proposal(proposal_function, target_distribution, chain_output[i-1])
}
chain_no_burnin = chain_output[201:n_samples]
plot(chain_no_burnin)
mean(chain_no_burnin)
## [1] -0.03140313
sd(chain_no_burnin)
## [1] 1.019056

Note:

Example 2: Mixture distributions

## The proposal distribution is normal, centered at the current state, standard deviation .3
proposal_function = function(x) rnorm(n = 1, mean = x, sd = .3)
## The target distribution is N(0,1)
target_distribution = function(x) .25 * dnorm(x, mean = 1, sd = 1) + .75 * dnorm(x, mean = 5, sd = .2)
n_samples = 1000
chain_output = numeric(n_samples)
chain_output[1] = -10
for(i in 2:n_samples) {
    chain_output[i] = sample_with_symmetric_proposal(proposal_function, target_distribution, chain_output[i-1])
}
plot(chain_output)
mean(chain_output)
## [1] 0.3720607
sd(chain_output)
## [1] 2.266502

Notice:

Let's try running the chain a lot longer:

n_samples = 100000
chain_output = numeric(n_samples)
chain_output[1] = -10
for(i in 2:n_samples) {
    chain_output[i] = sample_with_symmetric_proposal(proposal_function, target_distribution, chain_output[i-1])
}
plot(chain_output)
mean(chain_output)
## [1] 4.042973
sd(chain_output)
## [1] 1.780207

What's happening?

Another way to fix this: change the proposal distribution.

proposal_function = function(x) rnorm(n = 1, mean = x, sd = 2)
n_samples = 1000
chain_output = numeric(n_samples)
chain_output[1] = -10
for(i in 2:n_samples) {
    chain_output[i] = sample_with_symmetric_proposal(proposal_function, target_distribution, chain_output[i-1])
}
plot(chain_output)
mean(chain_output)
## [1] 3.376655
sd(chain_output)
## [1] 2.2654

Why not always have a really diffuse proposal distribution?

Plot below shows how far the chain moved on each step when we used the diffuse proposal distribution:

plot(diff(chain_output))

Overall:

Summing up

Metropolis Hastings is a simple, general-purpose method for creating a Markov chain with a specified stationary distribution. It is particularly useful when:

You should be scared because:

Next time: