Metropolis Hastings

Today: Metropolis Hastings

Reading:

Our goals

Metropolis-Hastings: The Idea

Metropolis-Hastings: Algorithm with a symmetric proposal distribution

Given:

Pick a starting value for the chain \(X_0\).

For \(i = 1, \ldots, n\):

Note:

Aside: Reversible chains and detailed balance

Checking that the Metropolis algorithm has the correct invariant distribution

Let \(X_i\) and \(X_j\) be any two elements in the state space and let \(a_{ij}\) be the acceptance probability given that we start at \(X_i\) and propose \(X_j\).

Suppose further that \(\pi(X_j) \le \pi(X_i)\) so that \(a_{ij} \le 1\)

We can check that \(\pi\) is the invariant distribution by checking detailed balance:

\[ \begin{align*} \pi(X_i)q(X_j \mid X_i) a_{ij} &= \pi(X_i) q(X_j \mid X_i) \frac{\pi(X_j)}{\pi(X_i)} \\ &= q(X_j \mid X_i) \pi(X_j) \\ &= \pi(X_j) q(X_i \mid X_j) a_{ji} \end{align*} \]

The last line follows because \(a_{ji} = 1\) and the proposal distribution is symmetric.

Metropolis-Hastings with non-symmetric proposal distribution

Given:

Pick a starting value for the chain \(X_0\).

For \(i = 1, \ldots, n\):

Notes:

Simple Example: Normal Distribution

sample_with_symmetric_proposal <-
    function(proposal_function, target_distribution, current_state) {
        proposal <- proposal_function(current_state)
        acceptance_probability <- min(1, target_distribution(proposal) / target_distribution(current_state))
        if(runif(1) <= acceptance_probability) {
            return(proposal)
        } else {
            return(current_state)
        }   
}
## The proposal distribution is normal, centered at the current state, standard deviation .3
proposal_function <- function(x) rnorm(n = 1, mean = x, sd = .3)
## The target distribution is N(0,1)
target_distribution <- dnorm
## check the sampling:
sample_with_symmetric_proposal(proposal_function, target_distribution, -10)
## [1] -10
n_samples <- 1000
chain_output <- numeric(n_samples)
chain_output[1] <- -10
for(i in 2:n_samples) {
    chain_output[i] <- sample_with_symmetric_proposal(proposal_function, target_distribution, chain_output[i-1])
}
plot(chain_output)

mean(chain_output)
## [1] -0.9162604
sd(chain_output)
## [1] 1.871455

Notice:

Let’s run the chain longer and discard the first 200 steps as “burn-in”

n_samples <- 10000
chain_output <- numeric(n_samples)
chain_output[1] <- -10
for(i in 2:n_samples) {
    chain_output[i] <- sample_with_symmetric_proposal(proposal_function, target_distribution, chain_output[i-1])
}
chain_no_burnin <- chain_output[201:n_samples]
plot(chain_no_burnin)

mean(chain_no_burnin)
## [1] -0.03140313
sd(chain_no_burnin)
## [1] 1.019056

Note:

Example 2: Mixture distributions

## The proposal distribution is normal, centered at the current state, standard deviation .3
proposal_function <- function(x) rnorm(n = 1, mean = x, sd = .3)
## The target distribution is a mixture of N(1,1) and N(5,.2)
target_distribution <- function(x) .25 * dnorm(x, mean = 1, sd = 1) + .75 * dnorm(x, mean = 5, sd = .2)
n_samples <- 1000
chain_output <- numeric(n_samples)
chain_output[1] <- -10
for(i in 2:n_samples) {
    chain_output[i] <- sample_with_symmetric_proposal(proposal_function, target_distribution, chain_output[i-1])
}
plot(chain_output)

mean(chain_output)
## [1] 0.3720607
sd(chain_output)
## [1] 2.266502

Notice:

Let’s try running the chain a lot longer:

n_samples <- 100000
chain_output <- numeric(n_samples)
chain_output[1] <- -10
for(i in 2:n_samples) {
    chain_output[i] <- sample_with_symmetric_proposal(proposal_function, target_distribution, chain_output[i-1])
}
plot(chain_output)

mean(chain_output)
## [1] 4.042973
sd(chain_output)
## [1] 1.780207

What’s happening?

Another way to fix this: change the proposal distribution.

proposal_function <- function(x) rnorm(n = 1, mean = x, sd = 2)
n_samples <- 1000
chain_output <- numeric(n_samples)
chain_output[1] <- -10
for(i in 2:n_samples) {
    chain_output[i] <- sample_with_symmetric_proposal(proposal_function, target_distribution, chain_output[i-1])
}
plot(chain_output)

mean(chain_output)
## [1] 3.376655
sd(chain_output)
## [1] 2.2654

Why not always have a really diffuse proposal distribution?

Plot below shows how far the chain moved on each step when we used the diffuse proposal distribution:

plot(diff(chain_output))

Overall:

Summing up

Metropolis Hastings is a simple, general-purpose method for creating a Markov chain with a specified invariant distribution. It is particularly useful when:

You should be scared because:

Next time: