Monte Carlo methods: Approximate Bayesian Computation

Today: Approximate Bayesian Computation

Reading: Sisson, Fan, Beaumont, “Overview of Approximate Bayesian Computation”

Our goals

Next two weeks:

Bayesian Statistics

Suppose we have data \(y_1,\ldots, y_n\) that we believe can be described by a probability model with parameters \(\theta\).

We also have a prior distribution on the parameters \(\theta\), describing our belief about the values of those parameters before seeing any of the data.

\[ \begin{align*} y_i \mid \theta &\sim P(y_i \mid \theta), \quad i = 1,\ldots, n\\ \theta & \sim \pi(\theta) \end{align*} \]

For example:

This posterior distribution is the Bayesian analog of a confidence interval for a normal mean.

Some more complicated examples

Finance:

Epidemiology

Posterior distribution

By applying Bayes’ rule, we can compute the posterior distribution of the parameters given the data: \[ \begin{align*} P(\theta \mid y_1,\ldots, y_n) &= \frac{P(y_1,\ldots, y_n \mid \theta)\pi(\theta)}{P(y_1,\ldots, y_n)} \end{align*} \]

One way of drawing samples from the posterior

Inputs:

Sampling: for \(i = 1,\ldots, N\):

Why does this work?

ABC: Simple Example

Bayesian analysis for a Poisson random variable.

Model is: \[ \begin{align*} Y_i &\sim \text{Poisson}(\theta), \quad i = 1,\ldots, n \\ \theta &\sim \text{Gamma}(\alpha, \beta) \end{align*} \]

By Bayes rule, we can find in closed form that the posterior, \(P(\theta \mid Y_1, \ldots, Y_n)\) has a \(\text{Gamma}(\sum_{i=1}^n Y_i + \alpha, n + \beta)\) distribution.

Let’s pretend we can’t do that though, and try using ABC.

Set up the function:

generate_abc_sample = function(observed_data,
    prior_distribution,
    data_generating_function) {
    while(TRUE) {
        theta = prior_distribution()
        y = data_generating_function(theta)
        if(all(y == observed_data)) {
            return(theta)
        }
    }
}

Analysis for:

prior_distribution = function() rgamma(n = 1, shape = 1, rate = 1)
data_generating_function = function(theta) rpois(n = 1, lambda = theta)
observed_data = 3
generate_abc_sample(observed_data, prior_distribution, data_generating_function)
## [1] 3.463678
posterior_samples = replicate(n = 1000, generate_abc_sample(observed_data, prior_distribution, data_generating_function))
## our posterior should be gamma(y + alpha, 1 + beta) or gamma(4, 2)
## The mean of a gamma distribution is alpha / beta, so should be 2
mean(posterior_samples)
## [1] 2.037032
## The variance of a gamma distribution is alpha / beta^2, so should be 1
var(posterior_samples)
## [1] 1.10589
qplot(posterior_samples)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

theta_vec = seq(0, 6, length.out = 1000)
plot(dgamma(theta_vec, shape = 4, rate = 2) ~ theta_vec, type = 'l', ylab="true posterior density", xlab = "theta")

What if you have more than one sample?

We still have

n_samples = 2
data_generating_function = function(theta) rpois(n = n_samples, lambda = theta)
observed_data = rep(3, n_samples)
generate_abc_sample(observed_data, prior_distribution, data_generating_function)
## [1] 1.445692
system.time(replicate(n = 1000, generate_abc_sample(observed_data, prior_distribution, data_generating_function)))
##    user  system elapsed 
##   0.481   0.045   0.529

(Try changing n_samples to something bigger on your own computer…)

Problems

Therefore:

ABC: The algorithm

Inputs:

Sampling: for \(i = 1,\ldots, N\):

This method generates approximate samples from the posterior distribution

Set up a function for the approximate version of ABC:

generate_abc_sample_2 = function(observed_data,
    summary_statistic,
    prior_distribution,
    data_generating_function,
    epsilon) {
    while(TRUE) {
        theta = prior_distribution()
        y = data_generating_function(theta)
        if(abs(summary_statistic(y) -  summary_statistic(observed_data)) < epsilon) {
            return(theta)
        }
    }
}

Let’s see what happens with the approximate version:

We still have

n_samples = 10
prior_distribution = function() rgamma(n = 1, shape = 1, rate = 1)
data_generating_function = function(theta) rpois(n = n_samples, lambda = theta)
observed_data = rep(3, n_samples)
summary_statistic = mean
epsilon = .1
generate_abc_sample_2(observed_data, summary_statistic, prior_distribution, data_generating_function, epsilon)
## [1] 1.7603
posterior_samples = replicate(n = 1000,
    generate_abc_sample_2(observed_data,
                          summary_statistic,
                          prior_distribution,
                          data_generating_function,
                          epsilon))

Checking on the posterior means and variances:

(true_posterior_mean = (1 + sum(observed_data)) / (n_samples + 1))
## [1] 2.818182
mean(posterior_samples)
## [1] 2.81916
(true_posterior_variance = (1 + sum(observed_data)) / (n_samples + 1)^2)
## [1] 0.2561983
var(posterior_samples)
## [1] 0.2605666

Checking on the posterior distributions:

qplot(posterior_samples) + xlim(c(0, max(theta_vec)))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plot(dgamma(theta_vec, shape = 1 + sum(observed_data), rate = n_samples + 1) ~ x_vec, type = 'l', ylab = "True posterior density", xlab = "theta")

ABC: Some notes