Monte Carlo methods: Approximate Bayesian Computation

Today: Approximate Bayesian Computation

Reading:

Our goals

Next two weeks:

Bayesian Statistics

Suppose we have data \(y_1,\ldots, y_n\) that we believe can be described by a probability model with parameters \(\theta\).

We also have a prior distribution on the parameters \(\theta\), describing our belief about the values of those parameters before seeing any of the data.

\[ \begin{align*} y_i \mid \theta &\sim P(y_i \mid \theta), \quad i = 1,\ldots, n\\ \theta & \sim \pi \end{align*} \]

For example:

This posterior distribution is the Bayesian analog of a confidence interval for a normal mean.

Some more complicated examples

Finance:

Epidemiology

Posterior distribution

By applying Bayes’ rule, we can compute the posterior distribution of the parameters given the data: \[ \begin{align*} P(\theta \mid y_1,\ldots, y_n) &= \frac{P(y_1,\ldots, y_n \mid \theta)P(\theta)}{P(y_1,\ldots, y_n)} \end{align*} \]

One way of drawing samples from the posterior

Inputs:

Sampling: for \(i = 1,\ldots, N\):

Why does this work?

(Non-)Approximate Bayesian Computation Simple Example

Bayesian analysis for a Gamma-Poisson model:

Model is: \[ \begin{align*} Y_i &\sim \text{Poisson}(\theta), \quad i = 1,\ldots, n \\ \theta &\sim \text{Gamma}(\alpha, \beta) \end{align*} \]

By Bayes rule, we can find in closed form that the posterior, \(P(\theta \mid Y_1, \ldots, Y_n)\) has a \(\text{Gamma}(\sum_{i=1}^n Y_i + \alpha, n + \beta)\) distribution.

Let’s pretend we can’t do that though, and try using the sampling method we described on the previous slide.

Set up the function:

## We call this "ebc" because it is exactly sampling from the
## posterior ("exact Bayesian computation"). ABC will be a small
## modification of this method.
generate_ebc_sample <- function(observed_data,
                                prior_distribution,
                                data_generating_function) {
    while(TRUE) {
        theta <- prior_distribution()
        y <- data_generating_function(theta)
        if(all(y == observed_data)) {
            return(theta)
        }
    }
}

Sampling from:

prior_distribution <- function() rgamma(n = 1, shape = 1, rate = 1)
data_generating_function <- function(theta) rpois(n = 1, lambda = theta)
observed_data <- 3
generate_ebc_sample(observed_data, prior_distribution, data_generating_function)
## [1] 3.463678
posterior_samples <- replicate(n = 1000, generate_ebc_sample(observed_data, prior_distribution, data_generating_function))
## our posterior should be gamma(y + alpha, 1 + beta) or gamma(4, 2)
## The mean of a gamma distribution is alpha / beta, so should be 2
mean(posterior_samples)
## [1] 2.037032
## The variance of a gamma distribution is alpha / beta^2, so should be 1
var(posterior_samples)
## [1] 1.10589
qplot(posterior_samples)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

theta_vec <- seq(0, 6, length.out = 1000)
plot(dgamma(theta_vec, shape = 4, rate = 2) ~ theta_vec, type = 'l', ylab="true posterior density", xlab = "theta")

What if you have more than one sample?

We still have

n_samples <- 4
data_generating_function <- function(theta) rpois(n = n_samples, lambda = theta)
observed_data <- rep(3, n_samples)
generate_ebc_sample(observed_data, prior_distribution, data_generating_function)
## [1] 2.662207
system.time(replicate(n = 1000, generate_ebc_sample(observed_data, prior_distribution, data_generating_function)))
##    user  system elapsed 
##   5.544   0.011   5.556

(Try changing n_samples to something bigger on your own computer…)

Problems

Therefore:

ABC: The algorithm

Inputs:

Sampling: for \(i = 1,\ldots, N\):

This method generates approximate samples from the posterior distribution

Set up a function for the approximate version of ABC:

generate_abc_sample <- function(observed_data,
                                  summary_statistic,
                                  prior_distribution,
                                  data_generating_function,
                                  epsilon) {
    while(TRUE) {
        theta <- prior_distribution()
        y <- data_generating_function(theta)
        if(abs(summary_statistic(y) -  summary_statistic(observed_data)) < epsilon) {
            return(theta)
        }
    }
}

Let’s see what happens with the approximate version:

We still have

n_samples <- 10
prior_distribution <- function() rgamma(n = 1, shape = 1, rate = 1)
data_generating_function <- function(theta) rpois(n = n_samples, lambda = theta)
observed_data <- rep(3, n_samples)
summary_statistic <- mean
epsilon <- .1
generate_abc_sample(observed_data, summary_statistic, prior_distribution, data_generating_function, epsilon)
## [1] 2.226695
posterior_samples <- replicate(n = 1000,
    generate_abc_sample(observed_data,
                          summary_statistic,
                          prior_distribution,
                          data_generating_function,
                          epsilon))
system.time(
replicate(n = 1000,
    generate_abc_sample(observed_data,
                          summary_statistic,
                          prior_distribution,
                          data_generating_function,
                          epsilon))
)
##    user  system elapsed 
##   0.931   0.002   0.933

Checking on the posterior means and variances:

(true_posterior_mean <- (1 + sum(observed_data)) / (n_samples + 1))
## [1] 2.818182
mean(posterior_samples)
## [1] 2.84378
(true_posterior_variance <- (1 + sum(observed_data)) / (n_samples + 1)^2)
## [1] 0.2561983
var(posterior_samples)
## [1] 0.2641091

Checking on the posterior distributions:

qplot(posterior_samples) + xlim(c(0, max(theta_vec)))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plot(dgamma(theta_vec, shape = 1 + sum(observed_data), rate = n_samples + 1) ~ theta_vec, type = 'l', ylab = "True posterior density", xlab = "theta")

ABC: Some notes