Lab 6/Homework 9

Part 1

A random walk is a sequence of random variables \(x_1,...,x_T\) where each step is generated by adding a random increment to the previous value: \[ x_t = x_{t-1} + y_t, t = 1,..., T \] with \(x_0\) the starting point and \(y_1,...,y_T\) independent, identically distributed increments. The simplest random walk uses \[ y_i = \begin{cases} 1 & \text{w.p. } 1/2 \\ -1 & \text{w.p. } 1/2 \end{cases} \]

We want to be able to sample random walks, conditional on the walk staying in a certain region, say, in the interval [-1, 5].

How might we do this? For a random walk where each increment is given by random variable that takes values -1 and 1 with equal probability, the probability mass function for \(x_1,...,x_T\) is given by \[ P(x_1,...,x_T) = P(x_1) P(x_2 | x_1) ... P(x_T | x_{T-1}) = P(x_1) \prod_{i=2}^T P(x_i | x_{i-1}) \] where \(P(x_i | x_{i-1}) = 1/2\) if \(|x_i - x_{i-1}| = 1\) and 0 otherwise.

It is easy to simulate this sort of a random walk, we just simulate T random variables \(y_1,...,y_T\), each of which takes value -1 w.p. 1/2 and takes value 1 w.p. 1/2, and let \(x_t = \sum_{i=1}^t y_i\).

It is trickier to sample from the conditional distribution, given that the walk stays in the interval.

One thing we could try is standard accept-reject. Let \(Z\) be the probability that the random walk stays in the interval. The conditional distribution we want is then \[ P(x_1,...,x_T | x_1 \in [-1, 5], ..., x_T \in [-1, 5]) = \begin{cases} P(x_1,...,x_T) / Z & x_1 \in [-1,5], ..., x_T \in [-1,5] \\ 0 & \text{o.w.} \end{cases} \]

If we take the unconditional walk, with pmf \(P(x_1,...,x_T)\) to be the proposal distribution \(g\), the conditional distribution \(P(x_1,...,x_T | x_1 \in [-1,5],...,x_T \in [-1,5])\) to be the target distribution \(f\), and \(1/Z\) to be the constant multiplier, the accept-reject ratio is \[ P(x_1,...,x_T | x_1 \in [-1,5],...,x_T \in [-1,5]) / [P(x_1,...,x_T) / Z] \] which is equal to 1 if \(x_1 \in [-1, 5], ...,x_T \in [-1,5]\) and 0 otherwise.

In other words, we accept if the proposed path stays in the desired interval the whole time and reject otherwise.

As we will see, for long walks, the acceptance probability becomes very small. Fill out the following functions, and then examine how many proposals are accepted for different values of \(T\).

#' Samples the increments for the random walk. You can change if you want a different kind of walk.
simulate_increment <- function() {
    return(sample(c(-1, 1), size = 1))
}
#' Simulates a walk of length T using simulate_increment for the increments
simulate_walk <- function(T) {
    
}
#' Takes a walk (a vector of length T), and checks whether all of the elements are in the interval [low, hi]. Returns TRUE if the walk stays in the interval
check_walk_in_interval <- function(walk, low = -1, hi = 5) {
    
}
#' Implements accept-reject for sampling from the conditional distribution of the random walks: we sample random walks until we get one that satisfies the conditions
accept_reject_for_conditional_walk <- function(T, low, hi) {
    while(TRUE) {
        proposed_walk <- simulate_walk(T)
        if(check_walk_in_interval(proposed_walk)) {
            return(proposed_walk)
        }
    }
}

Part 2

We can build a more efficient way of sampling from this conditional distribution by doing something a little different. The ideas here are related to sequential Monte Carlo.

The overall idea here is: We want to simulate from \(P(x_1,...,x_T | x_1 \in [-1, 5], ...,x_T \in [-1,5])\). Accept/reject won’t work well for large \(T\), however, we can start with short paths and build up to long paths.

First, we notice that we can simulate from short paths very efficiently. For instance, our naive accept/reject algorithm will do a good job at getting samples from \(P(x_1 | x_1 \in [-1,5])\) (a length-1 random walk).

Then, suppose we already have samples from the conditional length-\(t\) random walk, that is, paths \(x_1, \dots, x_t\) that stay in the interval \([-1,5]\). We can generate samples for a length-\(t+1\) random walk by adding one step at a time as follows:

\[ P(x_1, \dots, x_{t+1} \mid x_1, \dots, x_{t+1} \in [-1,5]) = P(x_{t+1} \mid x_1, \dots, x_t, x_1, \dots, x_t \in [-1,5], x_{t+1} \in [-1,5]) \cdot P(x_1, \dots, x_t \mid x_1, \dots, x_t \in [-1,5]). \]

The second term is the conditional distribution of the first \(t\) steps, which we already have. The first term is the conditional distribution of the next step given the previous steps and the constraint that \(x_{t+1}\) stays in the interval.

Suppose our proposal distribution for a conditional length-\(t+1\) random walk is given by taking a draw from the conditional length-\(t\) random walk, and proposing \(x_{t+1} = x_t + y_{t+1}\). The pmf for the proposal is \(P(x_1, ..., x_T | x_1, ..., x_T \in [-1, 5])P(y_{t+1})\).

As before, we know that the ratio between the \(g\) (the pmf of the proposal distribution) and \(f\) (the pmf of the target distribution) is given by \[ \text{acceptance probability} =\frac{ P(x_{t+1} \mid x_1, \dots, x_t, x_1, \dots, x_t \in [-1,5], x_{t+1} \in [-1,5]) \cdot P(x_1, \dots, x_t \mid x_1, \dots, x_t \in [-1,5])}{cP(x_1, ..., x_T | x_1, ..., x_T \in [-1, 5])P(y_{t+1})} \] where \(c = 1/Z = 1/ P(x_{t+1} \in [-1,5] | x_1,...,x_t \in [-1,5])\). This turns out to simplify to \[ \text{acceptance probability} = \begin{cases} 1 & x_t + y_{t+1} \in [-1,5] \\ 0 & x_t + y_{t+1} \not \in [-1,5] \end{cases} \]

In other words:

Start from a valid length-\(t\) path.
Propose the next step \(x_{t+1} = x_t + y_{t+1}\) with \(y_{t+1} = \pm 1\) equally likely.
Accept the step if it stays in \([-1,5]\); otherwise, reject it and try again.

By repeating this process for each step, we build the random walk one step at a time, checking the constraint as we go. This is the sequential accept–reject method: instead of generating an entire length-\(T\) walk and hoping it satisfies the constraints, we enforce the constraint incrementally, which is much more efficient for long walks.

#' Proposes the next step given the current position.
#' Returns a random next position that satisfies the interval constraint.
propose_next_step <- function(current, low = -1, hi = 5) {
    # TODO: simulate y = +1 or -1
    # TODO: check if current + y is within [low, hi]
    # If yes, return new position; if not, try again (loop until valid)
}

#' Builds a single conditional random walk of length T using sequential accept-reject
simulate_conditional_walk <- function(T, low = -1, hi = 5) {
    walk <- numeric(T)
    # TODO: initialize first step with accept-reject
    # for t = 2 to T:
    #   use propose_next_step to generate the next step
    #   append to walk
    return(walk)
}

#' Generates n_conditional walks of length T
simulate_conditional_walks <- function(n, T, low = -1, hi = 5) {
    walks <- matrix(NA, nrow = T, ncol = n)
    for (i in 1:n) {
        walks[, i] <- simulate_conditional_walk(T, low, hi)
    }
    return(walks)
}

Check on your walks by plotting them, and check how long it takes accept_reject_for_conditional_walk and simulate_conditional_walk to run for the same parameters using bench::mark or microbenchmark.

walks <- simulate_conditional_walks(n = 20, T = 50, low = -1, hi = 5)
par(mfrow = c(4,5))
for(i in 1:20) {
    plot(walks[,i], type = 'l', ylim = c(-1, 5))
}

What to turn in

Rmd/html documents showing your functions and samples from your simulators, and your checks.