Lecture 14: Code profiling and performance enhancement

Reading: Matloff, Chapter 14

Hadley Wickham, Advanced R

Logistics:

One more lecture on programming,
Then on to algorithms

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

Sir Tony Hoare, popularized by Donald Knuth

Counterpoint

Setup

You have:

Code that is correct
Code that is slow for the application you want

You want:

Code that is correct
Code that runs in a reasonable amount of time

Two steps:

Find what the bottleneck is
Rewrite the code for the bottleneck to make it run faster

What to do?

Make a smaller example that runs quickly
Write tests (formal or informal) for the output of the function
Make a new function to implement the same algorithm, rewritten to be (hopefully) faster
Check on the test cases that it gives the same output
Check whether the new version is faster

Finding what the problem is

Base R function for this task is Rprof.

Many other packages, e.g. profvis (I believe what RStudio uses now) are based on Rprof, usually with more nicely formatted output.

How does Rprof work?

Once every set amount of time (default .02 seconds), stops the interpreter and checks what function is being called.
Output is a list of function calls.
Output is formatted to indicate what takes the most time.
It will be non-deterministic, depends a bit on what else your computer is doing.

For example

library(profvis)
f <- function() {
  pause(0.1)
  g()
  h()
}
g <- function() {
  pause(0.1)
  h()
}
h <- function() {
  pause(0.1)
}

source("profiling-example.R")

tmp <- tempfile()
Rprof(tmp, interval = 0.02)
f()
Rprof(NULL)

summaryRprof(tmp)

If we don’t sample frequently enough, we don’t get very useful output:

tmp <- tempfile()
Rprof(tmp, interval = 0.2)
f()
Rprof(NULL)

summaryRprof(tmp)

Using profvis:

profvis(f())

Checking whether code is faster

Several packages for testing alternative implementations

microbenchmark function in microbenchmark package runs the same code a large number of times
mark function in the bench package does the same thing but with a little more information and some extra plotting facilities.

Strategies for writing faster code

Remove extra work
Vectorise
Avoid copies
Rewrite your R code in C/C++/fortran

Remove extra work

If you know something about the input to a function, you can speed up your code by skipping some of the steps R takes in checking what kind of data is used.

Here we’re bypassing the method dispatch for mean.

library(microbenchmark)
x <- runif(1e2)

microbenchmark(
  mean(x),
  mean.default(x), times = 1000
)

## Unit: microseconds
##             expr   min    lq     mean median     uq    max neval
##          mean(x) 2.048 2.152 2.310585 2.2135 2.2855 19.078  1000
##  mean.default(x) 1.091 1.154 1.203793 1.1820 1.2290  2.989  1000

The as.data.frame function performs many checks and is not optimized for performance.

Suppose that we have a list and want to change it to a data frame.

If we’re sure it’s formatted correctly, we can use the quickdf function.

quickdf <- function(l) {
  class(l) <- "data.frame"
  attr(l, "row.names") <- 1:length(l[[1]])
  l
}

l <- lapply(1:26, function(i) runif(1e3))
names(l) <- letters

microbenchmark(
  quick_df      = quickdf(l),
  as.data.frame = as.data.frame(l)
)

## Unit: microseconds
##           expr     min       lq      mean    median       uq       max neval
##       quick_df  10.597   11.918  164.2065   13.7245   17.432 14912.223   100
##  as.data.frame 997.955 1014.611 1102.2581 1028.5585 1069.960  4473.228   100

But we need to be sure the input is formatted correctly, because it does no checks:

quickdf(list(x = 1, y = 1:2))

## Warning in format.data.frame(if (omit) x[seq_len(n0), , drop = FALSE] else x, :
## corrupt data frame: columns will be truncated or padded with NAs

##   x y
## 1 1 1

Avoid copies

random_string <- function() {
  paste(sample(letters, 50, replace = TRUE), collapse = "")
}
strings10 <- replicate(10, random_string())
strings100 <- replicate(100, random_string())

collapse <- function(xs) {
  out = ""
  for (x in xs) {
    out <- paste0(out, x)
  }
  out
}

microbenchmark(
  loop10  = collapse(strings10),
  loop100 = collapse(strings100),
  vec10   = paste(strings10, collapse = ""),
  vec100  = paste(strings100, collapse = "")
)

## Unit: microseconds
##     expr     min      lq      mean   median       uq      max neval
##   loop10  21.934  22.348  24.02720  22.6275  23.3945   58.403   100
##  loop100 582.833 584.588 636.88791 586.5090 606.5575 3034.728   100
##    vec10   4.275   4.508   5.17937   4.7045   5.0455   19.173   100
##   vec100  28.946  29.338  31.04682  29.8150  30.4745   74.487   100

Vectorise

This will sometimes give speed improvements and sometimes will not.
Speed improvements happen primarily when it leads to calling compiled C code.

x <- matrix(rnorm(15), nrow = 3, ncol = 5)
row_means_loop <- function(x) {
    row_means <- numeric(nrow(x))
    for(i in 1:nrow(x)) {
        row_sum <- 0
        for(j in 1:ncol(x)) {
            row_sum <- row_sum + x[i,j]
        }
        row_means[i] <- row_sum / ncol(x)
    }
    return(row_means)
}
microbenchmark(
    no_vectorization = row_means_loop(x),
    apply = apply(x, 1, mean),
    c_version = rowMeans(x),
    times = 1000)

## Unit: microseconds
##              expr    min     lq      mean  median      uq      max neval
##  no_vectorization  5.199  5.607 16.300027  6.0025  6.4065 7169.270  1000
##             apply 22.297 23.259 28.150100 23.8715 25.7180  115.169  1000
##         c_version  2.889  3.333  4.273167  3.7890  4.1200   26.165  1000

Another example: a problem in probability.

Urn 1 contains 10 blue marbles and 8 yellow marbles
Urn 2 contains 6 blue marbles and 6 yellow marbles
We draw one marble uniformly at random from urn 1, place it in urn 2, and then draw a marble uniformly at random from urn 2.
What is the probability that the second marble is blue?

First attempt:

# perform nreps repetitions of the marble experiment, to estimate
# P(pick blue from Urn 2)
sim1 <- function(nreps)  {
    nb1 <- 10 #10 blue marbles in Urn1
    n1 <- 18 # number of marbles in Urn 1 at 1st pick
    n2 <- 13 # number of marbles in Urn 2 at 2nd pick
    count <- 0 # number of repetitions in which get blue from Urn 2
    for (i in 1:nreps) {
        nb2 <- 6 # 6 blue marbles orig. in Urn2
        ## pick from Urn 1 and put in Urn 2; is it blue?
        if (runif(1) < nb1/n1) nb2 <- nb2 + 1
        ## pick from Urn 2; is it blue?
        if (runif(1) < nb2/n2) count <- count + 1
    }
    return(count/nreps) # est. P(pick blue from Urn 2)
}

Let’s try pre-generate all of our random numbers and remove the for loop:

sim2 <- function(nreps) {
    nb1 <- 10
    nb2 <- 6
    n1 <- 18
    n2 <- 13
    ## pre-generate all our random numbers, one row per repetition
    u <- matrix(c(runif(2 * nreps)), nrow = nreps, ncol = 2)
    ## define simfun for use in apply(); simulates one repetition
    simfun <- function(rw) {
        ## rw ("row") is a pair of random numbers
        ## choose from Urn 1
        if (rw[1] < nb1 / n1) nb2 <- nb2 + 1
        ## choose from Urn 2, and return boolean on choosing blue
        return (rw[2] < nb2 / n2)
    }
    z <- apply(u, 1, simfun)
    return(mean(z))
}

microbenchmark(sim1(100), sim2(100))

## Unit: microseconds
##       expr     min       lq     mean  median      uq       max neval
##  sim1(100) 260.834 425.9855 579.7945 437.471 454.409 10682.013   100
##  sim2(100) 154.577 159.9060 245.1615 164.410 177.711  7149.775   100

The timings here are different from in Matloff. I believe this is because more recent version of R byte compile functions that you write, and this was not true for the version of R used in the book.

Take 3:

sim3 <- function(nreps) {
    nb1 <- 10
    nb2 <- 6
    n1 <- 18
    n2 <- 13
    u <- matrix(c(runif(2 * nreps)), nrow = nreps, ncol = 2)
    ## set up the condition vector
    cndtn <-
        u[,1] <= nb1 / n1 & u[,2] <= (nb2+1) / n2 |
        u[,1] > nb1 / n1 & u[,2] <= nb2 / n2
    return(mean(cndtn))
}

microbenchmark(sim1(100), sim2(100), sim3(100))

## Unit: microseconds
##       expr     min      lq     mean   median       uq      max neval
##  sim1(100) 257.505 281.790 379.3203 426.5785 453.1620  662.683   100
##  sim2(100) 152.142 158.667 176.3794 167.3800 187.6515  301.012   100
##  sim3(100)  19.917  24.228 109.0899  26.2305  29.4800 8148.984   100

Overall strategies

If you know something about your input that R has to check for, you can save time by bypassing the checks
Vectorised functions are sometimes faster, sometimes not
Matrix operations generally call C and are therefore fast
Compiled C code is faster than R code