Lecture 14: Code profiling and performance enhancement

Reading: Matloff, Chapter 14

Hadley Wickham, Advanced R

Logistics:

Midterm next week,
One more lecture on programming,
Then on to algorithms

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

Sir Tony Hoare, popularized by Donald Knuth

Counterpoint

Setup

You have:

Code that is correct
Code that is slow for the application you want

You want:

Code that is correct
Code that runs in a reasonable amount of time

Two steps:

Find what the bottleneck is
Rewrite the code for the bottleneck to make it run faster

What to do?

Make a smaller example that runs quickly
Write tests (formal or informal) for the output of the function
Make a new function to implement the same algorithm, rewritten to be (hopefully) faster
Check on the test cases that it gives the same output
Check whether the new version is faster

Finding what the problem is

Base R function for this task is Rprof.

Many other packages, e.g. profvis (I believe what RStudio uses now) are based on Rprof, usually with more nicely formatted output.

How does Rprof work?

Once every set amount of time (default .02 seconds), stops the interpreter and checks what function is being called.
Output is a list of function calls.
Output is formatted to indicate what takes the most time.
It will be non-deterministic, depends a bit on what else your computer is doing.

For example

library(profvis)
f = function() {
  pause(0.1)
  g()
  h()
}
g = function() {
  pause(0.1)
  h()
}
h = function() {
  pause(0.1)
}

source("profiling-example.R")

tmp = tempfile()
Rprof(tmp, interval = 0.01)
f()
Rprof(NULL)

summaryRprof(tmp)

If we don’t sample frequently enough, we don’t get very useful output:

tmp = tempfile()
Rprof(tmp, interval = 0.15)
f()
Rprof(NULL)

summaryRprof(tmp)

Using profvis:

profvis(f())

Checking whether code is faster

Several packages for testing alternative implementations

microbenchmark function in microbenchmark package runs the same code a large number of times
mark function in the bench package does the same thing but with a little more information and some extra plotting facilities.

Strategies for writing faster code

Remove extra work
Vectorise
Avoid copies
Rewrite your R code in C/C++/fortran

Remove extra work

If you know something about the input to a function, you can speed up your code by skipping some of the steps R takes in checking what kind of data is used.

Here we’re bypassing the method dispatch for mean.

library(microbenchmark)
x = runif(1e2)

microbenchmark(
  mean(x),
  mean.default(x)
)

## Unit: nanoseconds
##             expr  min   lq    mean median     uq   max neval cld
##          mean(x) 2123 2222 2719.09 2273.0 2361.0 15633   100   b
##  mean.default(x)  977 1023 1259.76 1059.5 1101.5 15034   100  a

The as.data.frame function performs many checks and is not optimized for performance.

Suppose that we have a list and want to change it to a data frame.

If we’re sure it’s formatted correctly, we can use the quickdf function.

quickdf = function(l) {
  class(l) = "data.frame"
  attr(l, "row.names") = 1:length(l[[1]])
  l
}

l = lapply(1:26, function(i) runif(1e3))
names(l) = letters

microbenchmark(
  quick_df      = quickdf(l),
  as.data.frame = as.data.frame(l)
)

## Unit: microseconds
##           expr     min       lq       mean   median        uq      max
##       quick_df   9.795  11.2755   37.25248  16.4895   19.2925 2100.902
##  as.data.frame 834.235 898.4920 1026.75537 958.2835 1039.0330 2727.082
##  neval cld
##    100  a 
##    100   b

But we need to be sure the input is formatted correctly, because it does no checks:

quickdf(list(x = 1, y = 1:2))

## Warning in format.data.frame(x, digits = digits, na.encode = FALSE):
## corrupt data frame: columns will be truncated or padded with NAs

##   x y
## 1 1 1

Avoid copies

random_string = function() {
  paste(sample(letters, 50, replace = TRUE), collapse = "")
}
strings10 = replicate(10, random_string())
strings100 = replicate(100, random_string())

collapse = function(xs) {
  out = ""
  for (x in xs) {
    out = paste0(out, x)
  }
  out
}

microbenchmark(
  loop10  = collapse(strings10),
  loop100 = collapse(strings100),
  vec10   = paste(strings10, collapse = ""),
  vec100  = paste(strings100, collapse = "")
)

## Unit: microseconds
##     expr     min       lq      mean  median       uq      max neval cld
##   loop10  14.630  15.3790  17.57578  16.324  17.5530   48.325   100  a 
##  loop100 517.291 523.8855 605.96058 568.129 597.0755 2793.477   100   b
##    vec10   3.596   3.9195   4.70854   4.357   5.1320   17.976   100  a 
##   vec100  28.755  30.3115  33.98348  31.691  34.5485  102.933   100  a

Vectorise

This will sometimes give speed improvements and sometimes will not.
Speed improvements happen primarily when it leads to calling compiled C code.

x = matrix(rnorm(15), nrow = 3, ncol = 5)
row_means_loop = function(x) {
    row_means = numeric(nrow(x))
    for(i in 1:nrow(x)) {
        row_sum = 0
        for(j in 1:ncol(x)) {
            row_sum = row_sum + x[i,j]
        }
        row_means[i] = row_sum / ncol(x)
    }
    return(row_means)
}
microbenchmark(
    no_vectorization = row_means_loop(x),
    apply = apply(x, 1, mean),
    c_version = rowMeans(x),
    times = 1000)

## Unit: microseconds
##              expr    min      lq      mean median      uq      max neval
##  no_vectorization  4.524  5.7855 13.008270  6.874  7.5575 6113.275  1000
##             apply 18.465 21.7525 27.769083 28.535 30.0870   97.173  1000
##         c_version  2.348  3.2730  3.939602  3.823  4.4555   25.342  1000
##  cld
##   a 
##    b
##   a

Another example: a problem in probability.

Urn 1 contains 10 blue marbles and 8 yellow marbles
Urn 2 contains 6 blue marbles and 6 yellow marbles
We draw one marble uniformly at random from urn 1, place it in urn 2, and then draw a marble uniformly at random from urn 2.
What is the probability that the second marble is blue?

First attempt:

# perform nreps repetitions of the marble experiment, to estimate
# P(pick blue from Urn 2)
sim1 = function(nreps)  {
    nb1 = 10 #10 blue marbles in Urn1
    n1 = 18 # number of marbles in Urn 1 at 1st pick
    n2 = 13 # number of marbles in Urn 2 at 2nd pick
    count = 0 # number of repetitions in which get blue from Urn 2
    for (i in 1:nreps) {
        nb2 = 6 # 6 bluemarbles orig. in Urn2
        ## pick from Urn 1 and put in Urn 2; is it blue?
        if (runif(1) < nb1/n1) nb2 = nb2 + 1
        ## pick from Urn 2; is it blue?
        if (runif(1) < nb2/n2) count = count + 1
    }
    return(count/nreps) # est. P(pick blue from Urn 2)
}

Let’s try pre-generate all of our random numbers and remove the for loop:

sim2 = function(nreps) {
    nb1 = 10
    nb2 = 6
    n1 = 18
    n2 = 13
    ## pre-generate all our random numbers, one row per repetition
    u = matrix(c(runif(2 * nreps)), nrow = nreps, ncol = 2)
    ## define simfun for use in apply(); simulates one repetition
    simfun = function(rw) {
        ## rw ("row") is a pair of random numbers
        ## choose from Urn 1
        if (rw[1] < nb1 / n1) nb2 = nb2 + 1
        ## choose from Urn 2, and return boolean on choosing blue
        return (rw[2] < nb2 / n2)
    }
    z = apply(u, 1, simfun)
    return(mean(z))
}

microbenchmark(sim1(100), sim2(100))

## Unit: microseconds
##       expr     min       lq     mean   median       uq       max neval cld
##  sim1(100) 259.107 271.3620 478.8515 285.0615 341.2680 11297.506   100   a
##  sim2(100) 128.224 135.2935 235.9363 141.4335 158.3995  8446.768   100   a

The timings here are different from in Matloff. I believe this is because more recent version of R byte compile functions that you write, and this was not true for the version of R used in the book.

Take 3:

sim3 = function(nreps) {
    nb1 = 10
    nb2 = 6
    n1 = 18
    n2 = 13
    u = matrix(c(runif(2 * nreps)), nrow = nreps, ncol = 2)
    ## set up the condition vector
    cndtn =
        u[,1] <= nb1 / n1 & u[,2] <= (nb2+1) / n2 |
        u[,1] > nb1 / n1 & u[,2] <= nb2 / n2
    return(mean(cndtn))
}

microbenchmark(sim1(100), sim2(100), sim3(100))

## Unit: microseconds
##       expr     min       lq     mean   median       uq       max neval cld
##  sim1(100) 261.157 279.9745 484.1770 411.6000 459.8785 10179.709   100   b
##  sim2(100) 127.339 137.5345 212.5064 185.0070 249.0700  1054.157   100  a 
##  sim3(100)  19.062  23.3220 122.1441  28.4635  36.6295  8939.858   100  a

Overall strategies

If you know something about your input that R has to check for, you can save time by bypassing the checks
Vectorised functions are sometimes faster, sometimes not
Matrix operations generally call C and are therefore fast
Compiled C code is faster than R code