Debugging
Reading: Matloff Chapter 13
Today: Debugging
Principle of confirmation
You wrote a function, and it does something you don't think it should. Debugging is figuring out why this is.
Aside from syntax errors, bugs are assumptions you made when writing the code that aren't actually true.
This is Matloff's principle of confirmation:
Fixing a buggy program is a process of confirming, one by one, that the many things you believe to be true about the code actually aretrue. When you find that one of your assumptions is not true, you have found a clue about the location (if not the exact nature) of a bug.
Some common causes of bugs
Syntax problems:
- Parentheses mismatches, particularly when nesting functions.
data(diamonds)
mean(subset(diamonds$carat), cut == "Ideal")
## Error in subset.default(diamonds$carat): argument "subset" is missing, with no default
mean(subset(diamonds$carat, cut = "Ideal"))
## Error in subset.default(diamonds$carat, cut = "Ideal"): argument "subset" is missing, with no default
- Arguments to functions given in the wrong order.
Inputs to functions are of a type you didn't expect:
[[]]
vs. []
: element of a list vs. sublist
Vectors vs. single values: You assume a single value but have a vector with more than one element, unexpected recycling.
Silent type conversions: when converting from a data frame to an array, or when creating a data frame.
NA values in data where they're not allowed.
Scope issues/global variables:
Function relies on a global variable with the wrong value.
You tried to use a function to change a global variable.
Confusion between arguments to the function and global variables.
Bug processing
Once you realize you have a bug, there are three steps:
Characterize the bug
Localize the bug
Fix the bug
Localizing the bug
First find the function that made the issue (traceback
helps here)
Find the line in that function that made the bug (single-stepping through the function or adding print statements in places you think are likely to have gone wrong).
Main debugging operations
Stepping through the source code
In R, you can use the debug
function or set breakpoints.
Pretty good control over how to step through: options for line by line, step into functions, continue through loops.
Inspecting variables
Setting up a function for debugging
f = function(y, z) {
x = y^2 - 3 * z^2
w = 28
if (x > 0 && a > 0) {
u = 1 + x
} else {
u = 10
}
return(u)
}
f(0, 1)
## [1] 10
f(1, 0)
## Error in f(1, 0): object 'a' not found
## try:
## debug(f)
## f(1, 0)
The browser
function:
Syntax: browser(expr=condition)
You enter the browser if the condition is true.
If you don't specify a condition, the function stops executing and you enter the browser when you reach the browser
line in the function.
f = function(y, z) {
x = y^2 - 3 * z^2
w = 28
if (x > 0 && a > 0) {
u = 1 + x
} else {
u = 10
}
return(u)
}
f(0, 1)
f(1, 0)
The setBreakpoint
function
Syntax: setBreakpoint(filename, linenumber)
This is what the RStudio breakpoints do.
Undoing a breakpoint is with untrace(g)
Takes you to the browser at the breakpoint.
Commands once you're in the browser
n
: Execute the next command.
s
: Step into the next function.
f
: Finish the current loop or function.
c
: Continue execution normally.
Q
: Stop the function and return to console.
g = function(a) {
y = a^2 + 3
if(y - 10 > 2) {
return(y)
} else {
return(a)
}
}
f = function(y, z) {
x = y^2 - 3 * g(z)
w = 28
if (x > 0 && a > 0) {
u = 1 + x
} else {
u = 10
}
return(u)
}
f(0,1)
debug(f)
f(0, 1)
Debugging after an error
The traceback
function:
Syntax: traceback()
, called after a crash (in RStudio the traceback is printed automatically after an error).
Provides the list of function calls leading to the error, allows you to localize the bug.
If the traceback has a combination of functions that you wrote and functions in base R, focus your attention on those that you wrote.
f = function(y, z) {
x = y^2 - 3 * z^2
w = 28
if (x > 0 && a > 0) {
u = 1 + x
} else {
u = 10
}
return(u)
}
f(1, 0)
traceback()
g = function(a) {
y = a^2 + 3
if(y - 10 > 2) {
return(y)
} else {
return(z)
}
}
f = function(y, z) {
x = y^2 - 3 * g(z)
w = 28
if (x > 0 && a > 0) {
u = 1 + x
} else {
u = 10
}
return(u)
}
f(1, 0)
traceback()
f(10, 3.1)
traceback()
The debugger
function:
Syntax: debugger()
You need to have set options(error=dump.frames)
for this to work.
If your function crashes and you call debugger()
, you can inspect variables in any of the function environments.
options(error = dump.frames)
f(1,0)
debugger()
Example 1
findruns
is supposed to find the starting positions of all the runs of 1's of length k
in x
:
findruns = function(x, k) {
n = length(x)
runs = NULL
for(i in 1:(n-k)) {
if(all(x[i:(i+k-1)] == 1)) {
runs = c(runs, i)
}
}
return(runs)
}
findruns(c(1,0,0,1,1,0,1,1,1),2)
## [1] 4 7
Example 2
## returns the minimum value of d[i,j], i != j, and
## the row/col attaining that minimum, for square
## symmetric matrix d; no special policy on ties;
## motivated by distance matrices
mind = function(d) {
n = nrow(d)
## add a column to identify row number for apply()
dd = cbind(d, 1:n)
wmins = apply(dd[-n, ], 1, imin)
## wmins will be 2xn, 1st row being indices and 2nd being values
i = which.min(wmins[1, ])
j = wmins[2, i]
return(c(d[i, j], i, j))
}
## finds the location, value of the minimum in a row x
imin = function(x) {
n = length(x)
i = x[n]
j = which.min(x[(i + 1):(n - 1)])
return(c(j, x[j]))
}
m = rbind(c(0, 12, 5), c(12, 0, 8), c(5, 8, 0))
Antibugging or defensive coding
When you're writing code, think about what you're assuming the inputs should be.
Write those assumptions in explicitly and test for them, with things like stopifnot
, error
, warning
.
Write small functions with well defined inputs and outputs, makes it easier to check whether they're doing the right thing during debugging.
Write code so that it is clear what each line/function is expected to do. This makes it easier to check whether what you assume it does is the same as what it actually does.
This includes: comments, descriptive names for variables, descriptive names for functions.