Writing and calling functions

Reading:

Today: Nuts and bolts, technical aspects of functions in R.

Next time: More practical advice on how to write functions, best practices, etc.

Why write functions?

What is a function?

Function creation

Syntax for function creation:

f = function(arguments) {
    body
}

So for example, if we return to our steak-cooking example from the first week, we might define the following function:

steak_directions = function(temp, steak_type) {
    if(steak_type == "rare" & temp > 115) {
        return("take your steak off!")
    } else if(steak_type == "med_rare" & temp > 125) {
        return("take your steak off!")        
    } 
    "you can keep cooking"
}

We can see the arguments and body of the function using formals and body, respectively.

formals(steak_directions)
## $temp
## 
## 
## $steak_type
body(steak_directions)
## {
##     if (steak_type == "rare" & temp > 115) {
##         return("take your steak off!")
##     }
##     else if (steak_type == "med_rare" & temp > 125) {
##         return("take your steak off!")
##     }
##     "you can keep cooking"
## }

Function arguments

Once you have a function, you call it by specifying the values for all of the arguments. The values can be specified in two ways:

The two can be combined.

So for example, the following are all the same:

steak_directions(temp = 120, steak_type = "rare")
## [1] "take your steak off!"
steak_directions(steak_type = "rare", temp = 120)
## [1] "take your steak off!"
steak_directions(120, "rare")
## [1] "take your steak off!"

But this is of course different and will not work:

simulate_birthdays("rare", 120)
## Error in simulate_birthdays("rare", 120): could not find function "simulate_birthdays"

Default arguments

When you define a function, you can set default values for any/all of the arguments.

When you call such a function, if you don't specify a value for that argument, it will automatically go to the default value.

For example, in the following function the default argument for steak_type is "rare".

steak_directions = function(temp, steak_type = "rare") {
    if(steak_type == "rare" & temp > 115) {
        return("take your steak off!")
    } else if(steak_type == "med_rare" & temp > 125) {
        return("take your steak off!")        
    } 
    "you can keep cooking"
}

If we don't specify steak_type, we will get results as if we had specified it to be "rare", but we can also over-ride that argument if we set it explicitly:

steak_directions(120, "rare")
## [1] "take your steak off!"
steak_directions(120)
## [1] "take your steak off!"
steak_directions(120, steak_type = "med_rare")
## [1] "you can keep cooking"

Return values

When a function is called, the commands in the body of the function are executed, and a value is returned.

The return value is either:

The commands in the body of the function are executed until a return statement is encountered or the the end of the body is reached, whichever comes first.

Let's think through what happens when we call the function these two ways:

steak_directions = function(temp, steak_type = "rare") {
    if(steak_type == "rare" & temp > 115) {
        return("take your steak off!")
    } else if(steak_type == "med_rare" & temp > 125) {
        return("take your steak off!")        
    } 
    "you can keep cooking"
}
steak_directions(steak_type = "rare", temp = 120)
## [1] "take your steak off!"
steak_directions(steak_type = "med_rare", temp = 120)
## [1] "you can keep cooking"

Invisible return

Invisible return is a bit R-specific:

square_invisible = function(x) invisible(x^2)
square = function(x) x^2

If we call square(4) we get output: 16

square(4)
## [1] 16

. . . But if we call square_invisible(4), we don't see any output!

square_invisible(4)

The square was computed though, and we can see this if we assign the output:

xsquared = square_invisible(4)
xsquared
## [1] 16

Another example: compare the two versions of oddcount:

oddcount = function(x) {
    k = 0
    for(n in x) {
        if (n %% 2 == 1) k = k + 1
    }
    return(k)
}
oddcount(c(0, 5))
## [1] 1
oddcount = function(x) {
    k = 0
    for(n in x) {
        if (n %% 2 == 1) k = k + 1
    }
}
oddcount(c(0, 5))
oddcount_output = oddcount(c(0, 5))
oddcount_output
## NULL

We get NULL for the value of oddcount_output because the last function to be evaluated in oddcount was technically the for function.

for returns NULL invisibly, so the second version of oddcount also returns NULL invisibly.

Return values can be anything

g = function() {
    t = function(x) x^2
    return(t)
}
g
## function() {
##     t = function(x) x^2
##     return(t)
## }
g()
## function(x) x^2
## <environment: 0x7fef6276d980>
formals(g)
## NULL
formals(g())
## $x
body(g)
## {
##     t = function(x) x^2
##     return(t)
## }
body(g())
## x^2

Environments and scope

When you call a function, the commands in the function body are executed, but not in exactly the same way they would be if you simply ran them one at a time in an interactive R session.

The commands are executed in the function's environment.

Environments

Ok, so what is an environment?

What are they good for?

For example, have you ever wondered how R finds functions?

The function lm is not in the global environment, as we can see if we just call ls:

ls()
##  [1] "corner"           "f"                "g"               
##  [4] "h"                "multiplot"        "oddcount"        
##  [7] "oddcount_output"  "square"           "square_invisible"
## [10] "steak_directions" "t"                "w"               
## [13] "xsquared"

But we are able to access it and, for instance, ask what its arguments are:

head(formals(lm))
## $formula
## 
## 
## $data
## 
## 
## $subset
## 
## 
## $weights
## 
## 
## $na.action
## 
## 
## $method
## [1] "qr"

Functions live in environments corresponding to the package they are defined in. For lm, this is stats.

environment(lm)
## <environment: namespace:stats>

Package environments are all ancestral to the global environment, so when R found that lm wasn't defined in the global environment, it looked through the packages until it found lm defined in stats.

Function environments

When a function is called, its body is evaluated in an execution environment whose parent is the function's environment.

w = 12
f = function(y) {
    d = 8
    h = function() {
        return(d * (w + y))
    }
    cat("h's environment: ", "\n")
    print(environment(h))
    cat("h's parent environment:", "\n")
    print(parent.env(environment(h)))
    return(h())
}
f(1)
## h's environment:  
## <environment: 0x7fef668bbe18>
## h's parent environment: 
## <environment: R_GlobalEnv>
## [1] 104
environment(f)
## <environment: R_GlobalEnv>

Compare with:

f = function(y) {
    d = 8
    return(h())
}

h = function() {
    cat("h's environment:", "\n")
    print(environment(h))
    cat("h's parent environment:", "\n")
    print(parent.env(environment(h)))
    return(d * (w + y))
}
f(5)
## h's environment: 
## <environment: R_GlobalEnv>
## h's parent environment: 
## <environment: package:knitr>
## attr(,"name")
## [1] "package:knitr"
## attr(,"path")
## [1] "/Library/Frameworks/R.framework/Versions/3.5/Resources/library/knitr"
## Error in h(): object 'd' not found

This perhaps seems overly baroque, but the take-home points about environments (and the reason why they are set up the way they are) are:

Side effects

A function has a side effect if it does anything other than compute a return value, for instance, if it changes the values of other variables in the environment it is defined in, or adds variables to the environment.

We generally don't want functions to have side effects, because they make code more confusing and more difficult to test.

In R, functions can have side effects, but it is discouraged by both the language itself and by programming norms.

Remember that functions can see variables defined in the parent environments.

What they can't do is change the values of those variables (except with a special operator).

For example:

w = 12
f = function(y) {
   d = 8
   w = w + 1
   y = y - 2
   cat(sprintf("Value of w: %i", w))
   h = function() {
      return(d*(w+y))
   }
   return(h())
}
t = 4
f(t)
## Value of w: 13
## [1] 120
w
## [1] 12

It looks like the value of w changed inside the function, but the value in the global environment was not actually changed.