Lab 2: Functions and Environments

We are interested in the impact of different diets on chicken growth. We have a dataset giving the weights of chickens on each of four different diets over the first several weeks of life in the ChickWeight dataset.

We can first look at the data and get a summary:

data(ChickWeight)
head(ChickWeight)
##   weight Time Chick Diet
## 1     42    0     1    1
## 2     51    2     1    1
## 3     59    4     1    1
## 4     64    6     1    1
## 5     76    8     1    1
## 6     93   10     1    1
summary(ChickWeight)
##      weight           Time           Chick     Diet   
##  Min.   : 35.0   Min.   : 0.00   13     : 12   1:220  
##  1st Qu.: 63.0   1st Qu.: 4.00   9      : 12   2:120  
##  Median :103.0   Median :10.00   20     : 12   3:120  
##  Mean   :121.8   Mean   :10.72   10     : 12   4:118  
##  3rd Qu.:163.8   3rd Qu.:16.00   17     : 12          
##  Max.   :373.0   Max.   :21.00   19     : 12          
##                                  (Other):506

Questions:

We can also make a plot to see what's going on a little better. Don't worry about ggplot for this class — it's a nice plotting package, and if you take Exploratory Data Analysis you'll use it more, but you won't have to know about it for our class.

library(ggplot2)
ggplot(ChickWeight, aes(x = Time, y = weight, color = Diet, group = Chick)) +
    geom_point() +
    geom_line() +
    facet_wrap(~ Diet)

plot of chunk unnamed-chunk-2

Suppose we are interested in how much weight the chickens have put on after four days on diet 1. We write the following code:

time_zero_indices <- ChickWeight$Time == 0 & ChickWeight$Diet == 1
end_time_indices <- ChickWeight$Time == 4 & ChickWeight$Diet == 1
end_weight <- mean(ChickWeight$weight[end_time_indices])
initial_weight <- mean(ChickWeight$weight[time_zero_indices])
end_weight - initial_weight
## [1] 15.07368

We then decide that we might want to be able to compute this value for any diet and any end time, and so we would like to write a function to do so. We try to do so in the following way:

n_diets <- length(unique(ChickWeight$Diet))
weight_gained_vector <- rep(NA, n_diets)
set_weight_gained <- function(diet, end_time) {
    time_zero_indices <- ChickWeight$Time == 0 & ChickWeight$Diet == diet
    end_time_indices <- ChickWeight$Time == end_time & ChickWeight$Diet == diet
    end_weight <- mean(ChickWeight$weight[end_time_indices])
    initial_weight <- mean(ChickWeight$weight[time_zero_indices])
    weight_gained_vector[diet] <- end_weight - initial_weight
}
for(diet in 1:4) {
    set_weight_gained(diet = diet, end_time = 4)
}

Questions:

The ls() function lists all of the variables present in the environment ls is called from. We can use it to see what variables are present inside the function call by printing out the result from ls in the middle of the function. Try that in the set_weight_gained function:

set_weight_gained <- function(diet, end_time) {
    time_zero_indices <- ChickWeight$Time == 0 & ChickWeight$Diet == diet
    end_time_indices <- ChickWeight$Time == end_time & ChickWeight$Diet == diet
    end_weight <- mean(ChickWeight$weight[end_time_indices])
    initial_weight <- mean(ChickWeight$weight[time_zero_indices])
    print(ls())
    weight_gained_vector[diet] <- end_weight - initial_weight
}
set_weight_gained(diet = diet, end_time = 4)
## [1] "diet"              "end_time"          "end_time_indices" 
## [4] "end_weight"        "initial_weight"    "time_zero_indices"

We will see better tools for investigating the variables available in function environments when we talk about debugging, but for now this gives us a peek at what is happening when the function is called.

Your final task: Make a function that computes the average amount of weight gained by chickens on one of the diets after a given period of time. That is, make a function that takes diet and end_time, like set_weight_gained did, but that returns the weight gained instead of trying to assign the output to a variable in the global environment.