Stat 610 Lecture 2: Flow control and looping

Today

Reading: Matloff Chapter 7.1, 7.2, 2.9

Last time/finish up today: Data structures, so that we have something to work on.

This time: Flow control, so we can actually do things.

Flow control and looping

if statements

Syntax

if (condition) {
    action1
} else {
    action2
}

So for example:

weather <- "sunny"
if(weather == "rainy") {
    print("Take your umbrella!")
} else {
    print("No need for an umbrella today...")
}
## [1] "No need for an umbrella today..."

You can make more complicated conditions using either else if or nested if statements:

weather <- "cloudy"
if(weather == "rainy") {
    print("Take your umbrella!")
} else if (weather == "cloudy") {
    print("Think about taking your umbrella")
} else {
    print("No need for an umbrella today...")
}
## [1] "Think about taking your umbrella"

Some rules:

Combining booleans and lazy evaluation

We often want to combine conditions, which we can do with boolean operations.

Like all other languages, R has AND and OR functions, but unlike some other languages it has two of each.

So for example:

steak_type <- "med_rare"
temp <- 130
if(steak_type == "rare" & temp > 115) {
    print("take your steak off!")
} else if(steak_type == "med_rare" & temp > 125) {
    print("take your steak off!")        
} else {
    print("you can keep cooking")
}
## [1] "take your steak off!"

NB: As we’ll see in two slides, & works here but it would be better to use &&.

Or, in not so dire a situation:

steak_type <- "rare"
temp <- 110
if(steak_type == "rare" && temp > 115) {
    print("take your steak off!")
} else if(steak_type == "med_rare" && temp > 125) {
    print("take your steak off!")        
} else {
    print("you can keep cooking")
}
## [1] "you can keep cooking"

What is the difference between the two?

Lazy evaluation:

Try this on your computer. Which ones are fast and which ones are slow? Why?

(FALSE && all(rep(1, 10^8) == 1))
(FALSE & all(rep(1, 10^8) == 1))
(all(rep(1, 10^8) == 1) && FALSE)
(all(rep(1, 10^8) == 1) & FALSE)

Take-away:

Iteration

Two types

For loops

Syntax:

for(x in vector) {
    ...
}

Rules:

So for example:

x <- 1:5
for(i in x) {
    print(i^2)
}
## [1] 1
## [1] 4
## [1] 9
## [1] 16
## [1] 25

As with all the other flow control elements, for loops can be nested.

We can use this to do something slightly more complicated:

d <- 1:5
D <- matrix(NA, nrow = length(d), ncol = length(d))
D
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   NA   NA   NA   NA   NA
## [2,]   NA   NA   NA   NA   NA
## [3,]   NA   NA   NA   NA   NA
## [4,]   NA   NA   NA   NA   NA
## [5,]   NA   NA   NA   NA   NA
for(i in 1:nrow(D)) {
    for(j in 1:ncol(D)) {
        if(i == j) {
            D[i,j] <- d[i]
        } else {
            D[i,j] <- 0
        }
    }
}
D
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    0    0    0    0
## [2,]    0    2    0    0    0
## [3,]    0    0    3    0    0
## [4,]    0    0    0    4    0
## [5,]    0    0    0    0    5

They can also be combined with the other flow control elements:

Don’t worry about this part, just data setup:

## install.packages("Lahman")
## install.packages("pacman")
library(Lahman)
library(pacman)
p_load(Lahman)

What the data looks like:

head(People)
##    playerID birthYear birthMonth birthDay birthCountry birthState  birthCity
## 1 aardsda01      1981         12       27          USA         CO     Denver
## 2 aaronha01      1934          2        5          USA         AL     Mobile
## 3 aaronto01      1939          8        5          USA         AL     Mobile
## 4  aasedo01      1954          9        8          USA         CA     Orange
## 5  abadan01      1972          8       25          USA         FL Palm Beach
## 6  abadfe01      1985         12       17         D.R.  La Romana  La Romana
##   deathYear deathMonth deathDay deathCountry deathState deathCity nameFirst
## 1        NA         NA       NA         <NA>       <NA>      <NA>     David
## 2      2021          1       22          USA         GA   Atlanta      Hank
## 3      1984          8       16          USA         GA   Atlanta    Tommie
## 4        NA         NA       NA         <NA>       <NA>      <NA>       Don
## 5        NA         NA       NA         <NA>       <NA>      <NA>      Andy
## 6        NA         NA       NA         <NA>       <NA>      <NA>  Fernando
##   nameLast        nameGiven weight height bats throws      debut  finalGame
## 1  Aardsma      David Allan    215     75    R      R 2004-04-06 2015-08-23
## 2    Aaron      Henry Louis    180     72    R      R 1954-04-13 1976-10-03
## 3    Aaron       Tommie Lee    190     75    R      R 1962-04-10 1971-09-26
## 4     Aase   Donald William    190     75    R      R 1977-07-26 1990-10-03
## 5     Abad    Fausto Andres    184     73    L      L 2001-09-10 2006-04-13
## 6     Abad Fernando Antonio    235     74    L      L 2010-07-28 2021-10-01
##    retroID   bbrefID  deathDate  birthDate
## 1 aardd001 aardsda01       <NA> 1981-12-27
## 2 aaroh101 aaronha01 2021-01-22 1934-02-05
## 3 aarot101 aaronto01 1984-08-16 1939-08-05
## 4 aased001  aasedo01       <NA> 1954-09-08
## 5 abada001  abadan01       <NA> 1972-08-25
## 6 abadf001  abadfe01       <NA> 1985-12-17

And finally a for loop: What am I doing here?

for(i in 1:nrow(People)) {
    if(!is.na(People$height[i]) && People$height[i] <= 62) {
        print(People[i,])
    }
}
##       playerID birthYear birthMonth birthDay birthCountry birthState birthCity
## 6300 gaedeed01      1925          6        8          USA         IL   Chicago
##      deathYear deathMonth deathDay deathCountry deathState deathCity nameFirst
## 6300      1961          6       18          USA         IL   Chicago     Eddie
##      nameLast   nameGiven weight height bats throws      debut  finalGame
## 6300   Gaedel Edward Carl     65     43    R      L 1951-08-19 1951-08-19
##       retroID   bbrefID  deathDate  birthDate
## 6300 gaede101 gaedeed01 1961-06-18 1925-06-08
##       playerID birthYear birthMonth birthDay birthCountry birthState birthCity
## 7928 healeto01      1853         NA       NA          USA         RI  Cranston
##      deathYear deathMonth deathDay deathCountry deathState deathCity nameFirst
## 7928      1891          2        6          USA         ME  Lewiston       Tom
##      nameLast nameGiven weight height bats throws      debut  finalGame
## 7928   Healey Thomas F.    155     55 <NA>      R 1878-06-13 1878-09-09
##       retroID   bbrefID  deathDate birthDate
## 7928 healt101 healeto01 1891-02-06      <NA>

Not a data problem: Eddie Gaedel

Question: does it matter whether we check for NA first?

While loops

Syntax:

while(condition) {
    ...
}

Rules:

If you don’t want your while loop to go forever, you have two options:

So for example, we could use a while loop to find the largest power of 2 less than 1000:

x <- 2
while(x * 2 < 1000) {
    x <- x * 2
}
x
## [1] 512

Or for a slightly less silly example, we could use it to answer a modified birthday problem.

Suppose we want to know how many classes filled with randomly selected individuals we would have to attend before we found one where there were at least two pairs of people with the same birthday.

We could go through the math, or we could get partway to an answer with a while loop.

Here we draw sets of birthdays for classes of size 20, assuming that there are 365 days in a year:

days_in_year <- 365
class_size <- 20
num_classes <- 0
while(TRUE) {
    num_classes <- num_classes + 1
    birthdays <- sample(1:days_in_year, class_size, replace = TRUE)
    num_birthdays_per_day <- table(birthdays)
    days_with_match <- num_birthdays_per_day >= 2
    num_days_with_match <- sum(days_with_match)
    if(num_days_with_match >= 2) {
        break
    }
}
num_classes
## [1] 15

Notes:

Vectorization

Most basic functions in R are vectorized, which means that they are applied to vectors element-by-element.

x <- rgamma(10, 1, .1)
x
##  [1] 13.88731699  0.38576359 14.78245675  0.01564977 17.45660198  6.29483769
##  [7] 11.87494519  7.47135398 25.78818403  6.68361488
log(x)
##  [1]  2.6309760 -0.9525305  2.6934411 -4.1572990  2.8597179  1.8397299
##  [7]  2.4744307  2.0110762  3.2499164  1.8996590
round(x)
##  [1] 14  0 15  0 17  6 12  7 26  7
floor(x)
##  [1] 13  0 14  0 17  6 11  7 25  6

More on vectorization and its advantages later. Why vectorization?

Compare:

for-loop way of computing the floor of all the elements in the vector x:

floor_of_x <- rep(NA, length(x)) ## pre-allocate a vector to hold our computations
for(i in 1:length(x)) {
    floor_of_x[i] <- floor(x[i])
}
floor_of_x
##  [1] 13  0 14  0 17  6 11  7 25  6

Versus the vectorized way:

floor(x)
##  [1] 13  0 14  0 17  6 11  7 25  6

Vectorized conditionals

Suppose we want to plot the following function.

\[ f(x) = \begin{cases} \frac{15}{16} (1 - x^2)^2 & |x| < 1\\ 0 & \text{o.w.} \end{cases} \]

Take 1:

x <- seq(-2, 2, length = 200) ## a vector with the values at which we want to evaluate f
fx <- rep(NA, 200) ## pre-allocate a vector in which to store the values of f(x)
for(i in 1:200) {
    if(abs(x[i]) < 1) {
        fx[i] <- 15/16 * (1 - x[i]^2)^2
    } else {
        fx[i] <- 0
    }
}
plot(fx ~ x, type = 'l')

ifelse: Vectorized conditionals

ifelse syntax:

ifelse(condition, yes, no)

Rules:

ifelse goes element-by-element through condition, yes, and no.

Take 2:

x <- seq(-2, 2, length.out = 200)
y <- ifelse(abs(x) < 1, 15/16 * (1 - x^2)^2, 0)
plot(y ~ x, type = 'l')