Stat 610 Lecture 2: Flow control and looping

Flow control and looping

Reading: Matloff Chapter 7.1, 7.2, 2.9

Last time: Data structures, so that we have something to work on.

This time: Flow control, so we can actually do things.

if statements

Syntax

if (condition) {
    action1
} else {
    action2
}

So for example:

weather = "sunny"
if(weather == "rainy") {
    print("Take your umbrella!")
} else {
    print("No need for an umbrella today...")
}
## [1] "No need for an umbrella today..."

You can make more complicated conditions using either else if or nested if statements:

weather = "cloudy"
if(weather == "rainy") {
    print("Take your umbrella!")
} else if (weather == "cloudy") {
    print("Think about taking your umbrella")
} else {
    print("No need for an umbrella today...")
}
## [1] "Think about taking your umbrella"

Some rules:

Combining booleans and lazy evaluation

We often want to combine conditions, which we can do with boolean operations.

Like all other languages, R has AND and OR functions, but unlike some other languages it has two of each.

So for example:

steak_type = "med_rare"
temp = 130
if(steak_type == "rare" & temp > 115) {
    print("take your steak off!")
} else if(steak_type == "med_rare" & temp > 125) {
    print("take your steak off!")        
} else {
    print("you can keep cooking")
}
## [1] "take your steak off!"

NB: As we’ll see in two slides, & works here but it would be better to use &&.

Or, in not so dire a situation:

steak_type = "rare"
temp = 110
if(steak_type == "rare" && temp > 115) {
    print("take your steak off!")
} else if(steak_type == "med_rare" && temp > 125) {
    print("take your steak off!")        
} else {
    print("you can keep cooking")
}
## [1] "you can keep cooking"

What is the difference between the two?

Lazy evaluation:

Try this on your computer. Which ones are fast and which ones are slow? Why?

(FALSE && all(rep(1, 10^8) == 1))
(FALSE & all(rep(1, 10^8) == 1))
(all(rep(1, 10^8) == 1) && FALSE)
(all(rep(1, 10^8) == 1) & FALSE)

Take-away:

Iteration

Two types

For loops

Syntax:

for(x in vector) {
    ...
}

Rules:

So for example:

x = 1:5
for(i in x) {
    print(i^2)
}
## [1] 1
## [1] 4
## [1] 9
## [1] 16
## [1] 25

As with all the other flow control elements, for loops can be nested.

We can use this to do something slightly more complicated:

d = 1:5
D = matrix(NA, nrow = length(d), ncol = length(d))
D
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   NA   NA   NA   NA   NA
## [2,]   NA   NA   NA   NA   NA
## [3,]   NA   NA   NA   NA   NA
## [4,]   NA   NA   NA   NA   NA
## [5,]   NA   NA   NA   NA   NA
for(i in 1:nrow(D)) {
    for(j in 1:ncol(D)) {
        if(i == j) {
            D[i,j] = d[i]
        } else {
            D[i,j] = 0
        }
    }
}
D
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    0    0    0    0
## [2,]    0    2    0    0    0
## [3,]    0    0    3    0    0
## [4,]    0    0    0    4    0
## [5,]    0    0    0    0    5

They can also be combined with the other flow control elements:

Don’t worry about this part, just data setup:

## install.packages("Lahman")
## install.packages("pacman")
library(Lahman)
library(pacman)
p_load(Lahman)

What the data looks like:

head(Master)
##    playerID birthYear birthMonth birthDay birthCountry birthState
## 1 aardsda01      1981         12       27          USA         CO
## 2 aaronha01      1934          2        5          USA         AL
## 3 aaronto01      1939          8        5          USA         AL
## 4  aasedo01      1954          9        8          USA         CA
## 5  abadan01      1972          8       25          USA         FL
## 6  abadfe01      1985         12       17         D.R.  La Romana
##    birthCity deathYear deathMonth deathDay deathCountry deathState
## 1     Denver        NA         NA       NA         <NA>       <NA>
## 2     Mobile        NA         NA       NA         <NA>       <NA>
## 3     Mobile      1984          8       16          USA         GA
## 4     Orange        NA         NA       NA         <NA>       <NA>
## 5 Palm Beach        NA         NA       NA         <NA>       <NA>
## 6  La Romana        NA         NA       NA         <NA>       <NA>
##   deathCity nameFirst nameLast        nameGiven weight height bats throws
## 1      <NA>     David  Aardsma      David Allan    215     75    R      R
## 2      <NA>      Hank    Aaron      Henry Louis    180     72    R      R
## 3   Atlanta    Tommie    Aaron       Tommie Lee    190     75    R      R
## 4      <NA>       Don     Aase   Donald William    190     75    R      R
## 5      <NA>      Andy     Abad    Fausto Andres    184     73    L      L
## 6      <NA>  Fernando     Abad Fernando Antonio    220     73    L      L
##        debut  finalGame  retroID   bbrefID  deathDate  birthDate
## 1 2004-04-06 2015-08-23 aardd001 aardsda01       <NA> 1981-12-27
## 2 1954-04-13 1976-10-03 aaroh101 aaronha01       <NA> 1934-02-05
## 3 1962-04-10 1971-09-26 aarot101 aaronto01 1984-08-16 1939-08-05
## 4 1977-07-26 1990-10-03 aased001  aasedo01       <NA> 1954-09-08
## 5 2001-09-10 2006-04-13 abada001  abadan01       <NA> 1972-08-25
## 6 2010-07-28 2017-10-01 abadf001  abadfe01       <NA> 1985-12-17

And finally a for loop: What am I doing here?

for(i in 1:nrow(Master)) {
    if(!is.na(Master$height[i]) && Master$height[i] <= 62) {
        print(Master[i,])
    }
}
##       playerID birthYear birthMonth birthDay birthCountry birthState
## 5989 gaedeed01      1925          6        8          USA         IL
##      birthCity deathYear deathMonth deathDay deathCountry deathState
## 5989   Chicago      1961          6       18          USA         IL
##      deathCity nameFirst nameLast   nameGiven weight height bats throws
## 5989   Chicago     Eddie   Gaedel Edward Carl     65     43    R      L
##           debut  finalGame  retroID   bbrefID  deathDate  birthDate
## 5989 1951-08-19 1951-08-19 gaede101 gaedeed01 1961-06-18 1925-06-08
##       playerID birthYear birthMonth birthDay birthCountry birthState
## 7536 healeto01      1853         NA       NA          USA         RI
##      birthCity deathYear deathMonth deathDay deathCountry deathState
## 7536  Cranston      1891          2        6          USA         ME
##      deathCity nameFirst nameLast nameGiven weight height bats throws
## 7536  Lewiston       Tom   Healey Thomas F.    155     55 <NA>      R
##           debut  finalGame  retroID   bbrefID  deathDate birthDate
## 7536 1878-06-13 1878-09-09 healt101 healeto01 1891-02-06      <NA>

Not a data problem: Eddie Gaedel

For you to think about on your own: does it matter whether we check for NA first? What could potentially happen if we check for height first instead?

While loops

Syntax:

while(condition) {
    ...
}

Rules:

If you don’t want your while loop to go forever, you have two options:

So for example, we could use a while loop to find the largest power of 2 less than 1000:

x = 2
while(x * 2 < 1000) {
    x = x * 2
}
x
## [1] 512

Or for a slightly less silly example, we could use it to answer a modified birthday problem.

Suppose we want to know how many classes filled with randomly selected individuals we would have to attend before we found one where there were at least two pairs of people with the same birthday.

We could go through the math, or we could get partway to an answer with a while loop.

Here we draw sets of birthdays for classes of size 20, assuming that there are 365 days in a year:

days_in_year = 365
class_size = 20
num_classes = 0
while(TRUE) {
    num_classes = num_classes + 1
    birthdays = sample(1:days_in_year, class_size, replace = TRUE)
    num_birthdays_per_day = table(birthdays)
    days_with_match = num_birthdays_per_day >= 2
    num_days_with_match = sum(days_with_match)
    if(num_days_with_match >= 2) {
        break
    }
}
num_classes
## [1] 6

Notes:

Vectorization

Most basic functions in R are vectorized, which means that they are applied to vectors element-by-element.

x = rgamma(10, 1, .1)
x
##  [1]  2.0550638  4.9118493  9.1955684  7.4509114 14.5696261 17.1481348
##  [7]  0.4875719  2.3505114 14.1177033  2.0490359
log(x)
##  [1]  0.7203069  1.5916505  2.2187217  2.0083364  2.6789390  2.8418894
##  [7] -0.7183175  0.8546329  2.6474296  0.7173694
round(x)
##  [1]  2  5  9  7 15 17  0  2 14  2
floor(x)
##  [1]  2  4  9  7 14 17  0  2 14  2

More on vectorization and its advantages later. Why vectorization?

Compare:

for-loop way of computing the floor of all the elements in the vector x:

floor_of_x = rep(NA, length(x)) ## pre-allocate a vector to hold our computations
for(i in 1:length(x)) {
    floor_of_x[i] = floor(x[i])
}
floor_of_x
##  [1]  2  4  9  7 14 17  0  2 14  2

Versus the vectorized way:

floor(x)
##  [1]  2  4  9  7 14 17  0  2 14  2

Vectorized conditionals

Suppose we want to plot the following function.

\[ f(x) = \begin{cases} \frac{15}{16} (1 - x^2)^2 & |x| < 1\\ 0 & \text{o.w.} \end{cases} \]

Take 1:

x = seq(-2, 2, length = 200) ## a vector with the values at which we want to evaluate f
fx = rep(NA, 200) ## pre-allocate a vector in which to store the values of f(x)
for(i in 1:200) {
    if(abs(x[i]) < 1) {
        fx[i] = 15/16 * (1 - x[i]^2)^2
    } else {
        fx[i] = 0
    }
}
plot(fx ~ x, type = 'l')

ifelse: Vectorized conditionals

ifelse syntax:

ifelse(condition, yes, no)

Rules:

ifelse goes element-by-element through condition, yes, and no.

Take 2:

x = seq(-2, 2, length.out = 200)
y = ifelse(abs(x) < 1, 15/16 * (1 - x^2)^2, 0)
plot(y ~ x, type = 'l')

Homework

I’ll post a homework on Sunday.

You’ll be able to start on it with the material we’ve covered so far, but it will also cover the text manipulation material we’ll go through next week.