Reading: Matloff Chapter 7.1, 7.2, 2.9
Last time/finish up today: Data structures, so that we have something to work on.
This time: Flow control, so we can actually do things.
Conditionals
Iteration
Vectorization
Syntax
You can make more complicated conditions using either
else if
or nested if
statements:
Some rules:
if
requires one boolean, not a vector. It
will throw a warning if you give it a vector, but it will evaluated
based on just the first element of that vector.
else
is optional
If the action is just one line, you don’t need the braces (but you should be consistent about this: choose a way you like and stick to it).
We often want to combine conditions, which we can do with boolean operations.
Like all other languages, R has AND and OR functions, but unlike some other languages it has two of each.
&
and &&
both mean
AND.
|
and ||
both mean OR.
So for example:
if(steak_type == "rare" & temp > 115) {
print("take your steak off!")
} else if(steak_type == "med_rare" & temp > 125) {
print("take your steak off!")
} else {
print("you can keep cooking")
}
## [1] "take your steak off!"
NB: As we’ll see in two slides, &
works here but it
would be better to use &&
.
Or, in not so dire a situation:
What is the difference between the two?
&&
or ||
on a pair of
vectors with length longer than 1, the expression will be evaluated on
the first element of the vector.&&
and ||
also support lazy
evaluation.Lazy evaluation:
FALSE
followed by &&
doesn’t
evaluate the second expression.
TRUE
followed by ||
doesn’t evaluate
the second expression.
This will occasionally make your code faster, so if you remember
to use &&
and ||
for flow control and
put the expressions that are simpler to evaluate first, you will
occasionally get performance improvements.
Try this on your computer. Which ones are fast and which ones are slow? Why?
Take-away:
Use &&
and ||
for flow
control.
Use &
and |
for operations on
vectors.
Put simpler operations first when using &&
and ||
.
Two types
For loops: You know how many iterations you need in advance.
While loops: You’ll know when to stop when you see it, but you don’t know in advance.
Syntax:
Rules:
vector
is a vector
x
is a variable, which will be set first to
vector[1]
, then to vector[2]
, and so on, up to
vector[n]
, where n
is the length of
vector
.
The actions inside {
and }
will be
performed for each value of x
.
As with all the other flow control elements, for loops can be nested.
We can use this to do something slightly more complicated:
## [,1] [,2] [,3] [,4] [,5]
## [1,] NA NA NA NA NA
## [2,] NA NA NA NA NA
## [3,] NA NA NA NA NA
## [4,] NA NA NA NA NA
## [5,] NA NA NA NA NA
for(i in 1:nrow(D)) {
for(j in 1:ncol(D)) {
if(i == j) {
D[i,j] <- d[i]
} else {
D[i,j] <- 0
}
}
}
D
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 0 0 0 0
## [2,] 0 2 0 0 0
## [3,] 0 0 3 0 0
## [4,] 0 0 0 4 0
## [5,] 0 0 0 0 5
They can also be combined with the other flow control elements:
Don’t worry about this part, just data setup:
## install.packages("Lahman")
## install.packages("pacman")
library(Lahman)
library(pacman)
p_load(Lahman)
What the data looks like:
## playerID birthYear birthMonth birthDay birthCountry birthState birthCity
## 1 aardsda01 1981 12 27 USA CO Denver
## 2 aaronha01 1934 2 5 USA AL Mobile
## 3 aaronto01 1939 8 5 USA AL Mobile
## 4 aasedo01 1954 9 8 USA CA Orange
## 5 abadan01 1972 8 25 USA FL Palm Beach
## 6 abadfe01 1985 12 17 D.R. La Romana La Romana
## deathYear deathMonth deathDay deathCountry deathState deathCity nameFirst
## 1 NA NA NA <NA> <NA> <NA> David
## 2 2021 1 22 USA GA Atlanta Hank
## 3 1984 8 16 USA GA Atlanta Tommie
## 4 NA NA NA <NA> <NA> <NA> Don
## 5 NA NA NA <NA> <NA> <NA> Andy
## 6 NA NA NA <NA> <NA> <NA> Fernando
## nameLast nameGiven weight height bats throws debut finalGame
## 1 Aardsma David Allan 215 75 R R 2004-04-06 2015-08-23
## 2 Aaron Henry Louis 180 72 R R 1954-04-13 1976-10-03
## 3 Aaron Tommie Lee 190 75 R R 1962-04-10 1971-09-26
## 4 Aase Donald William 190 75 R R 1977-07-26 1990-10-03
## 5 Abad Fausto Andres 184 73 L L 2001-09-10 2006-04-13
## 6 Abad Fernando Antonio 235 74 L L 2010-07-28 2021-10-01
## retroID bbrefID deathDate birthDate
## 1 aardd001 aardsda01 <NA> 1981-12-27
## 2 aaroh101 aaronha01 2021-01-22 1934-02-05
## 3 aarot101 aaronto01 1984-08-16 1939-08-05
## 4 aased001 aasedo01 <NA> 1954-09-08
## 5 abada001 abadan01 <NA> 1972-08-25
## 6 abadf001 abadfe01 <NA> 1985-12-17
And finally a for loop: What am I doing here?
for(i in 1:nrow(People)) {
if(!is.na(People$height[i]) && People$height[i] <= 62) {
print(People[i,])
}
}
## playerID birthYear birthMonth birthDay birthCountry birthState birthCity
## 6300 gaedeed01 1925 6 8 USA IL Chicago
## deathYear deathMonth deathDay deathCountry deathState deathCity nameFirst
## 6300 1961 6 18 USA IL Chicago Eddie
## nameLast nameGiven weight height bats throws debut finalGame
## 6300 Gaedel Edward Carl 65 43 R L 1951-08-19 1951-08-19
## retroID bbrefID deathDate birthDate
## 6300 gaede101 gaedeed01 1961-06-18 1925-06-08
## playerID birthYear birthMonth birthDay birthCountry birthState birthCity
## 7928 healeto01 1853 NA NA USA RI Cranston
## deathYear deathMonth deathDay deathCountry deathState deathCity nameFirst
## 7928 1891 2 6 USA ME Lewiston Tom
## nameLast nameGiven weight height bats throws debut finalGame
## 7928 Healey Thomas F. 155 55 <NA> R 1878-06-13 1878-09-09
## retroID bbrefID deathDate birthDate
## 7928 healt101 healeto01 1891-02-06 <NA>
Not a data problem: Eddie Gaedel
Question: does it matter whether we check for NA first?
Syntax:
Rules:
If condition
is TRUE
, the code inside
{
and }
will be evaluated.
After the code inside {
and }
is
evaluated, condition
is checked again, if it is still
TRUE
, we go again.
This repeats until condition
is FALSE.
If you don’t want your while loop to go forever, you have two options:
The value of condition
needs to eventually be set to
FALSE
by the code inside {
and
}
.
You have a break
statement inside the {
}
that eventually gets you out of the loop.
So for example, we could use a while loop to find the largest power of 2 less than 1000:
## [1] 512
Or for a slightly less silly example, we could use it to answer a modified birthday problem.
Suppose we want to know how many classes filled with randomly selected individuals we would have to attend before we found one where there were at least two pairs of people with the same birthday.
We could go through the math, or we could get partway to an answer with a while loop.
Here we draw sets of birthdays for classes of size 20, assuming that there are 365 days in a year:
days_in_year <- 365
class_size <- 20
num_classes <- 0
while(TRUE) {
num_classes <- num_classes + 1
birthdays <- sample(1:days_in_year, class_size, replace = TRUE)
num_birthdays_per_day <- table(birthdays)
days_with_match <- num_birthdays_per_day >= 2
num_days_with_match <- sum(days_with_match)
if(num_days_with_match >= 2) {
break
}
}
num_classes
## [1] 15
Notes:
The while(TRUE)
with a break
statement
is a common idiom for while loops.
break
can also be used in for loops.
Most basic functions in R are vectorized, which means that they are applied to vectors element-by-element.
We already saw this with vector operations.
Also true of many base functions.
## [1] 13.88731699 0.38576359 14.78245675 0.01564977 17.45660198 6.29483769
## [7] 11.87494519 7.47135398 25.78818403 6.68361488
## [1] 2.6309760 -0.9525305 2.6934411 -4.1572990 2.8597179 1.8397299
## [7] 2.4744307 2.0110762 3.2499164 1.8996590
## [1] 14 0 15 0 17 6 12 7 26 7
## [1] 13 0 14 0 17 6 11 7 25 6
More on vectorization and its advantages later. Why vectorization?
More readable code.
Instead of writing how you want the computer to perform the computations, you tell the computer what you want to do.
Less typing = less of an opportunity to introduce bugs.
Can be faster.
Compare:
for-loop way of computing the floor of all the elements in the vector
x
:
floor_of_x <- rep(NA, length(x)) ## pre-allocate a vector to hold our computations
for(i in 1:length(x)) {
floor_of_x[i] <- floor(x[i])
}
floor_of_x
## [1] 13 0 14 0 17 6 11 7 25 6
Suppose we want to plot the following function.
\[ f(x) = \begin{cases} \frac{15}{16} (1 - x^2)^2 & |x| < 1\\ 0 & \text{o.w.} \end{cases} \]
ifelse
syntax:
Rules:
ifelse
returns a vector
condition
is a vector of Booleans.
yes
and no
are vectors, should be of
the same type.
ifelse
goes element-by-element through
condition
, yes
, and no
.
The i
th element of the output is yes[i]
if condition[i]
is TRUE
The i
th element of the output is no[i]
if condition[i]
is FALSE