Chapter 5 Conditions and Loops
In programming, controlling the flow of execution is an essential task. When you start to work on more sophisticated programs with R, you will often need to control the flow and order the execution in your code. This is where the concept of flow control comes in.
There are two types of typical scenarios where flow control is used:
- Execute some parts of the code or skip others, based on certain criteria or conditions (i.e., an expression that evaluates to
TRUE
orFALSE
) - Repeat a particular code chunk a certain number of times, which is often referred to as loops.
In this Chapter, we will explore these core programming techniques using:
if-else
statementsfor
andwhile
loops
However, in Chapter 11, we will talk about loops more and point you to the idiomatic ways of dealing with loops in R (i.e., with Tidyverse syntax).
5.1 Conditions
Before we talk about the flow controls, let’s deal with the more fundamental element of the control structures: what is a condition? To that end, let’s first explore a few important concepts:
- Boolean Values
- Comparison Operators
- Boolean Operators
Boolean Values
Unlike numbers or characters, the Boolean data type has only two values, TRUE
and FALSE
. In R, the Boolean values TRUE
and FALSE
lack the quotes you place around strings, and they are always uppercase.
[1] "logical"
Comparison Operators
Comparison Operators are important in programming. They compare two values and evaluate down to a single Boolean value. They are also referred to as relational operators.
Operator | Meaning |
---|---|
== |
Equal to |
!= |
Not equal to |
> |
Greater than |
< |
Less than |
>= |
Greater than or equal to |
<= |
Less than or equal to |
X %in% Y |
X Is a member of Y | |
These comparison operators return TRUE
or FALSE
depending on the values we give them.
[1] TRUE
[1] FALSE
[1] TRUE
Please note that these operators work not only with numbers but also characters as well.
[1] TRUE
[1] FALSE
[1] TRUE
When using comparison operators, be careful of the data types (i.e., numeric or character).
[1] TRUE
[1] TRUE
[1] TRUE
Please note from the above example that R does implicit data type conversion, which means it automatically converts one data type to another as needed. This can occur in a variety of situations, such as when performing arithmetic operations on different types of data or when assigning a value of one data type to a variable of another data type.
Please note the difference between ==
and =
. The ==
operator asks whether two objects are the same as each other while the =
operator assigns the value/object on the right into the variable name on the left. In R, the preferred way of assignment is to use <-
operator instead.
Boolean Operators
There are four Boolean operators in R to combine two conditions: &
(AND), |
(inclusive OR), xor()
(exclusive OR), and !
(NOT). That is, we can combine two or more conditions with these operators for more complex conditions.
[1] TRUE
[1] FALSE
[1] TRUE
[1] FALSE
## Create objects
a <- c(TRUE, FALSE, TRUE, FALSE)
b <- c(FALSE, TRUE, TRUE, FALSE)
## Exclusive vs. Inclusive OR
xor(a, b)
[1] TRUE TRUE FALSE FALSE
[1] TRUE TRUE TRUE FALSE
Can you analyze the following given piece of R code and predict its output before execution?
# Exclusive OR
x <- c(1, 2, 3)
y <- c(1, 3, 5)
z <- xor(x %in% y, y %in% x)
print(z)
# Inclusive OR
z2 <- x %in% y | y %in% x
print(z2)
The Boolean operators follow the order of operations in math. That is, R evaluates the !
(not) operator first, then the &
(and) operator, and then the |
(or) operators.
[1] FALSE
[1] TRUE
Elements of Flow Control
Flow control statements often start with a condition and are followed by a block of code. A quick recap:
- Conditions: Any Boolean expressions can be a potential condition, which evaluates down to a Boolean value (i.e.,
TRUE
orFALSE
). A flow control statement decides what to do and what to skip based on whether the condition is TRUE or FALSE. - Block of Code: Lines of R code can be grouped together in blocks, using initial and ending curly brackets
{
and}
. The beginning and ending of the block of code are clearly indicated.
5.2 if
Statements
The main purpose of if
is to control precisely which operations are carried out in a given code chunk.
An if
statement runs a code chunk only if a certain condition is true. This conditional expression allows R to respond differently depending on whether the condition is TRUE or FALSE.
The basic template of if
is as follows:
## Simple Binary Condition
if(CONDITION IS TRUE){
DO THIS CODE CHUNK 1
} else {
DO THIS CODE CHUNK 2
} ## endif
## More Complex Conditions
if (condition1) {
expr1
} else if (condition2) {
expr2
} else if (condition3) {
expr3
} else {
expr4
} ## endif
The condition is placed in the parenthesis after
if
within()
.The condition must be an expression that returns only a single logical value (
TRUE
orFALSE
).If it is
TRUE
, the code chunk 1 in the curly braces will be executed; if the condition is not satisfied, the code chunk 2 in the curly braces afterelse
will be executed.
Suppose you want to simulate a password checking system. The system requires you to enter a password to access it. The system has a password stored on the server. The gatekeeper of the system will check whether your input password matches the one stored on the server. If your input password does not match the one on the server, you will be banned from accessing the system.
correct_passcode <- 987 ## system correct passcode
input <- 113 ## assuming that you have the input 113
## Passcode checking & output
if(input == correct_passcode){
writeLines("Congratulations! Now you may get in!")
} else{
writeLines("Sorry! Wrong password.")
}
Sorry! Wrong password.
If the input matches the system passcode, you will be granted access to the system.
correct_passcode <- 987 ## system correct passcode
input <- 987 ## assuming that you have the input 987
## Passcode checking & output
if(input == correct_passcode){
writeLines("Congratulations! Now you may get in!")
} else{
writeLines("Sorry! Wrong password.")
}
Congratulations! Now you may get in!
Now we can ask R to prompt the user to input data directly in the R console, using the readline()
function. (Note: Please run the entire code chunk all at once.)
## ------------------------------------- ##
## Run the entire code chunk all at once ##
## ------------------------------------- ##
## System Correct Passcode
correct_passcode <- 987
## User's input
input <- readline(prompt = "Please enter your password:")
## Passcode checking & output
if (input == correct_passcode) {
writeLines("Congratulations! Now you may get in!")
} else {
writeLines("Sorry! Wrong password.")
} ## endif
In R, there are two very similar functions, readline()
and readLines()
. Please check the documentations of these two to make sure you understand their differences.
readline()
is used to read a single line of input from the user in the R console. It prompts the user for input and waits for them to enter a response, then returns that response as a character string.readLines()
is used to read multiple lines of input from a file or other external source. It reads in all the lines of a file or input stream and returns them as a character vector.
5.3 for
The for
loop is a core programming construct in R, which allows you to repeat a code chunk a certain number of times.
Typically, you would use a for
loop to iterate over elements of a vector, list or data frame and/or perform a certain operation a fixed number of times.
The general structure of the for
loop statement is to repeat the operation included a code chunk while incrementing an index or a counter.
The basic for
loop template is as follows:
The LOOP_INDEX is a placeholder that represents an element in the LOOP_VECTOR.
When the loop begins, the LOOP_INDEX starts off as the first element in the LOOP_VECTOR.
When the loop reaches the end of the brace, the LOOP_INDEX is incremented, taking on the next element in the LOOP_VECTOR.
This process continues until the loop reaches the final element of the LOOP_VECTOR. At this point, the code chunk is executed for the last time, and the loop exits.
For example, if we have a character vector with a few words in it. We can use a for
loop to get the number of characters for each element (i.e., word) in the vector.
word_vec <- c("apple","banana","watermelon","papaya")
for(w in word_vec){
word_nchar <- nchar(w)
writeLines(as.character(word_nchar))
}
5
6
10
6
For the above example, there is another way to write the for
loop:
for(i in 1:length(word_vec)){
word_nchar <- nchar(word_vec[i])
writeLines(as.character(word_nchar))
}
5
6
10
6
- In our first example, the LOOP_INDEX serves as the exact element object in the LOOP_VECTOR.
- In our second example, the LOOP_INDEX serves as the index of the element object in the LOOP_VECTOR.
Try the following code chunk and examine the differences in the outputs between print()
and writeLines()
. Do you know why the first for-loop does not work?
# writeLines() (This for-loop does not work!!)
for(i in 1:length(word_vec)){
word_nchar <- nchar(word_vec[i])
writeLines(word_nchar)
}
# `print()`
for(i in 1:length(word_vec)){
word_nchar <- nchar(word_vec[i])
print(word_nchar)
}
Please note that the control structures are to direct the flow of execution of the codes. Therefore, the control structure itself DOES NOT return any object. That is, you CANNOT assign a for-loop structure to an object. It is NOT meaningful and grammatical. The following code chunk would give you an error.
Exercise 5.1 Use the data set from stringr::sentences
for this exercise. Create a for-loop structure to get the number of characters for all sentences in the stringr::sentences
. Present your results in a data frame of three columns:
- Column 1 includes an unique integer ID for each sentence;
- Column 2 includes the texts of each sentence;
- Column 3 includes the number of characters of each sentence.
Note: A for-loop structure is needed for this exercise. But it may not necessarily be the most efficient way.
The data set stringr::sentences
includes 720 English sentences. The first six sentences are shown here.
[1] "The birch canoe slid on the smooth planks."
[2] "Glue the sheet to the dark blue background."
[3] "It's easy to tell the depth of a well."
[4] "These days a chicken leg is a rare dish."
[5] "Rice is often served in round bowls."
[6] "The juice of lemons makes fine punch."
Your results may look like the following data frame:
Exercise 5.2 Use the data set stringr::fruit
for this exercise. Create a for-loop
and if
-statement to instruct R to go through each fruit name and print out those fruit names that start with the vowel letters only (i.e., a, e, i, o, u).
[1] "apple"
[1] "apricot"
[1] "avocado"
[1] "eggplant"
[1] "elderberry"
[1] "olive"
[1] "orange"
[1] "ugli fruit"
5.4 while
loop
There is another type of loop. Unlike the for
loop, which repeats a code chunk by going through every element in a vector/list/data frame
, the while
loop repeats a code chunk UNTIL a specific condition evaluates to FALSE
(It’s like the opposite of if
-statement)
The basic template is as follows:
- Upon the start of a
while
loop, the LOOP_CONDITION is evaluated. - If the condition is
TRUE
, the braced code chunk is executed line by line till the end of the chunk. - At this point, the LOOP_CONDITION will be checked again.
- The loop terminates immediately when the condition is evaluated to be
FALSE
. That is, the code chunk will be repeated as long as the condition is true.
Based on the template above, it is important to note that the code chunk executed must somehow cause the loop to exit.
In particular, the code chunk needs to change the values of certain objects, which would eventually lead to the change of the LOOP_CONDITION.
If the LOOP_CONDITION never evaluates to FALSE
, the loop will continue infinitely, and R will crash. To avoid this, the code chunk must modify the values of certain objects that affect the loop condition. This way, as the loop runs, the loop condition will eventually be met, and the loop will exit.
Let’s come back to our password checker. This time let’s create a dumb password checker, i.e., app_v1()
.
The function app_v1()
simulates a very basic (and intentionally flawed) password guessing system. When the user provides an incorrect guess that is smaller than the system’s actual answer, the system automatically increments the user’s guess until it matches the correct password (of course, this is not something any real-world application would actually do!)
## -------------------------------------------- ##
## Please run the entire code chunk all at once! ##
## -------------------------------------------- ##
## Create a program
app_v1 <- function(guess = 83) {
## Let's assume that system default ans is 90
ans <- 90
## checking user's guess
while (guess != ans) {
cat("Your `guess` is too small! \nThe system will take care for you!\n")
guess <- guess + 1
cat("Now the system is adjusting your `guess` to ", guess, "\n\n")
} ## endwhile
## got the answer
cat('Great! The passcode is finally cracked.\n')
} ## endfunc
## Run the program
app_v1(guess = 83)
Your `guess` is too small!
The system will take care for you!
Now the system is adjusting your `guess` to 84
Your `guess` is too small!
The system will take care for you!
Now the system is adjusting your `guess` to 85
Your `guess` is too small!
The system will take care for you!
Now the system is adjusting your `guess` to 86
Your `guess` is too small!
The system will take care for you!
Now the system is adjusting your `guess` to 87
Your `guess` is too small!
The system will take care for you!
Now the system is adjusting your `guess` to 88
Your `guess` is too small!
The system will take care for you!
Now the system is adjusting your `guess` to 89
Your `guess` is too small!
The system will take care for you!
Now the system is adjusting your `guess` to 90
Great! The passcode is finally cracked.
Exercise 5.3 The current implementation of app_v1()
works fine when the guess
value is smaller than the actual ans
value. However, if the guess
value is larger than the actual ans
value, the script will enter an infinite loop and crash.
To solve this issue, we need to create an updated version of the program called app_v2()
, which should automatically adjust the guessed number to approach the actual answer. This adjustment can be done by incrementally adding or subtracting one from the guessed number until it matches the actual answer.
The updated app_v2()
program should take two parameters as input: ans
and guess.
It should then compare the guess
value to the actual ans
value. If the guess value is equal to ans, the program should print a message stating that the guess is correct. Examples are provided below.
NB: We assume that the system default password remains 90.
Your `guess` is too small!
The system will take care for you!
Now the system is adjusting your `guess` to 88
Your `guess` is too small!
The system will take care for you!
Now the system is adjusting your `guess` to 89
Your `guess` is too small!
The system will take care for you!
Now the system is adjusting your `guess` to 90
Your `guess` is too large!
The system will take care for you!
Now the system is adjusting your `guess` to 92
Your `guess` is too large!
The system will take care for you!
Now the system is adjusting your `guess` to 91
Your `guess` is too large!
The system will take care for you!
Now the system is adjusting your `guess` to 90
This is spooky. Your guess is correct!!
5.5 Toy Example
Now we are playing the Guess Game. The game is as follows:
The program will pick a random number from 1 to 100. A user has to guess which number the computer has picked. Every time the user makes a wrong guess, the computer will tell the user whether the correct answer is higher or lower.
We first pack the game as a small app, i.e., an R function
object:
## -------------------------------------------- ##
## Please run the entire code chunk all at once! ##
## -------------------------------------------- ##
## Create a function object
guessMyNumber_v1 <- function() {
## Randomly select an integer from 1 to 100
ans <- sample(1:100, size = 1)
## Initialize the variable `guess`
guess <- -1
## Instructions for user
writeLines("Now I am thinking of a number between 1 and 100.")
## As long as user's guess is not the answer
while (guess != ans) {
## Read the prompt input from user
guess <- readline(prompt = "Please guess my number(1~100):")
## Convert input string into integer
guess <- as.numeric(guess)
## if user's guess is smaller than the answer
if (guess < ans) {
writeLines("The answer is HIGHER.")
## if user's guess is larger than the answer
} else if (guess > ans) {
writeLines("The answer is LOWER")
## correct guess
} else{
writeLines(paste0("Good Job! You had the correct answer! My number is ", guess))
} ## endif
} ## endwhile
} ## endfunc
Please note that in the above function definition, we have included a line of code to initialize the guess
to -1 before the while loop:
## Initialize the variable `guess`
guess <- -1
This step is crucial because the variable guess
is not defined before it is used in the while loop condition. The code tries to compare guess
with ans
but guess
has not been defined yet. This results in an error message such as Error in while (guess != ans) : missing value where TRUE/FALSE needed
.
To fix the issue, the guess
variable needs to be initialized to some value before the while loop. For example, a default value could be set for the initial guess.
And guess
is initialized to -1 instead of any other number because the range of possible answers is 1 to 100, and -1 is not a valid guess within this range.
Initializing guess
to -1 guarantees that the loop will run at least once, as the default first guess (-1) will always be different from the value of ans
, which is randomly selected from the range of valid guesses.
Once the user enters a valid guess
, guess
is updated to that value, and the loop continues until the user’s guess
matches the value of ans.
After you load the above code chunk and create the guessMyNumber()
function in your current R environment, you can play the game by running the function guessMyNumber()
:
The above code demonstrates how to create a guessing game in R using a function called guessMyNumber_v1()
. Initially, the function does not accept any parameters. However, one can create a function with a parameter (e.g., guess
), which allows the user to specify their initial guess for the game. For example:
## -------------------------------------------- ##
## Please run the entire code chunk all at once! ##
## -------------------------------------------- ##
## Create a function object
guessMyNumber_v2 <- function(guess) {
## Randomly select an integer from 1 to 100
ans <- sample(1:100, size = 1)
## Instructions for user
writeLines("Now I am thinking of a number between 1 and 100.")
## As long as user's guess is not the answer
while (guess != ans) {
## Read the prompt input from user
guess <- readline(prompt = "Please guess my number(1~100):")
## Convert input string into integer
guess <- as.numeric(guess)
## if user's guess is smaller than the answer
if (guess < ans) {
writeLines("The answer is HIGHER.")
## if user's guess is larger than the answer
} else if (guess > ans) {
writeLines("The answer is LOWER")
## correct guess
} else{
writeLines(paste0("Good Job! You had the correct answer! My number is ", guess))
} ## endif
} ## endwhile
} ## endfunc
The revised function guessMyNumber_v2()
now accepts a parameter guess
that enables the user to specify the initial guess value when they start playing the game. The user can start the game by running the function and specifying the initial guess value, like this: guessMyNumber_v2(guess = 15)
. However, the revised function has two potential issues:
- It is not very intuitive to ask the user to provide a guess value when they have not even started the game yet. (They probably do not even know the objective of the game.)
- If the user specifies a guess value that happens to be the computer’s selection, the game will not even start.
To address these issues, we may also try the following revision (guessMyNumber_v3()
):
## -------------------------------------------- ##
## Please run the entire code chunk all at once! ##
## -------------------------------------------- ##
## Create a function object
guessMyNumber_v3 <- function(guess = -1) {
## Randomly select an integer from 1 to 100
ans <- sample(1:100, size = 1)
## Instructions for user
writeLines("Now I am thinking of a number between 1 and 100.")
## As long as user's guess is not the answer
while (guess != ans) {
## Read the prompt input from user
guess <- readline(prompt = "Please guess my number(1~100):")
## Convert input string into integer
guess <- as.numeric(guess)
## if user's guess is smaller than the answer
if (guess < ans) {
writeLines("The answer is HIGHER.")
## if user's guess is larger than the answer
} else if (guess > ans) {
writeLines("The answer is LOWER")
## correct guess
} else{
writeLines(paste0("Good Job! You had the correct answer! My number is ", guess))
} ## endif
} ## endwhile
} ## endfunc
The revised guessMyNumber_v3()
provides two possibilities for the user to start the game. The user can either start the game with a random guess (by specifying the value for the parameter guess), e.g., guessMyNumber_v3(guess = 15)
, or start the game by accepting the default setting of the parameter guess (e.g., -1
), e.g., guessMyNumber_v3()
.
The decision of whether or not to include parameters in a program or function ultimately depends on the developer’s design goals and the intended use of the program by its users. Parameters allow for greater flexibility and customization, as users can provide input values to the function to modify its behavior.
However, sometimes the program may be designed to have a fixed set of inputs and outputs, in which case parameters may not be necessary. The design choice should prioritize the usability and user experience of the program, taking into consideration the user’s level of expertise, potential use cases, and any potential issues that may arise from the inclusion or exclusion of parameters.
Exercise 5.4 The above guessMyNumber_v1()
can be improved. Sometimes naughty (careless) users would not input numbers as requested. Instead, they may accidentally (or on purpose) enter characters that are NOT digits at all in their guesses or digits that are not within the range of 1 to 100.
Please improve the guessMyNumber_v1()
function in R to handle non-digit and out-of-range inputs from users. If a user enters non-digit characters or digits outside the range of 1 to 100, the program should send a warning message. Also, once the user guesses the correct answer, the program should record the number of guesses the user made in total. The desired output of the revised function should be similar to the example output provided.
> guessMyNumber_v4()
Now I am thinking of a number between 1 and 100.
Please guess my number(1~100):%
Invalid guess. Please enter a number between 1 and 100.
Please guess my number(1~100):1000000
Invalid guess. Please enter a number between 1 and 100.
Please guess my number(1~100):-100
Invalid guess. Please enter a number between 1 and 100.
Please guess my number(1~100):50
The answer is LOWER
Please guess my number(1~100):25
The answer is HIGHER.
Please guess my number(1~100):30
Good Job!
After 3 guess(es), you got the correct answer! My number is 30!
>