Chapter 6 Functions
We have been using R functions in the default base R package, such as c()
, list()
, sample()
. R provides a lot of useful built-in functions like these, but we can write our own task-specific functions as well. A function is like a mini-program within a program. In this unit, we discuss how to write functions in R.
6.1 Functions: Basics (Self-study)
6.1.1 A Quick Start
To better understand how a function works, let’s create a simple one. The following code chunk creates a function
object, named hello()
.
A function object includes several important elements:
- We use the
function()
keyword to define a new function object. - After the
function()
keyword is the code block ({...}
), which contains the body of the function. That is, the code block is the set of instructions that perform the desired task. - The body of the function
function(Parameter1, Parameter2, ...)
can take arguments (inputs) that are passed to the function when it is called (See next section.) - Every function is assigned to a user-defined name (e.g.,
hello
in the above example.)
Once a function is defined using the function()
keyword in R, the code within the body of the function will not be executed immediately.
Rather, the code will only be executed when the function is called. Whenever the function is called, it will execute the code within its body, carrying out the specific tasks defined in the function’s code block.
This allows for code reuse and modularity, as the same code can be called multiple times with different inputs, rather than having to rewrite the same code multiple times.
[1] "How are you doing?"
[1] "How are you doing?"
6.1.2 Why do we need functions?
A major advantage of creating functions in our programs is to group codes that get executed multiple times. Without a function defined, one may need to copy-and-paste same code chunks many times.
Second, with functions, it is easier to update the programs. We often try to avoid duplicating code because if we need to update the code (e.g., to fix a bug in the original code), we don’t have to change the code everywhere we have copied it.
In short, functions can greatly reduce the chances of duplicating code, rendering the programs shorter, easier to read and update.
6.1.3 Functions with parameters
When we use the built-in R functions like cat()
, length()
, or matrix()
, we can pass them values, called arguments, in the parentheses.
That is, some functions have parameters and users can pass values to each parameter as arguments.
In our self-defined functions, we can also define a function which accepts arguments.
How are you doing, Alvin
The new hello()
function has a parameter called name
. Parameters are variables that expect arguments in the function call. When a function is called with an argument (e.g., Alvin
), this argument is stored in the parameter (e.g., name
).
More specifically, when the function hello(name = 'Alvin')
is called:
- The argument
"Alvin"
is assigned to the parametername
; - The program then continues the code block of the function;
- Within the code block, the parameter
name
is automatically set toAlvin
.
It is important to note that the value stored in the parameter is forgotten when the function returns. That is, we cannot access the parameter name
in the main program:
Error in eval(expr, envir, enclos): object 'name' not found
In short, the parameters of a function are destroyed after a function call hello(name = 'Alvin')
returns.
In the function definition, we can specify default values to the parameters. For example, if the function hello()
is defined as follows, users can decide whether to accept the default argument or assign the parameter name
with a new argument:
How are you doing, Alvin
How are you doing, Superman
6.1.4 Recap of Important Concepts So Far
To utilize a function
object, there are several key steps:
- We need to define the function by creating it using
hello <- function(){...}
and assigning it with an object name like any other objects in R. - Then we can call the now-created function using
hello()
. - The function call will start the execution of the code block in the function by first passing or assigning the arguments/values to the parameters within the function (e.g.,
hello(name = 'Alvin')
).- A value being passed to a function in a function call is an argument, (e.g.,
Alvin
) - Variables that have arguments assigned to them are parameters, (e.g.,
name =
).
- A value being passed to a function in a function call is an argument, (e.g.,
6.1.5 RETURN Statements
When we define a function, we can specify what the return values should be using the return()
statement.
The returned values, i.e., the output object of the function, can then be assigned to a new object name for later use in the program.
In R, there are many built-in functions that return values:
[1] 3 5 8
When a function returns nothing, by default the return value of the function is NULL
, which is a unique data type in R referring to NoneType
.
This is a sentence
NULL
Now how about the hello()
function we created earlier? We didn’t specify the return()
statement in the function definition.
How are you doing, John
NULL
In the definition of hello()
, we did not specify the return()
statement; therefore, by default, this function returns NULL
. But how come we can still see the outputs of the function?
In the code block of hello()
definition, the cat()
displays text on the R console only. Therefore, displaying texts in the R console and returning the values are two different things.
Exercise 6.1 The function hello()
prints a message to the console. Without any change of the function definition hello()
, how can we capture the messages printed in the console by hello()
and save them to an object named out
?
Your task is to modify the following code chunk so that out
can store the messages printed by hello(name="Alvin")
. Keep in mind that you cannot modify the definition of hello()
itself, so you will need to use a different approach to capture its output.
out <- hello(name="Alvin")
out
Exercise 6.2 Can you try to create a revised version of hello()
, which returns the strings so that one can assign the outputs of the hello()
to another object name? (Please note that in the following example, the return value out
is not a NULL
anymore.)
[1] "How are you doing, Alvin"
[1] "How are you doing, John"
6.1.6 Parameters Order
We’ve seen functions with parameters. When a function has many parameters, there are two alternatives to assign the arguments to the parameters in the function call.
First, we can assign the arguments to the parameters specified in the function call:
[1] 3 10 2 8 6
Alternatively, we can assign the arguments to the parameters according to the order of the parameters in the function definition without specifying the parameter names:
[1] 3 10 2 8 6
In the function definition, we can also assign default values to the parameters. For example, in the documentation of sample(x, size, replace = FALSE, prob = NULL)
, we can see that the parameters replace=
and prob=
have default values. That means in the function call we can use these default values as the arguments without specifying them in the call.
[1] 5 4 6 8 1
6.1.7 Stacking Functions
A function can also call another function within its code block. When this happens, the execution of the code would move to the called function before returning to the original function call.
For instance, we can define two functions in R, hello()
and email()
.
## Main function
hello <- function(name) {
user_email <- email(user = name)
out <- paste0("How are you doing, ", name, ". ", user_email)
return(out)
}
## Embedded function
email <- function(user) {
out <- paste0("Your email is: ", tolower(user), "@whatever.org")
return(out)
}
Within the code block of hello()
, we can make a function call to email()
. This means that email()
is embedded within hello()
, and when hello()
is called, it will execute its code block and then move to execute the code block of email()
. After the code block of email()
is executed, the control will move back to hello()
to complete its execution.
[1] "How are you doing, Alvin. Your email is: alvin@whatever.org"
This ability to call functions within functions is useful in situations where a function needs to perform multiple tasks, and each task can be implemented using a separate function. Rather than writing all the code in a single function, we can call other functions from within it to keep the code organized and easier to understand.
6.2 Local and Global Scope
Functions help us organize code into reusable blocks that perform specific tasks. You write the code once and can use it as many times as you need with different inputs.
When you run a function, it creates its own workspace called the local scope. Variables inside the function, known as local variables, only exist within that function and can’t be used outside it.
On the other hand, variables you create outside functions are in the global scope. These global variables can be accessed anywhere in the program. If you change a global variable inside a function, the change affects it everywhere.
A scope is like a variable’s life span. A local scope is created when you call a function. Variables made inside the function exist only in that function’s local scope. Once the function finishes, the local scope is destroyed, and those local variables are forgotten (removed from memory).
The global scope starts when the main program runs (like your current R session). When the program ends, the global scope is destroyed, and all the global variables are forgotten too.
There are a few important considerations for variable scope:
- Code in the global scope (i.e., outside of all functions) cannot use any local variables (i.e., variables within functions).
- Code in a local scope can access global variables.
- Code in a local scope can modify the values of global variables.
- Code in a function’s local scope cannot use variables in any other local scope.
- We can use the same name for different variables if they are in different scopes (e.g., they can be local variables within different functions).
While using global variables within functions in small programs may not cause significant issues, it is generally considered bad practice to rely on global variables in larger programs.
One issue with using global variables in local functions is that they can be accessed and modified by any part of the program, making it difficult to track changes and debug the code.
Another issue is that global variables can make it difficult to reuse functions in other parts of the program or in other programs.
In short, functions that rely on global variables are less modular and less portable, and can make code maintenance and updates more difficult in the long run.
Before running the following code demos, restart your R session to ensure there are no leftover objects from previous exercises.
You can also clear the environment with this code (use with caution!):
- The following code chunk shows that local variables cannot be accessed in the global scope.
rm(list = ls(all = TRUE))
customer <- function() {
id <- 123
age <- 25
nation <- "TW"
}
customer()
cat(id)
Error in cat(id): argument 1 (type 'closure') cannot be handled by 'cat'
- The following code chunk shows that local scopes cannot use variables in other local scopes.
rm(list = ls(all = TRUE))
customer <- function() {
id <- 123
age <- 25
nation <- "TW"
print(age)
}
client <- function() {
age <- 50
}
client() ## initalize a local var `age` in `client()`
customer() # returning `age` from `customer()` not from `client()`
[1] 25
- The following code chunk shows a local scope can access global variables.
rm(list = ls(all = TRUE))
customer <- function() {
age <- 25
cat("The customer works at", company)
}
company <- "NTNU"
customer()
The customer works at NTNU
6.3 Exception Handling
In programming, errors and warnings can happen during code execution, causing the program to crash or behave unexpectedly.
To prevent the program from stopping abruptly, developers use exception handling to manage these issues more smoothly.
The goals of exception handling are:
- Ensure the program keeps running without stopping halfway.
- Clearly inform users about any errors or warnings.
For example, if we create a function myLog()
that calculates the logarithm of a number with a given base, there are cases where the result might be problematic:
[1] 2
[1] 3
Warning in log(x, myBase): NaNs produced
[1] NaN
Warning in log(x, myBase): NaNs produced
[1] NaN
Error in log(x, myBase): non-numeric argument to mathematical function
To make sure that the function myLog()
does not terminate the main program when encountering errors or warnings, it is often a good idea to include exception handling in the function code block.
In R, exception handling can be achieved using the tryCatch()
function. This function allows the programmer to specify what should happen when an error or warning occurs during the execution of a block of code.
The structure of tryCatch()
includes a try block where the code is executed, and a catch block where the handling of errors and warnings is specified. If an error or warning occurs in the try block, the code in the catch block is executed instead of terminating the program.
Its structure is as follows:
result <- tryCatch({
##----- original_code -----##
}, warning = function(w) {
##----- warning_handler_code -----##
}, error = function(e) {
##----- error_handler_code -----##
}, finally = {
##----- cleanup_code -----##
}) ## endtry
tryCatch()
includes the following important elements:
expr
: the expression/code to be evaluated.warning
: When theexpr
causes a warning, the program execution immediately moves to the code in thewarning
code block.error
: When theexpr
causes an error, the program execution immediately moves to the code in theerror
code block.
Now let’s try to include tryCatch()
in our code block of the function myLog()
:
myLog <- function(x, myBase) {
tryCatch({
##----- original_code -----##
return(log(x, myBase))
}, warning = function(w) {
##----- warning_handler_code -----##
if (x < 0)
print("WARNING!! `x` must be a positive number")
if (myBase < 0)
print("WARNING!! `myBase` must be a positive number")
}, error = function(e) {
##----- error_handler_code -----##
if (!is.numeric(x) | !is.numeric(myBase))
print("ERROR!! Either `x` or `myBase` must be a positive number not a string")
}, finally = {
##----- cleanup_code (optional) -----##
## print('Function completed!!')
}) ## endtrycatch
} ## endfunc
[1] 2
[1] 4.60517
[1] "WARNING!! `myBase` must be a positive number"
[1] "ERROR!! Either `x` or `myBase` must be a positive number not a string"
[1] "ERROR!! Either `x` or `myBase` must be a positive number not a string"
In R, cat()
, writeLines()
, and print()
are three functions that can be used to print out output to the console (or other output devices). However, they differ in their syntax and output format. Here are the main differences between these functions:
cat()
This function is used to concatenate and print objects. It can take one or more objects as input and concatenates them with a separator (by default, a space). The output is not formatted and is printed as a single string.
1 2 3 4 5 6 7 8 9 10
a b c d e f g h i j
Please pay particular attention to how cat()
prints the values of a factor:
1 2 3 4 5 6 7 8 9 10
Numbers: 1 2 3 4 5 6 7 8 9 10 Characters: a b c d e f g h i j
writeLines()
This function writes character vectors to a connection, with one element per line. By default, each element is appended with a line break.
a
b
c
d
e
f
g
h
i
j
Please note that this function only takes character vectors as the input and does not do implicit data conversion (if the input is a numeric vector)
Error in writeLines(x): can only write character objects
This function cannot take a factor either:
Error in writeLines(z): can only write character objects
Unlike cat()
, it cannot concatenate character strings.
Error in file(con, "w"): invalid 'description' argument
print()
print()
is a generic function that prints the object to the console or a file. print()
is called implicitly whenever an object name is typed into the console (i.e., auto-printing) or passed as an argument to a function that expects a printed output. It applies formatting to the object being printed, such as adding quotes to character strings and displaying factors as levels rather than as numeric codes.
If the input is a numeric vector, print()
does implicit data conversion and prints the numeric values to the console.
[1] 1 2 3 4 5 6 7 8 9 10
If the input is a character vector, print()
prints the characters to the console, each of which is embraced with double quotes indicating its character data type.
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
This function prints a factor as well:
[1] a b c d e f g h i j
Levels: a b c d e f g h i j
This function does not concatenate character strings either.
Warning in print.default(x, y): NAs introduced by coercion
Error in print.default(x, y): invalid printing digits -2147483648
Exercise 6.3 Create a function that produces a simple animation, i.e., the zigzag outputs as shown below. The function will slowly create a back-and-forth zigzag pattern with the laps and the indent size (i.e., the maximum number of spaces that the zigzag pattern can go) as the parameters of the function.
The animation should be generated slowly, so that the pattern can be clearly seen as it is being created. Additionally, the user should be able to stop the animation at any time by pressing CTRL+C or ESC.
- Please note that the user can interrupt the program/function by pressing CTRL+C or ESC and your function should stop properly (using
tryCatch
).
Exercise 6.4 Create a function that allows the user to play a game of rock, paper, scissors against the computer. The function should have the following features:
- The user should be prompted to enter their move as a text input (i.e., rock, paper, or scissors).
- The computer should randomly select a move.
- The program should ensure that the user’s input is a valid move.
- After each round, the program should report the result of the game as a text output, either “You win!”, “You lose.”, or “It’s a tie.”
- The user should be able to play as many rounds as they wish until they choose to quit.
- When the user decides to quit, the program should provide a summary of the number of games won, lost, and tied.
An example of how the function works is provided below.