2.3 Using functions in R

Up until now we’ve been creating simple objects by directly assigning a single value to an object. It’s very likely that you’ll soon want to progress to creating more complicated objects as your R experience grows and the complexity of your tasks increase. Happily, R has a multitude of functions to help you do this. You can think of a function as an object which contains a series of instructions to perform a specific task. The base installation of R comes with many functions already defined or you can increase the power of R by installing one of the 10000’s of packages now available. Once you get a bit more experience with using R you may want to define your own functions to perform tasks that are specific to your goals (more about this in Chapter 7).

 

See this video for a general introduction to using functions in R and this video on how to create vectors in R

 

The first function we will learn about is the c() function. The c() function is short for concatenate and we use it to join together a series of values and store them in a data structure called a vector (more on vectors in Chapter 3).

my_vec <- c(2, 3, 1, 6, 4, 3, 3, 7)

In the code above we’ve created an object called my_vec and assigned it a value using the function c(). There are a couple of really important points to note here. Firstly, when you use a function in R, the function name is always followed by a pair of round brackets even if there’s nothing contained between the brackets. Secondly, the argument(s) of a function are placed inside the round brackets and are separated by commas. You can think of an argument as way of customising the use or behaviour of a function. In the example above, the arguments are the numbers we want to concatenate. Finally, one of the tricky things when you first start using R is to know which function to use for a particular task and how to use it. Thankfully each function will always have a help document associated with it which will explain how to use the function (more on this later) and a quick Google search will also usually help you out.

To examine the value of our new object we can simply type out the name of the object as we did before

my_vec
## [1] 2 3 1 6 4 3 3 7

Now that we’ve created a vector we can use other functions to do useful stuff with this object. For example, we can calculate the mean, variance, standard deviation and number of elements in our vector by using the mean(), var(), sd() and length() functions

mean(my_vec)    # returns the mean of my_vec
## [1] 3.625
var(my_vec)     # returns the variance of my_vec
## [1] 3.982143
sd(my_vec)      # returns the standard deviation of my_vec
## [1] 1.995531
length(my_vec)  # returns the number of elements in my_vec
## [1] 8

If we wanted to use any of these values later on in our analysis we can just assign the resulting value to another object

vec_mean <- mean(my_vec)    # returns the mean of my_vec
vec_mean
## [1] 3.625

Sometimes it can be useful to create a vector that contains a regular sequence of values in steps of one. Here we can make use of a shortcut using the : symbol.

my_seq <- 1:10     # create regular sequence
my_seq
##  [1]  1  2  3  4  5  6  7  8  9 10
my_seq2 <- 10:1    # in decending order
my_seq2
##  [1] 10  9  8  7  6  5  4  3  2  1

Other useful functions for generating vectors of sequences include the seq() and rep() functions. For example, to generate a sequence from 1 to 5 in steps of 0.5

my_seq2 <- seq(from = 1, to = 5, by = 0.5)
my_seq2
## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Here we’ve used the arguments from = and to = to define the limits of the sequence and the by = argument to specify the increment of the sequence. Play around with other values for these arguments to see their effect.

The rep() function allows you to replicate (repeat) values a specified number of times. To repeat the value 2, 10 times

my_seq3 <- rep(2, times = 10)   # repeats 2, 10 times
my_seq3
##  [1] 2 2 2 2 2 2 2 2 2 2

You can also repeat non-numeric values

my_seq4 <- rep("abc", times = 3)    # repeats ‘abc’ 3 times 
my_seq4
## [1] "abc" "abc" "abc"

or each element of a series

my_seq5 <- rep(1:5, times = 3)  # repeats the series 1 to 
                                # 5, 3 times
my_seq5
##  [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

or elements of a series

my_seq6 <- rep(1:5, each = 3)   # repeats each element of the 
                              #series 3 times
my_seq6
##  [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5

We can also repeat a non-sequential series

my_seq7 <- rep(c(3, 1, 10, 7), each = 3) # repeats each 
                                         # element of the 
                                         # series 3 times
my_seq7
##  [1]  3  3  3  1  1  1 10 10 10  7  7  7

Note in the code above how we’ve used the c() function inside the rep() function. Nesting functions allows us to build quite complex commands within a single line of code and is a very common practice when using R. However, care needs to be taken as too many nested functions can make your code quite difficult for others to understand (or yourself some time in the future!). We could rewrite the code above to explicitly separate the two different steps to generate our vector. Either approach will give the same result, you just need to use your own judgement as to which is more readable.

in_vec <- c(3, 1, 10, 7)
my_seq7 <- rep(in_vec, each = 3)    # repeats each element of 
                                  # the series 3 times
my_seq7
##  [1]  3  3  3  1  1  1 10 10 10  7  7  7

2.3.1 Pipes. A better way!!!!!

A better way to nest functions is to use what are called pipes. Pipes are a way of passing the output of one function to another function without storing the values in an object in between. You can use two different pipes, one in native R (|>) and another from magittr (%>%) packages.


my_seq8 <- c(3, 1, 10, 7) |>  # Native R version
                            rep(each = 3)   

my_seq8
##  [1]  3  3  3  1  1  1 10 10 10  7  7  7

library(magrittr)
my_seq9 <- c(3, 1, 10, 7) %>%  # magittr version
                            rep(each = 3)   

all.equal(my_seq8, my_seq9)
## [1] TRUE

You can build long series of these nested functions as you will do in the urban analytics course.

Notice that in both cases the the first argument of the rep function is not specified. That is because the output of the previous step before the pipe is automatically the first argument. This is how you will use it in many situations, however, in more advanced settings you can change the output to be a different agument of a later functions. But that is for a different day!