6.4 Loops

R is very good at performing repetitive tasks. If we want a set of operations to be repeated several times we use what’s known as a loop. When you create a loop, R will execute the instructions in the loop a specified number of times or until a specified condition is met. There are three main types of loop in R: the for loop, the while loop and the repeat loop.

Loops are one of the staples of all programming languages, not just R, and can be a powerful tool (although in our opinion, used far too frequently when writing R code).

6.4.1 For loop

The most commonly used loop structure when you want to repeat a task a defined number of times is the for loop. The most basic example of a for loop is:

for (i in 1:5) {
  print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5

But what’s the code actually doing? This is a dynamic bit of code were an index i is iteratively replaced by each value in the vector 1:5. Let’s break it down. Because the first value in our sequence (1:5) is 1, the loop starts by replacing i with 1 and runs everything between the { }. Loops conventionally use i as the counter, short for iteration, but you are free to use whatever you like, even your pet’s name, it really does not matter (except when using nested loops, in which case the counters must be called different things, like SenorWhiskers and HerrFlufferkins.

So, if we were to do the first iteration of the loop manually

i <- 1
print(i)
## [1] 1

Once this first iteration is complete, the for loop loops back to the beginning and replaces i with the next value in our 1:5 sequence (2 in this case):

i <- 2
print(i)
## [1] 2

This process is then repeated until the loop reaches the final value in the sequence (5 in this example) after which point it stops.

To reinforce how for loops work and introduce you to a valuable feature of loops, we’ll alter our counter within the loop. This can be used, for example, if we’re using a loop to iterate through a vector but want to select the next row (or any other value). To show this we’ll simply add 1 to the value of our index every time we iterate our loop.

for (i in 1:5) {
  print(i + 1)
}
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6

As in the previous loop, the first value in our sequence is 1. The loop begins by replacing i with 1, but this time we’ve specified that a value of 1 must be added to i in the expression resulting in a value of 1 + 1.

i <- 1
i + 1
## [1] 2

As before, once the iteration is complete, the loop moves onto the next value in the sequence and replaces i with the next value (2 in this case) so that i + 1 becomes 2 + 1.

i <- 2
i + 1
## [1] 3

And so on. We think you get the idea! In essence this is all a for loop is doing and nothing more.

Whilst above we have been using simple addition in the body of the loop, you can also combine loops with functions.

Let’s go back to our data frame city. Previously in the Chapter we created a function to multiply two columns and used it to create our porto_aberdeen, aberdeen_nairobi, and nairobi_genoa objects. We could have used a loop for this. Let’s remind ourselves what our data look like and the code for the multiple_columns() function.

# Recreating our dataset
city <- data.frame(
  porto = rnorm(100),
  aberdeen = rnorm(100),
  nairobi = c(rep(NA, 10), rnorm(90)),
  genoa = rnorm(100)
)

# Our function
multiply_columns <- function(x, y) {
  temp <- x * y
  if (any(is.na(temp))) {
    warning("The function has produced NAs")
    return(temp)
  } else {
    return(temp)
  }
}

To use a list to iterate over these columns we need to first create an empty list (remember lists?) which we call temp (short for temporary) which will be used to store the output of the for loop.

temp <- list()
for (i in 1:(ncol(city) - 1)) {
  temp[[i]] <- multiply_columns(x = city[, i], y = city[, i + 1])
}
## Warning in multiply_columns(x = city[, i], y = city[, i + 1]): The function has
## produced NAs

## Warning in multiply_columns(x = city[, i], y = city[, i + 1]): The function has
## produced NAs

When we specify our for loop notice how we subtracted 1 from ncol(city). The ncol() function returns the number of columns in our city data frame which is 4 and so our loop runs from i = 1 to i = 4 - 1 which is i = 3. We’ll come back to why we need to subtract 1 from this in a minute.

So in the first iteration of the loop i takes on the value 1. The multiply_columns() function multiplies the city[, 1] (porto) and city[, 1 + 1] (aberdeen) columns and stores it in the temp[[1]] which is the first element of the temp list.

The second iteration of the loop i takes on the value 2. The multiply_columns() function multiplies the city[, 2] (aberdeen) and city[, 2 + 1] (nairobi) columns and stores it in the temp[[2]] which is the second element of the temp list.

The third and final iteration of the loop i takes on the value 3. The multiply_columns() function multiplies the city[, 3] (nairobi) and city[, 3 + 1] (genoa) columns and stores it in the temp[[3]] which is the third element of the temp list.

So can you see why we used ncol(city) - 1 when we first set up our loop? As we have four columns in our city data frame if we didn’t use ncol(city) - 1 then eventually we’d try to add the 4th column with the non-existent 5th column.

Again, it’s a good idea to test that we are getting something sensible from our loop (remember, check, check and check again!). To do this we can use the identical() function to compare the variables we created by hand with each iteration of the loop manually.

porto_aberdeen_func <- multiply_columns(city$porto, city$aberdeen)
i <- 1
identical(multiply_columns(city[, i], city[, i + 1]), porto_aberdeen_func)
## [1] TRUE

aberdeen_nairobi_func <- multiply_columns(city$aberdeen, city$nairobi)
## Warning in multiply_columns(city$aberdeen, city$nairobi): The function has
## produced NAs
i <- 2
identical(multiply_columns(city[, i], city[, i + 1]), aberdeen_nairobi_func)
## Warning in multiply_columns(city[, i], city[, i + 1]): The function has produced
## NAs
## [1] TRUE

If you can follow the examples above, you’ll be in a good spot to begin writing some of your own for loops. That said there are other types of loops available to you.

6.4.2 While loop

Another type of loop that you may use (albeit less frequently) is the while loop. The while loop is used when you want to keep looping until a specific logical condition is satisfied (contrast this with the for loop which will always iterate through an entire sequence).

The basic structure of the while loop is:

while(logical_condition){ expression }

A simple example of a while loop is:

i <- 0
while (i <= 4) {
  i <- i + 1
  print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5

Here the loop will only continue to pass values to the main body of the loop (the expression body) when i is less than or equal to 4 (specified using the <= operator in this example). Once i is greater than 4 the loop will stop.

There is another, very rarely used type of loop; the repeat loop. The repeat loop has no conditional check so can keep iterating indefinitely (meaning a break, or “stop here”, has to be coded into it). It’s worthwhile being aware of it’s existence, but for now we don’t think you need to worry about it; the for and while loops will see you through the vast majority of your looping needs.

6.4.3 When to use a loop?

Loops are fairly commonly used, though sometimes a little overused in our opinion. Equivalent tasks can be performed with functions, which are often more efficient than loops. Though this raises the question when should you use a loop?

In general loops are implemented inefficiently in R and should be avoided when better alternatives exist (e.g. vectorised functions), especially when you’re working with large datasets. However, loop are sometimes the only way to achieve the result we want.

Some examples of when using loops can be appropriate:

  • Some simulations.

  • Recursive relationships (a relationship which depends on the value of the previous relationship [“to understand recursion, you must understand recursion”])

  • More complex problems that are not easily amenable to functions.

  • While loops (keep jumping until you’ve reached the moon)

6.4.4 If not loops, then what?

The pattern of looping over a vector, doing something to each element and saving the results is so common that the purrr package (part of tidyverse) provides a family of functions to do it for you. These are map_* functions.

For example, column wise summaries of cities data can be created as

map_dfr(city, function(x){summary(x)})
## # A tibble: 4 × 7
##   Min.      `1st Qu.`  Median      Mean         `3rd Qu.` Max.     `NA's` 
##   <table>   <table>    <table>     <table>      <table>   <table>  <table>
## 1 -3.281362 -0.7580670 -0.09221508 -0.041098543 0.6769474 2.386683 NA     
## 2 -2.488330 -0.8160216 -0.06276359 -0.083801719 0.4851634 2.530844 NA     
## 3 -3.189245 -0.9550408 -0.24572891 -0.205044862 0.6328455 3.010478 10     
## 4 -2.407874 -0.6904893  0.01923070  0.006420597 0.6981329 2.849795 NA

Note that data frame is just a list, where columns are elements. To see this, use the [[ function on the list.

city[["porto"]]

summary is operating on each element of the list and creates a vector (more specifically with named number) as requested by map. The dfr is a call to return a data frame created by row-binding. This require dplyr to be installed.

You could have done the same thing with

temp <- NULL # Initialise a null data frame.

for(i in seq_along(city)){
    temp[[i]] <- summary(city[,i]) %>% t() # Need the transpose (t) to convert a vector into a row in a data frame.
}

plyr::ldply(temp) #Shortcut to use rowbinding and filling missing columns.
##        Min.    1st Qu.      Median         Mean   3rd Qu.     Max. NA's
## 1 -3.281362 -0.7580670 -0.09221508 -0.041098543 0.6769474 2.386683   NA
## 2 -2.488330 -0.8160216 -0.06276359 -0.083801719 0.4851634 2.530844   NA
## 3 -3.189245 -0.9550408 -0.24572891 -0.205044862 0.6328455 3.010478   10
## 4 -2.407874 -0.6904893  0.01923070  0.006420597 0.6981329 2.849795   NA

There is no real speed advantage of using map functions over for loops. It just makes the code more readable. A strategy to have in the back of your mind which may be useful is; for every loop you make, try to remake it using an map function (or lapply or sapply in base R will work). There’s nothing worse than realising there was a small, tiny, seemingly meaningless mistake in a loop which weeks, months or years down the line has propagated into a huge mess. We strongly recommend trying to use the apply functions whenever possible.