Exercise 6: Basic programming in R

Read Chapter 6 to help you complete the questions in this exercise. Click on the ‘code’ button to the right of each question to reveal the solutions code (but try to find a solution first!).

Create a function to calculate the area of a circle. Test the function by finding the area of a circle with a diameter of 3.4 cm. Can you use it on a vector of data?

# area of a circle
# the equation to calculate the area of a circle is pi * radius^2

circle.area <- function(d) {
  pi * (d / 2)^2
}

# to use your new function

circle.area(10)
# [1] 78.53982

# to test on a vector of diameters
# first create a vector with diameters ranging from 0 to 50 in steps of 10

cir.diam <- seq(from = 0, to = 50, by = 10)

# test your function

circle.area(cir.diam)
# [1]    0.00000   78.53982  314.15927  706.85835 1256.63706 1963.49541

Write a function to convert farenheit to centegrade (oC = (oF - 32) x 5/9). Get your function to print out your result in the following format: “Farenheit : value of oF is equivalent to value oC centigrade.”

far.cent <- function(a) {
    val <- (a - 32) * 5/9
    print(paste("Fahrenheit: ", round(a, digits = 3), "oF", sep = " "), quote = FALSE)  # round 3dp
    print(paste("Centigrade: ", round(val, digits = 3), "oC", sep = " "), quote = FALSE)  # round 3dp
}

# alternative Fahrenheit to centigrade using cat function

far.cent2 <- function(a) {
    val <- (a - 32) * 5/9  # calculation
    cat("Fahrenheit: ", round(a, digits = 3), "oF", "\n")  # use cat function
    cat("Centigrade: ", round(val, digits = 3), "oC", "\n")
}

Create a vector of normally distributed data, of length 100, mean 35 and standard deviation of 15. Write a function to calculate the mean, median, and range of the vector, print these values out with appropriate labels. Also get the function to plot a histogram (as a proportion) of the values and add a density curve.

# Create a vector of normally distributed data
# length 100, mean 35 and standard deviation of 29

vals <- rnorm(100, 35, 15) # create some norm dist data mean 35, sd = 15

summary.fun <- function(dat) {
  mymean <- round(mean(dat), digits = 4) # calc mean
  mymedian <- round(median(dat), digits = 4) # calc median
  mymin <- round(min(dat), digits = 4) # calc min
  mymax <- round(max(dat), digits = 4) # calc max
  print(paste("mean:", mymean, sep = " "), quote = FALSE) # print mean
  print(paste("median:", mymedian, sep = " "), quote = FALSE) # print median
  print(paste("range:", "from:", mymin, "to", mymax, sep = " "), quote = FALSE)
  dens <- density(dat) # estimate density curve
  hist(dat, main = "", type = "l", freq = FALSE) # plot histogram
  lines(dens, lty = 1, col = "red") # plot density curve
}

# use the function
summary.fun(vals)

Write a function to calculate the median value of a vector of numbers (yes I know there’s a median() function already but this is fun!). Be careful with vectors of an even sample size, as you will have to take the average of the two central numbers (hint: use modulo %%2 to determine whether the vector is an odd or an even size). Test your function on vectors with both odd and even sample sizes.

# calculate a median

ourmedian <- function(x) {
  n <- length(x)
  if (n %% 2 == 1) { # odd numbers
sort(x)[(n + 1) / 2]
  } # find the middle number by adding 1 to length and div 2
  else { # even numbers
middletwo <- sort(x)[(n / 2) + 0:1] # find the two middle numbers
mean(middletwo)
  }
}

# use the function
mydat <- sample(1:50, size = 10, replace = TRUE)

# our function
ourmedian(mydat)

# R median function
median(mydat)

You are a population ecologist for the day and wish to investigate the properties of the Ricker model. The Ricker model is defined as:

\[ N_{t+1} = N_{t} exp\left[r\left(1- \frac{N_{t}}{K} \right) \right] \]

(cont) Where N_t is the population size at time t, r is the population growth rate and K is the carrying capacity. Write a function to simulate this model so you can conveniently determine the effect of changing r and the initial population size N0. K is often set to 100 by default, but you want the option of being able to change this with your function. So, you will need a function with the following arguments; nzero which sets the initial population size, r which will determine the population growth rate, time which sets how long the simulation will run for and K which we will initially set to 100 by default.

# function to simulate Ricker model

Ricker.model <- function(nzero, r, time, K = 1) {
    # sets initial parameters
    N <- numeric(time + 1)  # creates a real vector of length time+1 to store values of Nt+1
    N[1] <- nzero  # sets initial population size in first element of N
    for (i in 1:time) {
        # loops over time
        N[i + 1] <- N[i] * exp(r * (1 - N[i]/K))
    }
    Time <- 0:time  # creates vector for x axis
    plot(Time, N, type = "o", xlim = c(0, time), xlab = "Time", ylab = "Population size (N)",
        main = paste("r =", r, sep = " "))  # plots     output
}

# To run play around with the different parameters, especially r
Ricker.model(nzero = 0.1, r = 1, time = 100)

One of the things computers can do much better than humans is to do repetitive things very fast. For example, LODES website has data on commuting patterns (simulated) from each Census tract to another for each state in the US. For most states in the US, it has annual data from 2002 to 2019. Unfortunately to downlaod this data through a browser means that we will have to click individual links in each of these state subdirectories and save them.

This can be made very easy with loops in R.

As an example, look at a the data for Connecticut. It is located at https://lehd.ces.census.gov/data/lodes/LODES7/ct/od/

and has files named ct_od_{main|aux}_JT0{0-5}_yyyy.csv.gz

where {main|aux} represent main or aux; 0-5 is a sequence of numbers from 0 to 5; yyyy is from 2002 to 2019.

Using download.file() function to download individual files from the web and nested for loops, download all the files for Connecticut.

baseurl <- "https://lehd-ces-census-gov/data/lodes/LODES7/ct/od/"

for (m in c("main", "aux")) {
    for (i in 0:5) {
        for (year in 2002:2019) {
            filename <- paste("ct_od_", m, "_JT0", i, "_", year, ".csv.gz", sep = "")
            fileurl <- paste(baseurl, filename)
            download.file(fileurl, destfile = paste("./data_download/", filename,
                sep = ""))
        }
    }
}

replace one or more of the nested for loop with walk() or map from purrr package.

.
baseurl <- "https://lehd.ces.census.gov/data/lodes/LODES7/ct/od/"

m <- c("main", "aux")
i <- 0:5
year <- 2002:2019

var_combos <- expand.grid(m, i, year)
names(var_combos) <- c("m", "i", "year")

filenames <- paste("ct_od_", var_combos$m, "_JT0", var_combos$i, "_", var_combos$year,
    ".csv.gz", sep = "")

walk(filenames, function(x) {
    download.file(paste(baseurl, x, sep = ""), destfile = paste("./data_downloads/",
        x))
})

Pick any 5 states in the US and using common two letter abbreviations, download all the files. Try different combinations of states and see if your assumptions about the file names are generalisable (Hint: Use MA)

baseurl <- "https://lehd.ces.census.gov/data/lodes/LODES7/"

st < tolower(state.abb[1:5])
m <- c("main", "aux")
i <- 0:5
year <- 2002:2019

var_combos <- expand.grid(m, i, year)
names(var_combos) <- c(, "m", "i", "year")


for (state in st) {
    # Why am I using a loop here instead of expanding the grid by combinations
    # that include state and construct the file name? (Hint: baseurl needs to
    # be changed as well)

    baseurl_st <- paste(baseurl, state, "/od/", sep = "")
    filenames <- paste(state, "_od_", var_combos$m, "_JT0", var_combos$i, "_", var_combos$year,
        ".csv.gz", sep = "")
    walk(filenames, function(x) {
        download.file(paste(baseurl_st, x, sep = ""), destfile = paste("./data_downloads/",
            x))
    })
}

# This fails for MA and few other states, why?

End of Exercise 6