Exercise 6: Basic programming in R
Read Chapter 6 to help you complete the questions in this exercise. Click on the ‘code’ button to the right of each question to reveal the solutions code (but try to find a solution first!).
Create a function to calculate the area of a circle. Test the function by finding the area of a circle with a diameter of 3.4 cm. Can you use it on a vector of data?
# area of a circle # the equation to calculate the area of a circle is pi * radius^2 <- function(d) { circle.area * (d / 2)^2 pi } # to use your new function circle.area(10) # [1] 78.53982 # to test on a vector of diameters # first create a vector with diameters ranging from 0 to 50 in steps of 10 <- seq(from = 0, to = 50, by = 10) cir.diam # test your function circle.area(cir.diam) # [1] 0.00000 78.53982 314.15927 706.85835 1256.63706 1963.49541
Write a function to convert farenheit to centegrade (oC = (oF - 32) x 5/9). Get your function to print out your result in the following format: “Farenheit : value of oF is equivalent to value oC centigrade.”
<- function(a) { far.cent <- (a - 32) * 5/9 val print(paste("Fahrenheit: ", round(a, digits = 3), "oF", sep = " "), quote = FALSE) # round 3dp print(paste("Centigrade: ", round(val, digits = 3), "oC", sep = " "), quote = FALSE) # round 3dp } # alternative Fahrenheit to centigrade using cat function <- function(a) { far.cent2 <- (a - 32) * 5/9 # calculation val cat("Fahrenheit: ", round(a, digits = 3), "oF", "\n") # use cat function cat("Centigrade: ", round(val, digits = 3), "oC", "\n") }
Create a vector of normally distributed data, of length 100, mean 35 and standard deviation of 15. Write a function to calculate the mean, median, and range of the vector, print these values out with appropriate labels. Also get the function to plot a histogram (as a proportion) of the values and add a density curve.
# Create a vector of normally distributed data # length 100, mean 35 and standard deviation of 29 <- rnorm(100, 35, 15) # create some norm dist data mean 35, sd = 15 vals <- function(dat) { summary.fun <- round(mean(dat), digits = 4) # calc mean mymean <- round(median(dat), digits = 4) # calc median mymedian <- round(min(dat), digits = 4) # calc min mymin <- round(max(dat), digits = 4) # calc max mymax print(paste("mean:", mymean, sep = " "), quote = FALSE) # print mean print(paste("median:", mymedian, sep = " "), quote = FALSE) # print median print(paste("range:", "from:", mymin, "to", mymax, sep = " "), quote = FALSE) <- density(dat) # estimate density curve dens hist(dat, main = "", type = "l", freq = FALSE) # plot histogram lines(dens, lty = 1, col = "red") # plot density curve } # use the function summary.fun(vals)
Write a function to calculate the median value of a vector of numbers (yes I know there’s a
median()
function already but this is fun!). Be careful with vectors of an even sample size, as you will have to take the average of the two central numbers (hint: use modulo %%2 to determine whether the vector is an odd or an even size). Test your function on vectors with both odd and even sample sizes.# calculate a median <- function(x) { ourmedian <- length(x) n if (n %% 2 == 1) { # odd numbers sort(x)[(n + 1) / 2] # find the middle number by adding 1 to length and div 2 } else { # even numbers <- sort(x)[(n / 2) + 0:1] # find the two middle numbers middletwo mean(middletwo) } } # use the function <- sample(1:50, size = 10, replace = TRUE) mydat # our function ourmedian(mydat) # R median function median(mydat)
- You are a population ecologist for the day and wish to investigate the properties of the Ricker model. The Ricker model is defined as:
\[ N_{t+1} = N_{t} exp\left[r\left(1- \frac{N_{t}}{K} \right) \right] \]
(cont) Where Nt is the population size at time t, r is the population growth rate and K is the carrying capacity. Write a function to simulate this model so you can conveniently determine the effect of changing r and the initial population size N0. K is often set to 100 by default, but you want the option of being able to change this with your function. So, you will need a function with the following arguments; nzero which sets the initial population size, r which will determine the population growth rate, time which sets how long the simulation will run for and K which we will initially set to 100 by default.
# function to simulate Ricker model <- function(nzero, r, time, K = 1) { Ricker.model # sets initial parameters <- numeric(time + 1) # creates a real vector of length time+1 to store values of Nt+1 N 1] <- nzero # sets initial population size in first element of N N[for (i in 1:time) { # loops over time + 1] <- N[i] * exp(r * (1 - N[i]/K)) N[i }<- 0:time # creates vector for x axis Time plot(Time, N, type = "o", xlim = c(0, time), xlab = "Time", ylab = "Population size (N)", main = paste("r =", r, sep = " ")) # plots output } # To run play around with the different parameters, especially r Ricker.model(nzero = 0.1, r = 1, time = 100)
- One of the things computers can do much better than humans is to do repetitive things very fast. For example, LODES website has data on commuting patterns (simulated) from each Census tract to another for each state in the US. For most states in the US, it has annual data from 2002 to 2019. Unfortunately to downlaod this data through a browser means that we will have to click individual links in each of these state subdirectories and save them.
This can be made very easy with loops in R.
As an example, look at a the data for Connecticut. It is located at https://lehd.ces.census.gov/data/lodes/LODES7/ct/od/
and has files named ct_od_{main|aux}_JT0{0-5}_yyyy.csv.gz
where {main|aux} represent main or aux; 0-5 is a sequence of numbers from 0 to 5; yyyy is from 2002 to 2019.
- Using
download.file()
function to download individual files from the web and nestedfor
loops, download all the files for Connecticut.
<- "https://lehd-ces-census-gov/data/lodes/LODES7/ct/od/"
baseurl
for (m in c("main", "aux")) {
for (i in 0:5) {
for (year in 2002:2019) {
<- paste("ct_od_", m, "_JT0", i, "_", year, ".csv.gz", sep = "")
filename <- paste(baseurl, filename)
fileurl download.file(fileurl, destfile = paste("./data_download/", filename,
sep = ""))
}
} }
- replace one or more of the nested
for
loop withwalk()
ormap
frompurrr
package.
.<- "https://lehd.ces.census.gov/data/lodes/LODES7/ct/od/"
baseurl
<- c("main", "aux")
m <- 0:5
i <- 2002:2019
year
<- expand.grid(m, i, year)
var_combos names(var_combos) <- c("m", "i", "year")
<- paste("ct_od_", var_combos$m, "_JT0", var_combos$i, "_", var_combos$year,
filenames ".csv.gz", sep = "")
walk(filenames, function(x) {
download.file(paste(baseurl, x, sep = ""), destfile = paste("./data_downloads/",
x)) })
- Pick any 5 states in the US and using common two letter abbreviations, download all the files. Try different combinations of states and see if your assumptions about the file names are generalisable (Hint: Use MA)
<- "https://lehd.ces.census.gov/data/lodes/LODES7/"
baseurl
< tolower(state.abb[1:5])
st <- c("main", "aux")
m <- 0:5
i <- 2002:2019
year
<- expand.grid(m, i, year)
var_combos names(var_combos) <- c(, "m", "i", "year")
for (state in st) {
# Why am I using a loop here instead of expanding the grid by combinations
# that include state and construct the file name? (Hint: baseurl needs to
# be changed as well)
<- paste(baseurl, state, "/od/", sep = "")
baseurl_st <- paste(state, "_od_", var_combos$m, "_JT0", var_combos$i, "_", var_combos$year,
filenames ".csv.gz", sep = "")
walk(filenames, function(x) {
download.file(paste(baseurl_st, x, sep = ""), destfile = paste("./data_downloads/",
x))
})
}
# This fails for MA and few other states, why?
End of Exercise 6