Introduction to R

Nihil Kaza

2024-08-17

Agenda


  1. Working with R & Rstudio
  2. R Basics
  3. Built-in Functions
  4. Working with Vectors
  5. Sneak Peak

What is R? What is Rstudio ?

1 2 3 4 5

What is R?

1 2 3 4 5





R is an open-source (free!) scripting language for working with data.


It started out as a statistical analysis language but is so much more now.

Getting R & Rstudio

1 2 3 4 5




You need the R language

Organizing with projects

1 2 3 4 5


highly recommend nay require using projects to stay organized


keeps code files and data files together, allowing for easier file path navigation and better reproducible work habits


File -> New Project


more guidance: here and here

Organizing with projects

1 2 3 4 5

project files are here

imported data shows up here

code can also
go here

Organizing with projects

1 2 3 4 5

.
└── my_awesome_project
    ├── my_awesome_project.Rproj
    ├── data
        ├── raw
        └── temp
        └── processed
    ├── src
        ├── 01_first_script.R
        └── 02_second_script.R
    ├── R 
        ├── useful_function_1.R
        └── useful_function_2.R
    ├── writing
        ├── 01_chapter.Rmd
    ├── output

   └──README.md
    
    

Organizing with projects

1 2 3 4 5

# create directory called 'data'
dir.create("data")

# create subdirectory raw in the data directory
dir.create("data/raw")

# create subdirectory processed in the data directory
dir.create("data/processed")

# list the files and directories
list.files(recursive = TRUE, include.dirs = TRUE)

# [1] "data"  "data/raw" "data/processed"  "my_awesome_project.Rproj"

Exercise 1

R Basics

1 2 3 4 5

Tentative steps

1 2 3 4 5

project files are here.
Visualisations show up here

imported data shows up here

code can go here

Tentative steps

1 2 3 4 5

2 + 2
[1] 4
sin(pi/2)
[1] 1
log10(3)
[1] 0.4771213
log(3)
[1] 1.098612
sqrt(4)
[1] 2
4^2
[1] 16
exp(4)
[1] 54.59815

Assign to an object

1 2 3 4 5

<- is the assignment operator1

my_obj <- 48

my_obj
[1] 48
my_obj <- exp(4) * sin(2)^3

my_obj
[1] 41.04836
my_obj <- (exp(4)*sin(2))^3

my_obj
[1] 122363.4
my_obj <- "test"

my_obj
[1] "test"

Manipulate objects

1 2 3 4 5

my_obj1 <- 48

my_obj1
[1] 48
my_obj2 <- exp(4) * sin(2)^3

my_obj2
[1] 41.04836
my_obj3 <- my_obj1 + my_obj2

my_obj3
[1] 89.04836
my_obj4  <- "test"

my_obj5 <- my_obj3 + my_obj4
Error in my_obj3 + my_obj4: non-numeric argument to binary operator

Google is your friend to decipher this error!

Objects may be modified

1 2 3 4 5

my_obj4 
[1] "test"
my_obj4 <- my_obj3

my_obj4
[1] 89.04836
my_obj5 <- my_obj3 + no_obj
Error in eval(expr, envir, enclos): object 'no_obj' not found
my_obj5
Error in eval(expr, envir, enclos): object 'my_obj5' not found

Pay attention to the errors!

Built-in Functions

1 2 3 4 5

Functions

1 2 3 4 5

my_vec <- c(2, 3, 1, 6, 4, 3, 3, 7)
my_vec
[1] 2 3 1 6 4 3 3 7
length(my_vec)
[1] 8
mean(my_vec)
[1] 3.625
sd(my_vec)
[1] 1.995531
my_vec[5]
[1] 4
  • c is built-in R function. short for concatenate
  • Likewise length, mean, sd, [ are other built-in functions in R.

Anatomy of Functions

1 2 3 4 5

  • Functions can be broken down into three components: arguments (formals), body, and environment.
formals(sd) # What arguments can you pass to the function `sd`
$x


$na.rm
[1] FALSE
body(sd)
sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x), 
    na.rm = na.rm))
environment(sd)
<environment: namespace:stats>

Anatomy of Functions

1 2 3 4 5

  • Functions are objects, just as vectors are objects. Be very careful!
sd(my_vec)
[1] 1.995531
sd <- function(x){x^2}

sd
function(x){x^2}
sd(my_vec)
[1]  4  9  1 36 16  9  9 49
stats::sd(my_vec)
[1] 1.995531
sd <- 8

sd
[1] 8
sd(my_vec)
[1] 1.995531

Getting Help

1 2 3 4 5

?mean

Ask for help!

1 2 3 4 5

Help Me Help You!

1 2 3 4 5 6

I am importing fairly large csv’s (2 - 3 million rows). When I import these using read_csv, it fails. Anyone know why?

This question is missing the key information required to reproduce and troubleshoot the problem:

  •   How is the datafile being imported delimited(csv(comma-delimited), other)?
  •   What operating system is involved(Linux, Windows)? What locale?
  •   What version of R running? What functions and libraries are being used?
  •   The post does not provide an example of the R code, with the data that led to the problem

Working with Vectors

1 2 3 4 5

Extract Elements from Vectors

1 2 3 4 5

  • [ is very useful extractor function.
  • Do not confuse with (.
  • Extract based on position
my_vec
[1] 2 3 1 6 4 3 3 7
my_vec[3]
[1] 1
my_vec[c(3,5,7)]
[1] 1 4 3

Extract Elements from Vectors

1 2 3 4 5

my_vec
[1] 2 3 1 6 4 3 3 7
my_vec > 4
[1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
my_vec[my_vec > 4]
[1] 6 7
# this is same as
my_vec[c(FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,TRUE)]
[1] 6 7
my_vec[my_vec >= 4]        # values greater or equal to 4
[1] 6 4 7
my_vec[my_vec == 4]        # values equal to 4
[1] 4
my_vec[my_vec != 4]         # values not equal to 4
[1] 2 3 1 6 3 3 7

Replace Element(s)

1 2 3 4 5

my_vec[4] <- 500
my_vec
[1]   2   3   1 500   4   3   3   7
my_vec[c(6, 7)] <- 100
my_vec
[1]   2   3   1 500   4 100 100   7
my_vec[my_vec <= 4] <- 1000
my_vec
[1] 1000 1000 1000  500 1000  100  100    7

Ordering Vectors

1 2 3 4 5

vec_sort <- sort(my_vec)
vec_sort
[1]    7  100  100  500 1000 1000 1000 1000
vec_sort2 <- sort(my_vec, decreasing = TRUE)
vec_sort2
[1] 1000 1000 1000 1000  500  100  100    7
height <- c(180, 155, 160, 167, 181)
p.names <- c("Joanna", "Charlotte", "Helen", "Karen", "Amy")
height_ord <- order(height)
height_ord
[1] 2 3 4 1 5
p.names[height_ord]
[1] "Charlotte" "Helen"     "Karen"     "Joanna"    "Amy"      

Vectorisation

1 2 3 4 5

my_vec * 5
[1] 5000 5000 5000 2500 5000  500  500   35
my_vec3 <- c(17, 15, 13, 19, 11, 0)
my_vec + my_vec3
[1] 1017 1015 1013  519 1011  100  117   22

Beware though of recycling

my_vec4 <- c(5,10)

my_vec + my_vec4
[1] 1005 1010 1005  510 1005  110  105   17

Beware of Missing Data

1 2 3 4 5

temp  <- c(7.2, NA, 7.1, 6.9, 6.5, 5.8, 5.8, 5.5, NA, 5.5)
temp
 [1] 7.2  NA 7.1 6.9 6.5 5.8 5.8 5.5  NA 5.5
mean(temp)
[1] NA
mean(temp, na.rm = TRUE)
[1] 6.2875

Everything Printed is not True

1 2 3 4 5

sd_temp <- sd(temp, na.rm = TRUE)
sd_temp
[1] 0.719995
sd_temp == 0.719995
[1] FALSE
options(digits = 10)
sd_temp
[1] 0.7199950397
options(digits = 20)
sd_temp
[1] 0.71999503966545297384

Exercise 2

Sneak Peak

1 2 3 4 5

Rectangular Data

1 2 3 4 5

Thank You!