4.1 Data Frames & Tibbles

We already saw how rectangular data is stored in data frames in the previous Chapter. In contrast, Tidyverse uses tibbles.

A tibble, or tbl_df, is a modern reimagining of the data.frame, keeping what time has proven to be effective, and throwing out what is not. Tibbles are data.frames that are lazy and surly: they do less (i.e. they don’t change variable names or types, and don’t do partial matching) and complain more (e.g. when a variable does not exist). This forces you to confront problems earlier, typically leading to cleaner, more expressive code.

data_tib <- tibble(
  `alphabet soup` = letters,
  `nums ints` = 1:26,
  `sample ints` = sample(100, 26)

data_df <- data.frame(
  `alphabet soup` = letters,
  `nums ints` = 1:26,
  `sample ints` = sample(100, 26)

## Rows: 26
## Columns: 3
## $ `alphabet soup` <chr> "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k",…
## $ `nums ints`     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…
## $ `sample ints`   <int> 46, 84, 69, 6, 42, 80, 25, 61, 56, 11, 16, 52, 99, 81,…
## Rows: 26
## Columns: 3
## $ alphabet.soup <chr> "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "…
## $ nums.ints     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
## $ sample.ints   <int> 19, 40, 87, 49, 13, 92, 38, 4, 59, 96, 32, 62, 43, 46, 3…

Notice how data.frame changes the names of the variables because it does not like spaces in the column names. One advantage of tibbles is that columns need not be valid R variable names as long as they are enclosed in ticks.

You can use base R functions to work with tibbles, because tibble is indeed a data frame. However, functions based on tibbles may not work with data frames.

data_tib[, 3]
## # A tibble: 26 × 1
##    `sample ints`
##            <int>
##  1            46
##  2            84
##  3            69
##  4             6
##  5            42
##  6            80
##  7            25
##  8            61
##  9            56
## 10            11
## # … with 16 more rows

data_tib[2:4, 1:3]
## # A tibble: 3 × 3
##   `alphabet soup` `nums ints` `sample ints`
##   <chr>                 <int>         <int>
## 1 b                         2            84
## 2 c                         3            69
## 3 d                         4             6