4.3 Wrangling Data
dplyr
is the workhorse package for exploratory data analysis. In
particular, select
, filter
and mutate
are useful.
select
is used to select on columns.filter
is used to select on rows.mutate
changes/adds columns
%>%
nola_str_tib select(c(`Permit Type`, `Residential Subtype`))
## # A tibble: 89 × 2
## `Permit Type` `Residential Subtype`
## <fct> <chr>
## 1 Short Term Rental Commercial Owner N/A
## 2 Short Term Rental Commercial Owner N/A
## 3 Short Term Rental Commercial Owner N/A
## 4 Short Term Rental Residential Owner Residential Partial Unit
## 5 Short Term Rental Residential Owner Residential Small Unit
## 6 Short Term Rental Residential Owner Residential Partial Unit
## 7 Commercial STR N/A
## 8 Short Term Rental Residential Owner Residential Small Unit
## 9 Short Term Rental Residential Owner Residential Partial Unit
## 10 Short Term Rental Residential Owner Residential Partial Unit
## # … with 79 more rows
%>%
nola_str_tib filter(`Permit Type` == "Short Term Rental Residential Owner")
## # A tibble: 53 × 12
## `Permit Number` Address `Permit Type` `Residential S…` `Current Status`
## <chr> <chr> <fct> <chr> <chr>
## 1 <NA> 3323 Rosalie… Short Term R… Residential Par… Pending
## 2 <NA> 1525 Melpome… Short Term R… Residential Sma… Pending
## 3 22-RSTR-15568 3112 Octavia… Short Term R… Residential Par… Issued
## 4 <NA> 1952 Treasur… Short Term R… Residential Sma… Pending
## 5 <NA> 4616 S Rober… Short Term R… Residential Par… Pending
## 6 <NA> 3043 St Phil… Short Term R… Residential Par… Pending
## 7 22-RSTR-15454 1731 Third S… Short Term R… Residential Lar… Issued
## 8 <NA> 1731 Third S… Short Term R… Residential Lar… Pending
## 9 <NA> 4632 Frankli… Short Term R… Residential Par… Pending
## 10 <NA> 621 Desire St Short Term R… Residential Par… Pending
## # … with 43 more rows, and 7 more variables: `Expiration Date` <chr>,
## # `Bedroom Limit` <dbl>, `Guest Occupancy Limit` <dbl>,
## # `Operator Name` <chr>, `License Holder Name` <chr>,
## # `Application Date` <date>, Issue_Date <date>
library(lubridate)
%>%
nola_str_tib mutate(Backlogged = if_else(
today() - `Application Date` >= 15) & is.na(Issue_Date),
(
T, F%>%
)) filter(Backlogged == T) %>%
select(`Address`, `Operator Name`, `License Holder Name`, `Application Date`)
## # A tibble: 42 × 4
## Address `Operator Name` `License Holde…` `Application D…`
## <chr> <chr> <chr> <date>
## 1 3149 Chartres St Bruce Michael Fe… Bruce Ferweda 2022-08-05
## 2 1952 Treasure St Zedrick Price Zedrick L Price 2022-08-05
## 3 4616 S Robertson St Paul Macres Paul Macres 2022-08-05
## 4 3043 St Philip St Rachel Wellons Rachel Wellons 2022-08-04
## 5 1727 Henriette Delille St Jane Chaisson David Trocquet … 2022-08-04
## 6 1731 Third St 1B Brenna White Nicolette Jones 2022-08-04
## 7 4859 Tchoupitoulas St Jane Chaisson David Trocquet 2022-08-04
## 8 3826 Canal St Jordan Jacobs 3828 Canal Stre… 2022-08-04
## 9 4632 Franklin Ave Cornelious Celes… Cornelious Cele… 2022-08-03
## 10 621 Desire St Angela Larson Angela Rosetta … 2022-08-02
## # … with 32 more rows
Few things to notice here.
There is no need to specify the tibble using
$
to access the column names.dplyr
assumes that you are working with the dataset that you specified earlier in the pipe. This makes the code cleaner.When there are no spaces in the column names you can omit the ticks.
You can chain a number of functions as in
mutate
,filter
andselect
to do complicated analyses in a simple way using pipe operator%>%
.We also used
if_else
to create a logical variable calledBacklogged
. Notice that mutate automatically creates and populates this new column. Also notice the date arithmetic as well as logical operator&
. Be careful to make sure that the new column is the same length as the rest of the dataset.Notice the use of
(today()...)
. They are used to improve readability of code but also to specify the order of operations.