Advanced Spatial Analysis in R

Nikhil Kaza

5/11/23

Recap from last week

  • Reconceptualising space
  • Spatial data types
  • Converting and casting
  • Visualising spatial data
  • Reading & writing external files

A reminder about the data

sm_tbl %>%
    head(5)
# A tibble: 5 × 10
  CardID tripid station_name_D station_name_O lon_D lon_O lat_D lat_O
   <dbl>  <dbl> <chr>          <chr>          <dbl> <dbl> <dbl> <dbl>
1   5690      1 宜山路         星中路          121.  121.  31.2  31.2
2   5690      2 星中路         宜山路          121.  121.  31.2  31.2
3   5951      1 南京东路       新村路          121.  121.  31.2  31.3
4   5951      2 新村路         南京东路        121.  121.  31.3  31.2
5   6032      1 锦绣路         广兰路          122.  122.  31.2  31.2
# ℹ 2 more variables: dat_time_D <dttm>, dat_time_O <dttm>
distr_sf %>%
    head(5)
Simple feature collection with 5 features and 7 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 120.8527 ymin: 30.94312 xmax: 121.5703 ymax: 31.51392
Geodetic CRS:  WGS 84
  Id     city district citycode adcode Shape_Leng  Shape_Area
1  0 上海城区   青浦区      021 310118  1.9667873 0.063065413
2  0 上海城区   杨浦区      021 310110  0.3234397 0.005743019
3  0 上海城区   闵行区      021 310112  1.5004972 0.035197893
4  0 上海城区   宝山区      021 310113  0.9750052 0.029909024
5  0 上海城区   嘉定区      021 310114  1.2052217 0.043836862
                        geometry
1 MULTIPOLYGON (((121.0643 31...
2 MULTIPOLYGON (((121.5121 31...
3 MULTIPOLYGON (((121.2635 31...
4 MULTIPOLYGON (((121.5023 31...
5 MULTIPOLYGON (((121.3549 31...

A reminder about the data

subway_stations <- 
    here("data", "raw", "subway_smartcard", "SUB_Station.csv") %>%
    read_csv() %>%
    rename(Station="站点",
           Route="所属线路",
           District="所属区划") %>%
    select("Station", "District")

sm_tbl_2 <- sm_tbl%>%
      left_join(subway_stations, 
                by = c("station_name_D" = "Station")
                ) %>%
      rename(Dist_dest = "District") %>%
            left_join(subway_stations, 
                by = c("station_name_O" = "Station")
                ) %>%
      rename(Dist_origin = "District")
CardID tripid station_name_D station_name_O lon_D lon_O lat_D lat_O dat_time_D dat_time_O Dist_dest Dist_origin
5690 1 宜山路 星中路 121.4226 121.3643 31.18858 31.16001 2015-04-01 12:12:06 2015-04-01 11:52:57 徐汇区 闵行区
5690 2 星中路 宜山路 121.3643 121.4226 31.16001 31.18858 2015-04-01 18:39:48 2015-04-01 18:24:04 闵行区 徐汇区
5951 1 南京东路 新村路 121.4801 121.4180 31.24006 31.26578 2015-04-01 14:55:19 2015-04-01 14:24:06 黄浦区 普陀区

Recap

library(tmap)
tmap_mode("view")

sm_tbl_2 %>%
    group_by(Dist_origin) %>%
    summarise(numtrips = n()) %>%
    inner_join(distr_sf, 
               by=c("Dist_origin" = "district")) %>%
    st_as_sf(crs = 4326) %>%
    tm_shape()+
    tm_fill(col = "numtrips", 
            palette = "-viridis", 
            type = "seq",
            style = "quantile")

Recap

Code
tmap_mode("plot")

sm_tbl_2 %>%
    mutate(hr = hour(dat_time_O),
           time_of_day = hr %>% 
               cut(breaks=c(0,6,10,16,20,24), include.lowest = TRUE, 
                   labels=c("Midnight - 6AM", 
                            "6AM - 10AM", 
                            "10AM - 4PM", 
                            "4PM - 8PM", 
                            "8PM - Midnight"))
    ) %>%
group_by(Dist_origin, time_of_day) %>%
    summarise(numtrips = n()) %>%
    inner_join(distr_sf, by=c("Dist_origin" = "district")) %>%
    st_as_sf(crs = 4326) %>%
    tm_shape()+
    tm_fill(col = "numtrips", 
            palette = "-viridis", 
            type = "seq",
            style = "quantile") +
    tm_facets(by = "time_of_day")

Today’s plan

  • Understanding spatial relationships
  • DE-9IM
  • Space as a graph
  • Other geometric operations
  • Further Work

filter again

sm_tbl %>%
    filter(station_name_O %in%  c("星中路", "南京东路"))
CardID tripid station_name_D station_name_O lon_D lon_O lat_D lat_O dat_time_D dat_time_O
5690 1 宜山路 星中路 121.4226 121.3643 31.18858 31.16001 2015-04-01 12:12:06 2015-04-01 11:52:57
5951 2 新村路 南京东路 121.4180 121.4801 31.26578 31.24006 2015-04-01 16:21:04 2015-04-01 15:50:06
93715 2 枫桥路 南京东路 121.4068 121.4801 31.24368 31.24006 2015-04-01 19:02:47 2015-04-01 18:35:18
113573 2 江湾体育场 南京东路 121.5089 121.4801 31.30631 31.24006 2015-04-01 15:32:09 2015-04-01 15:12:06
129540 2 娄山关路 南京东路 121.3994 121.4801 31.21300 31.24006 2015-04-01 13:48:41 2015-04-01 13:30:15

Breaking it down. What’s happening with %in%?

sm_tbl$station_name_O %in% c("星中路", "南京东路")
    [1]  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
   [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
   [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
   [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
...

Breaking it down. What’s happening with filter?

sm_tbl %>%
    filter(c(TRUE, FALSE, FALSE, TRUE, ...))
CardID tripid station_name_D station_name_O lon_D lon_O lat_D lat_O dat_time_D dat_time_O
5690 1 宜山路 星中路 121.4226 121.3643 31.18858 31.16001 2015-04-01 12:12:06 2015-04-01 11:52:57
5690 2 星中路 宜山路 121.3643 121.4226 31.16001 31.18858 2015-04-01 18:39:48 2015-04-01 18:24:04
5951 1 南京东路 新村路 121.4801 121.4180 31.24006 31.26578 2015-04-01 14:55:19 2015-04-01 14:24:06
5951 2 新村路 南京东路 121.4180 121.4801 31.26578 31.24006 2015-04-01 16:21:04 2015-04-01 15:50:06
6032 1 锦绣路 广兰路 121.5357 121.6169 31.18973 31.21325 2015-04-01 14:42:53 2015-04-01 14:17:33
CardID tripid station_name_D station_name_O lon_D lon_O lat_D lat_O dat_time_D dat_time_O
5690 1 宜山路 星中路 121.4226 121.3643 31.18858 31.16001 2015-04-01 12:12:06 2015-04-01 11:52:57
5951 2 新村路 南京东路 121.4180 121.4801 31.26578 31.24006 2015-04-01 16:21:04 2015-04-01 15:50:06
93715 2 枫桥路 南京东路 121.4068 121.4801 31.24368 31.24006 2015-04-01 19:02:47 2015-04-01 18:35:18
113573 2 江湾体育场 南京东路 121.5089 121.4801 31.30631 31.24006 2015-04-01 15:32:09 2015-04-01 15:12:06
129540 2 娄山关路 南京东路 121.3994 121.4801 31.21300 31.24006 2015-04-01 13:48:41 2015-04-01 13:30:15

Same thing with spatial relationships

library(sf)

scard_sf <- 
    sm_tbl %>%
    st_as_sf(coords = c("lon_D", "lat_D"), crs = 4326) 

center_pt <- 
    st_point(c(121.469170, 31.224361)) %>% 
    st_sfc(crs=4326)

scard_sf %>%
    filter(st_distance(., center_pt)[,1]< units::set_units(4, "km"))
CardID tripid station_name_D station_name_O lon_O lat_O dat_time_D dat_time_O geometry
5951 1 南京东路 新村路 121.4180 31.26578 2015-04-01 14:55:19 2015-04-01 14:24:06 POINT (121.4801 31.24006)
6138 1 鲁班路 虹桥火车站 121.3144 31.19599 2015-04-01 10:16:49 2015-04-01 09:28:39 POINT (121.4706 31.20117)
6138 3 鲁班路 云山路 121.5686 31.25259 2015-04-01 15:07:16 2015-04-01 14:31:28 POINT (121.4706 31.20117)

Let’s break it down!

units::set_units(4, "km")
4 [km]
scard_sf %>%
    head(5)%>%
    st_distance(.,center_pt)
Units: [m]
          [,1]
[1,]  5956.142
[2,] 12277.587
[3,]  2033.359
[4,]  6696.058
[5,]  7410.832
st_distance(scard_sf[1:5,],center_pt)[,1] < units::set_units(4, "km")
[1] FALSE FALSE  TRUE FALSE FALSE
scard_sf %>%
    filter(st_distance(., center_pt)[,1]< units::set_units(4, "km"))
CardID tripid station_name_D station_name_O lon_O lat_O dat_time_D dat_time_O geometry
5951 1 南京东路 新村路 121.4180 31.26578 2015-04-01 14:55:19 2015-04-01 14:24:06 POINT (121.4801 31.24006)
6138 1 鲁班路 虹桥火车站 121.3144 31.19599 2015-04-01 10:16:49 2015-04-01 09:28:39 POINT (121.4706 31.20117)
6138 3 鲁班路 云山路 121.5686 31.25259 2015-04-01 15:07:16 2015-04-01 14:31:28 POINT (121.4706 31.20117)

Other spatial relationships

distr_sf <-
    here("data", "raw", "subway_smartcard", "districts.shp") %>%
    st_read()

scard_sf %>%
    filter(
        st_within(., distr_sf %>% filter(district == "嘉定区"), sparse = F)[,1]
    )
CardID tripid station_name_D station_name_O lon_O lat_O dat_time_D dat_time_O geometry
58829 1 丰庄 行知路 121.4168 31.28676 2015-04-01 19:30:01 2015-04-01 18:52:43 POINT (121.3505 31.24443)
85334 1 金运路 陕西南路 121.4542 31.21705 2015-04-01 08:11:52 2015-04-01 07:28:33 POINT (121.3148 31.24293)
85713 1 嘉定新城 祁连山路 121.3714 31.27349 2015-04-01 07:55:26 2015-04-01 07:37:33 POINT (121.2499 31.33204)

Spatial Relationships

Source: Michael Mann, Steven Chao, Jordan Graesser, Nina Feldman, https://pygis.io/docs/e_spatial_joins.html

Spatial Relationships

Source: Michael Mann, Steven Chao, Jordan Graesser, Nina Feldman, https://pygis.io/docs/e_spatial_joins.html

Spatial Relationships

Source: Michael Mann, Steven Chao, Jordan Graesser, Nina Feldman, https://pygis.io/docs/e_spatial_joins.html

Spatial Relationships

Source: Michael Mann, Steven Chao, Jordan Graesser, Nina Feldman, https://pygis.io/docs/e_spatial_joins.html

Spatial Relationships

Source: Michael Mann, Steven Chao, Jordan Graesser, Nina Feldman, https://pygis.io/docs/e_spatial_joins.html

Spatial Relationships

Source: Michael Mann, Steven Chao, Jordan Graesser, Nina Feldman, https://pygis.io/docs/e_spatial_joins.html

Dimensionally Extended 9-Intersection Model (DE-9IM)

Source: Paul Ramsey & Mark Leslie, https://postgis.net/workshops/postgis-intro/de9im.html

Dimensionally Extended 9-Intersection Model (DE-9IM)

Source: Paul Ramsey & Mark Leslie, https://postgis.net/workshops/postgis-intro/de9im.html

Using DE-9IM for filters

scard_sf %>%
    filter(
        st_relate(., 
                  distr_sf %>% filter(district == "嘉定区"), 
                  pattern = "T*F**F***",
                  sparse = F)[,1]
    )

is same as

scard_sf %>%
    filter(
        st_within(., 
                  distr_sf %>% filter(district == "嘉定区"), 
                  sparse = F)[,1]
    )

but can do lot more!

Using DE-9IM for filters

Treatment: Inside & touching

st_relate(A, B, pattern = "T*F*TF***")

Control: Outside & touching

st_relate(A, B, pattern = "F*T*T****")

When within is not within

https://twitter.com/RhoBott/status/788810834747154432/photo/1

Rethink polygons (digression)

Invalid polygons (digression)

Source: Hugo Ledoux

Try and make them valid with st_make_valid

Sometimes simple is better

Source: Robin Lovelace, Jakub Nowosad, Jannes Muenchow

Different types of distances

Source: Maarten Grootendorst

Distances depend on crs

EPSG (www.epsg.io)

Transform the geometry

distr_sf_transform <-
    st_transform(distr_sf, 32651)

st_distance(distr_sf_transform)
geometry
MULTIPOLYGON (((121.0643 31...
MULTIPOLYGON (((121.5121 31...
MULTIPOLYGON (((121.2635 31...
MULTIPOLYGON (((121.5023 31...
MULTIPOLYGON (((121.3549 31...
Units: [m]
          [,1]     [,2]      [,3]      [,4]     [,5]     [,6]      [,7]
[1,]     0.000 20431.79     0.000  8874.997     0.00 12884.10     0.000
[2,] 20431.793     0.00 11727.622     0.000 11246.64     0.00 40360.928
[3,]     0.000 11727.62     0.000  6349.408     0.00     0.00  7025.979
[4,]  8874.997     0.00  6349.408     0.000     0.00     0.00 39762.252
[5,]     0.000 11246.64     0.000     0.000     0.00 10951.17 34038.524
[6,] 12884.102     0.00     0.000     0.000 10951.17     0.00 22431.655
[7,]     0.000 40360.93  7025.979 39762.252 34038.52 22431.65     0.000
geometry
MULTIPOLYGON (((315569.8 34...
MULTIPOLYGON (((358318.1 34...
MULTIPOLYGON (((334652.9 34...
MULTIPOLYGON (((357628 3475...
MULTIPOLYGON (((343384.7 34...
Units: [m]
          [,1]     [,2]      [,3]      [,4]     [,5]     [,6]      [,7]
[1,]     0.000 20445.97     0.000  8881.829     0.00 12907.37     0.000
[2,] 20445.966     0.00 11731.071     0.000 11261.07     0.00 40268.432
[3,]     0.000 11731.07     0.000  6330.611     0.00     0.00  7004.715
[4,]  8881.829     0.00  6330.611     0.000     0.00     0.00 39647.414
[5,]     0.000 11261.07     0.000     0.000     0.00 10970.45 33945.272
[6,] 12907.367     0.00     0.000     0.000 10970.45     0.00 22432.682
[7,]     0.000 40268.43  7004.715 39647.414 33945.27 22432.68     0.000

What is a neighborhood?

Often defined with a spatial relationship

  • Distance \(< D\)
  • Touches \(==TRUE\)
  • Intersects \(==TRUE\)

Or combinations such as

  • Distance \(<D\) & Touches \(==TRUE\)
  • Distance \(<D\) | Touches \(==TRUE\)

Neighbors

library(spdep)

nb_matrix <- poly2nb(distr_sf_transform) %>%
                nb2mat(zero.policy = T, style = "B")

Neighborhood adjacency matrix

Code
#create unique district ids
distr_sf_transform$id_val <- paste0("D", 1:nrow(distr_sf_transform))

sm_tbl_3 <- sm_tbl_2 %>%
group_by(Dist_origin) %>%
    summarise(numtrips = n()) %>%
    inner_join(distr_sf_transform, by=c("Dist_origin" = "district")) 
    
nb_matrix <- poly2nb(distr_sf_transform, queen = F) %>%
                nb2mat(zero.policy = T, style = "B")

row.names(nb_matrix) <- colnames(nb_matrix) <- distr_sf_transform$id_val

# rearrange the nb_matrix rows and columns to match the order of district ids in sm_tbl_3
nb_matrix <-
    nb_matrix[sm_tbl_3$id_val,sm_tbl_3$id_val]

sm_tbl_3 <- sm_tbl_3 %>%
    mutate(Neigh_numtrips = nb_matrix %*% numtrips)
Dist_origin numtrips Neigh_numtrips
嘉定区 113070 1353674
宝山区 268179 1934761
徐汇区 564778 2266361
普陀区 294341 872935

Boundary precision matters!

st_overlaps(distr_sf_transform, 
            distr_sf_transform, 
            sparse = F )

Boundary precision matters!

poly2nb(distr_sf_transform) %>%
    nb2mat(zero.policy = T, style = "B") %>%
    as.logical() %>%
    matrix(nrow = nrow(distr_sf_transform), byrow = T)

st_relate(distr_sf_transform, 
          distr_sf_transform,
          pattern = "F*T*T****", 
          sparse = F)

Boundary precision matters!

temp_poly <- st_buffer(distr_sf_transform,.01)
st_relate(temp_poly, 
          temp_poly, 
          pattern = "T*T***T**", 
          sparse = F)

Boundary precision matters!

Rethinking joins

  • Join on any column including
    • number
    • name/string
    • date/time
    • geometry
  • Joining is fundamentally filtering and matching together

Other geometric operations

Source: Robin Lovelace, Jakub Nowosad, Jannes Muenchow

Other geometric operations

Source: Robin Lovelace, Jakub Nowosad, Jannes Muenchow

Conclusions: Spatial is somewhat special

  • Learn to think about spatial objects without plotting
  • Topological relationships are not embedded in simple features. They have to be inferred
  • Pay attention to units, especially for distance/areas
  • Pay attention to coordinate systems

Further Work

  • Find the nearest subway station to the air quality monitoring station.
  • Compute the trips in the neighbourhood of each station by restricting the neighborhood both by space and the transit line. Pay attention that stations might belong to multiple lines.
  • Instead of districts, make a regular grid and assign the number of trip ends to each grid.
  • Practise different choropleth map options using tmap.

Thank you