5/4/23
sf
for storing and processing spatial objectstmap
for visualising spatial objectssf
Close to 4.4 million trips
2.4 million unique individuals (card ids)
285 subway station locations
CardID | tripid | station_name_D | station_name_O | lon_D | lon_O | lat_D | lat_O | dat_time_D | dat_time_O |
---|---|---|---|---|---|---|---|---|---|
5690 | 1 | 宜山路 | 星中路 | 121.4226 | 121.3643 | 31.18858 | 31.16001 | 2015-04-01 12:12:06 | 2015-04-01 11:52:57 |
5690 | 2 | 星中路 | 宜山路 | 121.3643 | 121.4226 | 31.16001 | 31.18858 | 2015-04-01 18:39:48 | 2015-04-01 18:24:04 |
5951 | 1 | 南京东路 | 新村路 | 121.4801 | 121.4180 | 31.24006 | 31.26578 | 2015-04-01 14:55:19 | 2015-04-01 14:24:06 |
5951 | 2 | 新村路 | 南京东路 | 121.4180 | 121.4801 | 31.26578 | 31.24006 | 2015-04-01 16:21:04 | 2015-04-01 15:50:06 |
6032 | 1 | 锦绣路 | 广兰路 | 121.5357 | 121.6169 | 31.18973 | 31.21325 | 2015-04-01 14:42:53 | 2015-04-01 14:17:33 |
6032 | 2 | 广兰路 | 锦绣路 | 121.6169 | 121.5357 | 31.21325 | 31.18973 | 2015-04-01 15:34:24 | 2015-04-01 15:09:45 |
CardID | tripid | station_name_D | station_name_O | lon_D | lon_O | lat_D | lat_O | dat_time_D | dat_time_O |
---|---|---|---|---|---|---|---|---|---|
5690 | 1 | 宜山路 | 星中路 | 121.4226 | 121.3643 | 31.18858 | 31.16001 | 2015-04-01 12:12:06 | 2015-04-01 11:52:57 |
5690 | 2 | 星中路 | 宜山路 | 121.3643 | 121.4226 | 31.16001 | 31.18858 | 2015-04-01 18:39:48 | 2015-04-01 18:24:04 |
5951 | 1 | 南京东路 | 新村路 | 121.4801 | 121.4180 | 31.24006 | 31.26578 | 2015-04-01 14:55:19 | 2015-04-01 14:24:06 |
5951 | 2 | 新村路 | 南京东路 | 121.4180 | 121.4801 | 31.26578 | 31.24006 | 2015-04-01 16:21:04 | 2015-04-01 15:50:06 |
6032 | 1 | 锦绣路 | 广兰路 | 121.5357 | 121.6169 | 31.18973 | 31.21325 | 2015-04-01 14:42:53 | 2015-04-01 14:17:33 |
6032 | 2 | 广兰路 | 锦绣路 | 121.6169 | 121.5357 | 31.21325 | 31.18973 | 2015-04-01 15:34:24 | 2015-04-01 15:09:45 |
library(ggplot2)
library(ggthemes)
numtrips <- sm_tbl %>%
filter(station_name_O != station_name_D) %>%
mutate(hr = hour(dat_time_O),
time_of_day = hr %>% cut(breaks=c(0,6,10,16,20,24), include.lowest = TRUE, labels=c("Midnight - 6AM", "6AM - 10AM", "10AM - 4PM", "4PM - 8PM", "8PM - Midnight"))
) %>%
group_by(station_name_O, station_name_D, time_of_day) %>%
summarise(lon_O = first(lon_O),
lat_O = first(lat_O),
lon_D = first(lon_D),
lat_D = first(lat_D),
totaltrips = n()
)
numtrips %>%
ggplot()+
geom_segment(aes(x=lon_O, y=lat_O,xend=lon_D, yend=lat_D, alpha=totaltrips))+
#Here is the magic bit that sets line transparency - essential to make the plot readable
scale_alpha_continuous(range = c(0.005, 0.01), guide='none')+
facet_wrap(time_of_day~.)+
#Set black background, ditch axes
scale_x_continuous("", breaks=NULL)+
scale_y_continuous("", breaks=NULL) +
theme_tufte()
numtrips <- sm_tbl %>%
filter(station_name_O != station_name_D) %>%
mutate(hr = hour(dat_time_O),
time_of_day = hr %>% cut(breaks=c(0,6,10,16,20,24), include.lowest = TRUE, labels=c("Midnight - 6AM", "6AM - 10AM", "10AM - 4PM", "4PM - 8PM", "8PM - Midnight"))
) %>%
group_by(station_name_D, time_of_day) %>%
summarise(lon_D = first(lon_D),
lat_D = first(lat_D),
totaltrips = n()
)
numtrips %>%
ggplot()+
geom_point(aes(x=lon_D, y=lat_D, size=totaltrips))+
scale_size_continuous(range = c(0.01, 2), guide='none')+
facet_wrap(time_of_day~.)+
#Set black background, ditch axes
scale_x_continuous("", breaks=NULL)+
scale_y_continuous("", breaks=NULL) +
theme_tufte()+
labs(main = "Popularity of destination stations on Apr 1, 2015")
library(ggmap)
bbox <- c(left = 120.9892, bottom = 30.7151, right = 122.0228, top = 31.4177)
g1 <- get_stamenmap(bbox, zoom = 10, maptype = "toner-hybrid", color = "bw")
ggmap(g1)+
geom_point(aes(x=lon_D, y=lat_D, size=totaltrips), data = numtrips)+
scale_size_continuous(range = c(0.01, 2), guide='none', trans = )+
facet_wrap(time_of_day~.)+
#Set black background, ditch axes
scale_x_continuous("", breaks=NULL)+
scale_y_continuous("", breaks=NULL) +
theme_tufte()
tmap
(but need to get serious)sf
sf
sf
GEOS
(projected) and s2geometry
(unprojected)sf
ecosystem
CardID | tripid | station_name_D | station_name_O | lon_O | lat_O | dat_time_D | dat_time_O | geometry |
---|---|---|---|---|---|---|---|---|
5690 | 1 | 宜山路 | 星中路 | 121.3643 | 31.16001 | 2015-04-01 12:12:06 | 2015-04-01 11:52:57 | POINT (121.4226 31.18858) |
5690 | 2 | 星中路 | 宜山路 | 121.4226 | 31.18858 | 2015-04-01 18:39:48 | 2015-04-01 18:24:04 | POINT (121.3643 31.16001) |
5951 | 1 | 南京东路 | 新村路 | 121.4180 | 31.26578 | 2015-04-01 14:55:19 | 2015-04-01 14:24:06 | POINT (121.4801 31.24006) |
5951 | 2 | 新村路 | 南京东路 | 121.4801 | 31.24006 | 2015-04-01 16:21:04 | 2015-04-01 15:50:06 | POINT (121.418 31.26578) |
6032 | 1 | 锦绣路 | 广兰路 | 121.6169 | 31.21325 | 2015-04-01 14:42:53 | 2015-04-01 14:17:33 | POINT (121.5357 31.18973) |
6032 | 2 | 广兰路 | 锦绣路 | 121.5357 | 31.18973 | 2015-04-01 15:34:24 | 2015-04-01 15:09:45 | POINT (121.6169 31.21325) |
station_name_O | geometry |
---|---|
虹桥火车站 | POINT (121.4706 31.20117) |
鲁班路 | POINT (121.5686 31.25259) |
云山路 | POINT (121.4706 31.20117) |
鲁班路 | POINT (121.5776 31.25927) |
CardID | tripid | geometry | orig_geom |
---|---|---|---|
5690 | 1 | POINT (121.4226 31.18858) | POINT (121.3643 31.16001) |
5690 | 2 | POINT (121.3643 31.16001) | POINT (121.4226 31.18858) |
5951 | 1 | POINT (121.4801 31.24006) | POINT (121.418 31.26578) |
5951 | 2 | POINT (121.418 31.26578) | POINT (121.4801 31.24006) |
6032 | 1 | POINT (121.5357 31.18973) | POINT (121.6169 31.21325) |
6032 | 2 | POINT (121.6169 31.21325) | POINT (121.5357 31.18973) |
CardID | orig_geom | buffer_geom | geometry |
---|---|---|---|
5690 | POINT (121.3643 31.16001) | POLYGON ((121.3643 31.16001... | POINT (121.4226 31.18858) |
5690 | POINT (121.4226 31.18858) | POLYGON ((121.4226 31.18858... | POINT (121.3643 31.16001) |
5951 | POINT (121.418 31.26578) | POLYGON ((121.418 31.26578,... | POINT (121.4801 31.24006) |
5951 | POINT (121.4801 31.24006) | POLYGON ((121.4801 31.24006... | POINT (121.418 31.26578) |
6032 | POINT (121.6169 31.21325) | POLYGON ((121.6169 31.21325... | POINT (121.5357 31.18973) |
6032 | POINT (121.5357 31.18973) | POLYGON ((121.5357 31.18973... | POINT (121.6169 31.21325) |
station_name_D | dat_time_D | geometry |
---|---|---|
宜山路 | 2015-04-01 12:12:06 | POINT (121.4226 31.18858) |
星中路 | 2015-04-01 18:39:48 | POINT (121.3643 31.16001) |
南京东路 | 2015-04-01 14:55:19 | POINT (121.4801 31.24006) |
新村路 | 2015-04-01 16:21:04 | POINT (121.418 31.26578) |
CardID | tripid | hr | time_of_day | geometry |
---|---|---|---|---|
5690 | 1 | 11 | 10AM - 4PM | POINT (121.4226 31.18858) |
5690 | 2 | 18 | 4PM - 8PM | POINT (121.3643 31.16001) |
5951 | 1 | 14 | 10AM - 4PM | POINT (121.4801 31.24006) |
5951 | 2 | 15 | 10AM - 4PM | POINT (121.418 31.26578) |
6032 | 1 | 14 | 10AM - 4PM | POINT (121.5357 31.18973) |
6032 | 2 | 15 | 10AM - 4PM | POINT (121.6169 31.21325) |
scard_sf %>%
filter(station_name_O != station_name_D) %>%
mutate(hr = hour(dat_time_O),
time_of_day = hr %>% cut(breaks=c(0,6,10,16,20,24), include.lowest = TRUE, labels=c("Midnight - 6AM", "6AM - 10AM", "10AM - 4PM", "4PM - 8PM", "8PM - Midnight"))
) %>%
group_by(time_of_day) %>%
summarise(totaltrips = n())
totaltrips | time_of_day | geometry |
---|---|---|
3 | Midnight - 6AM | MULTIPOINT ((121.4111 31.21... |
10 | 6AM - 10AM | MULTIPOINT ((121.3331 31.15... |
18 | 10AM - 4PM | MULTIPOINT ((121.3807 31.11... |
18 | 4PM - 8PM | MULTIPOINT ((121.7573 31.05... |
CardID | tripid | station_name_D | station_name_O | lon_O | lat_O | dat_time_D | dat_time_O | orig_geom |
---|---|---|---|---|---|---|---|---|
5690 | 1 | 宜山路 | 星中路 | 121.3643 | 31.16001 | 2015-04-01 12:12:06 | 2015-04-01 11:52:57 | POINT (121.3643 31.16001) |
5690 | 2 | 星中路 | 宜山路 | 121.4226 | 31.18858 | 2015-04-01 18:39:48 | 2015-04-01 18:24:04 | POINT (121.4226 31.18858) |
5951 | 1 | 南京东路 | 新村路 | 121.4180 | 31.26578 | 2015-04-01 14:55:19 | 2015-04-01 14:24:06 | POINT (121.418 31.26578) |
5951 | 2 | 新村路 | 南京东路 | 121.4801 | 31.24006 | 2015-04-01 16:21:04 | 2015-04-01 15:50:06 | POINT (121.4801 31.24006) |
scard_sf %>%
filter(station_name_O != station_name_D) %>%
mutate(hr = hour(dat_time_O),
time_of_day = hr %>% cut(breaks=c(0,6,10,16,20,24), include.lowest = TRUE, labels=c("Midnight - 6AM", "6AM - 10AM", "10AM - 4PM", "4PM - 8PM", "8PM - Midnight"))
) %>%
st_drop_geometry() %>%
group_by(time_of_day) %>%
summarise(totaltrips = n())
time_of_day | totaltrips |
---|---|
Midnight - 6AM | 193219 |
6AM - 10AM | 1576447 |
10AM - 4PM | 1044716 |
4PM - 8PM | 1345731 |
8PM - Midnight | 188404 |
tmap
tmap
againtmap_mode("plot")
numtrips <- sm_tbl %>%
filter(station_name_O != station_name_D) %>%
mutate(hr = hour(dat_time_O),
time_of_day = hr %>% cut(breaks=c(0,6,10,16,20,24), include.lowest = TRUE, labels=c("Midnight - 6AM", "6AM - 10AM", "10AM - 4PM", "4PM - 8PM", "8PM - Midnight"))
) %>%
group_by(station_name_D, time_of_day) %>%
summarise(lon_D = first(lon_D),
lat_D = first(lat_D),
totaltrips = n()
) %>%
st_as_sf(coords = c("lon_D", "lat_D"), crs=4326)
numtrips %>%
tm_shape()+
tm_bubbles(size = "totaltrips",
col="red",
border.col = "red",
title.size = "# of trips",
legend.size.is.portrait = TRUE
)+
tm_facets(by = "time_of_day") +
tm_layout(
legend.outside.size = 0.2,
main.title= 'Popularity of trip destination',
main.title.position = c("center", 'top'))
tmap
can create interactive mapstmap_mode('view')
opts <- tmap_options(basemaps = c(Canvas = c("Esri.WorldGrayCanvas","Stamen.Toner"),
Imagery = "Esri.WorldImagery"),
overlays = c(Labels = paste0("http://services.arcgisonline.com/arcgis/rest/services/Canvas/",
"World_Light_Gray_Reference/MapServer/tile/{z}/{y}/{x}")))
numtrips %>%
filter(time_of_day == "4PM - 8PM") %>%
tm_shape(name = "Stations")+
tm_bubbles(size = "totaltrips",
col="red",
border.col = "red",
alpha = .5
)
sf
CardID | station_name_O | station_name_D | geometry |
---|---|---|---|
5690 | 星中路 | 宜山路 | LINESTRING (121.3643 31.160... |
5690 | 宜山路 | 星中路 | LINESTRING (121.4226 31.188... |
5951 | 新村路 | 南京东路 | LINESTRING (121.418 31.2657... |
5951 | 南京东路 | 新村路 | LINESTRING (121.4801 31.240... |
6032 | 广兰路 | 锦绣路 | LINESTRING (121.6169 31.213... |
6032 | 锦绣路 | 广兰路 | LINESTRING (121.5357 31.189... |
$`1`
# A tibble: 1 × 4
lon_O lat_O lon_D lat_D
<dbl> <dbl> <dbl> <dbl>
1 121. 31.2 121. 31.2
$`2`
# A tibble: 1 × 4
lon_O lat_O lon_D lat_D
<dbl> <dbl> <dbl> <dbl>
1 121. 31.2 121. 31.2
$`3`
# A tibble: 1 × 4
lon_O lat_O lon_D lat_D
<dbl> <dbl> <dbl> <dbl>
1 121. 31.3 121. 31.2
$`4`
# A tibble: 1 × 4
lon_O lat_O lon_D lat_D
<dbl> <dbl> <dbl> <dbl>
1 121. 31.2 121. 31.3
$`5`
# A tibble: 1 × 4
lon_O lat_O lon_D lat_D
<dbl> <dbl> <dbl> <dbl>
1 122. 31.2 122. 31.2
$`6`
# A tibble: 1 × 4
lon_O lat_O lon_D lat_D
<dbl> <dbl> <dbl> <dbl>
1 122. 31.2 122. 31.2
$`1`
LINESTRING (121.3643 31.16001, 121.4226 31.18858)
$`2`
LINESTRING (121.4226 31.18858, 121.3643 31.16001)
$`3`
LINESTRING (121.418 31.26578, 121.4801 31.24006)
$`4`
LINESTRING (121.4801 31.24006, 121.418 31.26578)
$`5`
LINESTRING (121.6169 31.21325, 121.5357 31.18973)
$`6`
LINESTRING (121.5357 31.18973, 121.6169 31.21325)
[,1] [,2]
[1,] 121.3643 31.16001
[2,] 121.4226 31.18858
LINESTRING (121.3643 31.16001, 121.4226 31.18858)
Geometry set for 6 features
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: 121.3643 ymin: 31.16001 xmax: 121.6169 ymax: 31.26578
Geodetic CRS: WGS 84
First 5 geometries:
LINESTRING (121.3643 31.16001, 121.4226 31.18858)
LINESTRING (121.4226 31.18858, 121.3643 31.16001)
LINESTRING (121.418 31.26578, 121.4801 31.24006)
LINESTRING (121.4801 31.24006, 121.418 31.26578)
LINESTRING (121.6169 31.21325, 121.5357 31.18973)
CardID | station_name_O | station_name_D | geometry |
---|---|---|---|
5690 | 星中路 | 宜山路 | LINESTRING (121.3643 31.160... |
5690 | 宜山路 | 星中路 | LINESTRING (121.4226 31.188... |
5951 | 新村路 | 南京东路 | LINESTRING (121.418 31.2657... |
5951 | 南京东路 | 新村路 | LINESTRING (121.4801 31.240... |
6032 | 广兰路 | 锦绣路 | LINESTRING (121.6169 31.213... |
6032 | 锦绣路 | 广兰路 | LINESTRING (121.5357 31.189... |
Reading layer `districts' from data source
`/Users/kaza/Library/CloudStorage/Dropbox/shortcourses/data/raw/subway_smartcard/districts.shp'
using driver `ESRI Shapefile'
Simple feature collection with 16 features and 7 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 120.8527 ymin: 30.67792 xmax: 122.2425 ymax: 31.87469
Geodetic CRS: WGS 84
Id | city | district | citycode | adcode | Shape_Leng | Shape_Area | geometry |
---|---|---|---|---|---|---|---|
0 | 上海城区 | 青浦区 | 021 | 310118 | 1.9667873 | 0.0630654 | MULTIPOLYGON (((121.0643 31... |
0 | 上海城区 | 杨浦区 | 021 | 310110 | 0.3234397 | 0.0057430 | MULTIPOLYGON (((121.5121 31... |
0 | 上海城区 | 闵行区 | 021 | 310112 | 1.5004972 | 0.0351979 | MULTIPOLYGON (((121.2635 31... |
0 | 上海城区 | 宝山区 | 021 | 310113 | 0.9750052 | 0.0299090 | MULTIPOLYGON (((121.5023 31... |
….. and many more
tmap
againTell a story about
In the raw dataset things are a little/lot more messy
Each row corresponds to a swipe of a card and some swipes are not properly registered. Clean the data to eliminate the trips that do not have a ‘Origin’ swipe.
Instead of trips, we ought to think of tours; collection of trips that are chained together. What percentage of the trips are taken up by tours in this day?
Is there a temporal pattern to these tours?
Is there a temporal pattern to breaks in how the chains are constructed?
Tell a story about these tours based on the origins (in space and in time)