Urban Analytics in R
Course Description & Objectives
Last updated: 2025-08-19
This course is about different techniques used in assembling, managing, analysing and predicting using heterogeneous data sets in urban environments. These datasets are inherently messy and incomplete. Types of data include, point, polygon, raster, vector, text, image and network data; data sets with high cadence and high spatial resolution.
This is a survey course for different techniques and approaches in dealing with these data in R. The objective of these analytical techniques is to inform both short term operational decisions and long term planning in cities. As such, the emphasis is on practical urban data analytics rather than in-depth discussion about the suitability and appropriateness of techniques and their associated theoretical assumptions.
Unlike other courses of similar vein, I put inordinate emphasis on data visualisation and communication. The point of data analysis is to tell a compelling story, not to use latest analytical techniques.
This is a companion course to PLAN 562: The Ethics and Politics of New Urban Analytics (Seminar), which deals with problems, opportunities and hidden agendas with data generation, analysis and visualisation in urban settings. Students are encouraged to take them both.
Course Details
- Instructor: Nikhil Kaza
- Classroom: New East 101
- Hours: T 17:00 - 19:30
- Office Hours: https://go.unc.edu/kaza
- Course Materials: https://nkaza.github.io/teaching/techniques-course/
- HW & Lab submissions: Canvas
- Troubleshooting & Collaboration: https://stackoverflowteams.com/c/plan672/questions
Prerequisites & Preparation
The course will move quickly, cover a large number of analytical techniques, data sets, use cases and disciplinary domains. It requires significant investment on the part of the students to learn the technical skills as well as to learn about substantive urban and regional analyses.
Much of the work in this course will be done using Open Source Software that is usually free.
Over the summer prior to the course, you are expected to review the materials in preparation for the course.
The course assumes a working knowledge of R. R is a programming language and a free software environment for statistical computing and graphics. There are a number of online resources that will help you with getting up to speed with R. You will have to use extensively the documentation, help and examples that R environment provides; i.e. Do not be afraid to use, for example,
?qplot
??randomForest
to seek help for specific commands.
One disadvantage with R is that it stores all its objects in memory. This means that your computer should have significant RAM to deal with large data sets.
Another disadvantage with R is that it has a shallow learning curve. And it has some quirks. In particular, please pay attention to R-Inferno. However, persistence will have long term benefits.
You should have an aptitude for debugging computer code, thinking through edge cases in data sets, identifying and dealing with missing data and messy data sets.
You should expect that the instructions and help provided may not work on your system due to different configurations, mismatched data types and differences in libraries. You should have an aptitude to troubleshoot the problems and figure out workarounds.
It may be helpful to go through the materials from STOR 320: Introduction to Data Science
External IT Accounts
While most of the work in the class will be done using R using publicly available data sets, you will need to set up accounts with the following services. Some of them might require billing enabled and most of them would require 2FA. You are responsible for monitoring them and ensuring that the charges are within your budget.
- StackOverflow for Teams. This is a private StackOverflow team for the class. You can use this to ask questions and troubleshoot issues with your peers. You should have received an invite to join the team. If you haven’t, please let me know.
- Github Education. Free access to Copilot in Rstudio and StackOverflow.
- RStudio Cloud. Free access to RStudio in the cloud. You can use this to share your code and troubleshoot issues with your peers.
- Claude. Free access to Claude 3.5 for R code generation and troubleshooting.
- US Census API. Free access to US Census data.
Others maybe required for specific portions of the course.
Textbooks & Readings
The following books are used implicitly in the class. You are not required to buy any of them, but they are very useful to have on your bookshelf.
Brewer, Cynthia A. (2015). Designing Better Maps: A Guide for GIS Users. 2 edition. Redlands, California: Esri Press. ISBN: 978-1-58948-440-5.
Few, Stephen (2015). Signal: Understanding What Matters in a World of Noise. Burlingame, California: Analytics Press. ISBN: 978-1-938377-05-1.
Tufte, E. R (2001). The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press.
Wickham, Hadley (2016). Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN: 978-3-319-24277-4. URL: https://ggplot2.tidyverse.org.
All the above books are about principles of information display and design rather than about data analysis techniques. Information visualisation is very important and much more so than analytical techniques though enough attention is not devoted to them. While we may not be using these textbooks explicitly in weekly readings, you are expected to critically engage with the materials and thoughtfully follow the principles laid out in the books throughout the course.
For general purpose statistics, I have always enjoyed Tim Harford’s podcast called More or Less. He has a recent book out that succinctly details the attitudes you want to take towards data analysis and telling stories with data. I highly recommend his new book.
Harford, Tim (2021). The Data Detective: Ten Easy Rules to Make Sense of Statistics. New York: Riverhead Books. ISBN: 978-0-593-08459-5.
The following books will get you started on some analytical techniques and can serve as a reference.
Bivand, Roger S., Edzer Pebesma, and Virgilio Gómez-Rubio (2013). Applied Spatial Data Analysis with R. 2nd ed. 2013 edition. New York Heidelberg Dordrecht London: Springer. ISBN: 978-1-4614-7617-7.
Grolemund, Garrett and Hadley Wickham (2017). R for Data Science. first. Sebastapol, CA: O’ Reilly. URL: http://r4ds.had.co.nz/ (visited on May. 25, 2018).
The following book is excellent for covering the latest techniques for Geospatial data in R
Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow (2019). Geocomputation with R. 1 edition. Boca Raton: Chapman and Hall/CRC. ISBN: 978-1-138-30451-2. URL: https://geocompr.robinlovelace.net/ (visited on Dec. 01, 2019).
Course Policies
The following set of course policies is not meant as an exhaustive list. If in doubt, ask for permission and clarification.
Logistics
This course relies on learning from one another. Your learning depends on how you help others troubleshoot their code and problems they are having. We will use cooperative learning and over-the-shoulder learning techniques. This means you will have cooperatively work together to learn from one another synchronously and asynchronously.
StackOverflow will be used for troubleshooting. You should have received an invite to join PLAN 672 group on StackOverflow. You can sign in using your Github login. You can also access this from the Canvas website You are required to ask questions using a Minimum Reproducible Example (MRE). To create a MRE, you can also use R studio Cloud. please create an account there (see below).
Your health and well-being is of paramount importance. You may also be primary care givers and might have substantial demands on your time. You may not be able meet the requirements of the course, for any number of other reasons. You may have differential aptitude and learning styles. Reach out to me early and often, if you need any help. I will deal with these on ad-hoc basis.
I don’t need any advance notification for intermittent absences. You should make appropriate judgements based on your health and your peers. However, you are responsible for keeping up with the material. Because the materials are posted on-line and in advance, you should be able to work through the code. If you have issues, please use StackOverflow, Office Hours and other resources available to you.
Canvas will continue to be used for HW, lab and assignment submissions.
You should expect that the datasets may not be available because of server outages or missing links. We will cross those bridges when we get to them.
Deadlines & Extension Requests
Completed lab session materials are due by the end of lab (8 PM) in Canvas. You only need to submit one lab work for each topic.
Homework assigned for the week is due on the deadline specified in Canvas
If there is a reason to extend the deadline for the entire class, please discuss with me at least a week ahead and make a cogent case.
All labs and homework needs to be submitted as two files 1) a R markdown file (*.Rmd) and 2) html output (*.html) of the Markdown file.
Readings/Resources
The weekly readings are provided as resources and references. You are not required to read all or any the materials in detail. But the readings are useful to learn the material in depth and troubleshoot some issues. In some cases, the software and techniques in the Resources may be dated. Please use the web to adapt and update them.
Tutorials
Often labs are accompanied by tutorials. The tutorials are usually self-contained and self-explanatory. In R, there are multiple ways to achieve the results, each with their own advantages and disadvantages. The tutorials may include different ways of data munging and analysis to expose you to different techniques. It is not implied that one is better than the others, though we all have our own preferences. If in doubt, rely on benchmarking.
Equipment
We will conduct the class in the New East lab and you are expected to use the computers in the lab. Occasionally we will use other campus resources such as virtuallab and research computing.
Drive space that yours is accessible at \\storage.unc.edu\\cas_a\\City_Course\\PLAN672
.
This space is accessible whether you are on campus or off campus (using VPN).
Each of you have a folder where you can store your work. Please do not store any files locally on the computer.
File paths are very important and created more headaches than is necessary in the past. Please pay attention to these! Please ensure that you are using this directory to store your data and files. Do not use Microsoft OneDrive or Dropbox. Multiple exasperated sighs will be used to express my displeasure, if you do not follow this instruction.
If you are using your own computer, you should have a computer with at least 16 GB of RAM and a 64-bit operating system. You will also need to install R and RStudio. RStudio is an integrated development environment (IDE) for R that makes it easier to write and run R code.
Accessing campus resources requires a VPN and MFA. You can find instructions on how to install and use VPN here. MFA instructions are here.
You will be collecting data using data loggers and sensors. I will provide them and you need to return them, when you are done with the assignment. You are responsible for the equipment and any damage to it. I will withhold grades for the assignments until the equipment is returned.
Grading
While all assignments are posted on this website, they are to be submitted exclusively on Canvas and on time. Please refrain from emailing your submissions to the instructor.
I am going to use a ‘Specification Grading’ in this course. The deliverables are as follows:
Ongoing assignments
Lab reports to be submitted at the end of the class day for the Topic (due 8 PM on class days). (Individual/Collaborative)
(Mostly) Bi monthly homework (HW) programming assignments (due 5 pm on specific due dates) (Individual/Collaborative)
Ad-hoc major assignments
Weekly data visualisation critique (Individual)
Data collection and story telling assignment (Group)
Final term project (Individual)
The assignments will be graded on an Satisfactory/Unsatisfactory scale. Satisfactory grade is equivalent to a B+ letter grade. The focus of these assignments is on learning outcomes such as mastery of the material, making innovative connections in the material and on-time submission.
Group assignments will get a single grade for the group.
You will need to achieve Satisfactory grades on at least 70% of the on-going assignments and 2 of the 3 major assignments to achieve a low passing grade (L/C).Fewer than 50 % Satisfactory grades in the ongoing assignments will automatically result in a failing grade regardless of performance in other assignments. In addition to 80% of the on-going assignments, Satisfactory grades should be achieved in all three major assignments for a P/B+. Exceptional performance in the final term project, in addition to Satisfactory grade in 90% of other requirements, will result in H/A grade.
In addition, discretionary points will be awarded for enhancing collective learning. These include, but are not limited to:
- StackOverflow questions and answers
- Group participation and management
- Over the shoulder learning in class
This grading scale will be adjusted if the deliverables change depending course progress. Equivalent grades for undergraduates are assigned accordingly.
Weekly data visualisation critique.
Every week, one of you will lead a discussion about the critique of data visualisation that is found in the wild. Signup sheet is here.
The purpose is to learn from others’ successes and failures in data visualisation. You will need to find an example of data visualisation in the wild (e.g. newspaper, magazine, website, social media). We will spare 20 minutes of in-class time for this. There is no need to submit a written critique, but the deliverable is leading an effective discussion in class. You will be graded on your ability to lead the discussion through pointed questions and your ability to engage the class about the data visualisation and drawing out lessons.
Attendance and Participation
If you don’t attend classes, but submit the requirements on time, there is no penalty. Continuous absences that affect the progress in the course should be discussed with the instructor to figure out remedial action.
Canvas messaging system should be the preferred way to communicate with the instructor. Before you email either of us about homework or lab sessions, you should use resources on the web and on Canvas. Google, StackOverflow are your friends.
Asynchronous Communication & Troubleshooting
We will use StackOverflow for asynchronous communication and troubleshooting. We can follow guidelines like these that allow you to get to answers quickly:
We could also use RStudio Cloud for troubleshooting in this course. Think of RStudio Cloud as an instance of RStudio in the cloud where you can share not only your script but also the whole environment. This increases the likelihood that others can replicate your results or troubles. Instructions are located here.
Academic Conduct
I firmly believe in learning from your peers and from others. All homework and lab submissions could benefit from collaborations, however, the submissions are individual. This means that interpreting the data and the results, producing the visualisations, drawing appropriate conclusions from the data, is necessarily individual even when the strategies can be discussed and developed with others. All help including fragments of borrowed or AI generated code, however, should be explicitly acknowledged. Penalties are imposed for non-attribution. In particular, please pay attention to the copyright restrictions and attribution requirements associated with the R-code that you might find elsewhere.
Additional Help
Please set up a time on my calendar to discuss any additional help you may require.
Odum Institute has walk-in consultations and some of them have expertise in R.
Phil McDaniel and Amanda Henley are excellent resources for tracking down geospatial datasets and troubleshooting issues with them.
There are organisations that are devoted to ensuring diversity in the R community. See for example, R-ladies meetup groups and Slack channels. Local groups may or may not be active.
Schedule (Tentative)
Introductory materials
Aug 19 (Tue) Introduction. Telling Stories with data
Tutorials/Slides
HW1 is posted.
Tell a story about air quality data.
Aug 26 (Tue) Exploratory Data Analysis & Visualisation
Tutorials/Slides
Sep 2 (Tue) TinkerCAD
Anna Engelke will conduct a TinkerCAD tutorial on how to use the software to design and print 3D models. This is a hands-on session and you will need an account on TinkerCAD. You can sign up for free at TinkerCAD.
- You will need to have completed BEAM 101 training (both online and in person) by this date.
- Ideally, you will need to have finished the 3D printer quiz on BEAM Maker Space Trainings Canvas page.
Resources/Readings
Assignment 1 posted.
Sep 9 (Tue) Data Collection with Drones
Susan Cohen at Carolina Drone lab at UNC will provide an introductory session on how to use drones for data collection. This will be conducted off-site. Details will be provided in class.
Tutorials/Slides
Resources/Readings
Sep 16 (Tue) Analysing Raster Datasets
Tutorials/Slides
Homework/Deliverables
HW 2 posted
Sep 23 (Tue) Advanced Image Processing
Tutorials/Slides
- [Image Processing in R]
- [Object Detection in Street Scenes]
Sep 30 (Tue) Maps & Flows
Tutorials/Slides
Oct 14 (Tue) Group Assignment Presentations
Oct 21 (Tue) Use of Geospatial Databases
Tutorials/Slides
- [Using PostGRES with R]
Oct 28 (Tue) Scraping Web for (Un)Structured Data
Tutorials/Slides
Nov 4 (Tue) Analysing Text Data
Nov 11 (Tue) Networks
Tutorials/Slides
Resources/Readings
Resources/Readings
Nov 18 (Tue) Supervised Classification with Trees and Forests
Tutorials/Slides
Nov 25 (Tue) Clustering & Unsupervised Classification
Tutorials/Slides
Resources/Readings
- (Bates 2006)
- (Clapp and Wang 2006)
- Chapters 7 & 9 (Bivand, Pebesma, and Gómez-Rubio 2013)
Nov 25 (Tue) Neural Networks & Deep Learning
Tutorials/Slides
- [Deep learning for Image Classification]