Urban Analytics in R

Course Description & Objectives

Last updated: 2025-08-19

This course is about different techniques used in assembling, managing, analysing and predicting using heterogeneous data sets in urban environments. These datasets are inherently messy and incomplete. Types of data include, point, polygon, raster, vector, text, image and network data; data sets with high cadence and high spatial resolution.

This is a survey course for different techniques and approaches in dealing with these data in R. The objective of these analytical techniques is to inform both short term operational decisions and long term planning in cities. As such, the emphasis is on practical urban data analytics rather than in-depth discussion about the suitability and appropriateness of techniques and their associated theoretical assumptions.

Unlike other courses of similar vein, I put inordinate emphasis on data visualisation and communication. The point of data analysis is to tell a compelling story, not to use latest analytical techniques.

This is a companion course to PLAN 562: The Ethics and Politics of New Urban Analytics (Seminar), which deals with problems, opportunities and hidden agendas with data generation, analysis and visualisation in urban settings. Students are encouraged to take them both.

Course Details

Prerequisites & Preparation

The course will move quickly, cover a large number of analytical techniques, data sets, use cases and disciplinary domains. It requires significant investment on the part of the students to learn the technical skills as well as to learn about substantive urban and regional analyses.

Much of the work in this course will be done using Open Source Software that is usually free.

Over the summer prior to the course, you are expected to review the materials in preparation for the course.

The course assumes a working knowledge of R. R is a programming language and a free software environment for statistical computing and graphics. There are a number of online resources that will help you with getting up to speed with R. You will have to use extensively the documentation, help and examples that R environment provides; i.e. Do not be afraid to use, for example,

?qplot
??randomForest

to seek help for specific commands.

One disadvantage with R is that it stores all its objects in memory. This means that your computer should have significant RAM to deal with large data sets.

Another disadvantage with R is that it has a shallow learning curve. And it has some quirks. In particular, please pay attention to R-Inferno. However, persistence will have long term benefits.

You should have an aptitude for debugging computer code, thinking through edge cases in data sets, identifying and dealing with missing data and messy data sets.

You should expect that the instructions and help provided may not work on your system due to different configurations, mismatched data types and differences in libraries. You should have an aptitude to troubleshoot the problems and figure out workarounds.

It may be helpful to go through the materials from STOR 320: Introduction to Data Science

External IT Accounts

While most of the work in the class will be done using R using publicly available data sets, you will need to set up accounts with the following services. Some of them might require billing enabled and most of them would require 2FA. You are responsible for monitoring them and ensuring that the charges are within your budget.

  • StackOverflow for Teams. This is a private StackOverflow team for the class. You can use this to ask questions and troubleshoot issues with your peers. You should have received an invite to join the team. If you haven’t, please let me know.
  • Github Education. Free access to Copilot in Rstudio and StackOverflow.
  • RStudio Cloud. Free access to RStudio in the cloud. You can use this to share your code and troubleshoot issues with your peers.
  • Claude. Free access to Claude 3.5 for R code generation and troubleshooting.
  • US Census API. Free access to US Census data.

Others maybe required for specific portions of the course.

Textbooks & Readings

The following books are used implicitly in the class. You are not required to buy any of them, but they are very useful to have on your bookshelf.

Brewer, Cynthia A. (2015). Designing Better Maps: A Guide for GIS Users. 2 edition. Redlands, California: Esri Press. ISBN: 978-1-58948-440-5.

Few, Stephen (2015). Signal: Understanding What Matters in a World of Noise. Burlingame, California: Analytics Press. ISBN: 978-1-938377-05-1.

Tufte, E. R (2001). The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press.

Wickham, Hadley (2016). Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN: 978-3-319-24277-4. URL: https://ggplot2.tidyverse.org.

All the above books are about principles of information display and design rather than about data analysis techniques. Information visualisation is very important and much more so than analytical techniques though enough attention is not devoted to them. While we may not be using these textbooks explicitly in weekly readings, you are expected to critically engage with the materials and thoughtfully follow the principles laid out in the books throughout the course.

For general purpose statistics, I have always enjoyed Tim Harford’s podcast called More or Less. He has a recent book out that succinctly details the attitudes you want to take towards data analysis and telling stories with data. I highly recommend his new book.

Harford, Tim (2021). The Data Detective: Ten Easy Rules to Make Sense of Statistics. New York: Riverhead Books. ISBN: 978-0-593-08459-5.

The following books will get you started on some analytical techniques and can serve as a reference.

Bivand, Roger S., Edzer Pebesma, and Virgilio Gómez-Rubio (2013). Applied Spatial Data Analysis with R. 2nd ed. 2013 edition. New York Heidelberg Dordrecht London: Springer. ISBN: 978-1-4614-7617-7.

Grolemund, Garrett and Hadley Wickham (2017). R for Data Science. first. Sebastapol, CA: O’ Reilly. URL: http://r4ds.had.co.nz/ (visited on May. 25, 2018).

The following book is excellent for covering the latest techniques for Geospatial data in R

Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow (2019). Geocomputation with R. 1 edition. Boca Raton: Chapman and Hall/CRC. ISBN: 978-1-138-30451-2. URL: https://geocompr.robinlovelace.net/ (visited on Dec. 01, 2019).

Course Policies

The following set of course policies is not meant as an exhaustive list. If in doubt, ask for permission and clarification.

Logistics

  • This course relies on learning from one another. Your learning depends on how you help others troubleshoot their code and problems they are having. We will use cooperative learning and over-the-shoulder learning techniques. This means you will have cooperatively work together to learn from one another synchronously and asynchronously.

  • StackOverflow will be used for troubleshooting. You should have received an invite to join PLAN 672 group on StackOverflow. You can sign in using your Github login. You can also access this from the Canvas website You are required to ask questions using a Minimum Reproducible Example (MRE). To create a MRE, you can also use R studio Cloud. please create an account there (see below).

  • Your health and well-being is of paramount importance. You may also be primary care givers and might have substantial demands on your time. You may not be able meet the requirements of the course, for any number of other reasons. You may have differential aptitude and learning styles. Reach out to me early and often, if you need any help. I will deal with these on ad-hoc basis.

  • I don’t need any advance notification for intermittent absences. You should make appropriate judgements based on your health and your peers. However, you are responsible for keeping up with the material. Because the materials are posted on-line and in advance, you should be able to work through the code. If you have issues, please use StackOverflow, Office Hours and other resources available to you.

  • Canvas will continue to be used for HW, lab and assignment submissions.

  • You should expect that the datasets may not be available because of server outages or missing links. We will cross those bridges when we get to them.

Deadlines & Extension Requests

Completed lab session materials are due by the end of lab (8 PM) in Canvas. You only need to submit one lab work for each topic.

Homework assigned for the week is due on the deadline specified in Canvas

If there is a reason to extend the deadline for the entire class, please discuss with me at least a week ahead and make a cogent case.

All labs and homework needs to be submitted as two files 1) a R markdown file (*.Rmd) and 2) html output (*.html) of the Markdown file.

Readings/Resources

The weekly readings are provided as resources and references. You are not required to read all or any the materials in detail. But the readings are useful to learn the material in depth and troubleshoot some issues. In some cases, the software and techniques in the Resources may be dated. Please use the web to adapt and update them.

Tutorials

Often labs are accompanied by tutorials. The tutorials are usually self-contained and self-explanatory. In R, there are multiple ways to achieve the results, each with their own advantages and disadvantages. The tutorials may include different ways of data munging and analysis to expose you to different techniques. It is not implied that one is better than the others, though we all have our own preferences. If in doubt, rely on benchmarking.

Equipment

We will conduct the class in the New East lab and you are expected to use the computers in the lab. Occasionally we will use other campus resources such as virtuallab and research computing.

Drive space that yours is accessible at \\storage.unc.edu\\cas_a\\City_Course\\PLAN672 .

This space is accessible whether you are on campus or off campus (using VPN).

Each of you have a folder where you can store your work. Please do not store any files locally on the computer.

File paths are very important and created more headaches than is necessary in the past. Please pay attention to these! Please ensure that you are using this directory to store your data and files. Do not use Microsoft OneDrive or Dropbox. Multiple exasperated sighs will be used to express my displeasure, if you do not follow this instruction.

If you are using your own computer, you should have a computer with at least 16 GB of RAM and a 64-bit operating system. You will also need to install R and RStudio. RStudio is an integrated development environment (IDE) for R that makes it easier to write and run R code.

Accessing campus resources requires a VPN and MFA. You can find instructions on how to install and use VPN here. MFA instructions are here.

You will be collecting data using data loggers and sensors. I will provide them and you need to return them, when you are done with the assignment. You are responsible for the equipment and any damage to it. I will withhold grades for the assignments until the equipment is returned.

Grading

While all assignments are posted on this website, they are to be submitted exclusively on Canvas and on time. Please refrain from emailing your submissions to the instructor.

I am going to use a ‘Specification Grading’ in this course. The deliverables are as follows:

Ongoing assignments

  • Lab reports to be submitted at the end of the class day for the Topic (due 8 PM on class days). (Individual/Collaborative)

  • (Mostly) Bi monthly homework (HW) programming assignments (due 5 pm on specific due dates) (Individual/Collaborative)

Ad-hoc major assignments

  • Weekly data visualisation critique (Individual)

  • Data collection and story telling assignment (Group)

  • Final term project (Individual)

The assignments will be graded on an Satisfactory/Unsatisfactory scale. Satisfactory grade is equivalent to a B+ letter grade. The focus of these assignments is on learning outcomes such as mastery of the material, making innovative connections in the material and on-time submission.

Group assignments will get a single grade for the group.

You will need to achieve Satisfactory grades on at least 70% of the on-going assignments and 2 of the 3 major assignments to achieve a low passing grade (L/C).Fewer than 50 % Satisfactory grades in the ongoing assignments will automatically result in a failing grade regardless of performance in other assignments. In addition to 80% of the on-going assignments, Satisfactory grades should be achieved in all three major assignments for a P/B+. Exceptional performance in the final term project, in addition to Satisfactory grade in 90% of other requirements, will result in H/A grade.

In addition, discretionary points will be awarded for enhancing collective learning. These include, but are not limited to:

  • StackOverflow questions and answers
  • Group participation and management
  • Over the shoulder learning in class

This grading scale will be adjusted if the deliverables change depending course progress. Equivalent grades for undergraduates are assigned accordingly.

Weekly data visualisation critique.

Every week, one of you will lead a discussion about the critique of data visualisation that is found in the wild. Signup sheet is here.

The purpose is to learn from others’ successes and failures in data visualisation. You will need to find an example of data visualisation in the wild (e.g. newspaper, magazine, website, social media). We will spare 20 minutes of in-class time for this. There is no need to submit a written critique, but the deliverable is leading an effective discussion in class. You will be graded on your ability to lead the discussion through pointed questions and your ability to engage the class about the data visualisation and drawing out lessons.

Attendance and Participation

If you don’t attend classes, but submit the requirements on time, there is no penalty. Continuous absences that affect the progress in the course should be discussed with the instructor to figure out remedial action.

E-mail

Canvas messaging system should be the preferred way to communicate with the instructor. Before you email either of us about homework or lab sessions, you should use resources on the web and on Canvas. Google, StackOverflow are your friends.

Asynchronous Communication & Troubleshooting

We will use StackOverflow for asynchronous communication and troubleshooting. We can follow guidelines like these that allow you to get to answers quickly:

We could also use RStudio Cloud for troubleshooting in this course. Think of RStudio Cloud as an instance of RStudio in the cloud where you can share not only your script but also the whole environment. This increases the likelihood that others can replicate your results or troubles. Instructions are located here.

Academic Conduct

I firmly believe in learning from your peers and from others. All homework and lab submissions could benefit from collaborations, however, the submissions are individual. This means that interpreting the data and the results, producing the visualisations, drawing appropriate conclusions from the data, is necessarily individual even when the strategies can be discussed and developed with others. All help including fragments of borrowed or AI generated code, however, should be explicitly acknowledged. Penalties are imposed for non-attribution. In particular, please pay attention to the copyright restrictions and attribution requirements associated with the R-code that you might find elsewhere.

Additional Help

Please set up a time on my calendar to discuss any additional help you may require.

Odum Institute has walk-in consultations and some of them have expertise in R.

Phil McDaniel and Amanda Henley are excellent resources for tracking down geospatial datasets and troubleshooting issues with them.

There are organisations that are devoted to ensuring diversity in the R community. See for example, R-ladies meetup groups and Slack channels. Local groups may or may not be active.

Schedule (Tentative)


Introductory materials


Aug 19 (Tue) Introduction. Telling Stories with data

Tutorials/Slides

HW1 is posted.

Tell a story about air quality data.


Aug 26 (Tue) Exploratory Data Analysis & Visualisation

Resources/Readings


Sep 2 (Tue) TinkerCAD

Anna Engelke will conduct a TinkerCAD tutorial on how to use the software to design and print 3D models. This is a hands-on session and you will need an account on TinkerCAD. You can sign up for free at TinkerCAD.

  • You will need to have completed BEAM 101 training (both online and in person) by this date.
  • Ideally, you will need to have finished the 3D printer quiz on BEAM Maker Space Trainings Canvas page.

Resources/Readings

Assignment 1 posted.

Sep 9 (Tue) Data Collection with Drones

Susan Cohen at Carolina Drone lab at UNC will provide an introductory session on how to use drones for data collection. This will be conducted off-site. Details will be provided in class.

Tutorials/Slides

Resources/Readings

Sep 16 (Tue) Analysing Raster Datasets

Tutorials/Slides

Homework/Deliverables

HW 2 posted

Sep 23 (Tue) Advanced Image Processing

Tutorials/Slides

  • [Image Processing in R]
  • [Object Detection in Street Scenes]

Sep 30 (Tue) Maps & Flows

Tutorials/Slides

Resources/Readings


Oct 14 (Tue) Group Assignment Presentations


Oct 21 (Tue) Use of Geospatial Databases

Tutorials/Slides

  • [Using PostGRES with R]

Resources/Readings


Oct 28 (Tue) Scraping Web for (Un)Structured Data

Nov 18 (Tue) Supervised Classification with Trees and Forests

Nov 25 (Tue) Clustering & Unsupervised Classification

Nov 25 (Tue) Neural Networks & Deep Learning

Tutorials/Slides

  • [Deep learning for Image Classification]

Dec 2 (Tue) Individual Work on Final Project


Final Project Presentations (Exam Day)


References

Bates, Lisa K. 2006. “Does Neighborhood Really Matter?: Comparing Historically Defined Neighborhood Boundaries with Housing Submarkets.” Journal of Planning Education and Research 26 (1): 5–17. https://doi.org/10.1177/0739456X05283254.
Bivand, Roger S., Edzer Pebesma, and Virgilio Gómez-Rubio. 2013. Applied Spatial Data Analysis with R. 2nd ed. 2013 edition. New York Heidelberg Dordrecht London: Springer.
Boeing, Geoff. 2019. “Urban Spatial Order: Street Network Orientation, Configuration, and Entropy.” Applied Network Science 4 (1): 1–19. https://doi.org/10.1007/s41109-019-0189-1.
Boeing, Geoff, and Paul Waddell. 2017. “New Insights into Rental Housing Markets Across the United States: Web Scraping and Analyzing Craigslist Rental Listings.” Journal of Planning Education and Research 37 (4): 457–76. https://doi.org/10.1177/0739456X16664789.
Chen, Yiqiao, Elisabete A. Silva, and José P. Reis. In press. “Measuring Policy Debate in a Regrowing City by Sentiment Analysis Using Online Media Data: A Case Study of Leipzig 2030.” Regional Science Policy & Practice, In press. https://doi.org/10.1111/rsp3.12292.
Chollet, Francois, and J. J. Allaire. 2018. Deep Learning with R. 1 edition. Shelter Island, NY: Manning Publications.
Clapp, John M., and Yazhen Wang. 2006. “Defining Neighborhood Boundaries: Are Census Tracts Obsolete?” Journal of Urban Economics 59 (2): 259–84. https://doi.org/http://dx.doi.org/10.1016/j.jue.2005.10.003.
Cline, Lydia Sloan. 2024. Make: The Complete Guide to Tinkercad: 17 Projects to Start Designing and Printing in the 3D World. Erscheinungsort nicht ermittelbar: Make Community, LLC.
Erdreich, Jason. 2024. Taking Tinkercad to the Next Level: Enhance Your Ability to Design, Model, and 3D Print with One of the Most Intuitive CAD Programs. Place of publication not identified: Packt Publishing.
Frazier, Amy, and Kunwar Singh, eds. 2021. Fundamentals of Capturing and Processing Drone Imagery and Data. Boca Raton: CRC Press.
Gebru, Timnit, Jonathan Krause, Yilun Wang, Duyun Chen, Jia Deng, Erez Lieberman Aiden, and Li Fei-Fei. 2017. “Using Deep Learning and Google Street View to Estimate the Demographic Makeup of Neighborhoods Across the United States.” Proceedings of the National Academy of Sciences 114 (50): 13108–13. https://doi.org/10.1073/pnas.1700035114.
Grolemund, Garrett, and Hadley Wickham. 2017. R for Data Science. First. Sebastapol, CA: O’ Reilly. http://r4ds.had.co.nz/.
Hijmans, Robert J. 2017. Raster: Geographic Data Analysis and Modeling. https://CRAN.R-project.org/package=raster.
Kaza, Nikhil, and Katherine Nesse. 2021. “Characterizing the Regional Structure in the United States: A County-based Analysis of Labor Market Centrality.” International Regional Science Review 44 (5): 560–81. https://doi.org/10.1177/0160017620946082.
Law, Stephen, Piage Brooks, and Chris Russell. 2019. “Take a Look Around: Using Street View and Satellite Images to Estimate House Prices.” ACM Transactions on Intelligent Systems and Technology (TIST) 10 (5). https://dl.acm.org/doi/abs/10.1145/3342240.
Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. 2019. Geocomputation with R. 1 edition. Boca Raton: Chapman and Hall/CRC. https://geocompr.robinlovelace.net/.
McCarty, J., and N. Kaza. 2015. “Urban Form and Air Quality in the United States.” Landscape and Urban Planning 139: 168–79. https://doi.org/10.1016/j.landurbplan.2015.03.008.
Munzert, Simon, Christian Rubba, Peter Meißner, and Dominic Nyhuis. 2014. Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining. 1 edition. Chichester, West Sussex, United Kingdom: Wiley.
Nelson, Garrett Dash, and Alasdair Rae. 2016. “An Economic Geography of the United States: From Commutes to Megaregions.” PLOS ONE 11 (11): e0166083. https://doi.org/10.1371/journal.pone.0166083.
Reades, Jonathan, Jordan De Souza, and Phil Hubbard. 2019. “Understanding Urban Gentrification Through Machine Learning.” Urban Studies 56 (5): 922–42. https://doi.org/10.1177/0042098018789054.
Rigaux, Philippe, Michel Scholl, and Agnès Voisard. 2002. Spatial Databases: With Application to GIS. San Francisco: Morgan Kaufmann.
Rincón, Daniela, Usman T. Khan, and Costas Armenakis. 2018. “Flood Risk Mapping Using GIS and Multi-Criteria Analysis: A Greater Toronto Area Case Study.” Geosciences 8 (8): 275. https://doi.org/10.3390/geosciences8080275.
Schweitzer, Lisa. 2014. “Planning and Social Media: A Case Study of Public Transit and Stigma on Twitter.” Journal of the American Planning Association 80 (3): 218–38. https://doi.org/10.1080/01944363.2014.980439.
Stevens, Forrest R., Andrea E. Gaughan, Catherine Linard, and Andrew J. Tatem. 2015. “Disaggregating Census Data for Population Mapping Using Random Forests with Remotely-Sensed and Ancillary Data.” PLOS ONE 10 (2): e0107042. https://doi.org/10.1371/journal.pone.0107042.
Tribby, Calvin P., Harvey J. Miller, Barbara B. Brown, Carol M. Werner, and Ken R. Smith. 2017. “Analyzing Walking Route Choice Through Built Environments Using Random Forests and Discrete Choice Techniques.” Environment & Planning B : Urban Analytics & City Science 44 (6): 1145–67. https://doi.org/10.1177/0265813516659286.
Tufte, E. R. 2001. The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press.
Watson, Joss J. W., and Malcolm D. Hudson. 2015. “Regional Scale Wind Farm and Solar Farm Suitability Assessment Using GIS-assisted Multi-Criteria Evaluation.” Landscape and Urban Planning 138 (June): 20–31. https://doi.org/10.1016/j.landurbplan.2015.02.001.
Wickham, Hadley. 2010. “A Layered Grammar of Graphics.” Journal of Computational and Graphical Statistics 19 (1): 3–28. https://doi.org/10.1198/jcgs.2009.07098.
Nikhil Kaza
Nikhil Kaza
Professor

My research interests include urbanization patterns, local energy policy and equity