October 10, 2022
“Computational notebooks […] open up the world of analytics to […] disciplines that encompass diverse methodologies and skillsets [such as] urban planning […] Some urban planners focus on policymaking […] Others employ qualitative methods to work in and with vulnerable communities. Others develop simulation models to forecast urbanization patterns and infrastructure needs. Others intermingle these, and many more, different approaches to understanding and shaping the city. Yet all urban planners benefit [or should!] from basic quantitative literacy and an ability to reason critically with data. This scholarly and professional imperative aligns with the growing importance of computational thinking in the urban context and parallel trends in geocomputation […], geographic data science […], and the open-source/open-science movements […].” (Boeing, 2019, p. 40)
“toolkits relying on point-and-click interfaces are inefficient in the era of big data. Due to the limited scope for automation of tasks, not only is workflow efficiency reduced but also the reproducibility of the underlying research is compromised, because this largely depends on the (often undocumented) sequence of decisions manually operating the software. […] We then argue that the field [of urban morphology] needs a shift from dominant traditional geographic information system (GIS) environments based on a graphical user interface (GUI; e.g., QGIS or ArcMap) towards reproducible open code-based workflows.” (Fleischmann et al., 2022, p. 3)
Organization: tools to organize your projects so that you don’t have a single folder with hundreds of files
Automation: the power of scripting to create automated data analyses
Documentation: difference between binary files (e.g. docx) and text files and why text files are preferred for documentation
Dissemination: publishing is not the end of your analysis, rather it is a way station towards your future research and the future research of others
You could just type into the console…
… but that doesn’t help much with documentation
… but that doesn’t help much with automation
With RStudio you can combine your programming and your documentation
Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and (often) sample data. (From: http://r-pkgs.had.co.nz)
We will use the ggplot2
package for plots and dplyr
for data wrangling in this session.
If you have not yet done so, install these packages by running the following in the Console:
install.packages("ggplot2") install.packages("dplyr")
NOT about understanding all the R commands, but rather getting the big picture of how using R in this way facilitates reproducible analyses
Download the archive you received by email with the files we will use in the workshop.
Follow the instructions in the email to unzip the archive into a folder that you will use as your project folder.
Put a green sticky note on the front of your laptop if you are ready or a pink one if you need help and a helper will approach you.
Go to your project folder and double click on documents/intro-demo.Rmd
and it should open in RStudio
Click on Knit HTML to compile the document
Read the output and discuss why this way of documenting research is reproducible
Checklist available in documents/checklist.md