October 10, 2022

Rule 1

Every project lives in its own folder.

Rule 2

Not all subfolders are equal.

  • Distinguish folder types, name them accordingly:
    • Read-only: data, metadata
    • Human-generated: code, paper, documentation
    • Project-generated: clean data, figures, models…
  • This ensures you do not accidentally touch what you shouldn’t touch…
  • …or that your project as it progresses clutters up your system.

For example

PROJECT/
├── data/
│   ├── raw_data/
|   |   ├── metadata.json
|   |   └── raw_data.csv
│   └── clean_data/
|       └── data_cleaned.csv
├── results
│   ├── figures/
│   |   ├── chart.png
│   └── output/
│   |   └── statistics.csv
├── docs/
│   ├── manuscript.Rmd
│   └── journal.md
├── code/
│   ├── analyse_data.R
│   └── clean_data.R
└── README.md

OK, stop.

To hell with rules, let’s explain some things:

To hell with rules, let’s explain some things:

So that means…

PROJECT/
└── data/
    └── raw_data/
        ├── metadata.json
        └── raw_data.csv

The ROOT is PROJECT/

The PATH is, e.g., PROJECT/data/raw_data/metadata.json.

Rule 3

Everything is relative

Everything is relative

PROJECT/
└── data/
    └── raw_data/
        ├── metadata.json
        └── raw_data.csv

A relative path: PROJECT/data/raw_data/metadata.json.

An absolute path: /Users/bvreede/work/courses/2021/PROJECT/data/raw_data/metadata.json

Rule 4

What’s in a name? Everything!

  • What information do you need to have if this file was ever displaced?

  • Use a schema, and be consistent. E.g.:

    maps_london_2018_openstreetmaps.png
    maps_paris_2020_google-earth.png
    maps_paris_2020_openstreetmaps.png
    maps_paris_2022_openstreetmaps.png
  • Sort the elements in your schema logically. E.g. 2020-05-30 instead of 30-05-2020.

  • No spaces, and limit special characters… but do use CamelCase or connecting_underscores.

  • Document your naming practice, ESPECIALLY when using abbreviations.

Rule 5

There is no such thing as TMI

Make a README file… or more

Where?

  • In the root/
  • Whenever necessary, in subfolders

What information do you want to see in a README?

  • What is this project about?
  • What information do you need to understand this data?
  • Any abbreviations or coding, inside the files or in their names
  • Sources of data, links to other projects -…

Let’s do this together

To start: basic organization

  • open a single folder for today’s work
  • make subfolders:
    • data/ (this will be read-only)
    • code/ (we will work in this!)
    • documents/ (and in this one)
    • results/ (this will be worked in by our project!)
  • download this zip file: tinyurl.com/urbanism-zipped
  • place the data and documents in the right location in your folder structure

Moving on to the next phase…

  • Let’s meet R and Rstudio!