Sharing, publishing and archiving research products

What is the difference between sharing, publishing & archiving?

Non-synomymous terms

  • shared: any way of sharing information, could mean I emailed it to you
  • publish : citable artifact, discoverable
  • archive : long-term preservation

We’ll be focusing on publishing and archiving

Why, with whom, what, when, where, and how to publish & archive?

  • why publish?
  • who are we sharing with?
  • what materials do we need to publish?
  • when do we make them available?
  • where do we publish various outputs?
  • how do we prepare materials for publication?

For the remaining slides we are going to assume that we are at the point of submitting our manuscript.

Why?

Better Research

Why?

Reproducibility: what’s in it for me?

more efficient, less redundant science - by allowing others to build upon our work

Five selfish reasons to work reproducibly by Florian Markowetz

  • Reason number 1: reproducibility helps to avoid disaster
  • Reason number 2: reproducibility makes it easier to write papers
  • Reason number 3: reproducibility helps reviewers see it your way
  • Reason number 4: reproducibility enables continuity of your work
  • Reason number 5: reproducibility helps to build your reputation

Who?

Who do we need to share with?

  • collaborators
  • peer reviewers & journal editors
  • broad scientific community
  • generally the public

For research to be reproducible, the research products (data, code) need to be publicly available in a form that people can find and understand them. Ideally, both data and code are FAIR.

What?

Activity (in pairs)

Catalog the artifacts you produced during this workshop (3 minutes)

  • What needs to be published?
  • What does not need to be published?
  • Anything that cannot be published?

Activity outcomes

share? yes!

  • starting data set
  • metadata
  • data cleaning steps
  • analysis scripts
  • source code
  • readme




share? maybe?

  • raw data
  • processed / cleaned data
  • intermediate results

share? no!

  • confidential data
  • material already published
  • pre-existing restrictive license
  • passwords, private keys

Activity outcomes

Advice: One way to determine what you need to publish is to go through and redo the analyses in your paper. Make note of the data and code and notes you needed to do that analysis. Make sure all of that is available. This might seem time consuming, but it assures that what you think you did is what you actually did.

When?

You can make your code and data public at any point of the research process.

However, at the point of paper submission, the results in your paper should be reproducible and therefore the data and code used in the paper published.

  • Journals now often require it
  • Lets the editor and the reviewers accurately review the paper
  • You might want to keep data and code private for just reviewers until the paper is published

Where?

How to choose?

  • is there a domain specific repository?
  • is there a plan for long-term preservation?
  • can people find your materials?
  • is it citable? (does it provide DOIs)
  • is your purpose archival, sharing or publication?

What goes where when?

You will likely have different artifacts:

  • R Markdown
  • source code
  • other documentation
  • raw data
  • derived data



Possible workflow:

  • develop and share data & code on GitHub
  • upon publication
    • share markdown on RPubs
    • archive a snapshot of data in 4TU.ResearchData
    • code snapshot to Zenodo

How to share, publish: file formats

Do’s

  • non-proprietary file formats
  • text file formats (.csv, .txt, .md)

Don’t’s

  • proprietary file formats (.xlsx)
  • data as PDFs or images
  • data in Word documents

Using standard data formats is sometimes required, but even when it’s not, conforming to standards greatly increases opportunities for re-use and understanding.

How to share, publish: checklist

  • top-level README that describes the data or software package
    • list files and naming conventions
    • describe abbreviations, column names, etc.
  • installation and usage instructions for software
  • citation instructions

Activity (in pairs)

Documenting your research:

  • collect all of the to-be-archived artifacts from the preceding lessons into a directory

  • write a README file that describes the contents of the directory

  • put a license or waiver on it

Does copyright apply?

Choose a license for your code

Choose a Creative Commons license

Licenses are legal instruments

Waiving copyright

Licenses versus community norms

From the Panton Principles:

[…] in the scholarly research community the act of citation is a commonly held community norm when reusing another community member’s work.

Community norms can be a much more effective way of encouraging positive behaviour, such as citation, than applying licenses. A well functioning community supports its members in their application of norms, whereas licences can only be enforced through court action and thus invite people to ignore them when they are confident that this is unlikely.

Challenges and concerns about publishing data and code

Discussion: What are some of the challenges of publishing research products? What are some of the concerns that people have?

University libraries can help

More and more specialized staff and services are available at university libraries. They provide great resources for data/software management as well as information and access to repositories. They are particularly good at thinking about data archives and increasingly providing support with code as well.

  • Faculty data steward can advise on data management and archival
  • Library staff specialized in copyright and licenses
  • (Relatively new) University Digital Competence Centres (e.g., the TU Delft DCC) have staff who can help or provide consultance on problems related to research data and software engineering
  • They are all very helpful and super awesome!