Dissemination: sharing, publishing, archiving

Sharing, publishing and archiving research products

What is the difference between sharing, publishing & archiving?

Why, with whom, what, when, where, and how to publish & archive?

why publish?
who are we sharing with?
what materials do we need to publish?
when do we make them available?
where do we publish various outputs?
how do we prepare materials for publication?

For the remaining slides we are going to assume that we are at the point of submitting our manuscript.

Why?

increased visibility / citation
funding agency (see the NWO Open Science Program)
journal requirements (see, e.g., PLOS Publishing policies, Nature Publishing policies)
community expects it

better research

Better Research

Figure 1. Distribution of reporting errors per paper for papers from which data were shared and from which no data were shared.

Wicherts et al (2011) Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results.

Why?

Reproducibility: what’s in it for me?

more efficient, less redundant science - by allowing others to build upon our work

Five selfish reasons to work reproducibly by Florian Markowetz

Reason number 1: reproducibility helps to avoid disaster
Reason number 2: reproducibility makes it easier to write papers
Reason number 3: reproducibility helps reviewers see it your way
Reason number 4: reproducibility enables continuity of your work
Reason number 5: reproducibility helps to build your reputation

Who?

Who do we need to share with?

collaborators
peer reviewers & journal editors
broad scientific community
generally the public

For research to be reproducible, the research products (data, code) need to be publicly available in a form that people can find and understand them. Ideally, both data and code are FAIR.

What?

Activity (in pairs)

Catalog the artifacts you produced during this workshop (3 minutes)

What needs to be published?
What does not need to be published?
Anything that cannot be published?

Activity outcomes

share? yes!

starting data set
metadata
data cleaning steps
analysis scripts
source code
readme

share? maybe?

raw data
processed / cleaned data
intermediate results

share? no!

confidential data
material already published
pre-existing restrictive license
passwords, private keys

Activity outcomes

Advice: One way to determine what you need to publish is to go through and redo the analyses in your paper. Make note of the data and code and notes you needed to do that analysis. Make sure all of that is available. This might seem time consuming, but it assures that what you think you did is what you actually did.

When?

You can make your code and data public at any point of the research process.

However, at the point of paper submission, the results in your paper should be reproducible and therefore the data and code used in the paper published.

Journals now often require it
Lets the editor and the reviewers accurately review the paper
You might want to keep data and code private for just reviewers until the paper is published

Where?

Domain-specific data repository (4TU.ResearchData, EASY | DANS)
Source code hosting service (GitHub, GitLab, Bitbucket)
Generic repository (Figshare, Zenodo)
Institutional repository (e.g., TU Delft Repository)
Sharing services (RPubs, Dropbox, Google Drive)

Discuss: Research products (code, data) published separately is different from journal supplementary materials. Why?

How to choose?

is there a domain specific repository?
is there a plan for long-term preservation?
can people find your materials?
is it citable? (does it provide DOIs)
is your purpose archival, sharing or publication?

What goes where when?

You will likely have different artifacts:

R Markdown
source code
other documentation
raw data
derived data

Possible workflow:

develop and share data & code on GitHub
upon publication
- share markdown on RPubs
- archive a snapshot of data in 4TU.ResearchData
- code snapshot to Zenodo

How to share, publish: file formats

How to share, publish: checklist

Activity (in pairs)

Documenting your research:

collect all of the to-be-archived artifacts from the preceding lessons into a directory
write a README file that describes the contents of the directory
put a license or waiver on it

Does copyright apply?

Copyright applies to creative works

source code
text (manuscripts etc.)
images

Typically not copyrightable:

data, results
individual records in a database of facts

Note: This is not an exhaustive list. Ask your data steward or library if you need help!

Choose a license for your code

Choose a Creative Commons license

Licenses are legal instruments

Licenses, copyright, terms of use are complicated issues.
There are legal implications to your choices.
Citation is a professional norm in science.
- We have good systems for ensuring proper citation.
- Would you try to sue someone in court who fails to cite you properly?
Keep it simple by putting the least-restrictive license possible

Waiving copyright

CC0 enables scientists, educators, artists and other creators and owners of copyright- or database-protected content to waive those interests in their works and thereby place them as completely as possible in the public domain, so that others may freely build upon, enhance and reuse the works for any purposes without restriction under copyright or database law.

Licenses versus community norms

From the Panton Principles:

[…] in the scholarly research community the act of citation is a commonly held community norm when reusing another community member’s work.

Community norms can be a much more effective way of encouraging positive behaviour, such as citation, than applying licenses. A well functioning community supports its members in their application of norms, whereas licences can only be enforced through court action and thus invite people to ignore them when they are confident that this is unlikely.

Challenges and concerns about publishing data and code

Discussion: What are some of the challenges of publishing research products? What are some of the concerns that people have?

University libraries can help

More and more specialized staff and services are available at university libraries. They provide great resources for data/software management as well as information and access to repositories. They are particularly good at thinking about data archives and increasingly providing support with code as well.

Faculty data steward can advise on data management and archival
Library staff specialized in copyright and licenses
(Relatively new) University Digital Competence Centres (e.g., the TU Delft DCC) have staff who can help or provide consultance on problems related to research data and software engineering
They are all very helpful and super awesome!