2019-11-19

On reproducibility

What is reproducibility in science?


  • ability to reproduce results by a peer
  • requires data, methods, and procedures
  • increasingly, science is supposed to be reproducible

Why does it not happen, in practice?

Some opinions on whether reproducibility is needed:

  • Ideally, yes but we don't have time for this.
  • If it gets published, yes.
  • If it gets published, yes; unless it is in PLoS One…
  • No need: I work on my own.
  • For others to copy us? You crazy?!
  • No way! We rigged the data, the method does not work, and we ran the analyses in Excel.

Main obstacles to reproducibility

  • lack of time: ultimately, reproducibility is faster
  • fear of plagiarism: low risks in practice
  • internal work, no need to share: almost never true
  • one good reason: lack of tools to facilitate reproducibility

You never work alone


Be nice to your future selves!

Two aspects of reproducibility using


  • implementing methods as packages
  • making transparent and reproducible analyses

eproducibility in practice

Literate programming

Let us change our traditional attitude to the construction of programs: instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do.

(Donald E. Knuth, Literate Programming, 1984)

A data-centred approach to programming

Literate programming in

Current workflows use the following equation:

markdown (.md) + = Rmarkdown (.Rmd)



Example:
knitr::knit2html("foo.Rmd") \(\rightarrow\) foo.html
rmarkdown::render("foo.Rmd") \(\rightarrow\) foo.pdf
rmarkdown::render("foo.Rmd") \(\rightarrow\) foo.doc
...

Rmarkdown: chunks in markdown

```{r chunk-title, ...}
a <- rnorm(1000)
hist(a, col = terrain.colors(15), border = "white", main = "Normal distribution")
```

results in:

a <- rnorm(1000)
hist(a, col = terrain.colors(15), border = "white", main = "Normal distribution")

Formatting outputs

```{r another-chunk-title, ...}
[some R code here]
```

where ... are options for processing and formatting, e.g:

  • eval (TRUE/FALSE): evaluate code?
  • echo (TRUE/FALSE): show code input?
  • results ("markup"/"hide"/"asis"): show/format code output
  • message/warning/error: show messages, warnings, errors?
  • cache (TRUE/FALSE): cache analyses?

See http://yihui.name/knitr/options for details on all options.

One format, several outputs

rmarkdown can generate different types of documents:

  • standardised reports (html, pdf)
  • journal articles. using the rticles package (.pdf)
  • Tufte handouts (.pdf)
  • word documents (.doc)
  • slides for presentations (html, pdf)

See: http://rmarkdown.rstudio.com/gallery.html.

rmarkdown: toy example 1/2

Let us consider the file :

---
title: "A toy example of rmarkdown"
author: "John Snow"
date: "2019-11-19"
output: html_document
---

This is some nice R code:

```{r rnorm-example}
x <- rnorm(100)
x[1:6]
hist(x, col = "grey", border = "white")
```

rmarkdown: toy example 1/2

rmarkdown::render("foo.Rmd")

Good practices

rmarkdown is just the beginning