29  R markdown document

knitr rmarkdown
1.45 2.25

29.1 Introduction

An R markdown document is a text file usually ending with an .Rmd extension. It allows one to embed R code chunks and their output into a comprehensive report thus eliminating the possibility of loading the wrong figure into the document, or forgetting to update a statistical summary in the text when the original data file was revised.

Creating an R markdown output from an Rmd file requires knitting the file as opposed to running the code as you would an R script. The RStudio interface provides you with a knit button at the top of its interface. The knit button also allows you to choose the output format (HMTL, Word or PDF). You can also knit an Rmd file in R using the render function from the rmarkdown package. For example,

rmarkdown::render("HW16.Rmd")

29.2 The YAML header

The YAML header controls the look and feel of your document. At the very least, your R markdown document should contain the following YAML header sandwiched between two sets of ---:

 ---
 title: "Your document title"
 author: "ES 218"
 output:
   html_document: default
 ---

Make sure that the html_document: default line is indented at least two spaces. If you intend on creating a Word document, substitute html_document with word_document.

 ---
 title: "Your document title"
 author: "ES 218"
 output:
   word_document: default
 ---

The YAML header can take on several parameters. For example, to add the current date, add:

date: '`r format(Sys.time(), "%d %B, %Y")`'

The above chunk makes use of an inline code chunk that will be discussed later in this tutorial. Note the mix of single quotes and back ticks that wrap the inline code. The %d, %B and %y parameters specify the date format. You can read more on date formats here.

To have the document automatically generate a table of contents add toc: true to the html_document or word_document header. Make sure that the toc parameter is indented at least two spaces from the xxx_document header:

...
output:
  html_document: 
    toc: true

The above generates a static TOC. If you want to generate a floating TOC, add toc_float: true.

...
output:
  html_document: 
    toc: true
    toc_float: true

29.3 Code folding

Rmarkdown offers the option to interactively collapse the code chunks in a knitted document. This may not be an option to have in a final report, but it may prove useful for a technical document where both code and output are to be shared. Code folding option is set with code_folding: .... The options are hide to collapse the code chunks by default and show to reveal the code chunks by default.

...
output:
  html_document: 
    toc: true
    toc_float: true
    code_folding: hide

29.4 Section headers

You can add section headers to your document by preceding the header with one or more hashtags. Each hashtag represents one heading level. For example, the top heading level is # and the third heading level is ###.

The top header hashtag is usually avoided because its default font size tends to be too big. It’s not uncommon to see R markdown files assign the top level to ##.

## Use this as a top section level 

Some text

### Use this as the second section level 

Some text

#### Use this as the third section level 

etc...

Use this as a top section level

Some text

Use this as the second section level

Some text

Use this as the third section level

etc…

29.5 Text formats

The markdown language has several built-in text formatting options. A brief summary of some their syntax follows:

  • Italic: To italicize text, wrap it in asterisks as in *this is italicized*. Note that you do not want spaces between the asterisks and the text.
  • Bold: To bold text, wrap it with a pair of asterisks **this is bold**.
  • Web links: To create web links wrap the text with [ ] followed by the web link wrapped with ( ) as in [ES 218 website](https://mgimond.github.io/ES218). Make sure that there are no spaces between [] and ().
  • Lists: To create lists in your document, precede each list item with an asterisk followed by a space. For example:
* First list element
* Second list element
* Third list element
  • First list element
  • Second list element
  • Third list element
  • Block equations: You can embed Latex block equations using double dollar signs,
$$
y_{edu} = \frac{1 + x}{x} + \varepsilon
$$

which generates,

\[ y_{edu} = \frac{1 + x}{x} + \varepsilon \]

  • Inline equations: You can also add inline Latex equations using single dollar signs,
The equation $x(1 + x)$ can be re-written as $x + x^2$. 

which generates,

The equation \(x(1 + x)\) can be re-written as \(x + x^2\).

29.6 Code chunks

To embed a code chunk, simply wrap the code between ```{r} and ```.

```{r} 
plot(hp ~ mpg, mtcars)
```

Code chunks can take on many options. Examples of a few common options follow:

  • echo: If you don’t want the code chunks to appear in the ouput, set echo=FALSE.
  • include: If you want neither the code chunk nor its ouput displayed in the output, set include=FALSE.
  • fig.width and fig.height: These parameters control a figure’s height and width (in inches).
  • warning and message: Some functions will output warnings or messages, most of which you probably do not want in your output document. To hide these, set warning and message to FALSE.

An example of a code chunk with a few of the aformentioned parameter follows:

```{r message=FALSE, warning=FALSE, echo=TRUE, fig.width=3, fig.height=2} 
plot(hp ~ mpg, mtcars)
```

Here’s the output (note that echo was set to TRUE in this example):

plot(hp ~ mpg, mtcars)

29.7 Document wide code chunk options

You can apply document wide code chunk options. For example, to avoid adding message=FALSE and warning=FALSE to each chunk of code, you can add this single chunk of code to the beginning of your Rmd file.

```{r include=FALSE}
knitr::opts_chunk$set(message=FALSE, warning=FALSE)
```

29.8 Inline code chunks

If statistical summaries or data derived vectors are to populate text in your document, it’s best to do so as inline code chunks. For example, you might want to embed the mean mpg value from the mtcars dataset in your text. The traditional approach is to compute the mean in an R console as follows,

mean(mtcars$mpg)
[1] 20.09062

then transcribe this value into your document as follows:

The mean miles per gallon is 20.1 mpg.

An automated way to do this is to embed the code chunk directly into your text by wrapping it between `r (backtic followed by the letter r) and another backtick.

The mean miles per gallon is `r mean(mtcars$mpg)` mpg.

which generates:

The mean miles per gallon is 20.090625 mpg.

To control the precision, you can wrap the output with round,

The mean miles per gallon is `r round(mean(mtcars$mpg), 1)` mpg.

The mean miles per gallon is 20.1 mpg.

If the code chunk becomes too long and unwieldy to embed in your text, you can create an object from that code in a separate chunk of code, then reference that object inline. For example:

```{r include=FALSE}
M <- lm(mpg ~ hp, mtcars)
r.sq <- round(summary(M)$r.square, 3)
```

The modeled r-square between miles-per-gallon and engine horsepower is `r r.sq`.

The code chunk is hidden from the output, but the object r.sq is created nonetheless and converted to its value in the inline code chunk. The output thus looks like:

The modeled r-square between miles-per-gallon and engine horsepower is 0.602.

29.9 Tables

You can create two types of tables: static tables where you manually populate the cell values, and dynamic tables which are populated with R data tables.

29.9.1 Static tables

Here’s an example of a static table syntax:

column 1        Column 2    column3 
-----------  -----------  ------------
val1                 2.3  apple
val2                   5  orange
val3                0.34  kiwi
column 1 Column 2 column3
val1 2.3 apple
val2 5.0 orange
val3 0.34 kiwi

Note how the left and right adjusted columns in the output reflect the left and right adjusted columns in the above syntax. It’s important that the column elements not extend beyond the dashed line extents.

29.9.2 Dynamic tables

There are many R packages that specialize in table output formats such as xtable and stargazer. However, decent tables can be created with knitr’s kable function in conjunction with kableExtra. Note that this requires the magrittr package (if the pipe %>% is used). However, if dplyr is used elsewhere in the Rmd document, the magrittr package can be omitted. Here’s an example:

library(magrittr)
knitr::kable( head(mtcars), format="html" ) %>% 
  kableExtra::kable_styling(bootstrap_options = "striped", 
                            full_width = FALSE, position = "left")
mpg cyl disp hp
Mazda RX4 21.0 6 160 110
Mazda RX4 Wag 21.0 6 160 110
Datsun 710 22.8 4 108 93
Hornet 4 Drive 21.4 6 258 110
Hornet Sportabout 18.7 8 360 175
Valiant 18.1 6 225 105

If the output file format is a Word document, substitute format = "html" with format = "pandoc".

For more kableExtra options, visit its website.

29.10 A complete example

Here’s what a complete Rmd file might look like:

 ---
 title: "A simple example"
 author: "ES 218"
 output:
   html_document: 
     toc: true
 editor_options: 
   chunk_output_type: console
 ---


```{r include=FALSE}
knitr::opts_chunk$set(message = FALSE, warning = FALSE, echo = TRUE)
```

## A basic plot


```{r fig.width = 3, fig.height = 2.5} 
library(ggplot2)

ggplot(mtcars, aes(mpg, hp)) + geom_point() + 
  geom_smooth(method = lm, se = FALSE)
```


## Here's a glimpse of the data table


```{r echo = FALSE}
knitr::kable(head((mtcars), format = "html"))
```


## A basic analysis


```{r include = FALSE}
M <- lm(mpg ~ hp, mtcars)
r.sq <- round(summary(M)$r.square, 2)
```


The modeled r-square between miles-per-gallon and engine horsepower is `r r.sq`.

29.11 Additional resources