Introduction to tocr

Add a fresh TOC to an (R) Markdown document

Adding a table of contents (TOC) to the pdftools README.md for example is as simple as:

tocr::add_toc(md = "https://raw.githubusercontent.com/ropensci/pdftools/e7248d9956c7e73968628fa3a8ed37f0a8c23b37/README.md") |> head(n = 15L)
#>  [1] "# pdftools"                                                              ""                                                                       
#>  [3] "<!-- TOC BEGIN -- leave this comment untouched to allow auto update -->" ""                                                                       
#>  [5] "## Table of contents"                                                    ""                                                                       
#>  [7] "- [Introduction](#-introduction)"                                        "- [Installation](#-installation)"                                       
#>  [9] "    - [Building from source](#-building-from-source)"                    "- [Getting started](#-getting-started)"                                 
#> [11] "- [Bonus feature: rendering pdf](#-bonus-feature-rendering-pdf)"         "- [Limitations](#-limitations)"                                         
#> [13] ""                                                                        "<!-- TOC END -- leave this comment untouched to allow auto update -->"  
#> [15] ""

The parameter md can specify a single file path, a single URL or a character vector (one string per line).

Update an existing TOC of an (R) Markdown document

To demonstrate updating an existing TOC, I will use the doctoc README file which already includes a TOC generated by doctoc. add_toc() recognizes this TOC properly. If you find software-generated Markdown TOCs which aren’t recognized by add_toc(), please file an issue.

Now let’s use the doctoc README as it appeared live at the end of May 2018¹, replace/update its TOC with a different one generated by add_toc() and display the first 41 lines:

# TODO: Fix me!

"https://raw.githubusercontent.com/thlorenz/doctoc/1d386261972d35c6bcd187d0a00e666f9d893d8d/README.md" |>
  tocr::add_toc(position = 33L,
                backlink_strings = c("\U1F446", "\U1F447"),
                listing_style = "*") |>
  magrittr::extract(1:41) |>
  pal::cat_lines()

This removes the existing TOC and writes a new one which differs in the following characteristics:

The new TOC is placed at line 33 (line numbers counted as in the original document).
The new TOC uses *, not - as the symbol to list the TOC entries (they are semantically the same).
There are backlinks to the new TOC before all headers included in the TOC for easier navigation. The backlink text consists of a \U1F446 or \U1F447 symbol depending on the header’s position. If it is below the new TOC, the first symbol gets used, otherwise the second one.

Remove a TOC from an (R) Markdown document

Let’s use the doctoc README file from before and remove it’s TOC:

# TODO: Fix me!
tocr::remove_toc("https://raw.githubusercontent.com/thlorenz/doctoc/1d386261972d35c6bcd187d0a00e666f9d893d8d/README.md") |>
  pal::cat_lines()

Optionally you can specify an old_toc_id to remove leftover backlinks.

Note that remove_toc() is just a convenience wrapper function around add_toc(..., position = "none").

Remove backlinks only

To just remove the backlinks from a Markdown document, set the parameter add_backlinks = FALSE and leave all the other parameters untouched (except for a manually set position line number which might have to be adapted to factor in the TOC lines). For example:

"https://raw.githubusercontent.com/tidyverse/purrr/d0a808186e820fb637affe4d92cef2c7bf3cf6bd/README.md" |>
  # add TOC _and_ backlinks 
  tocr::add_toc(position = "below") |>
  # only add TOC, remove possibly existing backlinks
  tocr::add_toc(position = "below",
                add_backlinks = FALSE) |>
  head(n = 34L)
#>  [1] ""                                                                                                                                     
#>  [2] "<!-- README.md is generated from README.Rmd. Please edit that file -->"                                                               
#>  [3] ""                                                                                                                                     
#>  [4] "# purrr <img src=\"man/figures/logo.png\" align=\"right\" />"                                                                         
#>  [5] ""                                                                                                                                     
#>  [6] "[![CRAN\\_Status\\_Badge](http://www.r-pkg.org/badges/version/purrr)](http://cran.r-project.org/package=purrr)"                       
#>  [7] "[![Build"                                                                                                                             
#>  [8] "Status](https://travis-ci.org/tidyverse/purrr.svg?branch=master)](https://travis-ci.org/tidyverse/purrr)"                             
#>  [9] "[![Coverage"                                                                                                                          
#> [10] "Status](https://img.shields.io/codecov/c/github/tidyverse/purrr/master.svg)](https://codecov.io/github/tidyverse/purrr?branch=master)"
#> [11] ""                                                                                                                                     
#> [12] "<!-- TOC BEGIN -- leave this comment untouched to allow auto update -->"                                                              
#> [13] ""                                                                                                                                     
#> [14] "## Table of contents"                                                                                                                 
#> [15] ""                                                                                                                                     
#> [16] "- [Overview](#overview)"                                                                                                              
#> [17] "- [Installation](#installation)"                                                                                                      
#> [18] "- [Usage](#usage)"                                                                                                                    
#> [19] "- [Code of conduct](#code-of-conduct)"                                                                                                
#> [20] ""                                                                                                                                     
#> [21] "<!-- TOC END -- leave this comment untouched to allow auto update -->"                                                                
#> [22] ""                                                                                                                                     
#> [23] "## Overview"                                                                                                                          
#> [24] ""                                                                                                                                     
#> [25] "purrr enhances R’s functional programming (FP) toolkit by providing a"                                                                
#> [26] "complete and consistent set of tools for working with functions and"                                                                  
#> [27] "vectors. If you’ve never heard of FP before, the best place to start is"                                                              
#> [28] "the family of `map()` functions which allow you to replace many for"                                                                  
#> [29] "loops with code that is both more succinct and easier to read. The best"                                                              
#> [30] "place to learn about the `map()` functions is the [iteration"                                                                         
#> [31] "chapter](http://r4ds.had.co.nz/iteration.html) in R for data science."                                                                
#> [32] ""                                                                                                                                     
#> [33] "## Installation"                                                                                                                      
#> [34] ""

Remove leftover backlinks to custom HTML anchor/`<id>` attribute

Consider you have added a TOC to a Markdown document using a non-header style title_tier (like "bold") and a strange toc_id like "navigation_centre". Now just feeding that document to add_toc() or remove_toc() will … not remove? remove?

Particular TOC features

In the following, the effects of setting specific parameters of the add_toc() function are explained in more detail.

TOC minimum and maximum tiers to include

The parameters min_tier and max_tier define which of all the possible header tiers <h1> – <h6> are considered in the TOC. By default, <h1> is not considered because it usually serves as the title of the document and it wouldn’t make sense to include that in a TOC. But if your Markdown document deviates from this premise, you might wanna set min_tier by hand. The Fira Code README.md for example doesn’t include any <h1> headers², instead the title is formatted as <h2>. Therefore we wanna set min_tier = 3 to obtain a reasonable TOC:

tocr::add_toc(md = "https://raw.githubusercontent.com/tonsky/FiraCode/30862e05b00f41c9179a9424e382755a5ef954f0/README.md",
              min_tier = 3L) |>
  magrittr::extract(1:25) |>
  pal::cat_lines()
#> ## Fira Code: monospaced font with programming ligatures
#> 
#> <!-- TOC BEGIN -- leave this comment untouched to allow auto update -->
#> 
#> ### Table of contents
#> 
#> - [Problem](#-problem)
#> - [Solution](#-solution)
#>     - [Download v1.205 · How to install · Troubleshooting · News & updates](#-download-v1205--how-to-install--troubleshooting--news--updates)
#> - [Code examples](#-code-examples)
#> - [Terminal support](#-terminal-support)
#> - [Editor support](#-editor-support)
#> - [Browser support](#-browser-support)
#> - [Projects using Fira Code](#-projects-using-fira-code)
#> - [Alternatives](#-alternatives)
#> - [Credits](#-credits)
#> 
#> <!-- TOC END -- leave this comment untouched to allow auto update -->
#> 
#> <img src="http://s.tonsky.me/imgs/fira_code_logo.svg">
#> 
#> ### [↑](#table-of-contents) Problem
#> 
#> Programmers use a lot of symbols, often encoded with several characters. For the human brain, sequences like `->`, `<=` or `:=` are single logical tokens, even if they take two or three characters on the screen. Your eye spends a non-zero amount of energy to scan, parse and join multiple characters into a single logical one. Ideally, all programming languages should be designed with full-fledged Unicode symbols for operators, but that’s not the case yet.

The same way you can restrict the maximum tier to be included in the TOC. If we don’t want to mention anything below tier <h3> for example, we set max_tier = 3:

tocr::add_toc(md = "https://raw.githubusercontent.com/tonsky/FiraCode/30862e05b00f41c9179a9424e382755a5ef954f0/README.md",
              max_tier = 3L) |>
  magrittr::extract(1:25) |>
  pal::cat_lines()
#> ## [↓](#table-of-contents) Fira Code: monospaced font with programming ligatures
#> 
#> <!-- TOC BEGIN -- leave this comment untouched to allow auto update -->
#> 
#> ## Table of contents
#> 
#> - [Fira Code: monospaced font with programming ligatures](#-fira-code-monospaced-font-with-programming-ligatures)
#>     - [Problem](#-problem)
#>     - [Solution](#-solution)
#>     - [Code examples](#-code-examples)
#>     - [Terminal support](#-terminal-support)
#>     - [Editor support](#-editor-support)
#>     - [Browser support](#-browser-support)
#>     - [Projects using Fira Code](#-projects-using-fira-code)
#>     - [Alternatives](#-alternatives)
#>     - [Credits](#-credits)
#> 
#> <!-- TOC END -- leave this comment untouched to allow auto update -->
#> 
#> <img src="http://s.tonsky.me/imgs/fira_code_logo.svg">
#> 
#> ### [↑](#table-of-contents) Problem
#> 
#> Programmers use a lot of symbols, often encoded with several characters. For the human brain, sequences like `->`, `<=` or `:=` are single logical tokens, even if they take two or three characters on the screen. Your eye spends a non-zero amount of energy to scan, parse and join multiple characters into a single logical one. Ideally, all programming languages should be designed with full-fledged Unicode symbols for operators, but that’s not the case yet.

TOC positioning

Where to place the TOC including its surrounding TOC BEGIN and "TOC END comments is defined by the position argument. It can either be:

"top" to place the TOC at the very beginning of the document, i.e. line 1.
"bottom" to place the TOC at the very end of the document.
"above" to place the TOC above the lines between the uppermost header of tier <= min_tier and the next header above (if any).
"below" to place the TOC below the lines between the uppermost header of tier <= min_tier and the next header above (if any), i.e. right above the uppermost header of tier <= min_tier.
"none" to only remove an existing TOC.
A line number as a positive integer to insert the TOC right above this line.

Difference between `above` and `below`

…

"https://raw.githubusercontent.com/mkearney/rtweet/ff5ac6cf48f63f0685beda6d5fed03388c51b7f2/README.md"
#> [1] "https://raw.githubusercontent.com/mkearney/rtweet/ff5ac6cf48f63f0685beda6d5fed03388c51b7f2/README.md"

TOC `listing_style`

There are three possible ways to list the TOC entries and the parameter listing_style defines which one it is. Possible are:

An unordered Markdown list. Set listing_style to - or * for the respective listing symbols.
An ordered Markdown list. Set listing_style = "ordered" for this.
Just plain text indented by non-breaking spaces ( ) according to the hierarchy of the headers. Set listing_style = "indented" for this.

While the first two options seem aesthetically superior, the last one is particularly useful in two situations:

Consider a Markdown document with numbered headers like # 1. Introduction.
Consider a Markdown document with a strange header hierarchy…

Notice: add_toc() checks if the first TOC entry is of the lowest tier included in the TOC, and if not, automatically sets listing_style = "indented" to avoid a broken Markdown list (when it happens, a warning message will printed about it).

TOC `markdown_flavor`

…