Add a fresh TOC to an (R) Markdown document
Adding a table of contents (TOC) to the pdftools
README.md
for example is as simple as:
tocr::add_toc(md = "https://raw.githubusercontent.com/ropensci/pdftools/e7248d9956c7e73968628fa3a8ed37f0a8c23b37/README.md") |> head(n = 15L)
#> [1] "# pdftools" ""
#> [3] "<!-- TOC BEGIN -- leave this comment untouched to allow auto update -->" ""
#> [5] "## Table of contents" ""
#> [7] "- [Introduction](#-introduction)" "- [Installation](#-installation)"
#> [9] " - [Building from source](#-building-from-source)" "- [Getting started](#-getting-started)"
#> [11] "- [Bonus feature: rendering pdf](#-bonus-feature-rendering-pdf)" "- [Limitations](#-limitations)"
#> [13] "" "<!-- TOC END -- leave this comment untouched to allow auto update -->"
#> [15] ""
The parameter md
can specify a single file path, a
single URL or a character vector (one string per line).
Update an existing TOC of an (R) Markdown document
To demonstrate updating an existing TOC, I will use the doctoc
README file which already includes a TOC generated by doctoc.
add_toc()
recognizes this TOC properly. If you find
software-generated Markdown TOCs which aren’t recognized by
add_toc()
, please file an issue.
Now let’s use the doctoc README as it appeared live at the end of May
20181,
replace/update its TOC with a different one generated by
add_toc()
and display the first 41 lines:
# TODO: Fix me!
"https://raw.githubusercontent.com/thlorenz/doctoc/1d386261972d35c6bcd187d0a00e666f9d893d8d/README.md" |>
tocr::add_toc(position = 33L,
backlink_strings = c("\U1F446", "\U1F447"),
listing_style = "*") |>
magrittr::extract(1:41) |>
pal::cat_lines()
This removes the existing TOC and writes a new one which differs in the following characteristics:
- The new TOC is placed at line 33 (line numbers counted as in the original document).
- The new TOC uses
*
, not-
as the symbol to list the TOC entries (they are semantically the same). - There are backlinks to the new TOC before all headers included in
the TOC for easier navigation. The backlink text consists of a
\U1F446
or\U1F447
symbol depending on the header’s position. If it is below the new TOC, the first symbol gets used, otherwise the second one.
Remove a TOC from an (R) Markdown document
Let’s use the doctoc README file from before and remove it’s TOC:
# TODO: Fix me!
tocr::remove_toc("https://raw.githubusercontent.com/thlorenz/doctoc/1d386261972d35c6bcd187d0a00e666f9d893d8d/README.md") |>
pal::cat_lines()
Optionally you can specify an old_toc_id
to remove leftover backlinks.
Note that remove_toc()
is just a convenience wrapper
function around add_toc(..., position = "none")
.
Remove backlinks only
To just remove the backlinks from a Markdown document, set the
parameter add_backlinks = FALSE
and leave all the other
parameters untouched (except for a manually set position
line number which might have to be adapted to factor in the TOC lines).
For example:
"https://raw.githubusercontent.com/tidyverse/purrr/d0a808186e820fb637affe4d92cef2c7bf3cf6bd/README.md" |>
# add TOC _and_ backlinks
tocr::add_toc(position = "below") |>
# only add TOC, remove possibly existing backlinks
tocr::add_toc(position = "below",
add_backlinks = FALSE) |>
head(n = 34L)
#> [1] ""
#> [2] "<!-- README.md is generated from README.Rmd. Please edit that file -->"
#> [3] ""
#> [4] "# purrr <img src=\"man/figures/logo.png\" align=\"right\" />"
#> [5] ""
#> [6] "[![CRAN\\_Status\\_Badge](http://www.r-pkg.org/badges/version/purrr)](http://cran.r-project.org/package=purrr)"
#> [7] "[![Build"
#> [8] "Status](https://travis-ci.org/tidyverse/purrr.svg?branch=master)](https://travis-ci.org/tidyverse/purrr)"
#> [9] "[![Coverage"
#> [10] "Status](https://img.shields.io/codecov/c/github/tidyverse/purrr/master.svg)](https://codecov.io/github/tidyverse/purrr?branch=master)"
#> [11] ""
#> [12] "<!-- TOC BEGIN -- leave this comment untouched to allow auto update -->"
#> [13] ""
#> [14] "## Table of contents"
#> [15] ""
#> [16] "- [Overview](#overview)"
#> [17] "- [Installation](#installation)"
#> [18] "- [Usage](#usage)"
#> [19] "- [Code of conduct](#code-of-conduct)"
#> [20] ""
#> [21] "<!-- TOC END -- leave this comment untouched to allow auto update -->"
#> [22] ""
#> [23] "## Overview"
#> [24] ""
#> [25] "purrr enhances R’s functional programming (FP) toolkit by providing a"
#> [26] "complete and consistent set of tools for working with functions and"
#> [27] "vectors. If you’ve never heard of FP before, the best place to start is"
#> [28] "the family of `map()` functions which allow you to replace many for"
#> [29] "loops with code that is both more succinct and easier to read. The best"
#> [30] "place to learn about the `map()` functions is the [iteration"
#> [31] "chapter](http://r4ds.had.co.nz/iteration.html) in R for data science."
#> [32] ""
#> [33] "## Installation"
#> [34] ""
Remove leftover backlinks to custom HTML
anchor/<id>
attribute
Consider you have added a TOC to a Markdown document using a
non-header style title_tier
(like "bold"
) and
a strange toc_id
like "navigation_centre"
. Now
just feeding that document to add_toc()
or
remove_toc()
will … not remove? remove?
Particular TOC features
In the following, the effects of setting specific parameters of the
add_toc()
function are explained in more detail.
TOC minimum and maximum tiers to include
The parameters min_tier
and max_tier
define
which of all the possible header tiers <h1>
–
<h6>
are considered in the TOC. By default,
<h1>
is not considered because it usually serves as
the title of the document and it wouldn’t make sense to include that in
a TOC. But if your Markdown document deviates from this premise, you
might wanna set min_tier
by hand. The Fira Code
README.md
for example doesn’t include any
<h1>
headers2, instead the title is formatted as
<h2>
. Therefore we wanna set
min_tier = 3
to obtain a reasonable TOC:
tocr::add_toc(md = "https://raw.githubusercontent.com/tonsky/FiraCode/30862e05b00f41c9179a9424e382755a5ef954f0/README.md",
min_tier = 3L) |>
magrittr::extract(1:25) |>
pal::cat_lines()
#> ## Fira Code: monospaced font with programming ligatures
#>
#> <!-- TOC BEGIN -- leave this comment untouched to allow auto update -->
#>
#> ### Table of contents
#>
#> - [Problem](#-problem)
#> - [Solution](#-solution)
#> - [Download v1.205 · How to install · Troubleshooting · News & updates](#-download-v1205--how-to-install--troubleshooting--news--updates)
#> - [Code examples](#-code-examples)
#> - [Terminal support](#-terminal-support)
#> - [Editor support](#-editor-support)
#> - [Browser support](#-browser-support)
#> - [Projects using Fira Code](#-projects-using-fira-code)
#> - [Alternatives](#-alternatives)
#> - [Credits](#-credits)
#>
#> <!-- TOC END -- leave this comment untouched to allow auto update -->
#>
#> <img src="http://s.tonsky.me/imgs/fira_code_logo.svg">
#>
#> ### [↑](#table-of-contents) Problem
#>
#> Programmers use a lot of symbols, often encoded with several characters. For the human brain, sequences like `->`, `<=` or `:=` are single logical tokens, even if they take two or three characters on the screen. Your eye spends a non-zero amount of energy to scan, parse and join multiple characters into a single logical one. Ideally, all programming languages should be designed with full-fledged Unicode symbols for operators, but that’s not the case yet.
The same way you can restrict the maximum tier to be included in the
TOC. If we don’t want to mention anything below tier
<h3>
for example, we set
max_tier = 3
:
tocr::add_toc(md = "https://raw.githubusercontent.com/tonsky/FiraCode/30862e05b00f41c9179a9424e382755a5ef954f0/README.md",
max_tier = 3L) |>
magrittr::extract(1:25) |>
pal::cat_lines()
#> ## [↓](#table-of-contents) Fira Code: monospaced font with programming ligatures
#>
#> <!-- TOC BEGIN -- leave this comment untouched to allow auto update -->
#>
#> ## Table of contents
#>
#> - [Fira Code: monospaced font with programming ligatures](#-fira-code-monospaced-font-with-programming-ligatures)
#> - [Problem](#-problem)
#> - [Solution](#-solution)
#> - [Code examples](#-code-examples)
#> - [Terminal support](#-terminal-support)
#> - [Editor support](#-editor-support)
#> - [Browser support](#-browser-support)
#> - [Projects using Fira Code](#-projects-using-fira-code)
#> - [Alternatives](#-alternatives)
#> - [Credits](#-credits)
#>
#> <!-- TOC END -- leave this comment untouched to allow auto update -->
#>
#> <img src="http://s.tonsky.me/imgs/fira_code_logo.svg">
#>
#> ### [↑](#table-of-contents) Problem
#>
#> Programmers use a lot of symbols, often encoded with several characters. For the human brain, sequences like `->`, `<=` or `:=` are single logical tokens, even if they take two or three characters on the screen. Your eye spends a non-zero amount of energy to scan, parse and join multiple characters into a single logical one. Ideally, all programming languages should be designed with full-fledged Unicode symbols for operators, but that’s not the case yet.
TOC positioning
Where to place the TOC including its surrounding
TOC BEGIN
and "TOC END
comments is defined by
the position
argument. It can either be:
-
"top"
to place the TOC at the very beginning of the document, i.e. line 1. -
"bottom"
to place the TOC at the very end of the document. -
"above"
to place the TOC above the lines between the uppermost header of tier <=min_tier
and the next header above (if any). -
"below"
to place the TOC below the lines between the uppermost header of tier <=min_tier
and the next header above (if any), i.e. right above the uppermost header of tier <=min_tier
. -
"none"
to only remove an existing TOC. - A line number as a positive integer to insert the TOC right above this line.
TOC listing_style
There are three possible ways to list the TOC entries and the
parameter listing_style
defines which one it is. Possible
are:
An unordered Markdown list. Set
listing_style
to-
or*
for the respective listing symbols.An ordered Markdown list. Set
listing_style = "ordered"
for this.Just plain text indented by non-breaking spaces (
) according to the hierarchy of the headers. Setlisting_style = "indented"
for this.
While the first two options seem aesthetically superior, the last one is particularly useful in two situations:
Consider a Markdown document with numbered headers like
# 1. Introduction
.-
Consider a Markdown document with a strange header hierarchy…
Notice:
add_toc()
checks if the first TOC entry is of the lowest tier included in the TOC, and if not, automatically setslisting_style = "indented"
to avoid a broken Markdown list (when it happens, a warning message will printed about it).