Future-proof LaTeX

LaTeX Forever

Specifications for future-proof LaTeX

LaTeX is a markup language, meaning that in addition to the actual content, the source contains 'tags' which provide information about the document contents. If the markup is clear enough, then even a computer can understand the structure of the document, and the source can be converted to many different formats, including formats that do not yet exist. In other words, the document is 'future proof'.

The requirements for future-proof LaTeX will seem less overwhelming if you understand this principle:

A computer program is used to interpret your LaTeX document, but the real audience is a person. Don't assume that the program is latex and don't assume the person has perfect eyesight and is reading a PDF.

A useful thought experiment is to imagine that the programs tex and latex have been lost. How hard would it be for someone to write a program to turn your source file into an attractively formatted web page?

Or, imagine someone wants to write a program to "read" your paper to a blind person. Could that program be fairly simple and mostly just give appropriate sounds and pronunciation based on the markup?

The goal of Future-proof LaTeX/LaTeX Forever is to answer 'yes' to those questions.

We use "LaTeX" to refer to the markup language, and "latex" for the computer program currently used to convert LaTeX into a form suitable for printing on paper. It is LaTeX which can serve as the basis for a document which can can continue to exist into the far future, and continue to be readable using the latest technology. LaTeX can be useful long after latex is obsolete.

The requirements fall into the following categories:

  1. The structure of a LaTeX document
  2. Packages and settings
  3. Macros
  4. Title, author, and other front matter
  5. Sections and subsections
  6. Numbering
  7. References and citations
  8. Fonts, symbols, and characters
  9. Paragraphs, lines, and comments
  10. Theorems, definitions, proofs, etc
  11. Math mode
  12. Images, figures, and graphics
  13. Lists: enumerate and itemize
  14. Bibliography and file management
  15. Boxes, skips, and spaces
  16. To do

The structure of a LaTeX document

The audience for these specifications is people who already write books and papers in LaTeX, and now want to do it better. We summarize some of the main ideas behind LaTeX, but we assume the reader is already a regular user.

LaTeX documents have the following structure:

     \documentclass{type_of_document}

     [packages to use]
     [settings]
     [macros]

     \begin{document}
     [title and author information]
     [abstract]

     [the body of the document]

     [instructions for creating the bibliography]

     \end{document}

We address each component separately.

Packages and settings

By package we mean any file that contains macros. This could be the style file for a journal, a commonly used package that extends the functionality of LaTeX, or your own set of macros.

  1. Use whatever packages you wish, but don't assume that those packages are used by the program that compiles your LaTeX into a human-readable form. In particular, packages which only affect the layout are likely to be discarded, because those layout options may not even make sense in the final format. For example, web pages do not have page numbers, and a format designed for someone with a visual disability will not make use of vertical or horizontal spacing, multiple columns, or font size.

  2. Global settings, such as \graphicspath and \usepackage, should occur before the start of the document content.

  3. Do not use if/then or other conditionals.

    If you really need multiple versions of your document, then define a macro in two different ways, and comment out one of those definitions. You then have the option of commenting out the other definition.

    If the purpose of the if/then was to support both PDF and some other output format, then that need will be addressed by the method used to compile the document.

    If the purpose was to make both a student and an instructor version, then that is easily handled by the alternate definitions method described above.

Macros

The ability to define macros is one one of the great features of LaTeX, but there are good macros and bad macros. Consider this:

      \transpose{A}

It is clear, to anyone who knows the basics of linear algebra, that this macro is meant to describe the transpose of the matrix A.

Is the transpose of A denoted by a 't' on the left, or on the right of A? Is it a lower or upper case T? The beauty of LaTeX is that we don't need to know the answer: different authors can agree on the source markup without having to agree on the definition of the macro. A good macro indicates meaning, with no need to actually see the definition of the macro.

Here is a bad macro:

      \be

What does that mean? Could it mean \begin{equation}? Or maybe it means \beta? Or maybe \bold{e}? There are many things it could mean, and that is bad, because the macro has been used to hide useful information. A slight convenience to the author has obscured the meaning of the document source.

Here is a good macro:

      \adjoint{A}

You know exactly what is meant by \adjoint{A}. But if you only saw A^*, you would not be sure of its meaning.

A bad macro is one where you gain information when you expand to its definition. A good macro is one where expanding to its definition loses information. Try to use only good macros, where the macro name makes its meaning obvious.

  1. Define good macros that help explain the structure and the mathematical content of the document.

    Don't define macros merely to save typing.

    Explanation: If you are in the habit of using a bad macro like \be for \begin{equation}, then just replace the macro by its definition after you complete each draft of the paper. If you are using a good text editor, that will take a total of 5 seconds.

  2. Macros must be defined before the start of the document content.

    Never change catcodes or mathcodes in the body of the document.

    Do not use \makeatletter in your macro definitions. (It is okay if that is in an external package you use.)

    Explanation: If the definition of a macro can change in the middle of the document, then it is much more difficult to write a program that converts the the LaTeX source to a more human-readable form. Remember not to assume that it is tex/latex which is doing the conversion.

    If you know what catcodes are, then you should understand why changing them makes it difficult to write a simple interpreter for a LaTeX document. If you don't already know what they are, then you don't need to know.

  3. Macro names must only involve letters. In particular, do not define the following macros: \<, \>, \~, \[, \]

  4. Do not create macros which hide the fundamental structure of the document. In particular, do not create macros which expand to \begin{equation}, \begin{itemize}, \section, \subsection, or similar.

  5. Material in math mode should be visibly in math mode. In particular, do not use \ensuremath in your macros.

  6. Use \newcommand or \DeclareMathOperator to define macros, and \newtheorem to define environments for theorems/definitions/remarks/etc.

  7. Do no use \def, \let, \newenvironment, \providecommand, or \renewcommand.

    Explanation: LaTeX-style macro definitions, using \newcommand, allows an author to define everything they need.

    Redefining a standard command or a command in a package you are using, is asking for trouble. Redefining your own command is sloppy: comment out the previous version.

  8. Do not redefine standard macros used to indicate diacritical marks or special symbols, including: backslash followed by:

          ', ", `, -, =, ^, ~ , H, c, u, v, l, L, i, o, O, S, SS, P,

    Do not redefine standard LaTeX macros, such as backslash followed by: (, ), [, ].

  9. Use a good editing program so that none of these requirements are burdensome.

    Explanation: There is nothing wrong with using \be for \begin{equation} in the drafts of your papers, if that is your habit. You can replace \be, and your other bad macros, when you are ready to make the paper public. If you use a good editing program, it is only a few seconds work to make that substitution throughout your document.

Title, author, and other front matter

Use the \title, \author, and similar fields as shown below. Be sure to include the \orcid information for all authors, so that the person can be unambiguously identified. Only put the appropriate information in each field: do not put footnotes on the title or author name.

  1. The title and author information go immediately after the \begin{document}.

    Insert the title as

           \title{The title goes here}

    or as

           \title[Short title here]{Full title here}

    Do not put footnotes, font or size commands, or anything else inside the \title: only the exact literal title.

  2. After the title, put each author like this:

           \author{Full name of an author}
           \orcid{The Orcid ID of that author}
           \affiliation{Affiliation of that author}
           \thanks{Thanks from that author}

    The \affiliation, \orcid, and \thanks are optional and can appear in any order (but before the next \author{}). The \orcid is strongly encouraged because it is (at present) the only reliable way to unambiguously identify an author. Go to orcid.org to sign up for your free Orcid ID. You may need to define the \orcid macro.

    If you want a \thanks{} that applies to all the authors, put it after the title and before the first author.

    Do not put footnotes or anything else but the exact literal information inside those fields.

    There is no need to supply an \email if there is an \orcid. (Your \orcid also knows your current \affiliation, but it is traditional to record the affiliation when the paper was written.)

  3. Having an abstract is strongly encouraged. Wrap the abstract in appropriate tags:

           \begin{abstract}
            [Abstract goes here.]
           \end{abstract}

    Do not use \abstract{...} or \beginabstract...\endabstract.

Sections and subsections

  1. Divide the document into sections and subsections, like so:

             \section{Title of a section}
                [words go here]
             \subsection{Title of a subsection}
                [etc, etc]

    If a (sub)section title is particularly long, you can indicate a short version:

              \subsection[Short title]{The long complete title}
  2. Do not use the "starred" version of the (sub)section command.

    If you do, assume the "star" will be ignored and that the (sub)section will be numbered.

  3. If your document is a book, Use \chapter{...} to break it into chapters. Every chapter should be divided into \sections.

    You can choose to use \part{...} as a coarser division than \chapter, but assume that the \parts will have little or no effect on the output format.

  4. If your document has appendix(es), just use

          \appendix

    followed by the sections (or chapters) of the appendixes, using the same \section or \chapter commands as the main content.

Numbering

[[explain why most numbering options are illogical, and that the author should not assume that the published version of the document has the same numbering as the original.

Do not use \setcounter or [list other things to not use] ]]

References and citations

[[explain why it is better to use \label{} coupled with \ref{}, \eqref{}, and \cite{}.

Put \href{} in this section?

A shortcoming of TeX/LaTeX is that one cannot use \ref{} in all cases. The system should have been designed so that \ref{} acts differently depending on whether its target is an equation, a bibliographic entry, or something else.

Use only letters, numbers, dash (-) and underscore (_) in labels. In partucular, do not any other punctuation symbols, or spaces. In a purely LaTeX world, it is natural to use a colon (:) in a label, as in \label{eqn:ec_3} or \label{thm:whatever}. But in other contexts, the colon indicates a "name space". Just use a - or _ instead of the :. ]]

Fonts, symbols, and characters

  1. Do not use TeX-style font switches: {\it ...}, {\bf ...}, {\sl ...}, etc.

    Don't use \pmb.

    Use \emph{emphasize these words} when emphasizing. Do not use \textit for emphasis. Use \term{new terminology} when defining a word. Except for \emph{} for emphasis and \term{} for words beign defined, you should not need any other font-like markup in the body of the document. (Note that you need to define the \term macro, or use the XXXX package.)

    Do you want emphasized words to be bold? Then redefine the \emph{...} macro. Don't use \textbf{...}, except in the definition of a macro.

    In the unlikely event that you need to switch fonts, use LaTeX-style font directives: \textit{...}, \textbf{...}, etc. Note: do not use these in the bibliography -- see separate biblio entry.

    Explanation: In traditional typesetting, font changes are used to indicate emphasis, foreign words, terminology, published works, names of ships, and various other elements. Of these, emphasis and terminology are the most common in math papers. The source markup should indicate the reason for the font change. Thus, \emph{} for emphasis, \term{} (or \terminology{}) for terminology. Note that \term{} is not a standard LaTeX macro, so you can define it, or use the XXXX package. If you need the name of a ship in your document, then define a \shipname{} macro.

  2. Do not invent new symbols by overlapping characters or using \joinrel: only use actual characters that have a standard encoding.

    Explanation: Someone who can't actually see the symbol will not know how to interpret it. A screen reader may not be able to provide a sensible pronunciation. More that 100,000 unicode characters have been defined, so probably one of them is close to what you want.

Paragraphs, lines, and comments

  1. Leave a blank line to indicate a paragraph break. Never use \par.

    Never use \\ to create a newline in text: only use \\ to indicate the end of a line in a table or multiline equation.

  2. Use

            % this is a comment

    to leave a comment, but assume that private comments will be visible to the reader, and important comments will be ignored.

  3. Don't make a comment that contains no content. In particular, do not put a % by itself on a line.

    Explanation: A % at the end of a line, or on a line by itself, is occasionally helpful in macro definitions. But in the body of the document there are better ways to improve the readability of the source markup.

Theorems, definitions, proofs, etc

This is simple: mark up a theorem, definition, or proof like this:

     \begin{xxxx}
      Content of the theorem/definition/proof goes here.
     \end{xxxx}

Here 'xxxx' is whichever of theorem, definition, proof, etc you are writing. If you need to give a title, then put the title in square brackets like so:

     \begin{theorem}[The fundamental theorem of algebra]
      If $f \in \R[x]$ then $f$ has a root in $\C$.
     \end{theorem}

Don't use abbreviations: 'thm' for 'theorem', 'ex' for 'example', etc. The small amount of time saved is not worth the chance of being misunderstood

Math mode

  1. For inline math, you can use the official LaTeX delimiters \(, \), or the old-fashioned dollar signs.

    For display math, neither the original LaTeX \[, \] nor the archaic TeX double-dollar signs behave appropriately in all cases. It is best to be explicit and use

         \begin{equation}
           [formula goes here]
         \end{equation}

    for simple one-line display math. Use gather or align for multi-line displays. [[list more options \alignat]]

    Don't use the "starred" version of these environments. It is perfectly fine to number every equation, and all those numbers are not annoying once you get used to them. It is extremly annoying when someone wants to refer to a formula in your paper and it does not have a number. (If you do use \begin{equation*}, assume that the publisher will choose to ignore the "*">)

  2. Put the \begin{equation}, \end{equation}, and similar tags, on a line by itself, as illustrated above. This makes the source easier to read and edit.

  3. Text in math mode should not contain math.

    Correct:

          \begin{equation}
              f(x) = 3 \text{ if } x > 5
          \end{equation}

    Incorrect:

          \begin{equation}
              f(x) = 3 \text{ if $x > 5$}
          \end{equation}
  4. Use semantic macros: the meaning of the macro should be clear from the name of the macro.

    Explanation: If you define \ba,...\bz to mean \boldmath{a},...\boldmath{z}, respectively, then you are needlessly obscuring the meaning of the symbols.

    Are those vectors? Then use \vec{a},..., where you can define \vec to be \boldmath. If those are ideals or fields or something else, then define a short macro that indicates the type of object it is.

Images, figures, and graphics

[[need to write this section. how many different good ways are there to include graphics and captions?]]

Lists: enumerate and itemize

[[need to write this section. enumerate, itemize, \li

What can go in a list? ]]

Bibliography and file management

[[need to write this section.

bibtex, right?

There is nothing wrong with having one big file containing the LaTeX source of your paper. But, especially if the paper is long, it can be helpful to break it into smaller files. (Book authors, see separate advice about books.)

Use one main file that contains the header, title, author, and abstract. Create a separate file for each \section of the paper, and \input each section file in the main file. Note: \input, not \include.

Give a meaningful name to each section file, ideally it should be the same as the \label of that section.

File extensions must be lower case, and file names should only use letters, numbers, dash (-), and underscore (_). In particular, never put spaces in file names. ]]

Boxes, skips, and spaces

Commands such as \minipage, \vskip, \mbox, etc are good for micromanaging the appearance of a PDF designed for printing. Those directives do not translate to other media and should not be in the author's version of the document. You should assume that these types of commands will be ignore (or deleted) when the document is processed.

  1. Do not use any of these in the document body: \minipage, \vbox, \hbox, \noindent,[[list several more]] Those may have a worthwhile place inside the definition of a macro, but they have no place in the document body [[say more/differently]].

  2. [[ put some more examples ]]

To do

A supplement for book authors: license, ISBN, edition, etc. [[ If you are writing a book, particularly if you are writing a textbook, you should consider using the PreTeXt authoring system. ]]

side-by-side

subtitles (only needed for a book?)

verbatim (inline and block)