Monday, February 17, 2014

Numbering in Papers and Books

I am preparing to release a Python program that converts math papers and books into HTML. A key design decision is that I am not attempting to replicate everything in the PDF version. In particular, I think it is time to stop repeating common errors. A particular error I want to avoid is un-helpful numbering. I'd be interested in hearing comments on my numbering proposal.

There is a long list of possible numbering schemes, and a short list of reasonable numbering schemes. For example:
  1. Number every equation with a single integer, sequentially throughout the entire multi-section document.
    *Bad* It is hard to find the equation you are looking for.
  2. Number everything in the format section.subsection.number , where every environment has its own separate numbering.
    *Bad* If Theorem 3.2.4 is followed by a Lemma, that should be Lemma 3.2.5, not Lemma 3.2.whatever-the-next-lemma-number-is.
  3. Number everything in the format section.subsection.number , where you group environments in a sensible way and have only a handful of sequential numberings.
    *Good*  For example Lemma 3.2.4, Theorem 3.2.5, Corollary 3.2.6.
At present, my program uses the 3-level numbering of Item 3. above, where books have chapter.section.environment_number and papers have section.subsection.environment_number
 I only keep track of the following global numbers:
  • chapter
  • section
  • subsection
  • subsubsection
  • equation
  • theorem-like (i.e., lemma, proposition, remark, conjecture, example,...)
  • figure (a "figure" is anything with a caption.)
So, except for the "sections", there are only 3 other sequential numbers throughout the document.

In addition, there are "local" numbers, as in a list or the exercises at the end or a section. Exercises sprinkled throughout a section would use the theorem-like number.

Question 1. Is there something missing, or unwanted side-effects, from this numbering scheme?

Question 2. If a paper has sections but no subsections, then it makes sense to use a 2-level instead of a 3-level numbering. But what if some sections have subsections, and others don't? Do you mix 2-level and 3-level numbering?

Question 3. What choices, if any, should be left to the author?

Comments welcome.
David Farmer
American Institute of Mathematics

[Post your responses in the comment section below.]


  1. I use the same system as David, for the same reasons, except that I number by section and not by subsection, as either some sections do not have subsections, or the subsections are not so long, or both.

    I still have not decided if I prefer to use the same counter for equations as for theorem/lemma/remark environments, which is slightly different from what David suggests.

  2. I number exercises in sequence with theorems and lemmas. But a number of people have told me that they don't like that, that they are used to separately-numbered exercises.

    --Jim Hefferon

  3. It seems to me that numbering is really irrelevant. The only purpose is the ease of finding the referred-to item within the textbook, and with hyperlinks we really don't need that. Given the situation, the name of the link, if given at all, might have some relevance to the referred-to item. I've been working on some material presented as a wiki, and every theorem is given a tag describing its content.

    Michael Doob

    1. I agree with you. Publishing in html lets hyperlinks replace reference numbers.

      I think what David is describing is a situation where an author wants to maintain parallel implementations of a text--one in html, one in print. I that case, the numbering is probably something to preserve (but add hyperlinks in the on-line edition).

      I'd like to see authors commit more to html-only texts, but the printed text is not dying a quick death. To really take advantage of what html has to offer, you have to leave the printed text behind.

  4. I would prefer 2-level numbering (either chapter.environment_number or section.environment_number), especially in an application such as yours. The ability to find the item is less important when you can have hyperlinks (or popup boxes). The simpler the numbering, the easier it is to recall what the number refers to. "As we proved in theorem 2, we see..." vs "As we proved in theorem, we completely forgot what that was."

    If a paper has three main results, I think it makes perfect sense to call those results 1, 2, and 3. Of course this would be inappropriate a lot of the time. I don't necessarily argue for leaving these choices up to the author, but providing flexibility is nice.

  5. In my linear algebra textbook, nothing was numbered (as Michael as has suggested). Instead, short titles, with acronyms (initialisms) with at most 5 letters. People don't like it. But my students and I find the popular ones easy to remember. I consider it one of my experiments that was not a success. Maybe there is a middle ground for HTML - numbers and titles, with tooltips that have the titles.

    Like David, I would really like to know what folks prefer. Some random thoughts follow. I have not really decided much myself on this.

    1. LaTeX "floats" figures and tables, which sometimes is very annoying. They could float far enough to be "out of order" in a scheme that numbers everything together.

    2. We scan the left of the page for theorem numbers (etc), and if equation numbers are on the right side, we would scan there.

    3. If you revise a book by adding/removing/consolidating an equation, do you want all your theorems renumbered?

    4. Suppose you break up a book into one subsection per web page, which could make sense. Which subsection of Chapter 2 do you open up to find Theorem 2.26 or Theorem 2.5.17 (versus Theorem I'd suggest a principle for HTML: the granularity of the numbering should meet or exceed the granularity of the web page decomposition. I agree that four-deep starts to look ugly.

    5. The one CRC Handbook Series book I have at home right now goes part.chapter.section, but then goes part.chapter.number for theorems, examples, remarks, tables all together with no attention to where sections break. These handbooks are an extreme situation, but the one I have goes no more than three-deep on numbering.

    6. The ability to revise an electronic edition frequently makes exercise numbers problematic. Add (not always appropriate to add at the end), or delete, and your master list of assigned homework problems breaks.

    1. I think Rob's point 4 is important. If paper1 refers to "Theorem XX" in paper2, it should be easy to find the appropriate page in the HTML version of paper2. If paper2 is a long survey article where each subsection is its own web page, I don't see how to avoid 3 levels in the numbering. Same goes for the print version.

      Even in the unlikely case that the authors of paper2 provided a memorable name for Theorem XX, it is still necessary to automatically generate a label which can be used to refer to the theorem in a way that helps people find it.

      In the future all authors will be sensible and we will have a seamless way to visualize and navigate the vast web of mathematical knowledge. Until then, I don't see how to avoid using a 2- or 3-level automatic numbering scheme that prevents authors from imposing quirky solutions that cause difficulties in certain use cases.

  6. These decisions should be made with the reader in mind. What does the reader get out of a numbering scheme? Just two things come to my mind.

    0. Help in discovering where else to look in the document for certain details. This is better handled by hyperlinks.

    1. A naming device that helps the reader understand the author. As in "Now it follows from Lemma 2.45 that ...". The use of numbers to name
    things is not very helpful---Lemma 2.45 has probably nothing to do with the number 2.45, so the name conveys nothing except a location. It is better to say "By the Prime Number Theorem...". It is better to give suggestive names rather than numbers. One could include hyperlinks at the same time. One could also make an index of named assertions, that would be useful for readers who come to the work to refer to a particular assertion.

    It also seems to me that numbering schemes tend to clutter up the exposition. Some of this is a consequence of Leslie Lamport's decision that makes assigning numbers the default in LaTeX. Some things are only used locally (various lemmas and claims) and need only a local scheme to keep them straight. Isn't it better to only number those things that will be referred to elsewhere in the text?

    1. With an electronic-only text, hyperlinks are best, along with knowls. But it does not feel to me like everybody is ready to abandon paper in favor of a tablet or e-reader, and paper (or PDF even) imposes a linear ordering. That includes Promotion and Tenure committees (unfortunately). Thus the appeal of a linear (or hierachical) numbering scheme.

      I strongly agree with George that "Lemma 2.45" is no help to anybody. But an automatic way of inserting a title of a theorem into running text is cumbersome, while an entirely manual way runs the risk of references not staying in sync with the source. My use of acronyms in FCLA was an attempt to find a middle ground.

      If an author never references a Lemma, it does not preclude somebody else from referencing it. So it seems to me every item needs some sort of handle to grab onto, with a stable, unambiguous and informative URL being the minimum.

      As I write, I am finding it useful to give a title to everything: theorems, lemmas, definitions, examples, perhaps even exercises. Rather than having an index, this makes it possible to provide useful and informative lists at the "back of the book". See for an example, there are several more at the "bottom of the book" in the sidebar.

      I plan to experiment some more with adding titles into tooltips for the HTML version of references as another type of compromise. You can see these in action in FCLA if you hover over most (but not all) of the links or knowls.