Monday, February 24, 2014

A write once, read anywhere document preparation system.

I've been intrigued for some time by the possibility of converting source documents into a variety of output formats.  This began in 2006 when I used tex4ht to convert the $\LaTeX$ source for my linear algebra textbook (FCLA) into HTML.  Around 2009 I began to experiment with using this conversion to create worksheets viewable in the Sage notebook.  The idea was to have Sage code in the source, and have the code be "live" in the notebook - editable and executable.  It was a real challenge to begin with $\LaTeX$ and convert to the Sage notebook format (mildly perturbed HTML, rendered with jsMath) so that the text, $\LaTeX$ and Sage all survived intact.  With support from the UTMOST grant I was able to get this working for FCLA and Judson's abstract algebra text.

Fast-forward to May 2013.  Now we have MathJax as the very worthy succesor to jsMath.  Jason Grout and his students have created the Sage Cell server, allowing an easy embedding of live Sage code in a web page.  Harald Schilly and David Farmer have created knowls, making cross-references and detail-hiding much more efficient than hyperlinks.  The previous year I had gained a lot of hard-won experience converting the source for FCLA to a one-off version of XML.  This made many more things possible, with the powerful text transformations available through XSL processing.  The result is the current web version of FCLA, in addition to a print-on-demand hardcover version.

With support from a Shuttleworth Flash Grant, I have started to build a general XML application for creating textbooks and other scholarly documents (research papers, monographs, etc).  The main goals are to deliver on the promise of separating presentation from content, and to be as simple as possible for authors to quickly get effective results.  Most mathematicians will tell you that $\LaTeX$ is all about separating content from presentation, but if you spend enough time trying to parse it programatically you discover it is full of inconsistencies and hidden/implied structure.  And that is before authors start using the abundance of add-on packages.  Quick: what does "\chapter" do?  Answer below.

Unlike DocBook, there is extensive support for mathematics, both displayed and numbered equations and the additional structure of definitions, theorems, claims, remarks and all the cross-referencing we expect.  Authors enter shortcut $\LaTeX$ macros for commonly used mathematical constructs one-time only, and they get used for MathJax and for $\LaTeX$ output.  HTML output can embed the Sage Cell server and GeoGebra applets, with Code Mirror, JSXGraph, and video all planned or partially implemented.

My intent is to very carefully design the XML elements.  They will be limited to expressing document structure and semantics, while preserving $\LaTeX$ markup for mathematics proper (only).  This will not be the ugly verbose XML that is created by programs.  I am writing XSLT converters to $\LaTeX$ and HTML as demonstrations, others may want to write other converters.   I have written, but not released yet, converters to Sage notebooks, SageMathCloud worksheets, and iPython notebooks.  I am experimenting with an extension for letters that includes images for letterhead and scanned signatures for PDF versions (since the $\LaTeX$ letter class drives me crazy).  I have used these tools for an article that will appear in the Monthly, for a monograph I am writing on combinatorial designs, and for lecture notes (a book, really) for my advanced linear algebra class this term.  It is very liberating to forget about $\LaTeX$, HTML, CSS, Javascript, and just concentrate on writing content, knowing your tools will produce something useful and appealing on the other end, with a single command-line invocation.

The link below is an example of the latest iteration, with plenty of rough edges in the content and presentation.  David Farmer gets credit for driving the creation of useful and general HTML markup, which will be useful beyond just this project.  A UPS student, Michael DuBois, designed the CSS, both visually and functionally, which accompanies the HTML.

So this is an invitation to become involved, as a user or by making suggestions, or just by keeping an eye on developments.  I do not yet have everything consolidated on a single website, but plan to soon and will announce it on the Google Group, and perhaps here.  In the meantime...

If you are interested, please join the discussions at the Google Group:!forum/mathbook-xml-support

Code is on GitHub (everything is on the dev branch):

Rob Beezer
University of Puget Sound

(Answer: yes, \chapter usually gets you a chapter heading.  Unless it comes sometime after \backmatter, in which case it gives you an appendix heading.)


  1. How does this compare with GAPdoc?

  2. It looks identical in philosophy. But also tightly bound to GAP it seems, both in purpose and in the processing tools.

    All you need to use MathBook XML is a text editor and the command-line tool, xsltproc, which seems ubiquitous on Macs and Linux. I've found some xsltproc binaries for Windows, but don't have any report yet of anybody using them. So you don't need much at all to get started.

    I'll have to spend more time looking at GAPDoc, it seems to have good support for BiBTeX, which I have not yet dealt with. Thanks for pointing it out, I was not aware of it.

  3. Rob, Interesting article. The final two links do not work for me; I just get a blank page with the url in the field. (Firefox under Ubuntu)

    Jim Hefferon

  4. Weird. Not sure what's going on with those links, but until I figure it out, you can right-click on the link and select "Open in a new tab" (or similar) and it seems to work.