In this essay I want to argue that the open source math book
community should move en masse to GitHub. For proof of
concept, I’ve taken the liberty of creating a GitHub “organization”
at https://open-math-book.github.io/
I think that GitHub can help us achieve many of the goals that have
been expressed by other authors in recent blog posts here. Jim
Hefferon has suggested that we have something akin to CTAN -- GitHub
would let us achieve that without incurring any expenses (actual
dollars for paying for servers, or time spent developing hosting
software and maintaining the site). David Farmer has suggested that
the community should adopt a standard for “meaningful math markup”
and in a separate post he suggested developing a standard for the
numbering of theorems, lemmas et cetera -- such standards, along with
software tools for helping authors to make use of them could become
community endorsed projects on GitHub.
I’d like to issue two disclaimers. One is that, even though I keep
beating the GitHub drum, I have no financial interest in the company.
The other is that I am still a Git neophyte. I decided to host my
book, A Gentle Introduction to the Art of Mathematics, at
GitHub last autumn. I had made the decision to offer multiple
versions of the book and keeping everything organized seemed like a
daunting task. I had been introduced to Git and GitHub at an MAA
Prep workshop on WeBWorK back in July, and it seemed to fit the bill
for my needs. I’ve found that using a Git repository for the
source code of my project simplifies my life tremendously.
A little background information:
There are two similarly named entities under discussion, Git and
GitHub. Git is a revision control system written by Linus Torvalds.
If you’ve used RCS or CVS or SVN, you’ll have the general idea.
The system allows a group of people to work simultaneously on the
code for some software project and it automatically integrates the
changes. Very rarely, there can be a so-called “collision” in
which incompatible changes are made to the same lines within a source
file, but for the most part the merging of various contributor’s
work gets done automagically.
GitHub is a hosting service. It is operated by a company whose
business model includes offering their service for free to
open-source projects. This seems appropriate as the backbone of
their system is the (open-source) Git software. But GitHub does much
more than merely hosting source code repositories, a user and/or
organization can have blogs and wikis, there is a nice system for
tracking bugs and issues.
Many well-regarded projects are currently hosted at GitHub:
- The community-driven taco repository https://github.com/sinker/tacofancy (okay, so this one may not be that well-regarded, but it is delicious!)
Pros:
It is free for open-source projects.
No need to “reinvent the wheel.”
Each Git repository that someone downloads is a full backup of the
project. Thus the data is very reliably backed up in a distributed
sense -- a very different scenario from what would have happened a
short time ago had the linux box in the back of my office gone
pffftt!
Having lots of projects available from a central source will give
prospective authors a boost by providing examples of best practice.
Cons:
What if the company goes belly up? (I deem this rather unlikely.)
Only viable for open source projects -- some notable math books are
free but not open.
A full revision control system may be overkill during the early
stages of a project.
Issues:
“Authorship” may become a somewhat nebulous term. The normal
model in open source programming may not be quite right when it comes
to books, and academics need be cautious about getting appropriate
credit for their work when it comes to P&T.
A balance would need to be maintained between the open, inclusive,
“big tent” approach and the desire to be a bit more restrictive.
Personally, I’d lean towards the experimentalist side of things and
rely on other organizations (e.g. AIM) to provide an imprimatur
indicating the projects that are more fully baked.
To date, I don’t think any open source math book has taken full
advantage of the social coding paradigm. This may be largely due to
the “Authorship” issue above. I’d like to see some truly
collaborative project gets going and I think that GitHub is currently
the best place to do that.
A final word.
Last Spring I dealt with a nightmare scenario when my university
transferred control of its web servers from the IT department to
Public Affairs. My personal site was deleted and it took several
months to get it reinstated. During the interim, I created a Google
site and informed those adopters, that I knew of, of the
situation. I would have been a much happier person if I had been
working in a manner that wasn’t dependent on infrastructure that
wasn’t under my control! GitHub may not be entirely under my
control either, but hosting projects is their core business.
Possibly I’m becoming too cynical, but it seems to me that the core
business of most university websites is appearing attractive to high
school seniors...
Joe Fields
Tacky I know to be first post since I'm the one that posted it, but...
ReplyDeleteI would love to try this and calculus seems to be the natural choice for a first attempt. I would also like to use Beezer's markup language as the foundation for typesetting the text. We could start by collectively converting Guichard's text to Beezer's markup language and hosting it on github.
Another big requirement to getting something like this going would be a high quality guide for contributors that describes precisely how to contribute a section, an exercise, a figure, an interactive etc to the text.
The auditing features of github would help keep track of who contributed what and help people get credit for their work.
I agree with everything said. I think GIT is a learning curve that requires practice, at least for me. It would be great to have a high quality guide on how to make a contribution (or even to start your own, and merge other contributions).
ReplyDeleteThanks for the post, Joe.
Nice post, Joe. Like David, I agree. I've been learning git for a while now, having used Mercurial for some time. It is a steep learning curve, but no fatal mistakes. Worst thing that happened was I got two copies of everything in one section of a book. And once you get the hang of it, it is rather nice to have several different sections in-progress on different branches before you are ready to make them public. Two experiments that might illustrate some of the social aspects.
ReplyDeleteLast semester I had three students reporting a lot of typos in FCLA, some induced by my switchover from LaTeX to XML. I trained them to create the edits on GitHub, which they did easily. I did not have to go hunting all over to find where to make an edit, I just applied the changes they'd created from GitHub. One student I trusted so much on little items that I sometimes did not even review her edits before incorporating them.
Another student did an independent study with me, working a pile of exercises from a graduate-level monograph. He kept his work in a shared Git repository, and then I would just pull from it the day before our weekly meeting.
I have three books going on GitHub: FCLA, and then a "Second Course in Linear Algebra" that I have been working hard on this semester, plus a shorter "Explorations in Algebraic Graph Theory with Sage" that is a project with Chris Godsil. (I'll add them to the wiki you've started.) Judson's Abstract Algebra has been in a Mercurial repository for almost 4 years now, hosted on BitBucket (it is easy to move from Mercurial to git, and to restart on GitHub, I did that with FCLA). Judson has an ODE book started which may be public soon.
I welcome corrections, suggestions and contributions to FCLA (see the text "changelog" file with roughly 400 changes credited to others). But ultimately, I am the author, so authorship has not been a "big tent" sort of thing. Though I would say git (or something similar) would be mandatory once you have two or more authors.
Backups: I have copies of my git repositories on a server at the university, so it is comforting to push to both places, in the event my disl goes "pffftt."
There is also bitbucket (limits collaborators rather than number of free projects) and one can also install one's own git server, if anything happens to github. Bitbucket (and perhaps github) also has unlimited academic accounts.
ReplyDeleteOnly because it is topical, today I had my first "social" correction to FCLA from a "stranger" via a pull request on GitHub. In the very first paragraph of the book, no less.
ReplyDeletehttps://github.com/rbeezer/fcla/pull/62
I just came across this and wanted to make a connection to another community: people writing research-level mathematics on github. I and a couple dozen other mathematicians wrote such a book using github last year: here is Andrej's blog post about the experience. It was a great success, and we've received many corrections and suggestions from readers since the initial release. I have no doubt that github will work well for textbooks of all levels as well.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by the author.
ReplyDelete