Tuesday, March 11, 2014

GIT and social coding for open math books

In this essay I want to argue that the open source math book community should move en masse to GitHub. For proof of concept, I’ve taken the liberty of creating a GitHub “organization” at https://open-math-book.github.io/

I think that GitHub can help us achieve many of the goals that have been expressed by other authors in recent blog posts here. Jim Hefferon has suggested that we have something akin to CTAN -- GitHub would let us achieve that without incurring any expenses (actual dollars for paying for servers, or time spent developing hosting software and maintaining the site). David Farmer has suggested that the community should adopt a standard for “meaningful math markup” and in a separate post he suggested developing a standard for the numbering of theorems, lemmas et cetera -- such standards, along with software tools for helping authors to make use of them could become community endorsed projects on GitHub.

I’d like to issue two disclaimers. One is that, even though I keep beating the GitHub drum, I have no financial interest in the company. The other is that I am still a Git neophyte. I decided to host my book, A Gentle Introduction to the Art of Mathematics, at GitHub last autumn. I had made the decision to offer multiple versions of the book and keeping everything organized seemed like a daunting task. I had been introduced to Git and GitHub at an MAA Prep workshop on WeBWorK back in July, and it seemed to fit the bill for my needs. I’ve found that using a Git repository for the source code of my project simplifies my life tremendously.

A little background information:

There are two similarly named entities under discussion, Git and GitHub. Git is a revision control system written by Linus Torvalds. If you’ve used RCS or CVS or SVN, you’ll have the general idea. The system allows a group of people to work simultaneously on the code for some software project and it automatically integrates the changes. Very rarely, there can be a so-called “collision” in which incompatible changes are made to the same lines within a source file, but for the most part the merging of various contributor’s work gets done automagically.

GitHub is a hosting service. It is operated by a company whose business model includes offering their service for free to open-source projects. This seems appropriate as the backbone of their system is the (open-source) Git software. But GitHub does much more than merely hosting source code repositories, a user and/or organization can have blogs and wikis, there is a nice system for tracking bugs and issues.

Many well-regarded projects are currently hosted at GitHub:
  1. The community-driven taco repository https://github.com/sinker/tacofancy (okay, so this one may not be that well-regarded, but it is delicious!)

Pros:

It is free for open-source projects.

No need to “reinvent the wheel.”

Each Git repository that someone downloads is a full backup of the project. Thus the data is very reliably backed up in a distributed sense -- a very different scenario from what would have happened a short time ago had the linux box in the back of my office gone pffftt!

Having lots of projects available from a central source will give prospective authors a boost by providing examples of best practice.

Cons:

What if the company goes belly up? (I deem this rather unlikely.)

Only viable for open source projects -- some notable math books are free but not open.

A full revision control system may be overkill during the early stages of a project.

Issues:

“Authorship” may become a somewhat nebulous term. The normal model in open source programming may not be quite right when it comes to books, and academics need be cautious about getting appropriate credit for their work when it comes to P&T.

A balance would need to be maintained between the open, inclusive, “big tent” approach and the desire to be a bit more restrictive. Personally, I’d lean towards the experimentalist side of things and rely on other organizations (e.g. AIM) to provide an imprimatur indicating the projects that are more fully baked.

To date, I don’t think any open source math book has taken full advantage of the social coding paradigm. This may be largely due to the “Authorship” issue above. I’d like to see some truly collaborative project gets going and I think that GitHub is currently the best place to do that.

A final word.

Last Spring I dealt with a nightmare scenario when my university transferred control of its web servers from the IT department to Public Affairs. My personal site was deleted and it took several months to get it reinstated. During the interim, I created a Google site and informed those adopters, that I knew of, of the situation. I would have been a much happier person if I had been working in a manner that wasn’t dependent on infrastructure that wasn’t under my control! GitHub may not be entirely under my control either, but hosting projects is their core business. Possibly I’m becoming too cynical, but it seems to me that the core business of most university websites is appearing attractive to high school seniors...

Joe Fields

8 comments:

  1. Tacky I know to be first post since I'm the one that posted it, but...

    I would love to try this and calculus seems to be the natural choice for a first attempt. I would also like to use Beezer's markup language as the foundation for typesetting the text. We could start by collectively converting Guichard's text to Beezer's markup language and hosting it on github.

    Another big requirement to getting something like this going would be a high quality guide for contributors that describes precisely how to contribute a section, an exercise, a figure, an interactive etc to the text.

    The auditing features of github would help keep track of who contributed what and help people get credit for their work.

    ReplyDelete
  2. I agree with everything said. I think GIT is a learning curve that requires practice, at least for me. It would be great to have a high quality guide on how to make a contribution (or even to start your own, and merge other contributions).

    Thanks for the post, Joe.

    ReplyDelete
  3. Nice post, Joe. Like David, I agree. I've been learning git for a while now, having used Mercurial for some time. It is a steep learning curve, but no fatal mistakes. Worst thing that happened was I got two copies of everything in one section of a book. And once you get the hang of it, it is rather nice to have several different sections in-progress on different branches before you are ready to make them public. Two experiments that might illustrate some of the social aspects.

    Last semester I had three students reporting a lot of typos in FCLA, some induced by my switchover from LaTeX to XML. I trained them to create the edits on GitHub, which they did easily. I did not have to go hunting all over to find where to make an edit, I just applied the changes they'd created from GitHub. One student I trusted so much on little items that I sometimes did not even review her edits before incorporating them.

    Another student did an independent study with me, working a pile of exercises from a graduate-level monograph. He kept his work in a shared Git repository, and then I would just pull from it the day before our weekly meeting.

    I have three books going on GitHub: FCLA, and then a "Second Course in Linear Algebra" that I have been working hard on this semester, plus a shorter "Explorations in Algebraic Graph Theory with Sage" that is a project with Chris Godsil. (I'll add them to the wiki you've started.) Judson's Abstract Algebra has been in a Mercurial repository for almost 4 years now, hosted on BitBucket (it is easy to move from Mercurial to git, and to restart on GitHub, I did that with FCLA). Judson has an ODE book started which may be public soon.

    I welcome corrections, suggestions and contributions to FCLA (see the text "changelog" file with roughly 400 changes credited to others). But ultimately, I am the author, so authorship has not been a "big tent" sort of thing. Though I would say git (or something similar) would be mandatory once you have two or more authors.

    Backups: I have copies of my git repositories on a server at the university, so it is comforting to push to both places, in the event my disl goes "pffftt."

    ReplyDelete
  4. There is also bitbucket (limits collaborators rather than number of free projects) and one can also install one's own git server, if anything happens to github. Bitbucket (and perhaps github) also has unlimited academic accounts.

    ReplyDelete
  5. Only because it is topical, today I had my first "social" correction to FCLA from a "stranger" via a pull request on GitHub. In the very first paragraph of the book, no less.

    https://github.com/rbeezer/fcla/pull/62

    ReplyDelete
  6. I just came across this and wanted to make a connection to another community: people writing research-level mathematics on github. I and a couple dozen other mathematicians wrote such a book using github last year: here is Andrej's blog post about the experience. It was a great success, and we've received many corrections and suggestions from readers since the initial release. I have no doubt that github will work well for textbooks of all levels as well.

    ReplyDelete
  7. This comment has been removed by a blog administrator.

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete