Announcing htmlbib, a tool for rendering BibTeX files as interactive HTML

For some time now, I’ve been working on an annotated bibliography of articles on various topics in transportation (particularly the history of automatic fare collection from 1960 to the present, as well as the SelTrac train control system and its origins in Germany). I’ve been compiling the information using BibDesk, and I’d like to be able to share it with a wider audience, in the hope that it might be useful to someone.

At a bare minimum, posting the BibTeX file online somewhere would fulfill my desire to get the information out there. But not everyone out there who might benefit from the bibliography uses BibTeX. For many people, I fear a .bib file would be nothing more than unintelligible gibberish; outside of academic circles (and even then, outside of the hard sciences), TeX is not particularly well-known.

The next alternative would be to post the bibliography online as a PDF or HTML file. This alternative is considerably more accessible to non-BibTeX users, but actually makes life harder for people who would like to be able to copy references (as BibTeX source) to use in their own BibTeX files (common practice in communities of TeX users). Merely rendering the entire contents of the file also loses some of the metadata—the comments associated with entries, the groups and keywords, etc.

There are also specialized tools (like bibtex2html) for converting a BibTeX file to HTML. But there, still, the results fall short; the output is mostly static text. I wanted a tool that would make good use of the keywords entered in BibDesk, and which would provide links between publications and authors. I also wanted a tool which would be equally useful for BibTeX users, who would be helped by having access to the BibTeX source for each entry, and non-BibTeX users, who would be helped by having formatted bibliography entries. I therefore set out to built a tool that would meet my needs; the result is htmlbib.

One of the items of concern for me was that the bibliography entries be formatted properly; after having taken care to make sure that the information was added to BibDesk so that it would be rendered well, I did not want to have some generic template used to create HTML for each entry. So, I ended up cobbling together an arrangement that actually uses BibTeX and tex4ht to produce HTML for each entry using the desired BibTeX style (in my case, IEEEtran), so that the entries look the same in the preview as they would in an actual publication. This is slow, but the preview results are cached, so subsequent runs are faster.

As for parsing the BibTeX file, since I’m already familiar with scripting BibDesk, I decided to use appscript to call BibDesk from Python. The result is therefore not portable from OS X, but it suits my needs. There are BibTeX parsing libraries for Python, so porting to another platform would only require substituting one of those libraries of the calls to BibDesk; the rest is pure Python, with the exception of lxml, and the aforementioned preview code, which expects a functioning TeX installation on the system.

The HTML is produced using Jinja2 templates, which for now are stored in the application egg. The default, built-in template is built on Blueprint CSS and jQuery along with jQuery Tools. It wouldn’t be too hard to provide an option for using user-specified templates instead of the built-in template.

I’ve uploaded some sample output to demonstrate what htmlbib does.

Google Apps Script: like AppleScript for the Web

I’ve always been a huge fan of AppleScript for automating tasks in scriptable appplications and (more importantly) gluing scriptable applications together. Particularly when working with applications which are designed to take full advantage of AppleScript, like BibDesk, Delicious Library, and XTension, AppleScript makes even complex tasks easy. Unlike macros which are confined to a single application, AppleScript is based on top of Apple Events, making it easy to target any scriptable application, even on a remote Mac over the network. More importantly, AppleScripts aren’t macros; they don’t just play back keyboard and mouse events; you get a real object-oriented view of the data being manipulated. But really good scriptable applications are hard to come by, and of course AppleScript does you no good if you’re using cloud-based applications like Google Docs.

Browser automation tools, like Selenium, and libraries like mechanize help fill the gap somewhat, but they’re far from providing the same rich environment that AppleScript does. To give a concrete example, I was recently working on a spreadsheet listing Twitter accounts for the top 50 transit agencies in the US (more on that project here). In the spreadsheet, I’d listed agencies’ accounts by username (that is, @username). But what I really wanted was a link to each account on Twitter (that is, http://twitter.com/username). I could have entered the links manually, but that would have required needless manual work. If I were using a conventional spreadsheet application on the desktop, I could have used whatever macro or scripting facility it provided, or I could have exported the file to CSV and used sed and awk to get the job done. But I was working in the cloud; I knew there had to be a better way.

Enter Google Apps Script. Google Apps Script provides for Google’s cloud-based applications the same scriptability that AppleScript provides for desktop applications on the Mac. In only a few minutes, after studying the documentation, I was able to produce a script which achieved the desired effect.

Buoyed by my quick success, I decided to try going a step further: what if I could use the Twitter API to automatically set each cell’s comment to the most recent Tweet? Doing so would give viewers a quick preview of the Twitter account’s content, without leaving the spreadsheet. Working off of some sample code from Google, I quickly wrote another script to do the job. I ran into trouble for a while until I found that the “Callback URL” in the Twitter application settings must be set to https://spreadsheets.google.com/macros; once that was done, everything worked perfectly. (Incidentally, the error message given in that case, “unexpected error”, is completely useless, and gives no clue as to the actual problem.) From there, all I had to do was set up a time-based trigger to run the script automatically so the Tweets would update periodically, and I was done.

For me, the real point—and the power of Google Apps Script—is how quickly and easily I was able to not only automate otherwise-tedious processes, but draw in data from disparate sources and display it automatically. I’ve only scratched the surface of what can be done with Google Apps Script; the technology can be made to do a lot more.

Using shortDOIs automatically in BibDesk

I’m quite a fan of using DOIs to refer to online resources when possible. However, some DOIs are a bit ungainly, and particularly for readers working from a printed bibliography, they’re outright inconvenient. Who wants to type in something like 10.1002/(SICI)1097-0258(19980815/30)17:15/16<1661::AID-SIM968>3.0.CO;2-2? Even when working from a digital copy, a string that long is bound to get mangled somewhere if it gets copied and pasted around, sent in emails, etc. You could use a conventional URL-shortening service, but that’s probably not appropriate in the context of a published paper. So, how can you continue to get the benefits of the DOI system without exposing your readers to long, ugly URLs?

The answer is the shortDOI service, which transforms DOIs into shortcuts that are a lot easier for your readers to use. Every shortDOI generated is itself a DOI, so the conventional risk of a URL-shortener shutting down and taking the shortcuts along with it isn’t a problem. As long as the DOI system is functioning, shortDOIs will be resolvable.

For example, the DOI 10.1109/JRPROC.1929.221679 can be dereferenced by using the URL http://dx.doi.org/10.1109/JRPROC.1929.221679. When this is shortened with shortDOI, the result is the DOI 10/bpc. This can be dereferenced with the URL http://dx.doi.org/10/bpc (note that that’s no different than any other DOI), but, more importantly, it can also be dereferenced with the URL http://doi.org/bpc. It’s this last URL that is important for our purposes, as it’s the shortest.

Now, shortDOIs can be manually generated, but why bother, if the process can be automated? I use BibDesk for managing references, and BibDesk is a scriptable application, so an AppleScript was the easiest solution to the problem. I’ve posted the script on GitHub; you can find it here. For every publication in a BibDesk document which has a DOI entered and which does not have a shortDOI shortcut, it will retrieve the shortDOI shortcut for the publication’s DOI, and store it in the URL field.

MultiStreamer, a work in progress

Last Wednesday, in the midst of the poor weather, I discovered that the Montgomery County Fire & Rescue Service streams four of their many trunking talkgroups online. But VLC will only open one stream at a time, and while you can open multiple streams at once in QuickTime Player, it’s still sort of ungainly. It seemed to me that there had to be a better way; it would be nice to be able to save and restore lists of streams, and set the volume and balance per-stream, but also have a master control for all of the streams.
MultiStreamer screenshot

The result is depicted in the screenshot above; the code is available here on GitHub. All of the basic functionality actually works at this point, but there are still a number of items in the TODO file. The software uses Apple’s poorly-documented AppleScriptObjC framework, although there is also a bit of Objective-C in the codebase, mainly for interfacing with QuickTime (which is a C API) to do things that QTKit can’t. (As a result, it only runs as a 32-bit application, but this isn’t a serious drawback.)

Incidentally, this is the first application I’ve developed that compiled to native code in quite a long time—while I got my start developing applications using FutureBasic on the Mac (which produced native code), after that I learned Perl and PHP, followed by Java and then Python. All of those are languages which are either interpreted or which compile to bytecode. I have also dabbled in Erlang and even Visual Basic, and they, too, do not produce native code. While the AppleScript parts of this project are not compiled to native code, the Objective-C parts most certainly are.