A Python script for visualizing GTFS data

See here for recent updates to tph.py; the software no longer uses Google's transitfeed module, in favor of the SQLite-backed gtfs library, and some of the restrictions described below have been removed.

Over the weekend, I put together a little Python project which I am now releasing: tph.py, a tool for visualizing transit service levels using data from GTFS feeds. Over at Raschke on Transport, I've posted some examples, but here I'd like to discuss the technical underpinnings.

The script uses Google's transitfeed library to parse the GTFS data. Loading a large GTFS schedule with transitfeed can take several minutes, but I wouldn't consider that to be unexpected when dealing with a CSV file that might be close to 200 MB. Still, though, I'd like to see a uniform way for getting GTFS data into (for example) a SQLite database, so that SQL queries can be issued against the dataset. In addition, the script depends on certain optional fields being present in the GTFS dataset—in particular, trips must use direction_id, ~~and either trips or stoptimes must have the headsign defined~~ (a headsign will be synthesized from the last stop in a trip if none is present otherwise). These issues could be worked around, but for now it's easier to assume that these fields will be present.

After having extracted the hourly service values for each of the target routes, the script uses matplotlib to generate the service graph. The matplotlib API is complex, but by the same token I was able to generate the plot I wanted without too much effort.

Because the most time-intensive part of the script is the initial step of loading the GTFS schedule into memory, I designed the script to generate multiple plots in each run. However, I could also see the script being used interactively—matplotlib can be used to generate clickable, interactive plots, and I can envision a future version which would allow a user to click on a particular hour and drill down and further examine the service for that hour (breaking the runs down by destination, for example).