Synoptic first!

So, you're a transit agency (or vendor, consultant, system integrator, etc.), and you've decided to develop an API to expose your real-time data. Perhaps you've gotten queries from developers like "I want to be able to query an API and get next bus arrivals at a stop...".

It's hard to say "no", but fulfilling that developer's request may not be the best way to go. If you have limited resources available to expose your data, there are better approaches available, which will in the long term enable the development of more advanced applications.

There are two key types of APIs for real-time transit data: synoptic and transactional. Synoptic APIs, like GTFS-realtime (and SIRI, in some use cases), communicate the entire state of a transit network in one call. Transactional APIs, by contrast, communicate stop-by-stop information. It's easy to process data from a synoptic API to provide a transactional API, but very difficult, if not impossible, to go in the other direction.

Why are synoptic APIs important? Imagine a medium-sized transit network with roughly 10,000 bus stops, and around 1,000 vehicles on the road during peak service. If you wanted to know the state of the entire transit network, you could query an API for arrivals at each of the 10,000 bus stops, or query an API for the status of each of the 1,000 trips. It's a lot more efficient (by orders of magnitude) to get the status of the active trips than it is to query for arrivals at every stop. Once you have the status of every trip, you can apply that information to the schedule to determine when that trip will arrive at upcoming stops.

It's true, it can be more challenging to use a synoptic API if all you want is simple stop-by-stop arrival information. But that's where software like OneBusAway comes into play—with OneBusAway, you can consume a synoptic GTFS-realtime or SIRI feed, and access it through the transactional OneBusAway RESTful API or RESTful SIRI 2.0.

So why should a transit agency implement a synoptic API first? There are several key reasons.

First, synoptic APIs are easier for feed producers to implement and deliver. With a transactional API, feed producers have to implement some sort of script to respond to requests, whereas a synoptic API can be as simple as a static text file (or XML, Protocol Buffers, etc.) written to a Web server every 30 seconds—no scripting necessary. For example, this file provides a synoptic feed from Hampton Roads Transit. Yes, it's in a non-standard format, but it's also immensely simple—just a static CSV file on a Web server, updated a few times every minute with fresh data.

More importantly, synoptic APIs enable advanced applications, like real-time trip planning, real-time performance monitoring, and data visualizations that would be difficult or impossible to implement on top of a transactional API.

If you're a transit data developer or open data advocate, keep these ideas in mind when you talk to transit agencies about opening their data. I was immensely disappointed, for example, to find this message in which a Code for America 'fellow' recommended to a group of transit developers that they push for a transactional API from their local transit agency rather than a synoptic API like GTFS-realtime. Yes, a transactional API may make it easier to get a 'where's my bus' app out the door, but in the long run it harms everyone by stunting the development of the most advanced real-time transit applications.

(For another perspective, see this mailing list post by Brian Ferris, which provided some of the motivation for this post.)