Legacy AVL system? It’s okay, join the club.

If you work with real-time transit data, you’ve probably heard the steadily-increasing call for data producers to release their data in open, standardized formats like GTFS-realtime and SIRI. But how do you actually make your data available in those formats? Some AVL vendors are beginning to include standards-compliant APIs in their products, and that’s great for agencies considering a new system or major upgrade. But what about the massive installed base of legacy AVL systems which have few open interfaces, if any?

Fortunately, there are ways to get data out of almost any AVL system, whether it was explicitly designed with open interfaces or not. Some of these techniques are more technologically sound than others, and some may require some relatively tricky programming, but if you can find the right software developer, almost any problem is soluble.

Here are five key strategies for extracting information from an AVL system. The first three are strongly recommended, while the last two should only be undertaken if no better interface is available, and if you have adequate technical support to implement a more complex solution.

  • Transform a proprietary API to GTFS-realtime or SIRI: Many AVL systems (both COTS and agency-homegrown) include non-standard APIs which can, with a bit of programming, be transformed into a modern, standards-compliant API. This is the approach I took with wmata-gtfsrealtime, to produce a GTFS-realtime feed from WMATA’s real-time bus data, septa-gtfsrealtime to produce a GTFS-realtime feed from SEPTA’s real-time bus and rail data, and ctatt-gtfsrealtime to produce a GTFS-realtime feed from CTA’s Train Tracker data. This is also the approach taken by onebusaway-gtfs-realtime-from-nextbus-cli, which converts from the NextBus API, and bullrunner-gtfs-realtime-generator, which converts from the Syncromatics API.
  • Query a reporting database: Some AVL systems can be configured to log vehicle positions, predicted arrival times, and other information to a database. Ostensibly these databases are meant to be used for after-the-fact incident analysis, performance reporting, etc., but there’s nothing stopping an application from polling the database every 15-30 seconds to get the latest vehicle positions and predicted arrival times. Many GTFS-realtime feed producers take this approach, including ddot-avl, built by Code for America to extract real-time information from DDOT’s TransitMaster installation, HART-GTFS-realtimeGenerator, built by CUTR to extract real-time information from HART’s OrbCAD installation, and live_transit_event_trigger, built by Greenhorne & O’Mara (now part of Stantec) to produce a GTFS-realtime feed from Ride On’s OrbCAD installation.
  • Parse a published text file: Similar to the database approach, some AVL systems can be configured to dump the current state of the transit network to a simple text file (like this file from Hampton Roads Transit). This text file can be read and parsed by a translator which then generates a standards-compliant feed, which is the approach taken by hrt-bus-api, built by Code for Hampton Roads, and onebusaway-sound-transit-realtime.
  • Screen-scrape a passenger-facing Web interface: This is where we get into the less technologically-sound options. While the first three options focused on acquiring data from machine-readable sources, screen scraping involves consuming data from a human-readable source and transforming it back into machine-readable data. In this case, that might mean accessing a passenger-facing Web site with predicted arrival times, extracting the arrival times, and using that to produce a standards-compliant feed. This is the approach taken by this project, which screen-scrapes KCATA’s TransitMaster WebWatch installation to produce a GTFS-realtime feed. Compared to options which involve machine-readable data sources, screen-scraping is more brittle, and may make it more challenging to produce a robust feed, but it can be made to work.
  • Intercept internal AVL system communications: This is the last resort, but if an AVL system has no open interfaces, it may be possible to intercept communications between the components of the AVL system (such as a central server and a dispatch console or system driving signage at transit stops), decode those communications, and use them as the basis for a standards-compliant feed. This is a last resort because it will often require reverse-engineering undocumented protocols, and results in solutions which are brittle and will tend to break in unpredictable ways. But, it can be done, and if it’s the only way to get data out of an AVL system, then go for it. This is the approach taken by onebusaway-king-county-metro-legacy-avl-to-siri.

As evidenced by the example links, every one of the strategies mentioned above has been implemented in at least one real-world application. No matter how old your AVL system is, no matter how far out of warranty or how unsupported it is, no matter how obsolete the technology is, some enterprising civic hacker has probably already figured out a way to get data out of the system (or is eager and ready to do so!). Every one of the tools linked in this post is open-source, and if it closely approximates your needs, you can download it today and start hacking (or find a local civic hacker and have them adapt it to meet your needs). And if none of the tools look close? Don’t head for your procurement department and have them issue an RFP—instead, post on the Transit Developers Google Group; chances are your post will make its way to someone who can help, whether a local Code for America brigade, or an independent civic hacker, or another transit agency that has already solved the same problem.

Finally, I’d like to thank the participants in the Disrupting Legacy Transit Ops Software (Moving Beyond Trapeze) session at Transportation Camp DC 2015, who inspired me to write this post.

Why “they’re not on NextBus” isn’t the problem it sounds like

Being active in open data for transit and real-time passenger information, one of the complaints I sometimes hear leveled at transit agencies is “They’re not on NextBus!”.

This bothers me. A lot.

Why? There are two reasons. The first is pretty simple. Sometimes, when people say “NextBus”, what they really mean is real-time passenger information, without any concern for the specific provider. But “NextBus” is a trademarked name for a specific proprietary real-time passenger information provider; if what you really mean is “real-time passenger information”, then say so.

The second reason is more pernicious. A lot of people use mobile apps for transit which are designed around the NextBus API. So, they work everywhere that the local transit agency has elected to contract with NextBus for real-time passenger information. On its face, this seems like a huge success for transit riders—one app for dozens of cities! But, it’s not. Vendor lock-in isn’t the way to achieve real transit data integration.

I understand that transit riders love the idea of having a single app for transit information in every city they visit. I’m a transit rider; I get it. But the solution isn’t to get every agency to pay the same vendor to provide the same proprietary service.

There are many AVL vendors out there; INIT, Xerox, Avail, Clever, Connexionz, and more. Some very forward-thinking agencies, like New York’s MTA, have even decided to act as their own system integrator, and build their own real-time passenger information system, so that they’ll never be beholden to any vendor’s proprietary system. Built on top of the open-source transit data platform OneBusAway, MTA Bus Time provides real-time passenger information for New York’s buses using an open technology stack that saved the MTA 70 percent compared to proprietary alternatives.

So with every agency using a different vendor’s system (and some having rolled their own), how do we provide that integrated experience that riders crave? The answer is simple: by using open data standards. With standards like GTFS-realtime and SIRI, app developers can build apps that work with data from any transit agency and any vendor’s systems. With OneBusAway, for example, I can easily (trivially) make use of feeds from any of several DC-area agencies, York Region Transit, MBTA, BART, TriMet, or any of the other agencies who are releasing GTFS-realtime data. Because these agencies are all using standardized formats for their open data, I don’t have to build anything new in OneBusAway to consume their data—the same code that works for one agency works for all of them.

But NextBus doesn’t provide an API using any recognized standards for real-time transit data. It’s a walled garden of sorts; the NextBus API is great if all you want to do is present data from agencies using NextBus, and terrible if you want to use it as a springboard for building revolutionary real-time passenger information tools.

The real question isn’t “why aren’t you on NextBus”; the real question is “why doesn’t NextBus provide a standards-compliant API”?

What’s wrong with the NextBus API?

When it comes to real-time transit data, one of the common refrains is “just use NextBus!”—but while NextBus may be a common name, that doesn’t make them best choice for providing real-time transit data with a robust open data API for developers. It’s true that NextBus provides an API for developers, but there are problems that hamper or even entirely prevent its use in certain applications.

What are these problems? Some are organizational, and some are technical:

  • API not enabled for all agencies: While NextBus provides service for more than a hundred agencies, only a fraction of those agencies make their data available through the NextBus API.
  • API not standards-compliant: NextBus provides data to developers in their own custom format, rather than using the industry-standard SIRI or GTFS-realtime formats. While NextBus’s API has its advantages for certain types of apps (principally simple mobile apps), for developers working on large-scale passenger information systems, and developers seeking to solve complex problems like real-time routing, there are deficiencies in the NextBus API which could be remedied by using a standardized format. In particular, NextBus makes it exceedingly difficult to get the status of an entire transit system at once. Retrieving data stop-by-stop makes sense for mobile apps, but not for transit data integration platforms like OneBusAway, which benefit from being able to update from a feed containing status updates for all of an agency’s vehicles and trips.
  • Commonality of identifiers: When NextBus agencies also publish a GTFS feed containing their static route and schedule data (which they should), route, stop, and trip identifiers should match those in the NextBus data. When this is not done, it becomes onerous to use the real-time data—developers must expend additional engineering effort to map identifiers between the static and real-time data.
  • Data quality and completeness: Though the NextBus API documentation defines the data elements which developers can expect to find in the API responses, the actual availability of these data varies considerably between agencies. For example, many agencies do not include the tripTag element, which is essential for linking predictions between stops and then to the static schedule. Similarly, some agencies don’t actually provide useful values for the block element. NextBus must impress upon its customers (that is, the transit agencies) the value of supplying high-quality configuration data so that the NextBus API works as intended.

Though the present NextBus API is far from ideal, it is possible to transform the data into standards-compliant GTFS-realtime, which can be fed into any app which uses GTFS-realtime data, but only if the feed has been configured correctly—that is, with meaningful trip IDs, identifiers which match those in the agency’s GTFS feed, etc. Out of all of the agencies which use NextBus, the fraction of those agencies who have enabled the NextBus API and provided NextBus with the right configuration data for the API to be useful to the GTFS-realtime translator is frustratingly small.

NextBus can—and should—do better. Their customers, more than 100 transit agencies in North America, would all benefit from standards-compliant APIs that would allow developers to build apps that work with data produced by AVL systems from all vendors, not just one. This is the essence of open data, and it’s time for NextBus to get on board.

Montgomery County Ride On finally has an API—sort of

After close to a year of unanswered questions and rancor, Montgomery County Ride On finally has an API. Earlier this year, Montgomery County placed their real-time passenger information system into production, under the brand Ride On Real Time. But, there was still no API.

In late April, while clicking around on Ride On’s site, I stumbled across a poorly-advertised developer resources page. Lo and behold, Ride On finally had an API.

This is, of course, what I’ve been calling for since October 2011. But does that make Ride On an open data success story? Does that make Montgomery County a good steward of taxpayer money in planning IT expenditures? Unfortunately, no.

First, about the software behind the API. It’s not something built into OrbCAD, nor SmartTraveler Plus, nor is it part of some other COTS product. Instead, it’s generated by a custom product developed by Greenhorne & O’Mara.

I don’t have any objection to transit authorities getting involved in the design and implementation of custom software—but unlike MTA NYCT’s involvement in the development of OneBusAway, there’s not much reusable value here. The software developed by G & O hasn’t been released as open source, and even if it were, it would only help those transit agencies that run OrbCAD and have no other interface to it (and they’d be better off just running OneBusAway directly).

So, strike one—building more single-use software. At least it’s built on top of a fairly modern software stack, including CouchDB and Ruby on Rails.

Now, on to the API itself. There are two major standards for real-time transit data: GTFS-realtime and SIRI. Unfortunately, even some of the newer real-time transit APIs, like the Metro Transit NexTrip API and the OC Transpo Live Next Bus Arrival Data Feed, use neither of these standards. These APIs are harder for developers to use, because they require custom code which will only be useful in a particular city. There are hundreds of transit systems in the United States, and thousands around the world, and eventually, hopefully, they will all have real-time data. Writing custom code for each one of them is simply unsustainable, and that’s why standards matter.

Does Ride On’s API support industry standards? Yes, sort of. In addition to a handful of custom methods, the API also has a method which returns GTFS-realtime output, although it’s not what you’d expect. The API’s GTFS-realtime output is in the text-based Protocol Buffers format (which is intended only for debugging, not for production use), rather than the binary format expected by GTFS-realtime tools. It didn’t take much effort for me to put together a little Python script to ingest the text-formatted feed and output a valid, binary GTFS-realtime feed which worked well with OneBusAway.

The API requires authentication, a nuisance that some agencies, like BART, have done away with:

We’ve never asked you to register for BART open data and our License Agreement is one of the least restrictive in the business.

Worse, the Ride On API gives you a token, and then doesn’t tell you what to do with it. Instead, you have to log in with your username and password on every call to the API—not an insurmountable challenge, but still rather peculiar.

In summary, I’m glad Ride On finally has an API, and I’m even more excited that I was able to get the feed working with OneBusAway. But on the whole it’s a disappointment. Xerox hasn’t built native SIRI or GTFS-realtime support into OrbCAD or SmartTraveler Plus. If they had, it would be a lot easier for every agency using those products to offer a standards-compliant real-time feed. Instead, the (idiosyncratic) implementation depends on yet more custom software.

It is a success for open data, but not an unqualified success—and certainly not a model to follow.

An oddity with NextBus short stop titles

The NextBus public API supports “short titles” for routes and stops, which are designed to be formatted for display on mobile devices, and other environments where space is at a premium. Stop and route definitions in the XML returned by the API both contain a title attribute with the full title, and a shortTitle attribute with the short title, if one exists. Unfortunately, there seems to be a bug in the output returned by NextBus which limits the usefulness of this feature.
Continue reading An oddity with NextBus short stop titles