Legacy AVL system? It’s okay, join the club.

If you work with real-time transit data, you’ve probably heard the steadily-increasing call for data producers to release their data in open, standardized formats like GTFS-realtime and SIRI. But how do you actually make your data available in those formats? Some AVL vendors are beginning to include standards-compliant APIs in their products, and that’s great for agencies considering a new system or major upgrade. But what about the massive installed base of legacy AVL systems which have few open interfaces, if any?

Fortunately, there are ways to get data out of almost any AVL system, whether it was explicitly designed with open interfaces or not. Some of these techniques are more technologically sound than others, and some may require some relatively tricky programming, but if you can find the right software developer, almost any problem is soluble.

Here are five key strategies for extracting information from an AVL system. The first three are strongly recommended, while the last two should only be undertaken if no better interface is available, and if you have adequate technical support to implement a more complex solution.

  • Transform a proprietary API to GTFS-realtime or SIRI: Many AVL systems (both COTS and agency-homegrown) include non-standard APIs which can, with a bit of programming, be transformed into a modern, standards-compliant API. This is the approach I took with wmata-gtfsrealtime, to produce a GTFS-realtime feed from WMATA’s real-time bus data, septa-gtfsrealtime to produce a GTFS-realtime feed from SEPTA’s real-time bus and rail data, and ctatt-gtfsrealtime to produce a GTFS-realtime feed from CTA’s Train Tracker data. This is also the approach taken by onebusaway-gtfs-realtime-from-nextbus-cli, which converts from the NextBus API, and bullrunner-gtfs-realtime-generator, which converts from the Syncromatics API.
  • Query a reporting database: Some AVL systems can be configured to log vehicle positions, predicted arrival times, and other information to a database. Ostensibly these databases are meant to be used for after-the-fact incident analysis, performance reporting, etc., but there’s nothing stopping an application from polling the database every 15-30 seconds to get the latest vehicle positions and predicted arrival times. Many GTFS-realtime feed producers take this approach, including ddot-avl, built by Code for America to extract real-time information from DDOT’s TransitMaster installation, HART-GTFS-realtimeGenerator, built by CUTR to extract real-time information from HART’s OrbCAD installation, and live_transit_event_trigger, built by Greenhorne & O’Mara (now part of Stantec) to produce a GTFS-realtime feed from Ride On’s OrbCAD installation.
  • Parse a published text file: Similar to the database approach, some AVL systems can be configured to dump the current state of the transit network to a simple text file (like this file from Hampton Roads Transit). This text file can be read and parsed by a translator which then generates a standards-compliant feed, which is the approach taken by hrt-bus-api, built by Code for Hampton Roads, and onebusaway-sound-transit-realtime.
  • Screen-scrape a passenger-facing Web interface: This is where we get into the less technologically-sound options. While the first three options focused on acquiring data from machine-readable sources, screen scraping involves consuming data from a human-readable source and transforming it back into machine-readable data. In this case, that might mean accessing a passenger-facing Web site with predicted arrival times, extracting the arrival times, and using that to produce a standards-compliant feed. This is the approach taken by this project, which screen-scrapes KCATA’s TransitMaster WebWatch installation to produce a GTFS-realtime feed. Compared to options which involve machine-readable data sources, screen-scraping is more brittle, and may make it more challenging to produce a robust feed, but it can be made to work.
  • Intercept internal AVL system communications: This is the last resort, but if an AVL system has no open interfaces, it may be possible to intercept communications between the components of the AVL system (such as a central server and a dispatch console or system driving signage at transit stops), decode those communications, and use them as the basis for a standards-compliant feed. This is a last resort because it will often require reverse-engineering undocumented protocols, and results in solutions which are brittle and will tend to break in unpredictable ways. But, it can be done, and if it’s the only way to get data out of an AVL system, then go for it. This is the approach taken by onebusaway-king-county-metro-legacy-avl-to-siri.

As evidenced by the example links, every one of the strategies mentioned above has been implemented in at least one real-world application. No matter how old your AVL system is, no matter how far out of warranty or how unsupported it is, no matter how obsolete the technology is, some enterprising civic hacker has probably already figured out a way to get data out of the system (or is eager and ready to do so!). Every one of the tools linked in this post is open-source, and if it closely approximates your needs, you can download it today and start hacking (or find a local civic hacker and have them adapt it to meet your needs). And if none of the tools look close? Don’t head for your procurement department and have them issue an RFP—instead, post on the Transit Developers Google Group; chances are your post will make its way to someone who can help, whether a local Code for America brigade, or an independent civic hacker, or another transit agency that has already solved the same problem.

Finally, I’d like to thank the participants in the Disrupting Legacy Transit Ops Software (Moving Beyond Trapeze) session at Transportation Camp DC 2015, who inspired me to write this post.

Why “they’re not on NextBus” isn’t the problem it sounds like

Being active in open data for transit and real-time passenger information, one of the complaints I sometimes hear leveled at transit agencies is “They’re not on NextBus!”.

This bothers me. A lot.

Why? There are two reasons. The first is pretty simple. Sometimes, when people say “NextBus”, what they really mean is real-time passenger information, without any concern for the specific provider. But “NextBus” is a trademarked name for a specific proprietary real-time passenger information provider; if what you really mean is “real-time passenger information”, then say so.

The second reason is more pernicious. A lot of people use mobile apps for transit which are designed around the NextBus API. So, they work everywhere that the local transit agency has elected to contract with NextBus for real-time passenger information. On its face, this seems like a huge success for transit riders—one app for dozens of cities! But, it’s not. Vendor lock-in isn’t the way to achieve real transit data integration.

I understand that transit riders love the idea of having a single app for transit information in every city they visit. I’m a transit rider; I get it. But the solution isn’t to get every agency to pay the same vendor to provide the same proprietary service.

There are many AVL vendors out there; INIT, Xerox, Avail, Clever, Connexionz, and more. Some very forward-thinking agencies, like New York’s MTA, have even decided to act as their own system integrator, and build their own real-time passenger information system, so that they’ll never be beholden to any vendor’s proprietary system. Built on top of the open-source transit data platform OneBusAway, MTA Bus Time provides real-time passenger information for New York’s buses using an open technology stack that saved the MTA 70 percent compared to proprietary alternatives.

So with every agency using a different vendor’s system (and some having rolled their own), how do we provide that integrated experience that riders crave? The answer is simple: by using open data standards. With standards like GTFS-realtime and SIRI, app developers can build apps that work with data from any transit agency and any vendor’s systems. With OneBusAway, for example, I can easily (trivially) make use of feeds from any of several DC-area agencies, York Region Transit, MBTA, BART, TriMet, or any of the other agencies who are releasing GTFS-realtime data. Because these agencies are all using standardized formats for their open data, I don’t have to build anything new in OneBusAway to consume their data—the same code that works for one agency works for all of them.

But NextBus doesn’t provide an API using any recognized standards for real-time transit data. It’s a walled garden of sorts; the NextBus API is great if all you want to do is present data from agencies using NextBus, and terrible if you want to use it as a springboard for building revolutionary real-time passenger information tools.

The real question isn’t “why aren’t you on NextBus”; the real question is “why doesn’t NextBus provide a standards-compliant API”?

Synoptic first!

So, you’re a transit agency (or vendor, consultant, system integrator, etc.), and you’ve decided to develop an API to expose your real-time data. Perhaps you’ve gotten queries from developers like “I want to be able to query an API and get next bus arrivals at a stop…”.

It’s hard to say “no”, but fulfilling that developer’s request may not be the best way to go. If you have limited resources available to expose your data, there are better approaches available, which will in the long term enable the development of more advanced applications.
Continue reading Synoptic first!

What’s wrong with the NextBus API?

When it comes to real-time transit data, one of the common refrains is “just use NextBus!”—but while NextBus may be a common name, that doesn’t make them best choice for providing real-time transit data with a robust open data API for developers. It’s true that NextBus provides an API for developers, but there are problems that hamper or even entirely prevent its use in certain applications.

What are these problems? Some are organizational, and some are technical:

  • API not enabled for all agencies: While NextBus provides service for more than a hundred agencies, only a fraction of those agencies make their data available through the NextBus API.
  • API not standards-compliant: NextBus provides data to developers in their own custom format, rather than using the industry-standard SIRI or GTFS-realtime formats. While NextBus’s API has its advantages for certain types of apps (principally simple mobile apps), for developers working on large-scale passenger information systems, and developers seeking to solve complex problems like real-time routing, there are deficiencies in the NextBus API which could be remedied by using a standardized format. In particular, NextBus makes it exceedingly difficult to get the status of an entire transit system at once. Retrieving data stop-by-stop makes sense for mobile apps, but not for transit data integration platforms like OneBusAway, which benefit from being able to update from a feed containing status updates for all of an agency’s vehicles and trips.
  • Commonality of identifiers: When NextBus agencies also publish a GTFS feed containing their static route and schedule data (which they should), route, stop, and trip identifiers should match those in the NextBus data. When this is not done, it becomes onerous to use the real-time data—developers must expend additional engineering effort to map identifiers between the static and real-time data.
  • Data quality and completeness: Though the NextBus API documentation defines the data elements which developers can expect to find in the API responses, the actual availability of these data varies considerably between agencies. For example, many agencies do not include the tripTag element, which is essential for linking predictions between stops and then to the static schedule. Similarly, some agencies don’t actually provide useful values for the block element. NextBus must impress upon its customers (that is, the transit agencies) the value of supplying high-quality configuration data so that the NextBus API works as intended.

Though the present NextBus API is far from ideal, it is possible to transform the data into standards-compliant GTFS-realtime, which can be fed into any app which uses GTFS-realtime data, but only if the feed has been configured correctly—that is, with meaningful trip IDs, identifiers which match those in the agency’s GTFS feed, etc. Out of all of the agencies which use NextBus, the fraction of those agencies who have enabled the NextBus API and provided NextBus with the right configuration data for the API to be useful to the GTFS-realtime translator is frustratingly small.

NextBus can—and should—do better. Their customers, more than 100 transit agencies in North America, would all benefit from standards-compliant APIs that would allow developers to build apps that work with data produced by AVL systems from all vendors, not just one. This is the essence of open data, and it’s time for NextBus to get on board.

WMATA’s open data efforts are good, but could be better

Last week, ReadWriteWeb profiled WMATA’s open data efforts—from the agency’s initial (and ultimately unsuccessful) efforts to monetize the data, through the release of a real-time API and GTFS feed, and the eventual inclusion of WMATA’s data in Google Transit.

The ReadWriteWeb article paints this as a complete success story, in which, as David Alpert puts it, WMATA “got religion on open data”.

The reality is somewhat different. A GTFS feed and real-time data API may have been a substantial step forward in 2008, when this process started, but today there are many other categories of data WMATA could expose, and open, interoperable formats they could use to do so (particularly for real-time data). In addition, WMATA’s communications with developers could be better. While some agencies have active discussion groups where agency staff communicate freely with developers, at WMATA developers still get a somewhat chilly reception.

How could WMATA’s open data efforts be improved? Here are four suggestions:
Continue reading WMATA’s open data efforts are good, but could be better