Legacy AVL system? It’s okay, join the club.

If you work with real-time transit data, you’ve probably heard the steadily-increasing call for data producers to release their data in open, standardized formats like GTFS-realtime and SIRI. But how do you actually make your data available in those formats? Some AVL vendors are beginning to include standards-compliant APIs in their products, and that’s great for agencies considering a new system or major upgrade. But what about the massive installed base of legacy AVL systems which have few open interfaces, if any?

Fortunately, there are ways to get data out of almost any AVL system, whether it was explicitly designed with open interfaces or not. Some of these techniques are more technologically sound than others, and some may require some relatively tricky programming, but if you can find the right software developer, almost any problem is soluble.

Here are five key strategies for extracting information from an AVL system. The first three are strongly recommended, while the last two should only be undertaken if no better interface is available, and if you have adequate technical support to implement a more complex solution.

  • Transform a proprietary API to GTFS-realtime or SIRI: Many AVL systems (both COTS and agency-homegrown) include non-standard APIs which can, with a bit of programming, be transformed into a modern, standards-compliant API. This is the approach I took with wmata-gtfsrealtime, to produce a GTFS-realtime feed from WMATA’s real-time bus data, septa-gtfsrealtime to produce a GTFS-realtime feed from SEPTA’s real-time bus and rail data, and ctatt-gtfsrealtime to produce a GTFS-realtime feed from CTA’s Train Tracker data. This is also the approach taken by onebusaway-gtfs-realtime-from-nextbus-cli, which converts from the NextBus API, and bullrunner-gtfs-realtime-generator, which converts from the Syncromatics API.
  • Query a reporting database: Some AVL systems can be configured to log vehicle positions, predicted arrival times, and other information to a database. Ostensibly these databases are meant to be used for after-the-fact incident analysis, performance reporting, etc., but there’s nothing stopping an application from polling the database every 15-30 seconds to get the latest vehicle positions and predicted arrival times. Many GTFS-realtime feed producers take this approach, including ddot-avl, built by Code for America to extract real-time information from DDOT’s TransitMaster installation, HART-GTFS-realtimeGenerator, built by CUTR to extract real-time information from HART’s OrbCAD installation, and live_transit_event_trigger, built by Greenhorne & O’Mara (now part of Stantec) to produce a GTFS-realtime feed from Ride On’s OrbCAD installation.
  • Parse a published text file: Similar to the database approach, some AVL systems can be configured to dump the current state of the transit network to a simple text file (like this file from Hampton Roads Transit). This text file can be read and parsed by a translator which then generates a standards-compliant feed, which is the approach taken by hrt-bus-api, built by Code for Hampton Roads, and onebusaway-sound-transit-realtime.
  • Screen-scrape a passenger-facing Web interface: This is where we get into the less technologically-sound options. While the first three options focused on acquiring data from machine-readable sources, screen scraping involves consuming data from a human-readable source and transforming it back into machine-readable data. In this case, that might mean accessing a passenger-facing Web site with predicted arrival times, extracting the arrival times, and using that to produce a standards-compliant feed. This is the approach taken by this project, which screen-scrapes KCATA’s TransitMaster WebWatch installation to produce a GTFS-realtime feed. Compared to options which involve machine-readable data sources, screen-scraping is more brittle, and may make it more challenging to produce a robust feed, but it can be made to work.
  • Intercept internal AVL system communications: This is the last resort, but if an AVL system has no open interfaces, it may be possible to intercept communications between the components of the AVL system (such as a central server and a dispatch console or system driving signage at transit stops), decode those communications, and use them as the basis for a standards-compliant feed. This is a last resort because it will often require reverse-engineering undocumented protocols, and results in solutions which are brittle and will tend to break in unpredictable ways. But, it can be done, and if it’s the only way to get data out of an AVL system, then go for it. This is the approach taken by onebusaway-king-county-metro-legacy-avl-to-siri.

As evidenced by the example links, every one of the strategies mentioned above has been implemented in at least one real-world application. No matter how old your AVL system is, no matter how far out of warranty or how unsupported it is, no matter how obsolete the technology is, some enterprising civic hacker has probably already figured out a way to get data out of the system (or is eager and ready to do so!). Every one of the tools linked in this post is open-source, and if it closely approximates your needs, you can download it today and start hacking (or find a local civic hacker and have them adapt it to meet your needs). And if none of the tools look close? Don’t head for your procurement department and have them issue an RFP—instead, post on the Transit Developers Google Group; chances are your post will make its way to someone who can help, whether a local Code for America brigade, or an independent civic hacker, or another transit agency that has already solved the same problem.

Finally, I’d like to thank the participants in the Disrupting Legacy Transit Ops Software (Moving Beyond Trapeze) session at Transportation Camp DC 2015, who inspired me to write this post.

Why “they’re not on NextBus” isn’t the problem it sounds like

Being active in open data for transit and real-time passenger information, one of the complaints I sometimes hear leveled at transit agencies is “They’re not on NextBus!”.

This bothers me. A lot.

Why? There are two reasons. The first is pretty simple. Sometimes, when people say “NextBus”, what they really mean is real-time passenger information, without any concern for the specific provider. But “NextBus” is a trademarked name for a specific proprietary real-time passenger information provider; if what you really mean is “real-time passenger information”, then say so.

The second reason is more pernicious. A lot of people use mobile apps for transit which are designed around the NextBus API. So, they work everywhere that the local transit agency has elected to contract with NextBus for real-time passenger information. On its face, this seems like a huge success for transit riders—one app for dozens of cities! But, it’s not. Vendor lock-in isn’t the way to achieve real transit data integration.

I understand that transit riders love the idea of having a single app for transit information in every city they visit. I’m a transit rider; I get it. But the solution isn’t to get every agency to pay the same vendor to provide the same proprietary service.

There are many AVL vendors out there; INIT, Xerox, Avail, Clever, Connexionz, and more. Some very forward-thinking agencies, like New York’s MTA, have even decided to act as their own system integrator, and build their own real-time passenger information system, so that they’ll never be beholden to any vendor’s proprietary system. Built on top of the open-source transit data platform OneBusAway, MTA Bus Time provides real-time passenger information for New York’s buses using an open technology stack that saved the MTA 70 percent compared to proprietary alternatives.

So with every agency using a different vendor’s system (and some having rolled their own), how do we provide that integrated experience that riders crave? The answer is simple: by using open data standards. With standards like GTFS-realtime and SIRI, app developers can build apps that work with data from any transit agency and any vendor’s systems. With OneBusAway, for example, I can easily (trivially) make use of feeds from any of several DC-area agencies, York Region Transit, MBTA, BART, TriMet, or any of the other agencies who are releasing GTFS-realtime data. Because these agencies are all using standardized formats for their open data, I don’t have to build anything new in OneBusAway to consume their data—the same code that works for one agency works for all of them.

But NextBus doesn’t provide an API using any recognized standards for real-time transit data. It’s a walled garden of sorts; the NextBus API is great if all you want to do is present data from agencies using NextBus, and terrible if you want to use it as a springboard for building revolutionary real-time passenger information tools.

The real question isn’t “why aren’t you on NextBus”; the real question is “why doesn’t NextBus provide a standards-compliant API”?

Traction motor, HVAC unit, AVL system?

If a transit agency runs its own motor shop for rebuilding traction motors, runs its own electronics shop for performing component-level repair of circuit boards, runs its own axle shop for rebuilding axles, why shouldn’t it be able to do the same for the software which is just as vital to everyday operation as axles and traction motors?

I recently came across a very interesting paper describing the successes of SEPTA’s Woodland Electronic Repair Shop. At SEPTA, the justification for in-house electronics repair is twofold: many components which come into the shop are not actually defective, and had they been sent to an outside shop for repair, time and money would have been wasted only for the outside shop to return the same “no trouble found” verdict, and, secondly, sending equipment to an outside shop is expensive—by SEPTA’s analysis, more than double the cost of operating an in-house electronics shop.

Transit agencies may not think of themselves as being technology-oriented, but the reality is that software systems are at the heart of so many things transit agencies do—from scheduling to passenger information to signals and communications. Agencies almost universally rely on vendor support for large software packages that perform a wide range of functions: scheduling, trip planning, real-time passenger information, and even safety-critical tasks in signalling and communications.

Yet in comparison to the nuts and bolts which keep the a transit system moving, most transit agencies have shockingly little control over their vital software. Being closed-source and proprietary, the agency is unable to develop its own bug fixes, patches, and new features, and may not even be able to export data in an open format. By controlling the availability of support and new features, the vendor dictates when the agency upgrades—and by using proprietary data formats and interfaces, the vendor all but guarantees that the agency will return to them instead of shopping around. This is the very same risk that SEPTA’s electronics shop seeks to mitigate:

At some point the vendor will no longer support their particular system and since you have always relied upon them for their parts you will have no choice but to go out for bid to get a new system or an alternately designed part to perform the same function.

When procuring new equipment, SEPTA demands access to schematics and test equipment, so that their repair shop can do its work. Without this access, the results are predictably poor. SEPTA found that costs for one class of parts had increased 94% over two years—an “astronomical” price increase at an agency used to inexpensive in-house repair. The explanation, from SEPTA’s engineering department, is depressing:

These are so expensive because SEPTA has no alternative but to purchase these parts from the OEM.

This is why our equipment specifications have a requirement that the Vendor provide SEPTA with all test equipment, documentation and training to allow us to repair the circuit boards in our electronic repair shop at Woodland. The CBTC project did not have a specification from Engineering, but rather was supplied for liquidated damages from the M4 program. It was understood from the beginning that SEPTA would not have the capability to repair the circuit boards.

The complexity and safety aspect of these boards prevents SEPTA from creating drawings and specifications that would allow an alternate supplier to produce these boards.

So, what is the parallel for a software project? Where an electronics shop has schematics, where a mechanical shop has blueprints, a software shop has source code and supporting tools. When a transit agency has access to the source code for a software system, they can perform their own in-house work on the system, on their own schedule, and with their own staff. New features are developed to meet the agency’s needs, not according to a vendor’s whims. Even if the agency elects to bring in contracted support to develop bug fixes or new features, they retain complete control over the process—and, more importantly, they own the end product.

Transit agencies may feel ill-at-ease at the prospect of getting into the business of software development, but the reality is that by bringing software skills in-house, they can realize the same gains as when they bring mechanical and electronic repair and overhaul in-house. In fact, the potential gains are even greater for software, when agencies use open-source software and actively participate in the surrounding community. Many of the fundamental problems of mass transit are the same from agency to agency, and software developed to solve a problem at one agency is very likely to be usable (at least in part) at other agencies.

GTFS-realtime for WMATA buses

I’ve posted many times about the considerable value of open standards for real-time transit data. While it’s always best if a transit authority offers its own feeds using open standards like GTFS-realtime or SIRI, converting available real-time data from a proprietary API into an open format still gets the job done. After a few months of kicking the problem around, I’ve finally written a tool to produce GTFS-realtime StopTimeUpdate, VehiclePosition, and Alert messages for Metrobus, as well as GTFS-realtime Alert messages for Metrorail.

The tool, wmata-gtfsrealtime, isn’t nearly as straightforward as it might be, because while the WMATA API appears to provide all of the information you’d need to create a GTFS-realtime feed, you’ll quickly discover that the route, stop, and trip identifiers returned by the API bear no relation to those used in WMATA’s GTFS feed.

One of the basic tenets of GTFS-realtime is that it is designed to directly integrate with GTFS, and for that reason identifiers must be shared across GTFS and GTFS-realtime feeds.

In WMATA’s case, this means that it is necessary to first map routes in the API to their counterparts in the GTFS feed, and then, for each vehicle, map its trip to the corresponding trip in the GTFS feed. This is done by querying a OneBusAway TransitDataService (via Hessian remoting) for active trips for the mapped route, then finding the active trip which most closely matches the vehicle’s trip.

Matching is done by constructing a metric space in which the distance between a stoptime in the API data and its counterpart in the GTFS feed is defined as an (x, y, t) tuple—that is, our notion of “distance” becomes distance in both space and time. The distances fed into the metric are actually halved, in order to bias the scores towards matching based on time, while allowing some leeway for stops which are wrongly located in either the GTFS or real-time data.

The resulting algorithm will map all but one or two of the 900-odd vehicles on the road during peak hours. Spot-checking arrivals for stops in OneBusAway against arrivals for the same stop in NextBus shows relatively good agreement; of course, considering that NextBus is a “black box”, unexplained variances in NextBus arrival times are to be expected.

You may wonder why we can’t provide better data for Metrorail; the answer is simple: the API is deficient. As I’ve previously discussed, the rail API only provides the same data you get from looking at the PIDS in stations. Unfortunately, that’s not what we need to produce a GTFS-realtime feed. At a minimum, we would need to be able to get a list of all revenue trains in the system, including their current schedule deviation, and a trip ID which would either match a trip ID in the GTFS feed, or be something we could easily map to a trip ID in the GTFS feed.

This isn’t how it’s supposed to be. Look at this diagram, then, for a reality check, look at this one (both are from a presentation by Jamey Harvey, WMATA’s former Enterprise Architect). WMATA’s data management practices are, to say the least, sorely lacking. For most data, there’s no single source of truth. The problem is particularly acute for bus stops; one database might have the stop in one location and identified with one ID, while another database might have the same physical stop identified with a different number, and coordinates that place it in an entirely different location.

Better data management practices would make it easier for developers to develop innovative applications which increase the usability of transit services, and, ultimately improve mobility for the entire region. Isn’t that what it’s supposed to be about, at the end of the day?

OneBusAway might be coming to Ride On, maybe?

Today on GitHub I came across this commit. I don’t quite know what’s going on, but it sure looks to me like someone at Greenhorne & O’Mara or Ride On has been experimenting with OneBusAway and Ride On’s data.

This is something in which I am keenly interested. But unlike in other cities, here there seems to be almost no interest in connecting transit agencies with each other and with local developers. There’s great value in doing both—connecting transit agencies together helps reduce duplicated effort and provide riders with harmonized, federated services. But even more importantly, connecting transit agencies with interested developers can provide transit riders with services that might have been cost-prohibitive or otherwise infeasible to for those agencies to develop in-house or through conventional procurement methods.

There are a lot of innovative developers out there, with lots of great ideas. It’s unreasonable to expect transit authorities to shoulder the risk of incubating all of those ideas, some of which might fail spectacularly, but it’s quite another thing for transit authorities to, on a best-effort basis, provide those developers with the data they need to bring their ideas to fruition.

I’d have thought that this would fall within the remit of the Mobility Lab, but more than a year after the launch of the Mobility Lab, that still hasn’t happened.

So, while there’s plenty going on, there’s not a whole lot of coordination, whether between agencies or between agencies and the community. Thus we have nearly a half-dozen real-time sign projects going on in the region—and who knows how much more duplicated work is being done, with everyone toiling behind closed doors!

Contrast that to New York City, where the MTA has been working—transparently—to develop MTA Bus Time, based on OneBusAway. When the agency began work on a GTFS-realtime feed for real-time subway arrivals from the IRT ATS system, once again, they turned to the community to get comments on the proposed specification. MTA developers are active on the agency’s mailing list to respond to questions and bug reports from developers.

In Portland, TriMet worked with OpenPlans to develop OpenTripPlanner, transparently, in full view of the community. OpenTripPlanner has proven to be a huge success, powering an first-of-its-kind regional, intermodal trip planner

Transit in the Washington, D.C. area isn’t all that different than in Portland or New York. Sure, the modes vary from city to city, and the Lexington Avenue Line by itself carries more passengers in one day than the Metrorail and Metrobus systems combined, but at its core, transit is transit. If it worked in New York, if it worked in Portland, it can work here.

This doesn’t have to be hard; in all seriousness, it takes about a minute to create a new Google Group.

When everyone works together, we can all help make transit better.