Better rail data for better apps

A number of transit agencies, including WMATA, BART, CTA, and the MBTA, offer real-time data rail system data as part of their open data initiatives. While each agency’s implementation is different, most revolve around distributing train predictions—the same data used by countdown clocks in stations. This data is immediately useful for many applications, like mobile and desktop widgets which mimic the look and feel of in-station countdown clocks. But for some applications, like How’s Metro, How’s the T (by the same developer), and this real-time train map for Metrorail, train predictions are not useful. These applications depend on knowing where trains are, rather than knowing how far a train is from a given station.

In short, if the only rail data a transit agency provides is train predictions, then developers who want to know the actual locations of trains must reverse-engineer that data from the predictions. Generating predictions based on track circuit occupancy and schedule information is far from a flawless process; the end result often contains glitches. When developers try to work backward from train predictions to infer the actual positions of trains, the result is noisy data. And besides, why should developers have to make the effort, when the transit agencies themselves already have the data available?

I expect that some will object on the grounds that providing more detailed rail data would be ‘too invasive’, or might constitute some kind of security threat. While it might seem exceedingly esoteric for a person to be interested in knowing every time a switch is thrown somewhere on the Metrorail system, that data is in no way proprietary. Train movements can be inferred from direct observation of the system, examination of schedules, and use of existing open data APIs. Providing the information described in this post only serves to confirm those inferences—it’s not as though the information is otherwise unavailable.

In addition, there’s good precedent for this level of information being available for other modes of transportation. WMATA’s own API for bus positions, for example, provides latitude and longitude, along with route information, for every Metrobus. In the context of freight rail, the ATCS Monitor software, combined with a radio scanner, makes it possible to intercept data on block occupancy and switch status from radio signals used by dispatchers to communicate with field equipment. The National Airspace System is similarly transparent.

So, what real-time rail data can transit agencies provide to make developers’ work easier? The first (and simpler) approach is to provide an API method which provides data on every train being tracked by the agency’s ATS system: train number, destination, train length (where available), and the train’s location (which will probably be a track circuit identifier). Additional data, including each train’s consist, can be added to the response if available.

The second approach is more ambitious, but provides a higher level of data for developers who want even more insight. In this approach, developers get access to a ‘firehose’ API in which clients open a persistent connection and receive a message every time a track circuit block becomes occupied or unoccupied, and every time a switch is thrown normal or reverse. Depending on the system, the data transmitted can also include the TWC bits every time a train berths in a platform (on systems with TWC, like Metrorail, BART, and MARTA), and AEI information every time a car passes an AEI tag reader. With all of that data, a developer would be able to relatively accurately replicate what transit agencies have access to in their control center. For some railfans, I’m sure the idea of being able to have an accurate real-time view of a rail system on their phone or tablet is fascinating. But the real value is in being able to develop apps like How’s Metro more easily, without having to do the reverse-engineering that is currently required. The end result is better and more reliable data for riders who are looking for something more than “x minutes until the next train”.