Reconstructing train positions from prediction data

Recently, I’ve been investigating techniques for independently gathering data in order to be able to analyze performance on the Metrorail system. As I’ve previously lamented, the agency releases only summary performance statistics, which makes it impossible to conduct more detailed analyses. Therefore, we must begin with data collection. If WMATA made all of the data captured by AIM available to developers, this would be a much easier task. But, as I’ve noted, only train predictions are released, obscuring the actual number of trains in the system and their positions.

So, we must first sample the prediction data. We know that the predictions are updated by AIM roughly every 20 seconds. It is not known how much delay Mashery introduces, so for simplicity we will just assume that new predictions are made available every 20 seconds. Application of the Shannon-Nyquist sampling theorem therefore tells us that we must sample the data every 10 seconds.

Don’t trust Claude Shannon? Here’s an example to illustrate why we have to sample so frequently:

Suppose that we’re polling the PIDS at Metro Center once per minute. In the peaks, sometimes the interval between trains is less than 60 seconds. So, at $latex T=0$, we might sample the PIDS and find an 8-car train to Glenmont boarding. If we sample again at $latex T=60$, and once again we see that an 8-car train to Glenmont is boarding, has one train serviced the platform, or two?

We might be able to say with some certainty that two distinct trains had serviced the platform if the observed trains were on different lines, or travelling to different destinations, or if they were different lengths. But if all of the observed characteristics are identical, then we have no way to tell if we saw one train or two, unless we were to have observed, in between the two trains, that the platform was empty (that is, that no train was boarding).

Once we accept the need to sample at a particular rate in order to avoid missing a train, how often do we sample the predictions? This is where Claude Shannon comes in. As previously introduced, the sampling theorem states that:

If a function $latex f(t)$ contains no frequencies higher than $latex W$ cps, it is completely determined by giving its ordinates at a series of points spaced $latex 1/(2W)$ seconds apart.

The PIDS update every 20 seconds, or at a rate of 0.05 Hertz. Accordingly, we must sample the predictions every 10 seconds. But then what? We’ll have a database of predictions; the sampling rate ensures that we will not miss any. But how do we go from predictions to trains? This remains an open question for me.

Obviously, any time we have a prediction indicating that a train is boarding, we know that there is a train physically at the platform. That’s the only time we don’t have to guess. In all other cases, we have to start guessing. One of the more substantial problems is that the it’s hard to figure out where a train is physically, given its arrival time to a station. The WMATA GTFS feed can be used to find the average travel time between two adjacent stations, and the WMATA API can be used to get the distance between those stations. Using that data, you can estimate how many feet away from the station a train is, given the arrival time. But it’s only an estimate, and almost certainly a bad one.

Have I mentioned how much easier this would be if there were an API call that would return every train being tracked by AIM and the track circuit being occupied by the head of the train? And have I mentioned the inconsistency inherent in the fact that the API will readily return the position of every Metrobus on the road, straight from OrbCAD, but all we can get from AIM is predictions?

Anyway, suppose we can get an accurate picture of where the trains are, then what can we do with that data? When you can see all of the trains at once, you can detect bunching and gaps. In addition, the PIDS only show predictions for trains arriving in the next 20 minutes, and tend to fail miserably when trains are single-tracking. A real feed of train positions might make it possible to offer better information to passengers during track work and disruptions, when the PIDS are often blank or give bad information.

Finally, with the right data, it should be possible to correlate real-time data with the GTFS schedule, and compute on-time performance—not just as the summary metric that WMATA provides, but along a variety of dimensions: by line, by time of day, by day of week, etc. Many questions have been asked about the performance of Metrorail, and ultimately, more data is the only way to answer those questions.

Human factors in rail signalling accidents, and the role of backup systems

The October 2011 report of the WMATA Riders’ Advisory Council Chair contained a note indicating that a RAC member had requested (using the RAC’s investigative powers) documentation from the agency concerning, among other items, the “development of a real-time collision-avoidance system for Metrorail trains”, further defined as a system “designed to serve as a continuous backup system that would provide alerts to potential safety issues, and which would supplement Metrorail’s primary electronic system to prevent crashes”.

This piqued my interest, because it doesn’t sound like the sort of thing which is usually expected of a rail signalling system. Rail signalling systems are (as I’ll reiterate later) meant to be fail-safe, meaning that any failure must lead to the most restrictive signal indication being displayed. So long as the system is properly maintained (and that’ll turn out to be an often erroneous assumption), the system should not permit unsafe conditions to exist.

So, what sort of backup system could be implemented, and do we really need one? More importantly, are there more serious, systemic problems, which will hobble any system implemented?
Continue reading Human factors in rail signalling accidents, and the role of backup systems

Heritage trains and CBTC

Today (June 30, 2011) was the last day of service for the 1967 Tube Stock on the London Underground’s Victoria Line. The 1967 Stock had served the Victoria Line since it opened in 1968, but with the commissioning of new signalling (Invensys Rail’s DTG-R) along with the delivery of the 2009 Tube Stock, the 1967 Stock are no longer welcome on the Victoria Line. The new signalling means that the 2009 Tube Stock are the only trains which can run on the line in passenger service; the old system (used by the ’67 Tube Stock) will soon be decommissioned. Unfortunately, it does not seem likely that TfL will preserve an entire train, so today’s last run was probably the very last we will see of the 1967 Stock in passenger service, on any line.

At the same time, there’s a very good question to be asked: with the Victoria Line having been resignalled, where would a preserved 1967 Tube Stock run? Of course, the train could theoretically run elsewhere on the Tube, but soon there won’t be many other places to go. On the Jubilee Line, SelTrac is now in use across the entire length of the line; the conventional signal heads are all bagged up, and the trainstops pinned down. Resignalling of the Northern Line with SelTrac is in progress, and the Picadilly Line will follow it. The sub-surface lines are not far behind; according to news from TfL, they are to be resignalled with Bombardier CITYFLO 650. Eventually the whole of the Tube will be resignalled, and then where will preserved trains go?
Continue reading Heritage trains and CBTC

Better rail data for better apps

A number of transit agencies, including WMATA, BART, CTA, and the MBTA, offer real-time data rail system data as part of their open data initiatives. While each agency’s implementation is different, most revolve around distributing train predictions—the same data used by countdown clocks in stations. This data is immediately useful for many applications, like mobile and desktop widgets which mimic the look and feel of in-station countdown clocks. But for some applications, like How’s Metro, How’s the T (by the same developer), and this real-time train map for Metrorail, train predictions are not useful. These applications depend on knowing where trains are, rather than knowing how far a train is from a given station.

In short, if the only rail data a transit agency provides is train predictions, then developers who want to know the actual locations of trains must reverse-engineer that data from the predictions. Generating predictions based on track circuit occupancy and schedule information is far from a flawless process; the end result often contains glitches. When developers try to work backward from train predictions to infer the actual positions of trains, the result is noisy data. And besides, why should developers have to make the effort, when the transit agencies themselves already have the data available?
Continue reading Better rail data for better apps

SPAD mitigation at Brentwood Yard

While riding north on B1 track today, I noticed two new signs marked “SIGNAL B99 06 ON LEFT”, one at the north end of the platform at New York Ave., and one in the right-of-way. I assume this is a SPAD mitigation measure linked to recent incidents. As far as I know, this is the first time signs of this nature have been installed on the Metrorail system, although they are common elsewhere.