Synoptic first!

So, you’re a transit agency (or vendor, consultant, system integrator, etc.), and you’ve decided to develop an API to expose your real-time data. Perhaps you’ve gotten queries from developers like “I want to be able to query an API and get next bus arrivals at a stop…”.

It’s hard to say “no”, but fulfilling that developer’s request may not be the best way to go. If you have limited resources available to expose your data, there are better approaches available, which will in the long term enable the development of more advanced applications.
Continue reading Synoptic first!

Traction motor, HVAC unit, AVL system?

If a transit agency runs its own motor shop for rebuilding traction motors, runs its own electronics shop for performing component-level repair of circuit boards, runs its own axle shop for rebuilding axles, why shouldn’t it be able to do the same for the software which is just as vital to everyday operation as axles and traction motors?

I recently came across a very interesting paper describing the successes of SEPTA’s Woodland Electronic Repair Shop. At SEPTA, the justification for in-house electronics repair is twofold: many components which come into the shop are not actually defective, and had they been sent to an outside shop for repair, time and money would have been wasted only for the outside shop to return the same “no trouble found” verdict, and, secondly, sending equipment to an outside shop is expensive—by SEPTA’s analysis, more than double the cost of operating an in-house electronics shop.

Transit agencies may not think of themselves as being technology-oriented, but the reality is that software systems are at the heart of so many things transit agencies do—from scheduling to passenger information to signals and communications. Agencies almost universally rely on vendor support for large software packages that perform a wide range of functions: scheduling, trip planning, real-time passenger information, and even safety-critical tasks in signalling and communications.

Yet in comparison to the nuts and bolts which keep the a transit system moving, most transit agencies have shockingly little control over their vital software. Being closed-source and proprietary, the agency is unable to develop its own bug fixes, patches, and new features, and may not even be able to export data in an open format. By controlling the availability of support and new features, the vendor dictates when the agency upgrades—and by using proprietary data formats and interfaces, the vendor all but guarantees that the agency will return to them instead of shopping around. This is the very same risk that SEPTA’s electronics shop seeks to mitigate:

At some point the vendor will no longer support their particular system and since you have always relied upon them for their parts you will have no choice but to go out for bid to get a new system or an alternately designed part to perform the same function.

When procuring new equipment, SEPTA demands access to schematics and test equipment, so that their repair shop can do its work. Without this access, the results are predictably poor. SEPTA found that costs for one class of parts had increased 94% over two years—an “astronomical” price increase at an agency used to inexpensive in-house repair. The explanation, from SEPTA’s engineering department, is depressing:

These are so expensive because SEPTA has no alternative but to purchase these parts from the OEM.

This is why our equipment specifications have a requirement that the Vendor provide SEPTA with all test equipment, documentation and training to allow us to repair the circuit boards in our electronic repair shop at Woodland. The CBTC project did not have a specification from Engineering, but rather was supplied for liquidated damages from the M4 program. It was understood from the beginning that SEPTA would not have the capability to repair the circuit boards.

The complexity and safety aspect of these boards prevents SEPTA from creating drawings and specifications that would allow an alternate supplier to produce these boards.

So, what is the parallel for a software project? Where an electronics shop has schematics, where a mechanical shop has blueprints, a software shop has source code and supporting tools. When a transit agency has access to the source code for a software system, they can perform their own in-house work on the system, on their own schedule, and with their own staff. New features are developed to meet the agency’s needs, not according to a vendor’s whims. Even if the agency elects to bring in contracted support to develop bug fixes or new features, they retain complete control over the process—and, more importantly, they own the end product.

Transit agencies may feel ill-at-ease at the prospect of getting into the business of software development, but the reality is that by bringing software skills in-house, they can realize the same gains as when they bring mechanical and electronic repair and overhaul in-house. In fact, the potential gains are even greater for software, when agencies use open-source software and actively participate in the surrounding community. Many of the fundamental problems of mass transit are the same from agency to agency, and software developed to solve a problem at one agency is very likely to be usable (at least in part) at other agencies.

Open standards are a force multiplier for civic software

In software engineering, software modularity and reusability are considered best practices. Unfortunately, in the civic software world, these principles are often ignored, because governments and public bodies fail to use open standards and interfaces for their data.

When governments and other public bodies adopt open standards, everyone wins. Consider, for example, the Open311 standard. When a city implements an Open311 endpoint, its citizens suddenly have the option of using any software which has been developed to support the Open311 standard. There’s no need for civic hackers in Chicago to develop one iPhone app, only for another group of civic hackers in New York to implement a substantially similar app, just because the two cities use different APIs.

For those developers, the use of open standards acts as a force multiplier: they don’t have to know anything about the cities where their apps are being used, because they all adhere to the same standards. Civic software is, for the most part, the domain of non-profits and individuals working on their own time. Development resources are far from unlimited, and, simply put, we have neither the time nor money to spend on what might be characterized as niche applications which only have use in a limited geographic area, or with one government’s proprietary API.

Closer to home, I’ve watched over the past few years as developers have expended needless effort building transit apps for the Washington, D.C. metropolitan area, simply to accommodate local transit authorities’ refusal to publish clean, high-quality data in standard formats.

Arlington County’s Mobility Lab Transit Tech initiative has developed two applications which rely on data from transit authorities. One is a package for driving real-time transit signs, and the other is Transit Near Me, a mobile webapp for mapping transit options.

I want to emphasize that I don’t mean to minimize the work of the Mobility Lab developers—in the end, they did what they needed to in order to be able to ship a working product, given the data they had access to.

Having said that, though, these applications are not all that different from similar transit apps which have already been built. The Mobility Lab’s real-time sign, for example, is (in terms of basic design concepts) not all that different from the OneBusAway sign mode.

Granted, the Mobility Lab’s real-time sign looks more polished, and includes support for transit modes like bike sharing, but imagine if instead of building a new piece of software from the ground up, the Mobility Lab developers had worked to polish the OneBusAway sign mode and add support for other transit modes?

Had they done so, every city which uses OneBusAway would have been able to benefit immediately from the improvements.

But, there’s a problem. OneBusAway consumes real-time transit information in the GTFS-realtime and SIRI VM formats. Out of the agencies in the region, only Montgomery County Ride On and VRE provide GTFS-realtime data. The other agencies which provide real-time data use proprietary formats which are incompatible with GTFS-realtime. Without detouring too deeply into technical territory, WMATA’s proprietary API actually provides all of the information that would be necessary to construct a GTFS-realtime feed for Metrobus, were it not for the fact that the API uses route, stop, and trip identifiers which are completely different from those in the static GTFS schedule.

The same goes for Transit Near Me; it is, in essence, a mobile version of the OpenTripPlanner system map. Cities around the world have adopted OpenTripPlanner; wouldn’t they also benefit from an interactive system map optimized for mobile devices?

OpenTripPlanner is designed to consume clean, well-constructed GTFS feeds, while instead Transit Near Me must include various work-arounds for idiosyncrasies in WMATA’s data: bad shapes which must be replaced with data from shapefiles, stop IDs which are only available in the API and not the GTFS feed, etc.

I should emphasize again that this isn’t just about the Mobility Lab; their work happens to highlight the problem particularly well, but they’re not the only developers to get caught up in this maelstrom:

What’s the solution? Civic hackers need to stand together with each other, and stand up for good software engineering principles. I doubt that any one developer alone will be able to convince WMATA to get their data in order (goodness knows I’ve tried). But if we stand together and recognize that reinventing the wheel over and over again is not a productive use of our time, we may be able to convince data providers to embrace open standards. When we do, it will have benefits not just locally, but for people around the world who benefit from the work of civic hackers.

Montgomery County Ride On finally has an API—sort of

After close to a year of unanswered questions and rancor, Montgomery County Ride On finally has an API. Earlier this year, Montgomery County placed their real-time passenger information system into production, under the brand Ride On Real Time. But, there was still no API.

In late April, while clicking around on Ride On’s site, I stumbled across a poorly-advertised developer resources page. Lo and behold, Ride On finally had an API.

This is, of course, what I’ve been calling for since October 2011. But does that make Ride On an open data success story? Does that make Montgomery County a good steward of taxpayer money in planning IT expenditures? Unfortunately, no.

First, about the software behind the API. It’s not something built into OrbCAD, nor SmartTraveler Plus, nor is it part of some other COTS product. Instead, it’s generated by a custom product developed by Greenhorne & O’Mara.

I don’t have any objection to transit authorities getting involved in the design and implementation of custom software—but unlike MTA NYCT’s involvement in the development of OneBusAway, there’s not much reusable value here. The software developed by G & O hasn’t been released as open source, and even if it were, it would only help those transit agencies that run OrbCAD and have no other interface to it (and they’d be better off just running OneBusAway directly).

So, strike one—building more single-use software. At least it’s built on top of a fairly modern software stack, including CouchDB and Ruby on Rails.

Now, on to the API itself. There are two major standards for real-time transit data: GTFS-realtime and SIRI. Unfortunately, even some of the newer real-time transit APIs, like the Metro Transit NexTrip API and the OC Transpo Live Next Bus Arrival Data Feed, use neither of these standards. These APIs are harder for developers to use, because they require custom code which will only be useful in a particular city. There are hundreds of transit systems in the United States, and thousands around the world, and eventually, hopefully, they will all have real-time data. Writing custom code for each one of them is simply unsustainable, and that’s why standards matter.

Does Ride On’s API support industry standards? Yes, sort of. In addition to a handful of custom methods, the API also has a method which returns GTFS-realtime output, although it’s not what you’d expect. The API’s GTFS-realtime output is in the text-based Protocol Buffers format (which is intended only for debugging, not for production use), rather than the binary format expected by GTFS-realtime tools. It didn’t take much effort for me to put together a little Python script to ingest the text-formatted feed and output a valid, binary GTFS-realtime feed which worked well with OneBusAway.

The API requires authentication, a nuisance that some agencies, like BART, have done away with:

We’ve never asked you to register for BART open data and our License Agreement is one of the least restrictive in the business.

Worse, the Ride On API gives you a token, and then doesn’t tell you what to do with it. Instead, you have to log in with your username and password on every call to the API—not an insurmountable challenge, but still rather peculiar.

In summary, I’m glad Ride On finally has an API, and I’m even more excited that I was able to get the feed working with OneBusAway. But on the whole it’s a disappointment. Xerox hasn’t built native SIRI or GTFS-realtime support into OrbCAD or SmartTraveler Plus. If they had, it would be a lot easier for every agency using those products to offer a standards-compliant real-time feed. Instead, the (idiosyncratic) implementation depends on yet more custom software.

It is a success for open data, but not an unqualified success—and certainly not a model to follow.

WMATA’s half-hearted open data hurts everyone

I’ve written before about WMATA’s API for train positions and API for bus route information. This time, it’s WMATA’s API for elevator and escalator status that is cause for concern. It’s good that WMATA provides this data in a machine-readable format—in fact, they’re one of only a handful of agencies to do so—but as with WMATA’s other APIs, the implementation is half-hearted at best.

Inconsistent data, the absence of a formal developer relations mechanism, and unexplained, unannounced outages are bad for everyone. They make WMATA look bad, obviously. But more importantly, they make developers look bad, and reduce the incentive for local developers to build applications using WMATA’s data. When someone finds that an app doesn’t work, or that they’re getting stale, incomplete, or inconsistent data, their first instinct is usually to blame the app or the app’s developer, not WMATA.

What’s specifically wrong with the ELES API?

  • 11-day outage, made worse by non-existant developer relations:
    From March 28 to April 9, 2012, the ELES feed returned static data. This outage was never acknowledged publicly by WMATA, in any medium.

    Because WMATA does not provide any public point of contact for developer relations, there was no way for developers to formally report the problem, nor any way for developers to get useful information like an estimated time to resolution.

    An API outage such as this may seem like the sort of thing that would only impact a handful of transit data nerds, but rest assured, there were absolutely real-world impacts: Elevator-dependent Metrorail users who relied on mobile applications which used data from the API found themselves trapped at stations where the stale data led them to erroneously believe that an elevator was in service.

    While this may have been a one-time problem, the underlying issue remains: how could a critical service have gone down for 11 days with no public notice?

  • Feed missing information from the Web site:
    Like much of the information in WMATA’s open data initiative, the ELES API presents the same data as is presented on WMATA’s Web site…or at least that’s how it’s supposed to be.

    In reality, while the Web site lists “estimated return to service” dates for each elevator/escalator, that information is omitted from the API. In addition, others have observed that the API feed and Web site don’t always seem to be in sync. This could create considerable confusion for riders who sometimes check the Web site directly and sometimes use an app which gets data from the API.

  • Feed missing information necessary for maximum usefulness:
    Before presenting this point, it’s important to explain how the elevator outage information is used by elevator-dependent riders. When an elevator-dependent rider sees that there’s an elevator outage at a transfer station that will affect them, they generally avoid the outage by transferring at another station (for example, at Fort Totten rather than Gallery Place).

    But if it’s at their origin or destination station, then they can either use another nearby station (like Judiciary Square rather than Gallery Place), or they can call for a shuttle.

    Calling for a shuttle is a difficult, time-consuming process, but in many cases, especially for outlying stations, it’s a necessity.

    Neither WMATA’s Web site nor the API contain a key piece of information needed by elevator-dependent riders: where to go to get a shuttle—which station, which exit at that station, etc. This information is displayed on the PIDS, but is simply not available on the Web in any format.

  • No master list of units:
    As I explained when I wrote about WMATA’s performance monitoring program, including the agency’s Vital Signs Report, only summary statistics are available for WMATA’s elevators and escalators. Want to know which specific units have the best or worst track records? Want to know if a major overhaul has improved a unit’s availability? Want to know how the units at transfer stations hold up, compared to their peers at less-trafficked stations? You can’t, at least not with the data in the Vital Signs Report.

    But, that doesn’t mean it’s absolutely impossible to compute those statistics; it just takes more work. For one thing, you can forget about getting historical data. However, if you’re willing to archive data from the ELES API, you can actually create your own statistics. Store that in a database, and over time you’ll build up a record of which units were out of service, and when. Transfer the result of that into an OLAP cube, and you can slice and dice to your heart’s content. Want a report on units at transfer stations? Done. Want stats on outages specifically at peak hours? Done. Want a report just on your home station? Done.

    There’s only one piece missing: a list of all elevators and escalators in the Metrorail system. Why is this necessary? In order to compute statistics with the outage data, we have to know how many units there are—in statistical terms, the universe. Of course, we can find out from WMATA’s Web site that there are a total of 588 escalators, and 239 elevators, but that’s only good enough for computing the same system-wide metric that the Vital Signs Report provides. Any more detailed analysis—like at a per-station level, or a per-line level, or any of the examples given above, requires knowing not just how many units there are, but the IDs of those units, and their locations (so statistics can be computed on a per-station, or even per-unit level).

    If WMATA had made a real commitment to transparency and open data, and if there were a developer liaison appointed, I’d imagine it might take a day or two to get such a master list of units made available as a CSV or XML file—I would have to imagine that somewhere in the 100 TB of data managed by WMATA, there must be a list of these 827 units.

    But there isn’t even anyone to ask for the data. And, to make matters worse, every such request is treated with suspicion and mistrust. There’s no sense of developers working cooperatively with WMATA; it is, from the outset, combative. Yes, some of these data will make WMATA look bad, but some will make the agency look good—especially when it can be shown that a major overhaul, such as is taking place now at Dupont Circle and will soon take place at Bethesda, improves the reliability of the overhauled units. Besides, transparency isn’t about releasing the data that make you look good, it’s about releasing data, period.

What’s the point of all this, then? When General Manager Sarles says that he “[doesn’t] want to hide problems”, or that the Metro Forward campaign is making tangible improvements for riders, I expect to see data to back up those assertions.

When elevator-dependent riders have to cope with yet another outage, I don’t want for them to find out for the first time when they get to their destination and the only notice they have is a cone in front of the elevator door. I want for there to be timely (and, more importantly, meaningful) information available, in a wide variety of formats, including a high-quality API that encourages app developers to build tools that further increase the accessibility and further widen the dissemination of that information.

Why do I expect these things? I expect these things because Metrorail is supposed to be “America’s Subway”, a world-class system at the forefront of technological innovation and operational excellence. Right now, it is neither of those things. Instead, it is a system where riders climb up and down stopped escalators in dimly-lit stations and hope that their train does not pass over another poorly-maintained track circuit which shall fail to detect that it has become occupied and engender yet another fatal collision. It is a system where secrecy and the maintenance of fiefdoms are the norm, not transparency and cooperation for the good of the riding public.

I don’t claim that open data (and better still, open data that is timely and meaningful) will solve all of those problems, but it is a small step forward, and a step that WMATA could easily take using its existing infrastructure.