Why “they’re not on NextBus” isn’t the problem it sounds like

Being active in open data for transit and real-time passenger information, one of the complaints I sometimes hear leveled at transit agencies is “They’re not on NextBus!”.

This bothers me. A lot.

Why? There are two reasons. The first is pretty simple. Sometimes, when people say “NextBus”, what they really mean is real-time passenger information, without any concern for the specific provider. But “NextBus” is a trademarked name for a specific proprietary real-time passenger information provider; if what you really mean is “real-time passenger information”, then say so.

The second reason is more pernicious. A lot of people use mobile apps for transit which are designed around the NextBus API. So, they work everywhere that the local transit agency has elected to contract with NextBus for real-time passenger information. On its face, this seems like a huge success for transit riders—one app for dozens of cities! But, it’s not. Vendor lock-in isn’t the way to achieve real transit data integration.

I understand that transit riders love the idea of having a single app for transit information in every city they visit. I’m a transit rider; I get it. But the solution isn’t to get every agency to pay the same vendor to provide the same proprietary service.

There are many AVL vendors out there; INIT, Xerox, Avail, Clever, Connexionz, and more. Some very forward-thinking agencies, like New York’s MTA, have even decided to act as their own system integrator, and build their own real-time passenger information system, so that they’ll never be beholden to any vendor’s proprietary system. Built on top of the open-source transit data platform OneBusAway, MTA Bus Time provides real-time passenger information for New York’s buses using an open technology stack that saved the MTA 70 percent compared to proprietary alternatives.

So with every agency using a different vendor’s system (and some having rolled their own), how do we provide that integrated experience that riders crave? The answer is simple: by using open data standards. With standards like GTFS-realtime and SIRI, app developers can build apps that work with data from any transit agency and any vendor’s systems. With OneBusAway, for example, I can easily (trivially) make use of feeds from any of several DC-area agencies, York Region Transit, MBTA, BART, TriMet, or any of the other agencies who are releasing GTFS-realtime data. Because these agencies are all using standardized formats for their open data, I don’t have to build anything new in OneBusAway to consume their data—the same code that works for one agency works for all of them.

But NextBus doesn’t provide an API using any recognized standards for real-time transit data. It’s a walled garden of sorts; the NextBus API is great if all you want to do is present data from agencies using NextBus, and terrible if you want to use it as a springboard for building revolutionary real-time passenger information tools.

The real question isn’t “why aren’t you on NextBus”; the real question is “why doesn’t NextBus provide a standards-compliant API”?

What’s wrong with the NextBus API?

When it comes to real-time transit data, one of the common refrains is “just use NextBus!”—but while NextBus may be a common name, that doesn’t make them best choice for providing real-time transit data with a robust open data API for developers. It’s true that NextBus provides an API for developers, but there are problems that hamper or even entirely prevent its use in certain applications.

What are these problems? Some are organizational, and some are technical:

  • API not enabled for all agencies: While NextBus provides service for more than a hundred agencies, only a fraction of those agencies make their data available through the NextBus API.
  • API not standards-compliant: NextBus provides data to developers in their own custom format, rather than using the industry-standard SIRI or GTFS-realtime formats. While NextBus’s API has its advantages for certain types of apps (principally simple mobile apps), for developers working on large-scale passenger information systems, and developers seeking to solve complex problems like real-time routing, there are deficiencies in the NextBus API which could be remedied by using a standardized format. In particular, NextBus makes it exceedingly difficult to get the status of an entire transit system at once. Retrieving data stop-by-stop makes sense for mobile apps, but not for transit data integration platforms like OneBusAway, which benefit from being able to update from a feed containing status updates for all of an agency’s vehicles and trips.
  • Commonality of identifiers: When NextBus agencies also publish a GTFS feed containing their static route and schedule data (which they should), route, stop, and trip identifiers should match those in the NextBus data. When this is not done, it becomes onerous to use the real-time data—developers must expend additional engineering effort to map identifiers between the static and real-time data.
  • Data quality and completeness: Though the NextBus API documentation defines the data elements which developers can expect to find in the API responses, the actual availability of these data varies considerably between agencies. For example, many agencies do not include the tripTag element, which is essential for linking predictions between stops and then to the static schedule. Similarly, some agencies don’t actually provide useful values for the block element. NextBus must impress upon its customers (that is, the transit agencies) the value of supplying high-quality configuration data so that the NextBus API works as intended.

Though the present NextBus API is far from ideal, it is possible to transform the data into standards-compliant GTFS-realtime, which can be fed into any app which uses GTFS-realtime data, but only if the feed has been configured correctly—that is, with meaningful trip IDs, identifiers which match those in the agency’s GTFS feed, etc. Out of all of the agencies which use NextBus, the fraction of those agencies who have enabled the NextBus API and provided NextBus with the right configuration data for the API to be useful to the GTFS-realtime translator is frustratingly small.

NextBus can—and should—do better. Their customers, more than 100 transit agencies in North America, would all benefit from standards-compliant APIs that would allow developers to build apps that work with data produced by AVL systems from all vendors, not just one. This is the essence of open data, and it’s time for NextBus to get on board.

GTFS-realtime for WMATA buses

I’ve posted many times about the considerable value of open standards for real-time transit data. While it’s always best if a transit authority offers its own feeds using open standards like GTFS-realtime or SIRI, converting available real-time data from a proprietary API into an open format still gets the job done. After a few months of kicking the problem around, I’ve finally written a tool to produce GTFS-realtime StopTimeUpdate, VehiclePosition, and Alert messages for Metrobus, as well as GTFS-realtime Alert messages for Metrorail.

The tool, wmata-gtfsrealtime, isn’t nearly as straightforward as it might be, because while the WMATA API appears to provide all of the information you’d need to create a GTFS-realtime feed, you’ll quickly discover that the route, stop, and trip identifiers returned by the API bear no relation to those used in WMATA’s GTFS feed.

One of the basic tenets of GTFS-realtime is that it is designed to directly integrate with GTFS, and for that reason identifiers must be shared across GTFS and GTFS-realtime feeds.

In WMATA’s case, this means that it is necessary to first map routes in the API to their counterparts in the GTFS feed, and then, for each vehicle, map its trip to the corresponding trip in the GTFS feed. This is done by querying a OneBusAway TransitDataService (via Hessian remoting) for active trips for the mapped route, then finding the active trip which most closely matches the vehicle’s trip.

Matching is done by constructing a metric space in which the distance between a stoptime in the API data and its counterpart in the GTFS feed is defined as an (x, y, t) tuple—that is, our notion of “distance” becomes distance in both space and time. The distances fed into the metric are actually halved, in order to bias the scores towards matching based on time, while allowing some leeway for stops which are wrongly located in either the GTFS or real-time data.

The resulting algorithm will map all but one or two of the 900-odd vehicles on the road during peak hours. Spot-checking arrivals for stops in OneBusAway against arrivals for the same stop in NextBus shows relatively good agreement; of course, considering that NextBus is a “black box”, unexplained variances in NextBus arrival times are to be expected.

You may wonder why we can’t provide better data for Metrorail; the answer is simple: the API is deficient. As I’ve previously discussed, the rail API only provides the same data you get from looking at the PIDS in stations. Unfortunately, that’s not what we need to produce a GTFS-realtime feed. At a minimum, we would need to be able to get a list of all revenue trains in the system, including their current schedule deviation, and a trip ID which would either match a trip ID in the GTFS feed, or be something we could easily map to a trip ID in the GTFS feed.

This isn’t how it’s supposed to be. Look at this diagram, then, for a reality check, look at this one (both are from a presentation by Jamey Harvey, WMATA’s former Enterprise Architect). WMATA’s data management practices are, to say the least, sorely lacking. For most data, there’s no single source of truth. The problem is particularly acute for bus stops; one database might have the stop in one location and identified with one ID, while another database might have the same physical stop identified with a different number, and coordinates that place it in an entirely different location.

Better data management practices would make it easier for developers to develop innovative applications which increase the usability of transit services, and, ultimately improve mobility for the entire region. Isn’t that what it’s supposed to be about, at the end of the day?

An oddity with NextBus short stop titles

The NextBus public API supports “short titles” for routes and stops, which are designed to be formatted for display on mobile devices, and other environments where space is at a premium. Stop and route definitions in the XML returned by the API both contain a title attribute with the full title, and a shortTitle attribute with the short title, if one exists. Unfortunately, there seems to be a bug in the output returned by NextBus which limits the usefulness of this feature.
Continue reading An oddity with NextBus short stop titles

WMATA to adopt Clever Devices for bus AVL

Last night, I was looking at a WMATA RFI for “GIS-based Transit Network Management Software” when I came across the following bit (emphasis mine):

Integrating data and providing support across several other enterprise applications, including…automated vehicle location tracking (Orbital, to be migrated to Clever)

That WMATA uses OrbCAD is a well-known fact, but what is new here is that WMATA intends to migrate to Clever Devices’ bus AVL offering. There was an RFP out earlier in the year for electronics upgrades for the Metrobus fleet, and I assume that this is the result of that RFP.

Aside from the fact that this move will result in substantially better tracking for the Metrobus fleet, it will be interesting to see whether WMATA continues to use NextBus to generate and disseminate arrival time predictions, or if they instead choose to use Clever’s own BusTime for that purpose.

I’m not sure if there’s any good reason for WMATA to continue using NextBus after they migrate to Clever for AVL; the split-system approach, with WMATA using one vendor’s technology for AVL and another’s (in this case NextBus) for predictions is somewhat unusual. WMATA must maintain extra infrastructure to send the raw AVL data to NextBus and then use the NextBus API to fetch predictions for internal use and dissemination through the WMATA developer API. With a BusTime server hosted internally, WMATA would have more control over the environment and internal use of the data.

In addition, unlike some competing AVL products, BusTime offers a fairly rich API (the link is to the CTA’s documentation, but the API is the same across BusTime instances). SIRI would be even better, but the Clever API is fine for now.

Plus, BusTime comes out-of-the-box with a fresh, modern-looking interface which agencies can customize with their colors and logos; see the CTA’s installation.

There’s a mobile site, too, and in case you were wondering, no, it does not expect you to know your stop code—in fact, there isn’t even an option to input a stop code. For riders who already know where they are, being able to input the stop code directly saves some work having to first find the route in the list, then select the direction, then select the stop from the list (and for systems with many routes and many stops along each route, those lists can get unwieldy). A good compromise might be to follow the lead taken by OCTO and DC Circulator, and post QR codes at stops which link directly to the predictions for that stop; that lets riders bypass the hierarchy of selections while still avoiding the manual input.

I’m glad to see that WMATA will be adopting Clever’s AVL system; there’s nothing intrinsically wrong with OrbCAD, but its performance here in DC has not been stellar, and for modern applications, frequent and accurate data updates matter. With more frequent data updates, predictions will be more accurate, and applications built on real-time data analysis (like those to detect bus bunching) will be able to generate better results.

WMATA also has a great opportunity to improve the usability of its bus predictions by adopting BusTime; while there’s really nothing wrong with NextBus, BusTime provides a more modern-looking interface, and would allow WMATA to bring prediction generation in-house, thus saving on NextBus service fees.