Legacy AVL system? It’s okay, join the club.

If you work with real-time transit data, you’ve probably heard the steadily-increasing call for data producers to release their data in open, standardized formats like GTFS-realtime and SIRI. But how do you actually make your data available in those formats? Some AVL vendors are beginning to include standards-compliant APIs in their products, and that’s great for agencies considering a new system or major upgrade. But what about the massive installed base of legacy AVL systems which have few open interfaces, if any?

Fortunately, there are ways to get data out of almost any AVL system, whether it was explicitly designed with open interfaces or not. Some of these techniques are more technologically sound than others, and some may require some relatively tricky programming, but if you can find the right software developer, almost any problem is soluble.

Here are five key strategies for extracting information from an AVL system. The first three are strongly recommended, while the last two should only be undertaken if no better interface is available, and if you have adequate technical support to implement a more complex solution.

  • Transform a proprietary API to GTFS-realtime or SIRI: Many AVL systems (both COTS and agency-homegrown) include non-standard APIs which can, with a bit of programming, be transformed into a modern, standards-compliant API. This is the approach I took with wmata-gtfsrealtime, to produce a GTFS-realtime feed from WMATA’s real-time bus data, septa-gtfsrealtime to produce a GTFS-realtime feed from SEPTA’s real-time bus and rail data, and ctatt-gtfsrealtime to produce a GTFS-realtime feed from CTA’s Train Tracker data. This is also the approach taken by onebusaway-gtfs-realtime-from-nextbus-cli, which converts from the NextBus API, and bullrunner-gtfs-realtime-generator, which converts from the Syncromatics API.
  • Query a reporting database: Some AVL systems can be configured to log vehicle positions, predicted arrival times, and other information to a database. Ostensibly these databases are meant to be used for after-the-fact incident analysis, performance reporting, etc., but there’s nothing stopping an application from polling the database every 15-30 seconds to get the latest vehicle positions and predicted arrival times. Many GTFS-realtime feed producers take this approach, including ddot-avl, built by Code for America to extract real-time information from DDOT’s TransitMaster installation, HART-GTFS-realtimeGenerator, built by CUTR to extract real-time information from HART’s OrbCAD installation, and live_transit_event_trigger, built by Greenhorne & O’Mara (now part of Stantec) to produce a GTFS-realtime feed from Ride On’s OrbCAD installation.
  • Parse a published text file: Similar to the database approach, some AVL systems can be configured to dump the current state of the transit network to a simple text file (like this file from Hampton Roads Transit). This text file can be read and parsed by a translator which then generates a standards-compliant feed, which is the approach taken by hrt-bus-api, built by Code for Hampton Roads, and onebusaway-sound-transit-realtime.
  • Screen-scrape a passenger-facing Web interface: This is where we get into the less technologically-sound options. While the first three options focused on acquiring data from machine-readable sources, screen scraping involves consuming data from a human-readable source and transforming it back into machine-readable data. In this case, that might mean accessing a passenger-facing Web site with predicted arrival times, extracting the arrival times, and using that to produce a standards-compliant feed. This is the approach taken by this project, which screen-scrapes KCATA’s TransitMaster WebWatch installation to produce a GTFS-realtime feed. Compared to options which involve machine-readable data sources, screen-scraping is more brittle, and may make it more challenging to produce a robust feed, but it can be made to work.
  • Intercept internal AVL system communications: This is the last resort, but if an AVL system has no open interfaces, it may be possible to intercept communications between the components of the AVL system (such as a central server and a dispatch console or system driving signage at transit stops), decode those communications, and use them as the basis for a standards-compliant feed. This is a last resort because it will often require reverse-engineering undocumented protocols, and results in solutions which are brittle and will tend to break in unpredictable ways. But, it can be done, and if it’s the only way to get data out of an AVL system, then go for it. This is the approach taken by onebusaway-king-county-metro-legacy-avl-to-siri.

As evidenced by the example links, every one of the strategies mentioned above has been implemented in at least one real-world application. No matter how old your AVL system is, no matter how far out of warranty or how unsupported it is, no matter how obsolete the technology is, some enterprising civic hacker has probably already figured out a way to get data out of the system (or is eager and ready to do so!). Every one of the tools linked in this post is open-source, and if it closely approximates your needs, you can download it today and start hacking (or find a local civic hacker and have them adapt it to meet your needs). And if none of the tools look close? Don’t head for your procurement department and have them issue an RFP—instead, post on the Transit Developers Google Group; chances are your post will make its way to someone who can help, whether a local Code for America brigade, or an independent civic hacker, or another transit agency that has already solved the same problem.

Finally, I’d like to thank the participants in the Disrupting Legacy Transit Ops Software (Moving Beyond Trapeze) session at Transportation Camp DC 2015, who inspired me to write this post.

Reprogramming a u-blox MAX-7Q in-situ on a Raspberry Pi

Suppose you have a u-blox MAX-7Q GPS module connected to a Raspberry Pi, and you need to reprogram the module (for example, to enable/disable certain NMEA strings, change the baud rate, etc.). You could manually construct the various binary UBX strings and send them through gpsd or straight out the serial port, but that’s needlessly complex.

You could also connect the module to a Windows PC and use u-center to reprogram it, but that’s a bit of a nuisance too. If the module is conveniently packaged for connection to a Raspberry Pi, then you aren’t going to have a readily accessible USB port, nor an RS-232 serial port that you could connect directly to a PC. Sure, you could cobble together a USB-serial interface (or a real hardware serial port, rare as they are nowadays) and an RS-232–3.3 volt level shifter like the MAX3232CPE, or just use this convenient cable from Adafruit, which provides a +5 volt supply and 3.3 volt serial interface from USB, but it’s still not quite plug-and-play.

But, there’s an easier way that avoids all of those hassles (although it still requires you to have a Windows PC)—enter socat!

$ sudo socat tcp-l:2000,reuseaddr,fork file:/dev/ttyAMA0,nonblock,waitlock=/var/run/ttyAMA0.lock,b9600,iexten=0,raw

This exposes the Raspberry Pi's serial port at TCP port 2000, so you can connect to it over the network from a Windows PC running u-center. You might think you'd then have to use extra software on the Windows side to get a TCP socket to appear as a virtual COM port, but u-center has support for network interfaces built in. Just enter the Raspberry Pi's IP address and the port number, and it will happily connect to the module via socat.

How should transit agencies make their GTFS available?

To many techies, the question of how transit agencies should make their GTFS available might seem like a silly one. They’d reply that obviously the agency should simply post their GTFS to their Web site at a reasonable URL, and make that URL readily available from the agency’s developer resources page.

Unfortunately, it isn’t nearly so simple in the real world. Instead, many agencies hide their GTFS behind a “clickwrap” license, or even require a login to download the feed. In a few particularly bad cases, developers even have to sign an agreement and return it (on paper) to get access to a feed. Some agencies don’t host their own feeds at all, instead depending on sites like the GTFS Data Exchange.

So, what are some best practices for hosting GTFS feeds?

  • Don’t rely on third parties: Think of this in terms of paper maps and schedules. How would riders feel if a transit agency told them to pick up transit maps and timetables not at the agency’s offices or stations, but rather some unrelated third party? If a transit agency has a Web site (as almost all do), then it should be capable of hosting its own GTFS feed. Sure, some agencies will complain about what their content management system “won’t let them do”, or complain that they must go through some arduous process to upload new content, but in 2014 running a Web site is a basic competency for almost any organization. Depending on a third-party site introduces additional risk and additional points of failure.
  • Help developers discover feeds: Developers shouldn’t have to hunt for GTFS feeds–there should be a prominent link on every agency’s homepage. Bonus points for participating in any applicable data catalogs, like these operated by ODOT and MassDOT for agencies in their respective states.
  • No login, no clickwrap: GTFS feeds should be downloadable by any Internet user, without having to log in or accept a license agreement. This is a must-have for being able to automate downloads of updated GTFS feeds, an essential part of any large-scale passenger information system. Don’t make it needlessly hard for developers to use your GTFS feed – if you can’t download it with wget, then you’re just making work for feed users. The only piece of information a developer should need to know to use an agency’s GTFS feed is the URL—a clean, simple URL like http://www.bart.gov/dev/schedules/google_transit.zip.
  • Support conditional HTTP GET: GTFS feeds rarely change every day, but it’s still important to get updates as soon as they’re available. But downloading a large feed (some can be 20 MB or more) every day is wasteful. So how can feed consumers stay up-to-date without wasting a lot of bandwidth? Feed producers should support conditional HTTP GET, using either the ETag or Last-Modified headers.

Agencies may balk at some of these recommendations—”But we have to track usage of the feed! But we have to have a signed license agreement!”—but the simple fact is that there are plenty of agencies that get it right. There are plenty of agencies that use a simple, reasonable license, and plenty of agencies that host their GTFS at a stable URL that supports automated downloads. If you demand a signed license agreement, or make developers log in to access the feed, you make it harder for developers to use your data. When you make it hard for developers to use your data in their apps, you make it harder for transit riders to get service information, because many riders’ first stop when they need transit information is a third-party smartphone app.

Apps ≇ frequency

Mobile apps for real-time passenger information are neither approximately nor actually equal to frequency of service. (and yes, “neither approximately nor actually equal to” is the name of the character in the title of this post)

But that doesn’t mean real-time passenger information isn’t valuable. On the contrary, it’s immensely valuable, in the right circumstances. For discretionary riders, who can vary their arrival and departure times, real-time passenger information is valuable. For passengers who have somewhere to wait before the bus or train comes (the proverbial “have another drink before you go”), it’s valuable. But for passengers, particularly transit-dependent passengers, who are trying to mesh the geometry of transit with their complex lives, nothing beats frequency of service.

Consider an example: you will depart Event A at 1:00 PM, and must be at Event B by 2:00 PM. A bus route connects the two, and the trip takes 45 minutes. If the route runs every 15 minutes (or more frequently), you have a good chance of making your second appointment, possibly even if you miss one arrival (or if the trip loses some time en-route). But if the bus runs every 20 minutes? Every 30 minutes? It becomes a game of chance. You might make your appointment, or you might not. Knowing when the bus will come does nothing to change the inexorable geometry of a low-frequency transit network—you may know when the bus is coming, but it’s still not going to get you to your appointment on time.

Some people might say “but you can call or text and push your appointment back!”. Sure, some people can. If you’re fortunate to be in a privileged position where you can dictate other people’s schedules, then you’re all set. But most of us simply have to be where we’re supposed to be, when we’re supposed to be there. So while it may help to know just how late you’re going to be, that neither excuses nor mitigates the impacts.

This is why transit planner and consultant Jarrett Walker says “frequency is freedom”. Sure, apps may help reduce wait time, but if a transit service is simply too infrequent to be useful, discretionary riders won’t ride, and captive riders will suffer.

Additionally, the benefits of real-time passenger information really only become apparent when the information provided to passengers is accurate and reliable. This isn’t a nuts-and-bolts post, so I will refrain from naming particular vendors or transit agencies, but not all real-time information is created equally.

It doesn’t do passengers any good, for example, when they arrive at a bus stop just as their app tells them the bus should be arriving, only to find that the bus departed several minutes prior. Nor does it do them any good to stand at a bus stop (in the cold, in several feet of snow, in the blazing summer heat, etc.), watching an app count down from ten minutes, to five minutes, to one minute, and then back up again, with no bus in sight. When these things happen, passengers become disillusioned. They lose faith in the system. In the short-term, they give up on the bus or the train and call a cab or book an Uber or walk. In the long-term, they begin making plans that allow them to avoid transit—perhaps they even buy a car.

As as software developer, and one who works on real-time passenger information systems, I’m not going to say that apps aren’t good. But I am also a transit rider, and I know there’s a balance. Two Sundays ago, for example, after leaving the Conveyal TRB Welcome Party in Columbia Heights, I walked over to 16th Street to catch an S bus home. OneBusAway told me that the next bus was 18 minutes away—that I’d just missed the previous bus—and there wasn’t a thing I could do about it. App or no app, I was going to sit and wait in the cold for another 18 minutes until the bus arrived. Arguably I should have checked OneBusAway before I left, but that’s precisely what “frequency is freedom” means: it was late, I was tired, and I was ready to go home. Not in another 18 minutes, but now (or maybe in another 5 or 10 minutes, but certainly not 18).

Telling people to plan their lives around a transit app just isn’t a good way to lure them out of their cars or endear them to transit. It’s much more compelling (and leads to a much more usable transit system) when we can simply tell people “show up at a stop, and there’ll be a bus in 10 minutes or less”. It’s also not a problem app developers can solve alone; as the cliché goes, when all you have is a hammer, everything looks like a nail. Providing reliable real-time passenger information is a good first step towards improving the usability of a transit network, and one that is often far less expensive than actually increasing frequency of service. But that doesn’t mean our work is done once the app goes live; on the contrary, we’ve only just begun.

Why “they’re not on NextBus” isn’t the problem it sounds like

Being active in open data for transit and real-time passenger information, one of the complaints I sometimes hear leveled at transit agencies is “They’re not on NextBus!”.

This bothers me. A lot.

Why? There are two reasons. The first is pretty simple. Sometimes, when people say “NextBus”, what they really mean is real-time passenger information, without any concern for the specific provider. But “NextBus” is a trademarked name for a specific proprietary real-time passenger information provider; if what you really mean is “real-time passenger information”, then say so.

The second reason is more pernicious. A lot of people use mobile apps for transit which are designed around the NextBus API. So, they work everywhere that the local transit agency has elected to contract with NextBus for real-time passenger information. On its face, this seems like a huge success for transit riders—one app for dozens of cities! But, it’s not. Vendor lock-in isn’t the way to achieve real transit data integration.

I understand that transit riders love the idea of having a single app for transit information in every city they visit. I’m a transit rider; I get it. But the solution isn’t to get every agency to pay the same vendor to provide the same proprietary service.

There are many AVL vendors out there; INIT, Xerox, Avail, Clever, Connexionz, and more. Some very forward-thinking agencies, like New York’s MTA, have even decided to act as their own system integrator, and build their own real-time passenger information system, so that they’ll never be beholden to any vendor’s proprietary system. Built on top of the open-source transit data platform OneBusAway, MTA Bus Time provides real-time passenger information for New York’s buses using an open technology stack that saved the MTA 70 percent compared to proprietary alternatives.

So with every agency using a different vendor’s system (and some having rolled their own), how do we provide that integrated experience that riders crave? The answer is simple: by using open data standards. With standards like GTFS-realtime and SIRI, app developers can build apps that work with data from any transit agency and any vendor’s systems. With OneBusAway, for example, I can easily (trivially) make use of feeds from any of several DC-area agencies, York Region Transit, MBTA, BART, TriMet, or any of the other agencies who are releasing GTFS-realtime data. Because these agencies are all using standardized formats for their open data, I don’t have to build anything new in OneBusAway to consume their data—the same code that works for one agency works for all of them.

But NextBus doesn’t provide an API using any recognized standards for real-time transit data. It’s a walled garden of sorts; the NextBus API is great if all you want to do is present data from agencies using NextBus, and terrible if you want to use it as a springboard for building revolutionary real-time passenger information tools.

The real question isn’t “why aren’t you on NextBus”; the real question is “why doesn’t NextBus provide a standards-compliant API”?

Synoptic first!

So, you’re a transit agency (or vendor, consultant, system integrator, etc.), and you’ve decided to develop an API to expose your real-time data. Perhaps you’ve gotten queries from developers like “I want to be able to query an API and get next bus arrivals at a stop…”.

It’s hard to say “no”, but fulfilling that developer’s request may not be the best way to go. If you have limited resources available to expose your data, there are better approaches available, which will in the long term enable the development of more advanced applications.
Continue reading Synoptic first!

Regional mobility is no pipe dream

Robert Smith, former WMATA Board chair, calls WMATA’s proposed loop line a “distracting pipe dream”. For context, Smith was appointed by former Maryland Governor Bob Ehrlich in 2003, then fired in 2006 after making anti-gay remarks.

Smith assails WMATA’s proposal as being in “the realm of fantasy”. In reality, it’s anything but. Every recent analysis of the Metrorail network has highlighted the immense congestion and overcrowding in the core. Far from serving only the core, the proposed loop line connects key transit hubs, enhancing mobility in and out of the core, relieving some of the pressure on the system’s most heavily-congested stations (like Gallery Place).

Yet Smith continues his assault on sensible planning:

What possible benefit of this project would inure to the people of Maryland, particularly those who dwell beyond Montgomery and Prince George’s counties? While they would be spared the capital construction cost, the state would still be zapped with an increase in operating costs into eternity for the privilege of watching more of its residents spend entertainment dollars in the District.

This is reflective of the sort of small-minded thinking that advocates like Richard Layman rail against. When politicians think only of their county or their state, they ignore the fact that we are one region made up of cities and counties from two states, plus the federal city. We succeed together, or we fail together.

Practically speaking, though, what do Marylanders get out of the loop line? For the many Marylanders who commute through the core—whether they enter by MARC, commuter bus, or the Red or Green Lines, the loop line will ease congestion and provide connectivity to destinations in DC which presently have no rail service. Sound transit planning isn’t about politics; it’s about hard data. It may be hard for the suburbs to stomach, but solving Metrorail’s problems—including the problems suburban commuters experience—means increasing core capacity.

Smith’s true colors come out with his next complaint:

Even now, many Montgomery County riders suffer the indignity of being tossed from every other homebound train at Grosvenor-Strathmore station during rush hour, thanks to Metro’s lack of enough dollars — and a supportive vote from the District — to fund the full ride out to Shady Grove.

It seems as though in Smith’s world, Metro exists for the sole purpose of shuttling suburban commuters to and from far-flung park-and-rides. Looking at WMATA’s ridership statistics, though, we can see that that’s just not true. If Smith believes so strongly that the stations between Grosvenor-Strathmore and Shady Grove need the added service, then he should call on the State of Maryland and Montgomery County to fund the service. As a reimbursable project, the other jurisdictions wouldn’t have to contribute—though their Board members would have to vote to approve the service.

Smith describes Metrorail as “[lacking] the engineering simplicity to do the basic job of getting people where they want to go”. It’s true that today’s Metrorail is clearly collapsing under the strain. But that doesn’t signify any underlying engineering failure; rather, it’s the result of years of deferred maintenance, and a failure to plan and build new capacity to accommodate shifts in the region’s population. What WMATA intends to build by 2040 should have been built years ago. Had this been done, there’d be less pressure on the oldest parts of the network, reducing the pain of the sometimes-lengthy maintenance outages necessary to keep this aging rail system running.

So, what should we build? In his article, Smith says we should “build a line that effectively paralleled the Beltway and circled the city”. The Beltway Line isn’t a new concept; it’s something people have been pushing for years. But it’s an idea born out of auto-centric thinking. The fact that people spend hours every day in the parking lot that is the Beltway doesn’t mean that’s where they’re actually trying to go. We need to focus on moving people, not cars, and that means connecting activity centers, not just following existing paths of congestion and sprawl. This is more than just a gut instinct; WMATA has tested plans for a Beltway Line, and found that “[only] the segments that crossed the American Legion Bridge (between White Flint and Dunn Loring) and the Woodrow Wilson Bridge (between Branch Avenue and Eisenhower Avenue) had some promise”.

Smith closes by calling our region a “transportation basket case that needs to focus on reality”. That’s true, but without an increase in capacity it’s only going to get worse. Arguing that we shouldn’t build anything new because it’s too expensive, too unpalatable for the suburban parts of our region, or because WMATA already has operational problems will only prolong the pain.