Python context managers for CFFI resource management

I have a project at work that involves calling into a third-party C library from Python, using CFFI. Using the library involves working with two different types of “contexts”—things that in an object-oriented language would be classes, but in C are nothing more than structs or void pointers to opaque memory regions.

In pseudo-Python, working with one of these contexts looks like this:

context ="ApplicationSpecificStruct *")
lib.do_something_with_context(context, some_data)

I’ve omitted error handling, but in reality each one of those library calls returns an error code which must be checked and handled after each call (no exceptions in C, remember!).

In order to simplify the process of working with the context, and especially of ensuring that it is still freed if an exception is thrown while it is being used, I have found Python’s context management protocol helpful.

Using contextlib.contextmanager, we can create a simple context manager that encapsulates the process of creating and freeing the C library’s context.

def application_context():
  context ="ApplicationSpecificStruct *")

    yield context

with application_context() as the_context:
  lib.do_something_with_context(the_context, some_data)

If an exception is thrown inside the with-block, the C library’s context will still get freed.

Now, I suspect some will argue that this is a cop-out—that the more Pythonic thing to do would be to create proper object-oriented wrappers for the library’s context types. There are certain advantages to this approach; principally, it enables cleaner code, in which lib.do_something_with_context(the_context, some_data) becomes simply context.do_something(some_data).

But building fully object-oriented wrappers is both more tedious and more time-consuming, and for what I’m doing the context manager approach is perfectly suitable. Besides, even if I’d implemented fully object-oriented wrappers, I’d still want them to implement the context manager protocol—it’s the Pythonic way to ensure that a resource is closed after it’s been used, as in this example from the Python documentation:

with open("hello.txt") as f:
  for line in f:
    print line,

Legacy AVL system? It’s okay, join the club.

If you work with real-time transit data, you’ve probably heard the steadily-increasing call for data producers to release their data in open, standardized formats like GTFS-realtime and SIRI. But how do you actually make your data available in those formats? Some AVL vendors are beginning to include standards-compliant APIs in their products, and that’s great for agencies considering a new system or major upgrade. But what about the massive installed base of legacy AVL systems which have few open interfaces, if any?

Fortunately, there are ways to get data out of almost any AVL system, whether it was explicitly designed with open interfaces or not. Some of these techniques are more technologically sound than others, and some may require some relatively tricky programming, but if you can find the right software developer, almost any problem is soluble.

Here are five key strategies for extracting information from an AVL system. The first three are strongly recommended, while the last two should only be undertaken if no better interface is available, and if you have adequate technical support to implement a more complex solution.

  • Transform a proprietary API to GTFS-realtime or SIRI: Many AVL systems (both COTS and agency-homegrown) include non-standard APIs which can, with a bit of programming, be transformed into a modern, standards-compliant API. This is the approach I took with wmata-gtfsrealtime, to produce a GTFS-realtime feed from WMATA’s real-time bus data, septa-gtfsrealtime to produce a GTFS-realtime feed from SEPTA’s real-time bus and rail data, and ctatt-gtfsrealtime to produce a GTFS-realtime feed from CTA’s Train Tracker data. This is also the approach taken by onebusaway-gtfs-realtime-from-nextbus-cli, which converts from the NextBus API, and bullrunner-gtfs-realtime-generator, which converts from the Syncromatics API.
  • Query a reporting database: Some AVL systems can be configured to log vehicle positions, predicted arrival times, and other information to a database. Ostensibly these databases are meant to be used for after-the-fact incident analysis, performance reporting, etc., but there’s nothing stopping an application from polling the database every 15-30 seconds to get the latest vehicle positions and predicted arrival times. Many GTFS-realtime feed producers take this approach, including ddot-avl, built by Code for America to extract real-time information from DDOT’s TransitMaster installation, HART-GTFS-realtimeGenerator, built by CUTR to extract real-time information from HART’s OrbCAD installation, and live_transit_event_trigger, built by Greenhorne & O’Mara (now part of Stantec) to produce a GTFS-realtime feed from Ride On’s OrbCAD installation.
  • Parse a published text file: Similar to the database approach, some AVL systems can be configured to dump the current state of the transit network to a simple text file (like this file from Hampton Roads Transit). This text file can be read and parsed by a translator which then generates a standards-compliant feed, which is the approach taken by hrt-bus-api, built by Code for Hampton Roads, and onebusaway-sound-transit-realtime.
  • Screen-scrape a passenger-facing Web interface: This is where we get into the less technologically-sound options. While the first three options focused on acquiring data from machine-readable sources, screen scraping involves consuming data from a human-readable source and transforming it back into machine-readable data. In this case, that might mean accessing a passenger-facing Web site with predicted arrival times, extracting the arrival times, and using that to produce a standards-compliant feed. This is the approach taken by this project, which screen-scrapes KCATA’s TransitMaster WebWatch installation to produce a GTFS-realtime feed. Compared to options which involve machine-readable data sources, screen-scraping is more brittle, and may make it more challenging to produce a robust feed, but it can be made to work.
  • Intercept internal AVL system communications: This is the last resort, but if an AVL system has no open interfaces, it may be possible to intercept communications between the components of the AVL system (such as a central server and a dispatch console or system driving signage at transit stops), decode those communications, and use them as the basis for a standards-compliant feed. This is a last resort because it will often require reverse-engineering undocumented protocols, and results in solutions which are brittle and will tend to break in unpredictable ways. But, it can be done, and if it’s the only way to get data out of an AVL system, then go for it. This is the approach taken by onebusaway-king-county-metro-legacy-avl-to-siri.

As evidenced by the example links, every one of the strategies mentioned above has been implemented in at least one real-world application. No matter how old your AVL system is, no matter how far out of warranty or how unsupported it is, no matter how obsolete the technology is, some enterprising civic hacker has probably already figured out a way to get data out of the system (or is eager and ready to do so!). Every one of the tools linked in this post is open-source, and if it closely approximates your needs, you can download it today and start hacking (or find a local civic hacker and have them adapt it to meet your needs). And if none of the tools look close? Don’t head for your procurement department and have them issue an RFP—instead, post on the Transit Developers Google Group; chances are your post will make its way to someone who can help, whether a local Code for America brigade, or an independent civic hacker, or another transit agency that has already solved the same problem.

Finally, I’d like to thank the participants in the Disrupting Legacy Transit Ops Software (Moving Beyond Trapeze) session at Transportation Camp DC 2015, who inspired me to write this post.

Reprogramming a u-blox MAX-7Q in-situ on a Raspberry Pi

Suppose you have a u-blox MAX-7Q GPS module connected to a Raspberry Pi, and you need to reprogram the module (for example, to enable/disable certain NMEA strings, change the baud rate, etc.). You could manually construct the various binary UBX strings and send them through gpsd or straight out the serial port, but that’s needlessly complex.

You could also connect the module to a Windows PC and use u-center to reprogram it, but that’s a bit of a nuisance too. If the module is conveniently packaged for connection to a Raspberry Pi, then you aren’t going to have a readily accessible USB port, nor an RS-232 serial port that you could connect directly to a PC. Sure, you could cobble together a USB-serial interface (or a real hardware serial port, rare as they are nowadays) and an RS-232–3.3 volt level shifter like the MAX3232CPE, or just use this convenient cable from Adafruit, which provides a +5 volt supply and 3.3 volt serial interface from USB, but it’s still not quite plug-and-play.

But, there’s an easier way that avoids all of those hassles (although it still requires you to have a Windows PC)—enter socat!

$ sudo socat tcp-l:2000,reuseaddr,fork file:/dev/ttyAMA0,nonblock,waitlock=/var/run/ttyAMA0.lock,b9600,iexten=0,raw

This exposes the Raspberry Pi's serial port at TCP port 2000, so you can connect to it over the network from a Windows PC running u-center. You might think you'd then have to use extra software on the Windows side to get a TCP socket to appear as a virtual COM port, but u-center has support for network interfaces built in. Just enter the Raspberry Pi's IP address and the port number, and it will happily connect to the module via socat.

How should transit agencies make their GTFS available?

To many techies, the question of how transit agencies should make their GTFS available might seem like a silly one. They’d reply that obviously the agency should simply post their GTFS to their Web site at a reasonable URL, and make that URL readily available from the agency’s developer resources page.

Unfortunately, it isn’t nearly so simple in the real world. Instead, many agencies hide their GTFS behind a “clickwrap” license, or even require a login to download the feed. In a few particularly bad cases, developers even have to sign an agreement and return it (on paper) to get access to a feed. Some agencies don’t host their own feeds at all, instead depending on sites like the GTFS Data Exchange.

So, what are some best practices for hosting GTFS feeds?

  • Don’t rely on third parties: Think of this in terms of paper maps and schedules. How would riders feel if a transit agency told them to pick up transit maps and timetables not at the agency’s offices or stations, but rather some unrelated third party? If a transit agency has a Web site (as almost all do), then it should be capable of hosting its own GTFS feed. Sure, some agencies will complain about what their content management system “won’t let them do”, or complain that they must go through some arduous process to upload new content, but in 2014 running a Web site is a basic competency for almost any organization. Depending on a third-party site introduces additional risk and additional points of failure.
  • Help developers discover feeds: Developers shouldn’t have to hunt for GTFS feeds–there should be a prominent link on every agency’s homepage. Bonus points for participating in any applicable data catalogs, like these operated by ODOT and MassDOT for agencies in their respective states.
  • No login, no clickwrap: GTFS feeds should be downloadable by any Internet user, without having to log in or accept a license agreement. This is a must-have for being able to automate downloads of updated GTFS feeds, an essential part of any large-scale passenger information system. Don’t make it needlessly hard for developers to use your GTFS feed – if you can’t download it with wget, then you’re just making work for feed users. The only piece of information a developer should need to know to use an agency’s GTFS feed is the URL—a clean, simple URL like
  • Support conditional HTTP GET: GTFS feeds rarely change every day, but it’s still important to get updates as soon as they’re available. But downloading a large feed (some can be 20 MB or more) every day is wasteful. So how can feed consumers stay up-to-date without wasting a lot of bandwidth? Feed producers should support conditional HTTP GET, using either the ETag or Last-Modified headers.

Agencies may balk at some of these recommendations—”But we have to track usage of the feed! But we have to have a signed license agreement!”—but the simple fact is that there are plenty of agencies that get it right. There are plenty of agencies that use a simple, reasonable license, and plenty of agencies that host their GTFS at a stable URL that supports automated downloads. If you demand a signed license agreement, or make developers log in to access the feed, you make it harder for developers to use your data. When you make it hard for developers to use your data in their apps, you make it harder for transit riders to get service information, because many riders’ first stop when they need transit information is a third-party smartphone app.

Apps ≇ frequency

Mobile apps for real-time passenger information are neither approximately nor actually equal to frequency of service. (and yes, “neither approximately nor actually equal to” is the name of the character in the title of this post)

But that doesn’t mean real-time passenger information isn’t valuable. On the contrary, it’s immensely valuable, in the right circumstances. For discretionary riders, who can vary their arrival and departure times, real-time passenger information is valuable. For passengers who have somewhere to wait before the bus or train comes (the proverbial “have another drink before you go”), it’s valuable. But for passengers, particularly transit-dependent passengers, who are trying to mesh the geometry of transit with their complex lives, nothing beats frequency of service.

Consider an example: you will depart Event A at 1:00 PM, and must be at Event B by 2:00 PM. A bus route connects the two, and the trip takes 45 minutes. If the route runs every 15 minutes (or more frequently), you have a good chance of making your second appointment, possibly even if you miss one arrival (or if the trip loses some time en-route). But if the bus runs every 20 minutes? Every 30 minutes? It becomes a game of chance. You might make your appointment, or you might not. Knowing when the bus will come does nothing to change the inexorable geometry of a low-frequency transit network—you may know when the bus is coming, but it’s still not going to get you to your appointment on time.

Some people might say “but you can call or text and push your appointment back!”. Sure, some people can. If you’re fortunate to be in a privileged position where you can dictate other people’s schedules, then you’re all set. But most of us simply have to be where we’re supposed to be, when we’re supposed to be there. So while it may help to know just how late you’re going to be, that neither excuses nor mitigates the impacts.

This is why transit planner and consultant Jarrett Walker says “frequency is freedom”. Sure, apps may help reduce wait time, but if a transit service is simply too infrequent to be useful, discretionary riders won’t ride, and captive riders will suffer.

Additionally, the benefits of real-time passenger information really only become apparent when the information provided to passengers is accurate and reliable. This isn’t a nuts-and-bolts post, so I will refrain from naming particular vendors or transit agencies, but not all real-time information is created equally.

It doesn’t do passengers any good, for example, when they arrive at a bus stop just as their app tells them the bus should be arriving, only to find that the bus departed several minutes prior. Nor does it do them any good to stand at a bus stop (in the cold, in several feet of snow, in the blazing summer heat, etc.), watching an app count down from ten minutes, to five minutes, to one minute, and then back up again, with no bus in sight. When these things happen, passengers become disillusioned. They lose faith in the system. In the short-term, they give up on the bus or the train and call a cab or book an Uber or walk. In the long-term, they begin making plans that allow them to avoid transit—perhaps they even buy a car.

As as software developer, and one who works on real-time passenger information systems, I’m not going to say that apps aren’t good. But I am also a transit rider, and I know there’s a balance. Two Sundays ago, for example, after leaving the Conveyal TRB Welcome Party in Columbia Heights, I walked over to 16th Street to catch an S bus home. OneBusAway told me that the next bus was 18 minutes away—that I’d just missed the previous bus—and there wasn’t a thing I could do about it. App or no app, I was going to sit and wait in the cold for another 18 minutes until the bus arrived. Arguably I should have checked OneBusAway before I left, but that’s precisely what “frequency is freedom” means: it was late, I was tired, and I was ready to go home. Not in another 18 minutes, but now (or maybe in another 5 or 10 minutes, but certainly not 18).

Telling people to plan their lives around a transit app just isn’t a good way to lure them out of their cars or endear them to transit. It’s much more compelling (and leads to a much more usable transit system) when we can simply tell people “show up at a stop, and there’ll be a bus in 10 minutes or less”. It’s also not a problem app developers can solve alone; as the cliché goes, when all you have is a hammer, everything looks like a nail. Providing reliable real-time passenger information is a good first step towards improving the usability of a transit network, and one that is often far less expensive than actually increasing frequency of service. But that doesn’t mean our work is done once the app goes live; on the contrary, we’ve only just begun.

Why “they’re not on NextBus” isn’t the problem it sounds like

Being active in open data for transit and real-time passenger information, one of the complaints I sometimes hear leveled at transit agencies is “They’re not on NextBus!”.

This bothers me. A lot.

Why? There are two reasons. The first is pretty simple. Sometimes, when people say “NextBus”, what they really mean is real-time passenger information, without any concern for the specific provider. But “NextBus” is a trademarked name for a specific proprietary real-time passenger information provider; if what you really mean is “real-time passenger information”, then say so.

The second reason is more pernicious. A lot of people use mobile apps for transit which are designed around the NextBus API. So, they work everywhere that the local transit agency has elected to contract with NextBus for real-time passenger information. On its face, this seems like a huge success for transit riders—one app for dozens of cities! But, it’s not. Vendor lock-in isn’t the way to achieve real transit data integration.

I understand that transit riders love the idea of having a single app for transit information in every city they visit. I’m a transit rider; I get it. But the solution isn’t to get every agency to pay the same vendor to provide the same proprietary service.

There are many AVL vendors out there; INIT, Xerox, Avail, Clever, Connexionz, and more. Some very forward-thinking agencies, like New York’s MTA, have even decided to act as their own system integrator, and build their own real-time passenger information system, so that they’ll never be beholden to any vendor’s proprietary system. Built on top of the open-source transit data platform OneBusAway, MTA Bus Time provides real-time passenger information for New York’s buses using an open technology stack that saved the MTA 70 percent compared to proprietary alternatives.

So with every agency using a different vendor’s system (and some having rolled their own), how do we provide that integrated experience that riders crave? The answer is simple: by using open data standards. With standards like GTFS-realtime and SIRI, app developers can build apps that work with data from any transit agency and any vendor’s systems. With OneBusAway, for example, I can easily (trivially) make use of feeds from any of several DC-area agencies, York Region Transit, MBTA, BART, TriMet, or any of the other agencies who are releasing GTFS-realtime data. Because these agencies are all using standardized formats for their open data, I don’t have to build anything new in OneBusAway to consume their data—the same code that works for one agency works for all of them.

But NextBus doesn’t provide an API using any recognized standards for real-time transit data. It’s a walled garden of sorts; the NextBus API is great if all you want to do is present data from agencies using NextBus, and terrible if you want to use it as a springboard for building revolutionary real-time passenger information tools.

The real question isn’t “why aren’t you on NextBus”; the real question is “why doesn’t NextBus provide a standards-compliant API”?

Synoptic first!

So, you’re a transit agency (or vendor, consultant, system integrator, etc.), and you’ve decided to develop an API to expose your real-time data. Perhaps you’ve gotten queries from developers like “I want to be able to query an API and get next bus arrivals at a stop…”.

It’s hard to say “no”, but fulfilling that developer’s request may not be the best way to go. If you have limited resources available to expose your data, there are better approaches available, which will in the long term enable the development of more advanced applications.
Continue reading Synoptic first!