How should transit agencies make their GTFS available?

To many techies, the question of how transit agencies should make their GTFS available might seem like a silly one. They’d reply that obviously the agency should simply post their GTFS to their Web site at a reasonable URL, and make that URL readily available from the agency’s developer resources page.

Unfortunately, it isn’t nearly so simple in the real world. Instead, many agencies hide their GTFS behind a “clickwrap” license, or even require a login to download the feed. In a few particularly bad cases, developers even have to sign an agreement and return it (on paper) to get access to a feed. Some agencies don’t host their own feeds at all, instead depending on sites like the GTFS Data Exchange.

So, what are some best practices for hosting GTFS feeds?

  • Don’t rely on third parties: Think of this in terms of paper maps and schedules. How would riders feel if a transit agency told them to pick up transit maps and timetables not at the agency’s offices or stations, but rather some unrelated third party? If a transit agency has a Web site (as almost all do), then it should be capable of hosting its own GTFS feed. Sure, some agencies will complain about what their content management system “won’t let them do”, or complain that they must go through some arduous process to upload new content, but in 2014 running a Web site is a basic competency for almost any organization. Depending on a third-party site introduces additional risk and additional points of failure.
  • Help developers discover feeds: Developers shouldn’t have to hunt for GTFS feeds–there should be a prominent link on every agency’s homepage. Bonus points for participating in any applicable data catalogs, like these operated by ODOT and MassDOT for agencies in their respective states.
  • No login, no clickwrap: GTFS feeds should be downloadable by any Internet user, without having to log in or accept a license agreement. This is a must-have for being able to automate downloads of updated GTFS feeds, an essential part of any large-scale passenger information system. Don’t make it needlessly hard for developers to use your GTFS feed – if you can’t download it with wget, then you’re just making work for feed users. The only piece of information a developer should need to know to use an agency’s GTFS feed is the URL—a clean, simple URL like http://www.bart.gov/dev/schedules/google_transit.zip.
  • Support conditional HTTP GET: GTFS feeds rarely change every day, but it’s still important to get updates as soon as they’re available. But downloading a large feed (some can be 20 MB or more) every day is wasteful. So how can feed consumers stay up-to-date without wasting a lot of bandwidth? Feed producers should support conditional HTTP GET, using either the ETag or Last-Modified headers.

Agencies may balk at some of these recommendations—”But we have to track usage of the feed! But we have to have a signed license agreement!”—but the simple fact is that there are plenty of agencies that get it right. There are plenty of agencies that use a simple, reasonable license, and plenty of agencies that host their GTFS at a stable URL that supports automated downloads. If you demand a signed license agreement, or make developers log in to access the feed, you make it harder for developers to use your data. When you make it hard for developers to use your data in their apps, you make it harder for transit riders to get service information, because many riders’ first stop when they need transit information is a third-party smartphone app.