Passively open, actively closed

What do you think of when you hear “open data”? Do you think of hackathons, APIs, data catalogs, perhaps partnerships with Socrata or Mashery, etc.? Do you think of clean data in well-defined formats with ample developer documentation?

Not all open data looks like that. Take Amtrak’s new “interactive train locator map”, for example. You might not know it, but that map is powered by a public dataset stored in Google Maps Engine. As Google’s documentation explains:

There’s an ever-growing number of public datasets available in Google Maps Engine for use by developers in their map or data visualization applications. You may retrieve this data with a simple HTTP request; no authorization is required, and authentication is accomplished through the use of an APIs Console key.

These data, then, are passively open. They are, on a technical level, available for creative reuse, innovation, and incorporation into new transformative projects. But there’s no fancy developer portal, no hackathon, no documentation. The openness of the dataset is more a side effect of having elected to host it in Google Maps Engine than a conscious decision. Once you get the map data, it’s up to you to figure out how to use it—and as for a developer community, well, you’re on your own. It’s not the end of the world, though—in the case of this dataset, it’s mostly self-documenting, and it’s not too hard to build transformative applications with the data.

Unfortunately, sometimes datasets which could easily be treated as passively open are instead made actively closed. Take, for example, GO Transit’s GO Tracker application. The Web application is powered by an XML data feed containing the real-time train data, which would make a great example of a passively open dataset. Instead, it is actively closed to innovation, development, and creative reuse. Try accessing the underlying XML feed outside of the GO Tracker application, and you’ll see that they employ technical measures to control access to the feed. While you could spoof the necessary HTTP headers to gain access, that’s not the sort of thing that comports with open data.

Open data doesn’t necessarily require any special effort. Where there are already APIs and data feeds powering Web applications, all that is required is to allow outside developers to access those same resources. In fact, as in the case of GO Transit, it often takes more effort to shut out developers, building access controls around what would otherwise be easily-reusable open data.