Tracking Metrobuses live in Google Maps, archiving historical data

Note: the map URLs in this post are no longer live. I’m working on an alternative, and will update this post when it becomes available.

A while ago, I posted about a script to produce a KML feed containing live Metrobus position data, based on WMATA’s bus positions API. At the time, I didn’t have a good demo to show it off; if you wanted to see the output you had to run the script yourself.

Now, though, I have a good demo ready, along with what will eventually become a long-term data repository. I already had an Amazon EC2 micro instance running for another project, which I’ve repurposed for this new initiative. What I’ve put together is an evolution of the previous script which now fetches the bus positions from WMATA every two minutes and produces KML and GeoJSON feeds from them. It also archives each XML response from WMATA to an Amazon S3 bucket, enabling future research on the historical data (like independently recalculating OTP by correlating GPS tracks with expected positions derived from trips in the GTFS feed).

The KML feed can be accessed at http://kgr-buspos.s3-website-us-east-1.amazonaws.com/latest.kml, and the GeoJSON file can be accessed at http://kgr-buspos.s3-website-us-east-1.amazonaws.com/latest.json. However, for KML viewers that support network links with interval-based refresh (including Google Earth), you may prefer to use http://kgr-buspos.s3-website-us-east-1.amazonaws.com/index.kml, which will automatically refresh the feed. You can also click here to view the feed in Google Maps.


View Larger Map

You could do considerably more thematic mapping here; the API reports a value for “deviation” which is supposed to be the number of minutes the bus is ahead of or behind schedule. However, I don’t know how reliable this value is (unless some buses really do routinely run an hour behind or ahead), so for now I’m not doing anything with it. If the deviation value turns out to be reliable, then I think it would be interesting to correlate it with location and time of day, to see where and when buses lose the most time.

The usual disclaimers about the data apply: if the bus is operating with the wrong headsign, then more than likely it’ll have the wrong headsign in the data, too. The data may be two minutes old; maybe older. If the data tell you that there should be a bus right outside, and you look out the window and there’s no bus there, don’t be surprised. WMATA’s AVL system is old—according to some reports, there are still buses in the fleet with no AVL hardware onboard—and even for those buses with AVL hardware onboard, it can still be many minutes between position reports. Finally, I should also point out that the feed is unsupported and the data is supplied with no warranty.

If you’d like information on the historical data I’m now archiving, please get in touch. I’d be happy to work with other developers to do more with the data.