Last night, while working on another project, I found that MTA New York City Transit had posted archived A Division ATS data for May 2011. Each file in the archive contains a single service day's worth of train movements, with events logged each time a train arrives at or departs from a station in area tracked by ATS.

As soon as I saw the data, I knew I wanted to develop some kind of visualization with it; the result is the video above (best viewed large and in high definition). In the video, left-hand half-circles represent southbound trains, while right-hand half-circles represent northbound trains, and the points are colored according to route colors. Each frame represents 30 seconds of real-time data, so one second of video is 900 seconds of data.

The data covers one 'service day'—in this case, May 31, 2011. Trains which entered service on May 31 and which are still in service after midnight are still considered part of the May 31 service day, so as a result there's 27 hours of data for May 31. In reality, after midnight, trains scheduled for the June 1 service day would have begun to enter service, so for times after midnight this is not a complete picture of every train in the system.

The video was produced with two Python scripts and a Processing sketch, which I've posted on GitHub.

The first Python script pre-processes the raw ATS data into a list of frame numbers, coordinates, directions, and colors. Stop locations and route colors are taken from the GTFS feed, using the gtfs library. This was done to minimize the amount of work that needed to be done in Processing.

The second Python script takes a shapefile of borough boundaries (from the New York City Department of City Planning's BYTES of the BIG APPLE) and converts it from the ESRI shapefile format to a text file for Processing, and reprojects the data into WGS84. This is done with GDAL and pyshp.

The Processing sketch reads in the pre-processed data files, plots the points (using OSMMercator for projection), and saves the result as a movie.

(Note: Before developing this visualization, I tried transforming the data to view it in Google Earth. That approach didn't work (too much data for Google Earth), but you can see the code anyway.)