Looking at PATH’s service with GTFS data

About an hour ago, @PATHTweet announced the availability of a GTFS feed for PATH service. Naturally, I was interested, having once been a regular rider of PATH, so I downloaded the feeds and set out to apply tph.py, my analysis tool from the last post, to the data. The first thing I found was that the feed didn't include headsign values (which is odd, because PATH trains do have headsigns). Anyway, valid headsign values were one of the requirements for tph.py. Not to be deterred, I added a workaround which infers a headsign value from the last stop of a given trip, which should be good enough for most applications. Other than that, the feeds worked out-of-the-box. So, without further ado, here's the output:

Click on the image for a larger version, or here for a full-resolution PDF.

This first plot is from Hoboken, my former home station, and it shows that with the exception of the overnight hours, service is almost evenly split between the two terminals. The next plot is from the World Trade Center, and I have to admit that I was surprised to find that service levels weren't higher, but I suspect that with so many tracks out of service at World Trade Center (it's supposed to be a five-track station) it would be hard to send more trains through there, and on top of that, I've heard that Newark can't really handle any more trains in the peak, either.

Click on the image for a larger version, or here for a full-resolution PDF.

Finally, rounding out the PATH system, here are two plots for 33rd Street and Newark:

Click on the image for a larger version, or here for a full-resolution PDF.

You might expect to see more service at Newark, but the problem as I understand it is that the relay operation at Newark is inefficient—trains unload on the upper-level platform, then they have to run way out behind the station on the tail tracks to relay, and then return on the lower platform. This is nowhere near as efficient as the terminal operation at 33rd Street, Hoboken, or World Trade Center (which, by virtue of being on a balloon loop, makes a great terminal). In addition, while it's not unheard of to run 30 trains per hour on a two-track line, remember that the Newark-World Trade Center route shares the tracks everywhere but at Newark and Harrison.

In any event, this is about more than just pretty pictures; the real point here is that this level of flexibility would not have been possible without transit agencies releasing their data using open standards. I've now been able to apply the tool to data from four transit agencies, with only a few minor tweaks to accomodate variations in how agencies generate their GTFS data. The end result is a flexible tool which can be applied to a GTFS feed to get a quick overview of service levels and span-of-service, as well as to see how multiple routes combine to provide increased service at certain points.