Cheap airport data

Let’s say you wanted to have a database of airports, including each airport’s ICAO and IATA codes, plus the airport’s name and location. You could pay ICAO $185.00 for a paper copy of Document 7910, Location Indicators. That gets you a 252-page tome which is probably not that useful for programmatic lookup. So, you could pay $945.00 for an annual subscription to the online version of Document 7910.

Or, you could execute a fifteen-line SPARQL query against the DBpedia SPARQL endpoint:


PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX grs: <http://www.georss.org/georss/>
SELECT ?name ?icao ?iata ?coordinates ?airport
WHERE {
    ?airport rdf:type <http://dbpedia.org/ontology/Airport> .
    ?airport dbo:icaoLocationIdentifier ?icao .
    FILTER regex(?icao, "^[A-Z0-9]{4}$")
    ?airport dbo:iataLocationIdentifier ?iata .
    FILTER regex(?iata, "^[A-Z0-9]{3}$")
    OPTIONAL { 
        ?airport rdfs:label ?name
        FILTER ( lang(?name) = "en" )
    }
    ?airport grs:point ?coordinates .
}

That gets you around 2000 airports, their ICAO and IATA codes, and their locations. The usual disclaimers concerning Wikipedia data apply, but it’s not bad for a free dataset. It would be great if ICAO would release their data for free, and even better if they’d subscribe to the W3C’s Linked Open Data principles, but until that happens, this is a good substitute. Not only that, but by being based on Wikipedia data, end-users can play a direct role in maintaining data quality. See bad data, fix it in Wikipedia, and see the updates in DBpedia and your own applications. For example, some airports have invalid values for their ICAO and/or IATA codes (which necessitates the two FILTER clauses in the SPARQL above). By inverting those two filters, you can generate a list of airports with invalid ICAO and/or IATA codes in their Wikipedia entries. You can then take that list, and go back to Wikipedia and make the necessary updates (or verify that updates have been made since March 2010, when the current DBpedia dataset was extracted from Wikipedia). It’s my understanding that DBpedia will at some point in the future move to a live-update model, where updates to Wikipedia will be funneled through the extraction process and into DBpedia on a real-time basis.

3 thoughts on “Cheap airport data”

  1. Thanks for this post! We very much encourage this “correct and improve Wikipedia content” approach to concerns with the data found in DBpedia.

    DBpedia will indeed be moving to take (roughly — there is some latency even when working against the firehose Wikipedia update stream) real-time updates from Wikipedia. You can see this in action on the staging version, at http://dbpedia-live.openlinksw.com/sparql. (Note that the staging version is running on a lower-powered configuration than the main DBpedia, so please be considerate in your queries, and realize that performance will improve when this goes fully live.)

Comments are closed.