Announcing aerodb

I have, on several occasions, written about using data from the DBpedia project to produce a freely-available database of aerodromes, aerodrome identifiers, and locations. I previously presented a SPARQL query which could be used to perform the necessary extraction from the DBpedia SPARQL endpoint. Now, I am releasing aerodb, a Python project which encapsulates the SPARQL query in a command-line tool, and provides other utilities for working with the data. The raw data extracted from Wikipedia is noisy; some location identifiers are used in more than one article, resulting in duplicates. aerodb includes a file which is used to de-duplicate these entries (which was produced by manually inspecting the linked Wikipedia articles). The README file for the project contains more information (including how Wikipedians can help), so I won’t duplicate all that here.

The final result contains 8,522 aerodromes (based on the data presently available from DBpedia, which will change over time). You can get the results as a JSON, CSV, or KML file from the GitHub downloads page. (The KML file is rather fun to look at in Google Earth; it gets a bit sluggish and has some rendering issues, but works well otherwise.)

(Note: I use the term aerodrome where others might use the more common ‘airport’ as there is a legal distinction in some countries (including Canada), where the term ‘aerodrome’ is most widely applicable—that is to say, aerodb produces a database which contains aerodromes which are, by the Canadian definition, not airports)