As a fun data visualisation experiment – and as a way to practice my new found interest in Node.js – I decided to plot on a map of the world all the people who contributed to the recent release of WordPress 3.6. The map can be seen further down, but first a brief description of how I went about it.
I decided to generate a GeoJSON file of the contributor’s locations so it can be displayed wherever and however the open GeoJSON format is supported, not least on GitHub which recently added support for automatic rendering of GeoJSON files.
In order to get the location data for each contributor I would need to scrape it from wordpress.org. In my simple Node script I started by using Node’s HTTP interface to fetch the 3.6 announcement blog post. The content of the post was then loaded into Cheerio which provides a wonderful implementation of jQuery’s core for working with the DOM, and used this to find the list of links in the blog post to wordpress.org profile pages.
Next step was to loop over each profile URL and fetch the page in order to pluck out the location data from the map shown for each user. With 225 contributors, it was clear this was going to be slow. Luckily the async module provides a super simple method for asynchronous iteration, so I used that to asynchronously fetch the profile pages.
The next step was to pluck out the user’s location from the markup using a straight forward regex, and then send this off to Google’s Geocoding API in order to get us the geographic coordinates we’d need. Out of the 225 contributors, 61 didn’t have their location displayed on their profile page.
The final step was to load all the data into the GeoJSON.js module in order to give us a valid GeoJSON file, and then pretty print it using the native JSON.stringify method.
Here’s the end result: