Tuesday, July 2, 2013

A Hack for Scaling Your Hex Map

The gist: do your aggregating in the regular old hex polygons, then do yourself a favor and convert them to point symbols (that look like hexagons).

Some time back I made a map showing the geographic frequency of traffic fatalities and their proportion of alcohol involvement, over a ten-year period. It was mapped within a hexagonal mesh where the sizes of the hexagon shapes corresponded to the overall count of fatal crashes at that location and the colors corresponded to their rate of alcohol involvement. Anybody who has made a scaled hex map knows it's not entirely straightforward to make polygons grow or shrink based on data.  So I cheated and didn't scale areas at all -the trick is to use point symbols in the end-stage of the visualization.  Skip to the Here's How section for a nerdy walk-through of the process.

The Problem
FARS traffic fatality data is invaluable, both as an example of the benefits of open data and as a tool for preventing the phenomena it describes through greater understanding.  But for cartographers, visualizing this data effectively can be a challenge because of the heavily-overlapping nature of crashes.  So many fatalities get covered up by other nearby fatalities that places with a truly high-frequency were not standing out as they should -a shame especially considering the gravity of each incident.  On the other hand, counting up the incidents into political zones like counties wasn't ideal because of the wildly inconsistent sizes of counties accross the country -playing havoc with the visual weight of the visualization.  Also counties are an awfully arbitrary unit to roll this data into (most things don't care what state or county they happened in), coupled with an inconsistent loss of precision.  Click here to read more about why I increasingly hate county choropleths.

Bi-variate or Bust
But my greatest challenge was showing two key bits of information at once: the overall frequency of traffic fatality events AND the local rate of incidents that involved alcohol.  Either one on its own would be a really flimsy map (mapping only frequency would subject me to the now-overused XKCD critique and mapping only drunk rates would unfairly inflate the importance of places having only a few incidents).
If, in your map, you don't need the size of the hexagons to vary at any one time, then you can stop here and read Kenneth Field's excellent ArcMap tutorial.  He's also teaching a session on how to make multivariate hex bin maps which you may want to check out.  Also, find lots of theory and examples at Indiemaps' blog including a programmatic (putting it out of my reach) approach to scaling hex shapes.

Endlessly fascinated by the innovative work of Kirk Goldsberry, whose scaled hex-meshes of basketball performance are blowing minds in the sport analytics world (and mainstream news outlets), I wondered how a hexagonal mesh, like the one he uses to segment the court, could be used to help me sort out the visualization problem that I was having.
A fantastic aspect of a hexagonal mesh is that you can benefit from the strengths of mapping with areas (aggregation for comparison in this case) without the burden of the inconsistent and almost-arbitrary shapes and sizes of political boundaries.  Hexagonal meshes are politically agnostic and spatially regular -and I think cool looking (in 2012, anyway).

Here's How
  • Download the source data. I downloaded the "accident.dbf" data table from each year of the FARS database, going back ten years, and lumped them all together into a single ginormous CSV file. I did that by pasting each year's rows into an Excel table then saving it as a CSV (alternatively, you could jump to the next step, creating a geographic point file for each year then merge those layers together).
  • Convert accidents to geographic points.  In QGIS (Quantum GIS is an open-source GIS tool based on GRASS) I added the accidents CSV file via the "Add Delimited Text Layer" dialog (the icon looks like a blue page with commas at the bottom) because the data has a column for latitude and a column for longitude.  Also, remember to use a projection that preserves relative area so your zones are consistent in size (remember, this is pretty much the whole point) before you move on to the next step.  I used Albers Equal Area.
  • Create a hexagonal mesh.  Also in QGIS there is a feature that lets you generate a grid layer (Plugins > mmqgis > Create > Create Grid Layer). I chose a hexagonal grid but there are other options like rectangles or diamonds (maybe diamonds will be the new hexagons in 2013?). I made my hexagon zones about 10 kilometers wide because it seemed like a nice aggregation size for this data -big enough to contain several incidents, generally, and small enough to show a little nuance around metropolitan areas.

    I thought a hexagon with a 10 kilometer diameter was a nice fit between problematic precision and over-generalization.

  • Count up incidents within your new hexagonal areas. There is a function in QGIS where you can count up the points from one layer that fall within the polygons of another layer (Vector > Analysis Tools > Points in Polygon). I counted up all incidents and then only the incidents that involved alcohol. In the attributes table I created a new column and calculated the ratio of alcohol incidents to total incidents. This ratio is the number that I will use to color code the hexes. The total incidents count is what I'll use to determine size, a little later on.

    Count up incidents within the hex zones.
Now the sneaky, counterintuitive, bit...
  • Convert your hexagon areas into points. Because scaling polygons is a real hassle in mapping, and because your areas are perfectly regularly spaced, you can convert them to a points layer and have a way easier time manipulating their appearance. I had pretty much given up on this project until this idea came to me late one night.  'That's how Goldsberry does it!' I shouted to no one. Anyway, convert your polygon file to a point file, by centroid.  In QGIS you can use this tool: Vector > Geometry Tools > Polygon Centroids.

Those hexagonal polygons have served their aggregation purpose.  Now turn them into points (with hexagon shaped symbols)!

  • Color-code by alcohol rate. Applying colorized range breaks to symbols is a pretty straightforward job in any GIS package.   At this point I moved over to ArcMap because I have to export the results as an image at the end and I've found no better tool for that.  I used a hexagon-shaped point symbol and scaled them so that they appear contiguous.  The coloration corresponds to the proportion of alcohol-related incidents that I'd calculated.  But the problem of leaving the map like this is areas with overall low numbers of incidents look just as important as areas with lots of incidents, -I really should scale them by overall count so that they are given a meaningful visual weight.

    Points that look like polygons (colored by alcohol rate).  The lengths I go to because I can't code.  But the equal size (and visual weight) of all symbols is problematic and misleading.

  • Separate scale ranges into individual layers. The easiest, most flexible, way I've found to control both symbol size AND symbol color (two simultaneous visual dimensions) is to separate the layer into several different tiered layers corresponding to one dimension (total count) where I control size, and each of those layers has the same color-break rules corresponding to the other dimension (alcohol rate).  This post describes programmatic ways of doing multivariate hexbins or using an advanced "size" feature in ArcMap but I wanted more flexibility.

    Separate your point layers into individual tiers for flexible scaling.

  • Add really basic reference layers for context.  The whole point of this is to be able to spot neighborhood-sized areas to see how they fare from the perspective of the frequency of traffic fatalities and their proportion of alcohol involvement.  So strike a balance with your context: don't strand your hexes in the middle of nowhere but don't lob them onto a busy basemap with irrelevant detail.  I found that a macroscopic reference layer of states gives a good at-a-glance indication of what is where, and a spartan major-road network takes readers the rest of the way.  It can be tempting to add in more but maps very quickly approach the domain of diminishing (or damaging) marginal value with additional layers, where the meaning of the message is diluted or distracted by crap tons of unnecessary stuff, thoroughly illustrated by Brian Timoney.  I exported the hexes, states, and roads as individual images.  ArcMap's map export options allow for seriously high resolution and transparency -a must for stacking layers externally. I merged them (and added a legend) in the Gimp.
Anyway, try the poly to points trick if scaling your grid zones is a hassle.  Or, if you are really adventurous, try binning your lat/long data right in Excel in about 30 seconds. Happy lumping!

1 comment:

  1. Hi All,
    I got this great email from Stuart Anderson at the Nerdery. He describes a method using inverse buffers to get multi-variate (color AND size) hex binning entirely in QGIS. Here is his email in its entirety:

    I found a way to get multi-variate hex-binning entirely in QGIS. The trick was using a combination of Negative Buffers and then Clipping the results.

    tl;dr here's the image set I made on imgur:

    For the sake of example, let's start with making a bunch of hexagons in mmqgis like you did:

    Great, now we have a bunch of hexagons. For the sake of example, I needed to create some fake data for each of the polygons. I decided to make two new columns: 1 is full of arbitrary data, and the other is a percentage (also fake). You can see how I calculated it here:

    Good. Now we can move along with our example. We want to track both columns as size and color because we want to be cool. We're going to use a qgis plugin called "Buffer By Percentage" (oh and I'm using QGIS 2.0 for this). Run this plugin and save the shape.

    But OH NOES! Our new hex bin has no data! Where is all our fake data? What-ever shall we do?

    Use [ITEM] Vector/Geoprocessing Tools/Clip! The data-layer is your input.

    Great! Now you have a new layer that is the size you want and STILL has all your datas!

    Now style away!

    Again, whole image set here:

    Hope this helps all ya'll out!
    Stuart Anderson
    Software Engineer and a Co-President
    The Nerdery