Monday, July 29, 2013

A Breathing Earth

Here's a view looking at one year of seasonal transformations on Earth.  Made possible by the tremendous folks of the NASA Visible Earth team, I downloaded the twelve cloud-free satellite imagery mosaics of Earth ("Blue Marble Next Generation") at each month of the year.  I wrapped them into some fun projections then stitched them together into a couple animated gifs...

Click here to see the large version (1.4 MB).
Click here to see the bonkers version (3.7 MB).
Don't click this small version or this tiny version.

Click here to see the large version (3 MB).
Click here to see the irrationally large version (8.9 MB).
Don't waste your time with this boring small version or this lame tiny version.

I of course had some expectation of what I would see as a result of animating these frames.  But I didn't expect to be so mesmerized by them.  I can't look away.

Having spent much of my life living near the center of that mitten-shaped peninsula in North America, I have had a consistent seasonal metronome through which I track the years of my life.  When I stitch together what can be an impersonal snapshot of an entire planet, all of the sudden I see a thing with a heartbeat. I can track one location throughout a year to compare the annual push and pull of snow and plant life there, while in my periphery I see the oscillating wave of life advancing and retreating, advancing and retreating.  And I'm reassured by it.

Of course there are the global characteristics of climate and the nature of land to heat and cool more rapidly than water.  The effects of warm currents feeding a surprisingly mild climate in the British Isles.  The snowy head start of winter in high elevations like the Himalayas, Rockies, and Caucuses, that spread downward to join the later snowiness of lower elevations.  The continental wave of growing grasses in African plains.

But, overall, to me it looks like breathing. And my pixel is right at an interesting intersection of life and ice, where the longest night of the year feels like forever, and the longest day of the year is a like a battery strapped to my back. My winter was especially dark. And my summer has been full of blessings -but I don't think either extreme would have been as memorable without the helpful (or painful) contrast of its opposite -all made possible by a 23.5° tilt.

Thursday, July 11, 2013

Language and Color

In a previous project I wondered if maybe a whipped up mix of the images that Google searches show me could be used as a cheap first-pass for testing color theory. Google search results are sort of a zeitgeist for any given term -and an image search is a portrait of that.

Because Google has language/culture variants for all over the world, I thought it would be a fun project to see how the zeitgeist-y image portraits might compare to each other across different cultures. This notion was in no small way inspired by the terrific and more rigorous work of David McCandless.

The result is an array of hues tracking various concepts (design, art, music, math, science, and philosophy) through five different languages.  You can pick a language and track across the six terms, or you can pick a term and compare it's coloration down the five languages.  Or whatever.

It's worth pointing out that almost all of these quilts ("quilts" are the snapshotted images of a Google Image search results page, coined by ET) have a multi-modal distribution of color.  That is, they may have a big bump around orange (due to the frequency of humans as the subject of so many images) and a big bump around blue -and the aggregation of those colors could result in green, which is sort-of nonsense.  So, temper your interpretation of the fully aggregated color palates with the little color histograms to their right.

Maybe a more meaningful illustration of cultural associations of colors to terms would be to aggregate the average of each term's colors across all countries then calculate where along the spectrum specific cultures tend to deviate from that "global" average. Yeah, that would rule and somebody else should totally do that.

Anyway, have fun looking at this and thanks for stopping by.  Below are various elements of this graphic if you'd rather look at it that way, with some wildly anecdotal commentary...

Warm red-shift tones abound. Fleshy portraits were common in the Chinese quilt, Hindi favored statuary or groups of people, Arabic images favored sculptures -frequently scenes of carved sand, Russian and English images were commonly bright abstract paintings.

Only the Chinese quilt shows a big departure from the crowd. Orange tones so conspicuous in just about all other tiles are way reduced for the Chinese term for design -frequently comprised of clean digital layouts (this could be an artifact of the translated term). Hindi is predominantly portraits.  Arabic is predominantly warm landscapes, often with poetry or word art. The Russian quilt is almost entirely interior design images (certainly a less-than-perfect translation choice on my part).  English is predominantly light and colorful desktop background image stock.

Another departure for the Chinese language quilt. The much greater proportion of blue in the images has pushed the overall average over to the cooler green.

All languages were fond of tagging chalkboard pictures but English image-taggers especially so.

Science scored the biggest counts for cool blue colors because of the common presence of dark, blue, illustrations of microscopic or cosmic scenes.  Except from Hindi and Arabic languages.  Hindi images tended to show teams of scientists at work, explaining the regression to flesh tones. The Arabic images tended to be more frequently clipart illustrations. Maybe because of the variability in the translated terms?

Overwhelmingly earthen in tones.  Chinese and Arabic images tended to be snapshots of texts on aged paper, Hindi images were almost entirely portraits of people or deities, Russian images tended to be diagrams and illustrations (in earth tones), while English images were full of statues.

Tuesday, July 2, 2013

A Hack for Scaling Your Hex Map

The gist: do your aggregating in the regular old hex polygons, then do yourself a favor and convert them to point symbols (that look like hexagons).

Some time back I made a map showing the geographic frequency of traffic fatalities and their proportion of alcohol involvement, over a ten-year period. It was mapped within a hexagonal mesh where the sizes of the hexagon shapes corresponded to the overall count of fatal crashes at that location and the colors corresponded to their rate of alcohol involvement. Anybody who has made a scaled hex map knows it's not entirely straightforward to make polygons grow or shrink based on data.  So I cheated and didn't scale areas at all -the trick is to use point symbols in the end-stage of the visualization.  Skip to the Here's How section for a nerdy walk-through of the process.

The Problem
FARS traffic fatality data is invaluable, both as an example of the benefits of open data and as a tool for preventing the phenomena it describes through greater understanding.  But for cartographers, visualizing this data effectively can be a challenge because of the heavily-overlapping nature of crashes.  So many fatalities get covered up by other nearby fatalities that places with a truly high-frequency were not standing out as they should -a shame especially considering the gravity of each incident.  On the other hand, counting up the incidents into political zones like counties wasn't ideal because of the wildly inconsistent sizes of counties accross the country -playing havoc with the visual weight of the visualization.  Also counties are an awfully arbitrary unit to roll this data into (most things don't care what state or county they happened in), coupled with an inconsistent loss of precision.  Click here to read more about why I increasingly hate county choropleths.

Bi-variate or Bust
But my greatest challenge was showing two key bits of information at once: the overall frequency of traffic fatality events AND the local rate of incidents that involved alcohol.  Either one on its own would be a really flimsy map (mapping only frequency would subject me to the now-overused XKCD critique and mapping only drunk rates would unfairly inflate the importance of places having only a few incidents).
If, in your map, you don't need the size of the hexagons to vary at any one time, then you can stop here and read Kenneth Field's excellent ArcMap tutorial.  He's also teaching a session on how to make multivariate hex bin maps which you may want to check out.  Also, find lots of theory and examples at Indiemaps' blog including a programmatic (putting it out of my reach) approach to scaling hex shapes.

Endlessly fascinated by the innovative work of Kirk Goldsberry, whose scaled hex-meshes of basketball performance are blowing minds in the sport analytics world (and mainstream news outlets), I wondered how a hexagonal mesh, like the one he uses to segment the court, could be used to help me sort out the visualization problem that I was having.
A fantastic aspect of a hexagonal mesh is that you can benefit from the strengths of mapping with areas (aggregation for comparison in this case) without the burden of the inconsistent and almost-arbitrary shapes and sizes of political boundaries.  Hexagonal meshes are politically agnostic and spatially regular -and I think cool looking (in 2012, anyway).

Here's How
  • Download the source data. I downloaded the "accident.dbf" data table from each year of the FARS database, going back ten years, and lumped them all together into a single ginormous CSV file. I did that by pasting each year's rows into an Excel table then saving it as a CSV (alternatively, you could jump to the next step, creating a geographic point file for each year then merge those layers together).
  • Convert accidents to geographic points.  In QGIS (Quantum GIS is an open-source GIS tool based on GRASS) I added the accidents CSV file via the "Add Delimited Text Layer" dialog (the icon looks like a blue page with commas at the bottom) because the data has a column for latitude and a column for longitude.  Also, remember to use a projection that preserves relative area so your zones are consistent in size (remember, this is pretty much the whole point) before you move on to the next step.  I used Albers Equal Area.
  • Create a hexagonal mesh.  Also in QGIS there is a feature that lets you generate a grid layer (Plugins > mmqgis > Create > Create Grid Layer). I chose a hexagonal grid but there are other options like rectangles or diamonds (maybe diamonds will be the new hexagons in 2013?). I made my hexagon zones about 10 kilometers wide because it seemed like a nice aggregation size for this data -big enough to contain several incidents, generally, and small enough to show a little nuance around metropolitan areas.

    I thought a hexagon with a 10 kilometer diameter was a nice fit between problematic precision and over-generalization.

  • Count up incidents within your new hexagonal areas. There is a function in QGIS where you can count up the points from one layer that fall within the polygons of another layer (Vector > Analysis Tools > Points in Polygon). I counted up all incidents and then only the incidents that involved alcohol. In the attributes table I created a new column and calculated the ratio of alcohol incidents to total incidents. This ratio is the number that I will use to color code the hexes. The total incidents count is what I'll use to determine size, a little later on.

    Count up incidents within the hex zones.
Now the sneaky, counterintuitive, bit...
  • Convert your hexagon areas into points. Because scaling polygons is a real hassle in mapping, and because your areas are perfectly regularly spaced, you can convert them to a points layer and have a way easier time manipulating their appearance. I had pretty much given up on this project until this idea came to me late one night.  'That's how Goldsberry does it!' I shouted to no one. Anyway, convert your polygon file to a point file, by centroid.  In QGIS you can use this tool: Vector > Geometry Tools > Polygon Centroids.

Those hexagonal polygons have served their aggregation purpose.  Now turn them into points (with hexagon shaped symbols)!

  • Color-code by alcohol rate. Applying colorized range breaks to symbols is a pretty straightforward job in any GIS package.   At this point I moved over to ArcMap because I have to export the results as an image at the end and I've found no better tool for that.  I used a hexagon-shaped point symbol and scaled them so that they appear contiguous.  The coloration corresponds to the proportion of alcohol-related incidents that I'd calculated.  But the problem of leaving the map like this is areas with overall low numbers of incidents look just as important as areas with lots of incidents, -I really should scale them by overall count so that they are given a meaningful visual weight.

    Points that look like polygons (colored by alcohol rate).  The lengths I go to because I can't code.  But the equal size (and visual weight) of all symbols is problematic and misleading.

  • Separate scale ranges into individual layers. The easiest, most flexible, way I've found to control both symbol size AND symbol color (two simultaneous visual dimensions) is to separate the layer into several different tiered layers corresponding to one dimension (total count) where I control size, and each of those layers has the same color-break rules corresponding to the other dimension (alcohol rate).  This post describes programmatic ways of doing multivariate hexbins or using an advanced "size" feature in ArcMap but I wanted more flexibility.

    Separate your point layers into individual tiers for flexible scaling.

  • Add really basic reference layers for context.  The whole point of this is to be able to spot neighborhood-sized areas to see how they fare from the perspective of the frequency of traffic fatalities and their proportion of alcohol involvement.  So strike a balance with your context: don't strand your hexes in the middle of nowhere but don't lob them onto a busy basemap with irrelevant detail.  I found that a macroscopic reference layer of states gives a good at-a-glance indication of what is where, and a spartan major-road network takes readers the rest of the way.  It can be tempting to add in more but maps very quickly approach the domain of diminishing (or damaging) marginal value with additional layers, where the meaning of the message is diluted or distracted by crap tons of unnecessary stuff, thoroughly illustrated by Brian Timoney.  I exported the hexes, states, and roads as individual images.  ArcMap's map export options allow for seriously high resolution and transparency -a must for stacking layers externally. I merged them (and added a legend) in the Gimp.
Anyway, try the poly to points trick if scaling your grid zones is a hassle.  Or, if you are really adventurous, try binning your lat/long data right in Excel in about 30 seconds. Happy lumping!