Wednesday, October 5, 2011

Hot Spot Mapping in R: Illustrating Relative Seasonal Risk

In recent months, IDV has taken steps to incorporate the powerful statistical engine, R, as a viable connection to Visual Fusion.  R has a robust and growing set of libraries and a community that is constantly thumping away on improvements.  Notably, the spatial statistics enabled by many of these libraries are legion.

Below is a set of maps that use the GAM method to identify geographic clusters and map out seasonal hot spots of traffic fatalities in the Great Lakes region.  Each season's incidents were separated as the test population relative to the full set of alcohol-related traffic accidents year-round.  In this way, time is the variable of interest.

These R plots are not heatmaps of those traffic fatalities within various seasons, but rather heatmaps of areas where traffic fatalities are disproportionately high compared to the rest of the year...

A chunky animation of four seasonal hot-spot maps -each isolating alcohol-related incident spikes within specific seasonal slices -in relation to the whole year.  Images like this don't necessarily tell us why events unfold as they do, but they allow us to hone in on more specific question-asking.

Springtime is disproportionately prone to have alcohol-related traffic fatalities in Western Wisconsin and Metro Chicago.

Areas with proportionately high summertime risk include Northeastern Minnesota,  Eau Claire, Wisconsin, and Highway 24 east of Fort Wayne.

Illinois's Interstate 57, Akron, OH, Harrisburg, PA, and Minnesota's Chippewa National Forest show Autumn hot-spots.

Contrary to my expectations, more northern areas are conspicuously light on a specifically-wintertime threat.  The Catskills of Eastern New York, the St. Louis area of Western Illinois, and, particularly, Columbus, OH, have an elevated wintertime risk.
Thanks to Ashton Shortridge, and Brian Eustice for R support!


Why is this flavor of hot spot mapping more meaningful than a simple heatmap?  As is the case with most social science data, the subtle trends of the phenomena of interest are easily steamrolled by the inherent overall trends of the population.  For example, if the maps above were simple heatmaps of traffic incidents that occurred at various times of the year, they would all look approximately identical and would essentially be a heatmap of population.  When the data is normalized by the underlying trend (in this case seasonal incidents were normalized by year-round incidents) the actual anomalies reveal themselves.

This sort of normalized hot-spot mapping (higher-than-usual-edness) is embraced by epidemiologists and oncologists, though it would benefit the retail and security industries greatly.

Epidemiology
Consider a team tracking troublesome areas for a spreading flu.  If they generate a simple heatmap or frequency map, they are really only looking at an approximate map of population.  But if they can statistically remove the underlying bias by using the overall population at risk as a denominator, then the resulting ratio has removed that bias and will show you where incidents of the flu are proportionally high.

Oncology
Likewise, a cancer researcher may look at a map of prostate cancer cases and essentially be looking at a map of where old men live.  If one were to use an actual map of old men as a normalizer, then the result might be an actual map of where rates of prostate cancer is proportionately high.  When actual cases are normalized by the specific population at risk (or a reasonable proxy), then you can hone in on a very specific trend.  Hot-spot analysis illuminates anomalies, which illustrate where potential causes might be lurking (remember, correlation doesn't indicate cause!), or at least where resources may be most effectively and efficiently targeted.

Crime & Security, Retail, Logistics, Geology...
These uses can easily be extrapolated to incidents of specific incidents of crime, or consumer behavior, or just-in-time logistics, or any subtle phenomenon who's distinct patterns might be clouded by the overall population mass.
Consider a map of crime incidents.  A raw map may look almost identical to a population map, because crime happens where people are.  But if you were to normalize by population, the result may be an informative indication of where crime rates are disproportionate to the population.
Hot-spot mapping drives clarity.

1 comment:

  1. Hi. These are great. Where can we find the R code, please?

    Thanks!
    Erin

    ReplyDelete