Tuesday, December 11, 2012

Five Years of Traffic Fatalities

Every year the National Highway Traffic Safety Administration releases a massive set of data collected for each recorded vehicle crash where there was a fatality.  They do it to allow groups to perform their own analysis with the goal of increasing awareness and ultimately decreasing fatalities.
Nathan Yau did a fascinating segmentation of an entire year to illustrate some insightful trends.  Inspired by Nathan's work, I wondered if there were more or different patterns of risk that might repeat themselves over a longer period, so I downloaded five years of it to get a big enough set to look at what time of the day or night is especially prone to fatal traffic incidents from a day-of-the-week perspective and also from a time-of-year perspective.
Finally, I did some basic segmentation mapping to get a rough geographic sense of where rates of different sorts of accidents were more or less likely to occur (take these with a grain of salt).

Structure abounds in traffic fatality data.

Incidentally, every one of these charts is a simple Excel pivot table -including the so called maps.  More on fake mapping here.  Click on the image to get a readable version.

5 comments:

  1. is it possible that you share the excel file with us? I just can`t seem to download the data, so it is impossible for me to recreate your output...
    Hopoe you can help me out...

    ReplyDelete
    Replies
    1. Yes, here is the file I used to create the pivots. You will see some added fields to get simple yes/no statuses of various factors. Also you'll see the lat long rounding columns I created for the map hack. All the pivots are there in various sheets.
      file:
      http://dl.dropbox.com/u/17180596/TimeOfDayPivots.xlsx

      sourced from aggregated "Accident" files per year here:
      ftp://ftp.nhtsa.dot.gov/fars/

      Caveat on the alcohol element:
      2006 and 2007 considered anybody involved in the fatal crash for the "Drunk_Dr" flag, not just the driver. 2008, 2009, and 2010 consider only the driver. BAC of 0.01 or more is considered in the data. All conditions of alcohol involvement were considered for these graphics.

      Delete
  2. I also couldn't download the data from NHTSA

    ReplyDelete