A geometric pass at delineating areas within the United States potentially covered by each craigslist site -the United States of Craigslist. Check out poster print options here.
Source Data Download
Here is a link to a skydrive folder with this data as a shapefile (.shp, .dbf, yada yada...) and a (spartanly formatted) kml file: https://skydrive.live.com/#!/?cid=2eb6aaf6c3ac1ebe&sc=documents&uc=2&id=2EB6AAF6C3AC1EBE%212242
With great power comes great responsibility.
Poster Print Available
If you want, you can go here and order a print of this sucker. Or you can call your friend in the geology department with a huge plotter and sneak off a print after hours.
Locality is inherent to the value of craigslist; I go to craigslist.org but I get kicked over to the local instance of craigslist (my IP address sources me to somewhere in the illustrious Lansing, MI). But how does craigslist know where to send me? Some mysterious system of assigning a geocoded IP address to just the right site must be in place...I wonder what that map looks like.
When Ian Clemens proposed the idea, I looked around to find an existing map of craigslist sites-to-areas -maybe even find the lookup that they themselves use. I couldn't find anything like it.
Whether it matches their system well or not, here is a map that approximates geographic coverage to individual sites using a Voronoi process as a base (more info on process below). It is at least a start at visualizing the geographic coverage and distribution of the community-driven instances of craigslist. Shapes like this might provide some useful context for other data, demographic or market information, for instance. Also, when pulled into VFX, it can serve as an input to some spatial querying on those other metrics.
We'll soon be releasing an interactive version of this data in VFX where you
can play with it in the context of other data and within alternate, though coupled,
charting and timeline dimensions.
With access to web traffic data, one could compile a pragmatic view of coverage driven by the locations of actual website visitors (but this would just be the incestuous results of the current method craigslist uses to allocate visitors to sites). That's ok, but it's more detective work than interesting data creation.
The use of openstreetmap data to weight the polygon drawing by travel time would improve the realism of the hypothetical zones considerably. In that case, maybe it could be used to drive a more efficient assignment of craigslist.org visitors to their actual-nearest craigslist community. That would rule.
Creating these areas was, in part, a helpful testbed for a generic region-building functionality that is in the skunk-works here at IDV. The algorithmically inclined Abhinav Dayal has been crafting our drive-time service that is already doing a lot of the heavy lifting when it comes to Voronoi diagramming. So, down the road we might see a new specialized tool that generates best-fit areas around an existing set of points -useful for some what-if scenarios around territory creation, available to the business user, not just the research scientist.
• Scrape the list of Craig’s cities at http://www.craigslist.org/about/sites.
• Split joint-locations into individual locations (like "Odessa / Midland")
• Geocode place-specific locations.
• Manually position the more regional locations (like "Southeast Iowa").
• Divide locations into three geographically distinct regions (split by the Continental Divide along the spine of the Rockies and the Mississippi); duplicate any locations that meaningfully straddle a border, like St. Louis. I do this to introduce some true-cost of crossing either of those features, in the face of an algorithm that would otherwise treat the whole country as a smooth unfettered plain.
• Run Voronoi (Thiessen) algorithm to generate best-fit zones for the points, for all three regions.
• Clip Voronoi zones by a “land” shape to cut out the oceans and provide a common border between the three regions (my "land" was constructed from the Census Bureau's tracts file).
• Merge the 3 regional Voronoi sets into a unified nation-wide set.
• Dissolve boundaries between same-website Voronoi zones (to re-combine the joint-locations up there in step 2) into merged chunky polygons.
• Manually re-assign oddly-orphaned or split areas (common along complicated shorelines).
That's just about it. Thoughts? Ideas? Outrage? Incredulity? Been done? Guffaws? Know a good dataset to improve this method?
Download the United States of Craigslist as a table of member Zip-Codes
Follow @JohnNelsonIDV Tweet