Wednesday, July 09, 2008

Public Data - Mining for Gold

I saw an interesting story about how the Houston Chronicle has put the entire database of the names and salaries of 81,000 city of Houston employees online. Though all this is public data, they have put it all in one place for everyone to play with. I particularly love this estimate from Houston Community College of how much labor it would take to compile the data. 70 hours of programming and validation seem a little steep to join a table or two. Perhaps there we vacuum tubes and punch cards involved. I imagine it created quite a bit of angst for some on the payroll. Among the fascinating tidbits, someone made almost $100,000 in overtime in 1 year and the superintendent of schools made twice what the mayor did . Where do I apply? This is some serious territory for the data visualization folks. So many things can be done with this. Salary mapping by agency and district just scratches the surface. Of course the Chronicle is hoping to get some more story tips from the public miners of public data.

Journalists are becoming much more sophisticated in analyzing this type of data. The Institute for Analytical Journalism appears to be an organization that promotes this. Using network analysis, spatial statistics, GIS and various data mining algorithms all have huge potential to unlock patterns and actionable knowledge, not only in journalism but in many domains.

No comments: