One of the most common questions I've gotten about our journalist/programmer scholarships comes from news organizations: "When can we hire them?" And recent developments suggest that the need for people with both journalism and programming skills is only going to increase.
For Northwestern's Readership Institute blog, I wrote last week about the growing number of data-driven applications being published on on news Web sites. I used the Indianapolis Star's Data Central as a case study. It's worth pointing out, though, that the paper was able to publish most of these databases without involving professional programmers. This reflects one of the driving trends in technology: the development of tools that enable data-driven publishing with modest levels of technical skills.
If the tools are getting easier for non-programmers to use, what would a person with both journalism and programming skills do at a news organization?
Some of the answers might be found in the experience of the Star and other news organizations (such as the Asbury Park Press) that are leading the way in deploying databases on their Web sites. Successful as these initiatives have been, I would argue that they are just a start. Some projects are clearly both more complex and potentially more rewarding -- for news organizations and their online audience -- than others. 数据挖掘研究院
For the Readership Institute post, I began to put together a hierarchy of data-driven journalism. At the low end are the simplest kinds of projects in which the news organization doesn't do much beyond making the data available. At the high end are the most ambitious applications, in which the news organization adds value through smart interface development, journalistic analysis, creativity in presentation or connections to storytelling. 数据挖掘研究院
Level 1: Data delivery. Here a news organization obtains data and makes it available in a browsable form. There's no additional reporting and little functionality for the online user. The Star's CEO salaries database is an example. 数据挖掘研究院
Level 2: Data search. This is by far the most common way data is made available. Users are expected to find relevant information by entering text into a search box. An example: The Cincinnati Enquirer's database of home sales prices.
Level 3: Data exploration. Compare the search results page for a typical searchable database like Cincinnati home sales prices to the browse options on Adrian Holovaty's chicagocrime.org. There's a search box on the page, but the site allows easy exploration of the data in a way most online databases do not. Click on any of the browse options and you are presented with additional links that you can click on and explore the information more thoroughly. I recently heard Adrian talk about his approach to developing database application. He talks about applying "The Treatment" to online data, by which he means, "Present it in ways that make it fun and serendipitous." His motto is: "Everything that can be linked should be linked." His work shows that searchability is just the beginning.
Level 4: Data visualization. Rows and columns are often not the best way to present data. For many databases, the most valuable thing a news organization can do is provide a way for people to visualize what the data show. The most obvious approaches involve mapping, at least for databases that have a geographic element. Thanks to Google and Yahoo!, it is relatively easy to add maps to any database that includes addresses. But the possibilities for data visualization go way beyond mapping. A site that is doing some very interesting things with data visualization is Digg.com, a tech-oriented site where content is prioritized based on user voting. Check out Digg Labs for some creative ways the Digg team is finding to prioritize news using visual interfaces.
Level 5: Data experiences and storytelling. When a news organization can effectively marry traditional reporting and storytelling with database development capabilities, truly new forms of journalism can emerge. Here are two examples of what I'm talking about: 数据挖掘研究院
- The Los Angeles Times' homicide map. What makes this project interesting is that behind the map is a page (actually a blog post) about every individual murder in Los Angeles this year. And for each murder, the Times allows comments (after staff review), which often take the form of tributes to the homicide victim. These comments are often poignant and compelling -- transforming dry statistics into human stories.
- Politifact, a joint project of the St. Petersburg Times and Congressional Quarterly. This is a data-driven application designed "to help you find the truth in the presidential campaign."
I'd also list some examples of data-driven journalism created by the News21 reporting project, an initiative (funded by the Knight Foundation and the Carnegie Corporation) involving graduate students from Northwestern University, the University of Southern California, Columbia University and the University of California-Berkeley. (Disclosure: I have served as an adviser to the students and helped them work with Flash developers to build these projects.) These three examples marry original reporting and data-driven presentation: 数据挖掘实验室
- Digital Trails, a story about how information about people is captured and stored as they go about their daily activities. The News21 reporters followed a young woman around the Washington area and identified every instance in which she left a "digital trail," then found out where that information was stored and how it might be shared with companies or the government. Underlying the multimedia reporting and Flash interface is a database in which every trail is a data element. Flash design and programming were done by From Scratch Design Studio of Washington, DC.
- Government Data Mining, a project that started with the most complete list ever compiled about government data-mining programs. The student team then had the interesting idea of using an interface similar to the ones used by data-mining software to allow people to explore these programs, government agency by government agency. FromScratch worked on this project as well.
- One Vote Under God, a comprehensive look at this year's presidential candidates and issues related to religion. The Flash interface for this project, developed by Michael Nix Design in Chicago, enables users to explore the candidates religious backgrounds and positions on religion-related issues, as well as to compare any pair of candidates.
I'm hoping that the journalist-programmers who graduate from our new program will both be more likely to come up with ideas like this, and be able to help make them happen.

