A National Data Agency

President Obama promises a more responsible and accountable government that openly shares information with its people. This includes publishing executive orders and laws before they are signed, so everybody can comment. But it also needs to include the data decisions are based on. An information society needs its data to be available and accessible to make informed decisions – just like its leaders.

What impressed me most about President Obama's inauguration speech was this passage (emphasis mine):

And those of us who manage the public's dollars will be held to account, to spend wisely, reform bad habits, and do our business in the light of day, because only then can we restore the vital trust between a people and their government.

There is no reason to doubt his words: already, the new White House web site has a blog and new sections for all kinds of reports. They will also post executive orders and legislation for review by everybody who cares to look. There is also a very interesting executive order that Obama issued on his first day that reverses Bush's sweeping use of executive privilege to withhold information and allows much more access to Presidential records (including those of past Presidents).

All these great developments notwithstanding, there is still a piece of the puzzle missing: the numbers. This president, more than many (or all) before him, makes his decisions based on numbers. These numbers need to be published and made accessible so we can understand the decisions and make our own.

The challenge is not only data availability. A lot of data is, in fact, available. The US is the most transparent nation in the world – to an extent that can be frightening to an outsider (think pay data for state employees, property tax data, etc.).

The challenge is that a lot of data is published in a format that is human-readable, not machine-readable. This might sound like a good thing, but it's not. Machine-readable data can be processed and transformed into any number of human-readable forms, that direction is trivial. Making human-readable data accessible to a machine is much more difficult, error-prone, and expensive.

What we need is a National Data Agency (NDA). This agency would be tasked with collecting data that all other agencies collect and produce, and making it available in a central place and in electronic, machine-readable form. There could and should be a reasonable data presentation on its website, perhaps even a National Data Dashboard (showing data of interest like debt, spending, jobless rate, etc.). But the bulk of data analysis would be left to third parties: analysts, journalists, citizens (and also aliens like me). Easily available data would make for more insightful reporting, more informed decisions, and endless business opportunities.

There are a number of great initiatives that make it possible to work with data collected from official sources: the Death and Taxes poster, Every Block's collection of local data, They Rule's network of board members, The New York Times' Senate API, etc. These are partly based on data feeds, but mostly on tedious, manual analysis of records and reports. The way data is presented and communicated produces lots of unnecessary work to undo the formatting and get back to the original. Publishing the data as simple data files would be so much more efficient.

The obvious thing to do with the data would of course be to visualize it. Feed it directly to Swivel and Many Eyes and any number of new sites that could build custom visualizations for different kinds of data. We have the tools to build a more informed, more engaged, and more educated society – we just need access to the raw material: data.

Posted by Robert Kosara on January 23, 2009.

A National Data Agency

Comments (5)