Data

How to extract, massage, filter, reshape, and reformat data for visualization.

Temperature Baseline Differences

Climate Differences

Tableau started the beta of its Tableau Public program today, and what better way to kick the tires than to run some more climate data through it? Below, you can look at temperature data from 343 weather stations over twenty years (77172 obervations) to compare the difference from the baseline numbers in the 1970s and 2000s.

---

A Look At Climate Data

Colorful lines

Wether you believe that global warming is real or not, a bit of validation of the source data is still interesting. This is my second look at the global temperature data recently released by the UK's Met Office, this time using Tableau. There are some interesting data issues here, and a rather analytical visualization.

---

Interactively Explore Climate Data

Climate data 1740-2008

The United Kingdom's Met Office recently released temperature data for about 1700 weather stations across the globe from 1701 to 2009. Here is an interactive visualization (built using Protovis) of that data for you to explore.

---

CSV to JSON Converter

<

p>This converter turns CSV into JSON in a "column-major" way. It is intended to be used to create visualizations with Protovis.

---

The Simple Way to Scrape an HTML Table: Google Docs

Google Docs used for scraping web tables

Raw data is the best data, but a lot of public data can still only be found in tables rather than as directly machine-readable files. One example is the FDIC's List of Failed Banks. Here is a simple trick to scrape such data from a website: Use Google Docs.

---

A Browser for Data.gov

An Explorer for Data.gov

Data.gov's selection of data is slowly growing, but even with less than 300 datasets, it is difficult to keep an overview of what is there. Below is a little Java applet that provides a way to drill down into data.gov's catalog using a variety of categories: reporting agency, geographic coverage, frequency, data type, etc. Besides giving a better idea what is there, it also shows a number of inconsistencies that make finding data more difficult.

---

Data Is A Dish Best Served Raw

Data Zoom

The recent opening of Data.gov has led to a number of discussions on data formats, feeds, what kinds of data, which agencies are or are not participating, etc. One key aspect that gets overlooked very easily, but that is really essential, is that what is being published is actual data: original, raw, unprocessed, undigested, naked data. Everything else is secondary.

---