Last week, I attended the seminar on Turning Statistics into Knowledge, organized by the OECD, the World Bank, and the US Census Bureau. That was an interesting way of spending two days, and I saw some interesting ideas and talked to many great people. But it was also a reminder of how little understanding of visualization there really is, and how far we have to go to make good visualizations available and work for a variety of users.
The seminar was hosted at the Census Bureau’s headquarters outside Washington, D.C. As a federal government building, it has very strict security: you have to be registered ahead of time, go through a metal detector and have your bag x-rayed. If you’re bringing a laptop, you have to register it when you come in (or risk having it confiscated) and sign it out when you leave. Foreigners (which made up about 50% of the seminar) had to submit passport information ahead of time and had to be escorted at all times by Census people while in the building. Sounds arduous, but it was managed extremely well, and was really a non-issue through most of the event. I also liked how sympathetic Nancy Gordon, the host (and Associate Director for Strategic Planning and Innovation) was about it without being apologetic.
Twenty-four talks and one panel were crammed into these two days. There was a live webcast, and they promised to put up videos of all the talks. Presentation materials and links to the presented websites should be available shortly.
I think my favorite talk was Di Cook‘s presentation. She walked us through three small data sets, showed some variations and models, and basically said that the model doesn’t explain everything: you can’t just boil a complex dataset down to a trend and think that you’ve captured what’s really going on. That gave me quite a bit to chew on, and I also liked her understated way of presenting. In a way, this was perhaps the talk that captured the idea of turning statistics into knowledge the best, though it also undermined it at the same time.
Amanda Cox of the New York Times showed some of her work, which included the Turning a Corner? graphic that is based on a chart by the OECD. Not only is her version a lot nicer to look at (no gaudy background gradients), it actually explains what is going on. Using layers, she compares different decades and recessions in a way that makes a lot of visual sense. I think I’m going to make this a case study for one of my classes to show how important design and storytelling can be to not just make a picture from data, but add value to it.
David Spiegelhalter showed work on their website, Understanding Uncertainty, which aims to show people what reducing your risk of stroke by 20%, etc. actually means. Their tools look neat, and the presentation was excellent, but they clearly need to evaluate what they are doing. In the end, it’s mostly breaking down percentages into actual numbers, and I wonder if that really adds much to the actual understanding (other than the initial surprise at the numbers).
Irene Ros talked about Many Eyes and showed some of its features. I found it to be somewhat off-topic, because the most impressive and innovative features on Many Eyes have to do with text visualization, which isn’t exactly what you use for a lot of statistical data. Of course, there are also tools for numerical data, but the most oohs and aahs came from from Wordle and the word tree. I also have to say that treemaps on Many Eyes are almost completely useless.
Ben Fry gave a great presentation about Processing and data-based art (or maybe artistic statistics?), but also didn’t quite address the seminar topic. The discussion in that session also turned in a somewhat absurd direction when another tool that had been presented there, called VisART, was described as “artistic.” VisART actually stands for Visual Automated Review Tool, and is about as unartistic as it gets (it’s used inside the Census Bureau to fact-check data, and I don’t doubt that it’s useful; but visually, it’s simply awful).
Eric Wassink of Statistics Netherlands gave an unusual presentation about the problems they had run into. He advocated starting work with the organization, not the technique. Defining targets and metrics helps keep the momentum going, and keeps your work on the minds of fickle managers (who are initially drawn to shiny and colorful things, but also tend to forget quickly). I liked his metaphor of “inviting the managers into the kitchen” where you cook up your techniques, i.e., involve them in design decisions. Perhaps having a good evaluation strategy would also help, but I don’t disagree about the importance of the other points.
I should also mention OECD eXplorer, which is a good tool, but it wasn’t presented very well. It combines an animated scatterplot (with trails a la gapminder), a map, and parallel coordinates, and lets you access a lot of data from the OECD. It’s a good tool with decent design and apparently quite a bit of backing from the organization. I wish a lot of the other organizations would just drop their own attempts and use Xplorer instead.
This was the first event where I didn’t expect to know anybody, and then it turned out that about a dozen or so people I know on Twitter were there and I got to meet them in real life. I also added a number of new people to my list in those two days, most of whom were there (and a few more who were following my remarks about the event).
The only problem was that we couldn’t quite agree on the hashtag to use. There was the obvious but rather unspecific #OECD, then there was #statknowledge, #stat2know, and maybe one or two more. It turned out to be easier to just follow the people there directly, rather than search through the hashtags. But next time (for any event, not just this), we should get together and decide on a single, short, specific hashtag before people go off doing their own things.
Anyway, here’s a shout-out to a few people I met there: Paolo Ciucarelli (density design), Peter Couvares (Verifiable.com), Zach Gemignani (Juice Analytics), Alex Lundry (Target Point Consulting), Steve Myers (Poynter), Naomi Robbins (NBR Graphs), Irene Ros (IBM/Many Eyes), Beck Tench (Museum of Life and Science), and a few more (sorry if I forgot to mention you, feel free to add your Twitter handle in the comments).
While I had expected a lot of discussion about actual statistics, once the program was published it became clear that it would be mostly about visualization. Most of the presentations were essentially demos of projects by various organizations to put visualization tools for their data on the web. This got quite repetitive, and many completely trivial features were demoed in much more depth than necessary.
Some of the projects were also terrible in terms of visualization techniques, colors, user interface, etc. Lots of primary colors, gratuitous gradients, and long lists of checkboxes. Giving people lots of options and lots and lots of data to choose from in a decent user interface is not a trivial undertaking. The OECD eXplorer does that quite well, but most of the others were stuck in endless lists of data dimensions and options.
There was also a complete lack of visualization people. Irene Ros and I would count, but that was it. All the projects used the standard bar, line, and pie charts. Not a single treemap that I remember, and the only parallel coordinates plot was in the OECD eXplorer. These things have to be accessible to a lot of people, but that doesn’t mean you can’t add more complex tools (like the eXplorer does by initially hiding the parallel coordinates). Rather than add animations and interaction to pie charts (as one project did), a bit of research would lead to a much more sensible and useful design.
None. The more I think about it, the more this bugs me. Dozens of visualization tools are thrown on the web, money is spent on designing and implementing them. And yet, none of these organizations bothered to test how effective these things would be. The question about evaluation came up a few times, but nobody was able to say more than “it would be a good idea to do it.”
Maybe the reason this bugs me so much is because it shows the mindset behind these projects: Let’s make something pretty! And colorful! On the web! It’ll show how cutting-edge we are! Nobody could have possibly done the exact same thing before, so don’t bother doing any research! Just get it out there!
Nobody really talked about the seminar topic of Turning Statistics into Knowledge. Some of the talks I mention above came close, but there was no explicit discussion of how it might be done. No overarching approach that said: this is our idea of how it might work, what we’re going to demo is the first step, motivated by our overall design. Perhaps that also explains the lack of evaluation: to evaluate, you have to know what you’re evaluating.
These visualization tools do not magically create knowledge, they only produce colored pixels. In several presentations, I got the distinct feeling that they really mostly wanted to make something pretty and colorful, and didn’t really care about how useful it would be.
Case in point was a presentation where the presenter started by talking about how people’s attention spans are getting shorter, and they expect to get results very quickly, and then spent ten minutes showing every single radiobutton and drop-down menu before we ever got to see a picture. To make changes, you had to leave the visualization, pull up the check box and radio button graveyard to make your selections, and then go back to the visualization. One of the worst user interface designs I’ve seen in some time, and hardly a tool that helps you gain knowledge.
Where is Visualization When You Need It?
Why do all these statistical offices and other organizations have to do things from scratch? Why don’t they just use well-designed, off-the-shelf tools instead? The answer is: they don’t exist. At least not in the open source world. As an academic field, visualization has completely failed to provide free, usable tools to the public. The things that are available are terrible, outdated, and unmaintained. It’s a sad state of affairs.
There are some good commercial tools and SDKs out there, and I don’t know why they’re not used. Cost may be a factor, but then developing your own isn’t exactly free, either. But either way, it’s a bad situation to be in to say: Look at all the wonderful academic work we’ve done in visualization! You like it? Well, you’ll have to implement it yourself from scratch, or pay lots of money for it!
Despite all my criticism, I had a great time there. There are lots of people who care about their data and about communicating it to the world. It’s wonderful to see that. But I also see that they need to dig deeper than to just make things pretty. While visualization does not currently provide decent tools, we do have a lot of knowledge to offer that could help these projects become much more effective and meaningful, and a better use of taxpayer money.