OECD Seminar on Turning Statistics into Knowledge

Last week, I attended the seminar on Turning Statistics into Knowledge, organized by the OECD, the World Bank, and the US Census Bureau. That was an interesting way of spending two days, and I saw some interesting ideas and talked to many great people. But it was also a reminder of how little understanding of visualization there really is, and how far we have to go to make good visualizations available and work for a variety of users.

The seminar was hosted at the Census Bureau's headquarters outside Washington, D.C. As a federal government building, it has very strict security: you have to be registered ahead of time, go through a metal detector and have your bag x-rayed. If you're bringing a laptop, you have to register it when you come in (or risk having it confiscated) and sign it out when you leave. Foreigners (which made up about 50% of the seminar) had to submit passport information ahead of time and had to be escorted at all times by Census people while in the building. Sounds arduous, but it was managed extremely well, and was really a non-issue through most of the event. I also liked how sympathetic Nancy Gordon, the host (and Associate Director for Strategic Planning and Innovation) was about it without being apologetic.

Twenty-four talks and one panel were crammed into these two days. There was a live webcast, and they promised to put up videos of all the talks. Presentation materials and links to the presented websites should be available shortly.

Highlights

I think my favorite talk was Di Cook's presentation. She walked us through three small data sets, showed some variations and models, and basically said that the model doesn't explain everything: you can't just boil a complex dataset down to a trend and think that you've captured what's really going on. That gave me quite a bit to chew on, and I also liked her understated way of presenting. In a way, this was perhaps the talk that captured the idea of turning statistics into knowledge the best, though it also undermined it at the same time.

Amanda Cox of the New York Times showed some of her work, which included the Turning a Corner? graphic that is based on a chart by the OECD. Not only is her version a lot nicer to look at (no gaudy background gradients), it actually explains what is going on. Using layers, she compares different decades and recessions in a way that makes a lot of visual sense. I think I'm going to make this a case study for one of my classes to show how important design and storytelling can be to not just make a picture from data, but add value to it.

David Spiegelhalter showed work on their website, Understanding Uncertainty, which aims to show people what reducing your risk of stroke by 20%, etc. actually means. Their tools look neat, and the presentation was excellent, but they clearly need to evaluate what they are doing. In the end, it's mostly breaking down percentages into actual numbers, and I wonder if that really adds much to the actual understanding (other than the initial surprise at the numbers).

Irene Ros talked about Many Eyes and showed some of its features. I found it to be somewhat off-topic, because the most impressive and innovative features on Many Eyes have to do with text visualization, which isn't exactly what you use for a lot of statistical data. Of course, there are also tools for numerical data, but the most oohs and aahs came from from Wordle and the word tree. I also have to say that treemaps on Many Eyes are almost completely useless.

Ben Fry gave a great presentation about Processing and data-based art (or maybe artistic statistics?), but also didn't quite address the seminar topic. The discussion in that session also turned in a somewhat absurd direction when another tool that had been presented there, called VisART, was described as "artistic." VisART actually stands for Visual Automated Review Tool, and is about as unartistic as it gets (it's used inside the Census Bureau to fact-check data, and I don't doubt that it's useful; but visually, it's simply awful).

Eric Wassink of Statistics Netherlands gave an unusual presentation about the problems they had run into. He advocated starting work with the organization, not the technique. Defining targets and metrics helps keep the momentum going, and keeps your work on the minds of fickle managers (who are initially drawn to shiny and colorful things, but also tend to forget quickly). I liked his metaphor of "inviting the managers into the kitchen" where you cook up your techniques, i.e., involve them in design decisions. Perhaps having a good evaluation strategy would also help, but I don't disagree about the importance of the other points.

I should also mention OECD eXplorer, which is a good tool, but it wasn't presented very well. It combines an animated scatterplot (with trails a la gapminder), a map, and parallel coordinates, and lets you access a lot of data from the OECD. It's a good tool with decent design and apparently quite a bit of backing from the organization. I wish a lot of the other organizations would just drop their own attempts and use Xplorer instead.

Twitter

This was the first event where I didn't expect to know anybody, and then it turned out that about a dozen or so people I know on Twitter were there and I got to meet them in real life. I also added a number of new people to my list in those two days, most of whom were there (and a few more who were following my remarks about the event).

The only problem was that we couldn't quite agree on the hashtag to use. There was the obvious but rather unspecific #OECD, then there was #statknowledge, #stat2know, and maybe one or two more. It turned out to be easier to just follow the people there directly, rather than search through the hashtags. But next time (for any event, not just this), we should get together and decide on a single, short, specific hashtag before people go off doing their own things.

Anyway, here's a shout-out to a few people I met there: Paolo Ciucarelli (density design), Peter Couvares (Verifiable.com), Zach Gemignani (Juice Analytics), Alex Lundry (Target Point Consulting), Steve Myers (Poynter), Naomi Robbins (NBR Graphs), Irene Ros (IBM/Many Eyes), Beck Tench (Museum of Life and Science), and a few more (sorry if I forgot to mention you, feel free to add your Twitter handle in the comments).

Visualization

While I had expected a lot of discussion about actual statistics, once the program was published it became clear that it would be mostly about visualization. Most of the presentations were essentially demos of projects by various organizations to put visualization tools for their data on the web. This got quite repetitive, and many completely trivial features were demoed in much more depth than necessary.

Some of the projects were also terrible in terms of visualization techniques, colors, user interface, etc. Lots of primary colors, gratuitous gradients, and long lists of checkboxes. Giving people lots of options and lots and lots of data to choose from in a decent user interface is not a trivial undertaking. The OECD eXplorer does that quite well, but most of the others were stuck in endless lists of data dimensions and options.

There was also a complete lack of visualization people. Irene Ros and I would count, but that was it. All the projects used the standard bar, line, and pie charts. Not a single treemap that I remember, and the only parallel coordinates plot was in the OECD eXplorer. These things have to be accessible to a lot of people, but that doesn't mean you can't add more complex tools (like the eXplorer does by initially hiding the parallel coordinates). Rather than add animations and interaction to pie charts (as one project did), a bit of research would lead to a much more sensible and useful design.

Evaluation

None. The more I think about it, the more this bugs me. Dozens of visualization tools are thrown on the web, money is spent on designing and implementing them. And yet, none of these organizations bothered to test how effective these things would be. The question about evaluation came up a few times, but nobody was able to say more than "it would be a good idea to do it."

Maybe the reason this bugs me so much is because it shows the mindset behind these projects: Let's make something pretty! And colorful! On the web! It'll show how cutting-edge we are! Nobody could have possibly done the exact same thing before, so don't bother doing any research! Just get it out there!

Knowledge

Nobody really talked about the seminar topic of Turning Statistics into Knowledge. Some of the talks I mention above came close, but there was no explicit discussion of how it might be done. No overarching approach that said: this is our idea of how it might work, what we're going to demo is the first step, motivated by our overall design. Perhaps that also explains the lack of evaluation: to evaluate, you have to know what you're evaluating.

These visualization tools do not magically create knowledge, they only produce colored pixels. In several presentations, I got the distinct feeling that they really mostly wanted to make something pretty and colorful, and didn't really care about how useful it would be.

Case in point was a presentation where the presenter started by talking about how people's attention spans are getting shorter, and they expect to get results very quickly, and then spent ten minutes showing every single radiobutton and drop-down menu before we ever got to see a picture. To make changes, you had to leave the visualization, pull up the check box and radio button graveyard to make your selections, and then go back to the visualization. One of the worst user interface designs I've seen in some time, and hardly a tool that helps you gain knowledge.

Where is Visualization When You Need It?

Why do all these statistical offices and other organizations have to do things from scratch? Why don't they just use well-designed, off-the-shelf tools instead? The answer is: they don't exist. At least not in the open source world. As an academic field, visualization has completely failed to provide free, usable tools to the public. The things that are available are terrible, outdated, and unmaintained. It's a sad state of affairs.

There are some good commercial tools and SDKs out there, and I don't know why they're not used. Cost may be a factor, but then developing your own isn't exactly free, either. But either way, it's a bad situation to be in to say: Look at all the wonderful academic work we've done in visualization! You like it? Well, you'll have to implement it yourself from scratch, or pay lots of money for it!

Summary

Despite all my criticism, I had a great time there. There are lots of people who care about their data and about communicating it to the world. It's wonderful to see that. But I also see that they need to dig deeper than to just make things pretty. While visualization does not currently provide decent tools, we do have a lot of knowledge to offer that could help these projects become much more effective and meaningful, and a better use of taxpayer money.

Posted by Robert Kosara on July 24, 2009.

Comments

Alberto says…

Thanks for the review. I was there as well and I think I agree with your overall assessment. I am an astronomer and an avid consumer of visualization and I had hoped for a little better at the conference. I think, as you mentioned, many talks were off-topic in that they really did not address what I believe is the REAL issue: the "how" you turn stats into knowledge. Nevertheless, I think it was a rather interesting conference that underscored the role that massive amounts of data are playing in our society by forcing us to reconsider many of approaches to data collection and analysis.

Hadley Wickham says…

I think you underestimate the importance of David Speigelhalter's work. There is a large body of work (particularly by Tversky and Kahneman) showing that our intuition is terribly bad when it comes to understand probabilities and translating to natural frequencies can make a big difference.

Robert Kosara says…

I have no doubt that our intuition is bad with regards to probabilities, but I would like to see some proof that this particular visualization actually helps. And I'm not even necessarily doubting it, I just think it needs to be evaluated. Watching the talk, my conclusions were also different from David Spiegelhalter's: when he showed how taking a particular medication would save two people out of 100, I thought: if one them is me, that's totally worth it! He didn't seem to think so. I wonder how that kind of discrepancy would factor into a user study.

balinjdl says…

Thanks for posting your comments. I, too, enjoyed the seminar, but took a slightly different tact in my comments, which focus on the technical side of things like XML, Flash, databases, etc. Good seminar. I hope to attend it again in the future, if it's ever around here again (I attended on my own, not through my work).

Hadley Wickham says…

Far enough. I think this is a fairly active field in psychology, so perhaps people are already doing the research.

Di Cook says…

Your post misses the point about a lot of important contributions from the workshop. My interpretation of turning information into knowledge, in the context of this meeting was how to inform masses of people. In this regard, Ben Fry's example of projecting a shape onto the smoke stack of a power plant is spot on. I'm very curious about the public's reaction, and whether the installation did indeed result in reduced energy consumption by residents. I would like to think that it made the connection between energy use and power plant emission concrete for local residents.
A few comments about how backward the USA is were raised but I was glad to see these parried. What is so excellent about many government agencies in the US is that they put their actual data out for public consumption. Several European countries and even my home country Australia are simply posting visual and tabular summaries. This is so frustrating, because they typically do a lousy job with the graphics and tables. Who are they to say what I want to know - give me the data and I can find out for myself. Its a trade-off like "give a person a fish and they eat for the day, give a person a fishing rod and they eat for a lifetime". I was also cringing at some of the graphics showed by developers. One thing I try to remind myself, though, is at least people are trying to do data visualization.
I had a good discussion with several people from BLS and there is a general problem that I've noticed in industry and government. A number of years ago Dan Carr worked with BLS on converting tables to charts and linked micromaps - very elegant work. I was sorry to hear during the meeting that this work is not used any more. He and Dan Rope provided software for the methods, but there was no-one at BLS who was in charge of keeping the software working or supporting users. It was incorporated into commercial software, nVizN, but that company dissolved. If there is going to be a fruitful, sustainable exchange of ideas, collaborations with academia need to have a full partner from the other side, the small test software produced by academia needs to be developed and maintained by the interested party.
In response to yours and Hadley's exchange above. Spiegelhalter's work is in many ways similar to Ben Fry's, trying to boil down complex data into something simple and understandable by the masses, only its done through the use of probability rather than art. One of his images does remain imprinted in my mind: the chart of the distance a cyclist, motorcyclist or car goes before a fatal crash. The risk is that the symbolism is too simplistic.
Thanks for the nice things you said about my talk. It was very difficult to decide on what to say in a 15 minute talk to such a hugely varied audience. I too felt isolated - Naomi Robbins was the only other statistical graphics person. I thought everyone was from info vis! And you say otherwise. I do think that the organizers of events like this are to be congratulated. It really was a forum for exchange of ideas in a very diverse environment.

James Lytle says…

This is exactly the trend I've seen online and in most conferences to date. I think its wonderful there is such a diversity of techniques and visualizations out there, however, when visualization prowess becomes the chart junk of understanding, I think we're headed in the wrong direction. The degree to which people understand and correct behavior is the the degree of usefulness. Graphic treatment of statistical information becomes knowledge when meaningful action is taken and it seems there are gravely more people getting excited about putting data into pictures (because they can) than working to understand the response of their audience. Though, I have to admit, it is tantalizing seeing the visual boundaries being pushed in the info viz community.

Gerad Suyderhoud says…

Thanks for the great summary of the seminar. We decided not to send anybody from Swivel (as we're working very hard on getting our new product out). Thus this summary was quite useful. To contribute some findings from our own qualitative research, we've found that there are three things that are important to turning statistics into knowledge.
Visualization - you need to be able to see the numbers in order to understand them (this includes being able to quickly scan through tables, as well as fancier graphical representations).
Interaction - you need to be able to engage with the data, shape it, and see it from different angles. Playing with the data really lets you focus on what the numbers are saying.
Communication - especially in businesses, real understanding of data happens when results are shared and presented. Collaboration that happens before a big presentation, as well as feedback that is taken away from that presentation are all integral to understanding.
Finally, a personal data point. Experimentation is essential. Having access to all the historical data in the world is not as useful as having a lab to test hypotheses in. Having access to historical data may let you make educated guesses about causes and results, but running experiments can give you proof.

Paul Gestwicki says…

I was not at the workshop, but I agree and sympathize with your feelings regarding lack of evaluation and tools. I feel that academics have a responsibility to bring free, accessible tools to the public. I think that one hindrance we have is the lack of coherent recommendations. Reading the literature, I find a lot of great ideas, but I have yet to see a set of recommendations that could become guidelines for a distributed open-source effort, for example. I have been working with the intermedia arts community, and they are developing recommendations for university media labs. This community then will be able to take these recommendations to their administrators to justify the effort and costs. Similarly, if the visualization community (diverse as it is) could articulate the features and use cases for end-user visualization tools, I know it would help researchers like me in getting support from my institution, as well as funding agencies, for the efforts.

tom J Halsør says…

I can relate to alot of your comments made about the seminar. I have attended quite a few statistical seminars lately, all about communication, visualization and dissimination. But I sort of is left with much of the same feelings as you did this time around. I have tried to sort of put a finger on the real problem in my latest post, and quoted your post quit a bit (hope this is okey). DD4D, a conference by IIID and OECD in Paris this July was great and, more spot on. I'm optimistic, we are getting there! Thanx again :)tom

Mmorpg says…

Statistics can be manipulated into anything. All you need is a creative researcher. What is so excellent about many government agencies in the US is that they put their actual data out for public consumption. Several European countries and even my home country Australia are simply posting visual and tabular summaries. This is so frustrating, because they typically do a lousy job with the graphics and tables. Who are they to say what I want to know - give me the data and I can find out for myself. Its a trade-off like "give a person a fish and they eat for the day, give a person a fishing rod and they eat for a lifetime". I was also cringing at some of the graphics showed by developers. One thing I try to remind myself, though, is at least people are trying to do data visualization.

OECD Seminar on Turning Statistics into Knowledge ​

Highlights ​

Twitter ​

Visualization ​

Evaluation ​

Knowledge ​

Where is Visualization When You Need It? ​

Summary ​