Social websites are all the rage right now, and are not just hyped by the media (MySpace and YouTube in particular), but there are also large amounts of money involved (again, MySpace and YouTube). But does the social model make sense for data analysis and visualization? And will users play and interact with data the way they do with other media? Two websites were launched recently to find out: Swivel.com (defunct as of late 2010) and Many Eyes. Here is a first review, looking at the two sites in terms of their founders, approach, social aspects, technology, capabilities, broad appeal, and ethics.
While the tools for creating charts and basic visualizations from data are available to everybody (and for free too, thanks to projects like OpenOffice), there is still a lack of knowledge how to properly use them. Most people have created a basic bar or pie chart before, but few realize how much you can do with even the few simple tools in Excel.
What is more, very little data is freely available. If I want to put my sales data or the price of oil into context, where do I get the data from? The makers of Swivel make some good points about data being locked up, as does Hans Rosling (in a totally different context). How much more could we know if all that data was made available and easy to access?
So social data sharing and visualization websites could have an enormous impact on the world, by giving many people access to a lot of data that would otherwise be hard or impossible to obtain. And they can expose a lot of people to visualization that would not otherwise see good uses visualization techniques, and to learn from others how to make the most of them. Visualization also needs to break out of the academic world and do something for real people with real data and real questions. The existence of these sites, and the fact that money is being spent on them, means that visualization is growing up, and in a few years will hopefully be part of mainstream technology the way other electronic media are today.
Both websites are small ventures, and the differences in approach can be tracked back easily to the people who run them. Swivel is clearly a business, with people working full-time on the site, venture capital, and a plan on how to eventually make money. It was started by Dmitry Dimov and Brian Mulloy, two physics graduates, and is backed by venture capital from Cnet founder Halsey Minor. They have a total of seven employees, and three advisors: Minor and a colleagues of his, plus a university professor at Berkeley who is working in databases and networks. Swivel was founded in 2005 (around the same time as YouTube), had a private beta for several months, and launched its public “preview” version of the site on December 6, 2006.
When I first heard of about Swivel, I immediately looked at their About page to see if I recognized any names. I didn’t, and I was quite disappointed. Visualization people are not exactly known for their business abilities, but I would have expected to see at least one familiar person to do some consulting for them. This is especially important in the case of Swivel, whose makers seem to be completely unaware of the work that is being done in information visualization.
Many Eyes can be seen as the answer from the visualization side. The site is run by IBM’s new Visual Communication Lab, which includes three young but prolific visualization researchers: Martin Wattenberg, Fernanda Viégas, and Frank van Ham. All three of them have made an impact in the visualization world, and are actively publishing at the relevant conferences. The lab has a total of five people, who are not working full-time on the site, though. The group’s history is not easy to track, but Wattenberg started at IBM in 2002, and Viégas and van Ham apparently joined him last year. Many Eyes was launched January 23, 2007.
Many Eyes exists in a much more academic setting than Swivel, with less resources and no discernible business plan. IBM can probably run the site for years without the need for profit, and they may even consider it more of a marketing effort and keep it like that. One important question for Many Eyes is therefore its longer-term sustainability, especially if it gets more popular and requires more resources to run.
Uploading a data set is quite similar in Swivel and Many Eyes, with Swivel offering more options in addition to pasting a tab-separated file (CSV upload and scraping off a website). Many Eyes is more interactive here, extracting the column headers and seemingly understanding what is going on. Swivel requires the user to input the data without too much help.
Once the data is uploaded the two sites start to differ. While Many Eyes offers a simple “visualize” button for the new dataset, Swivel creates a number of visualizations automatically. This mechanism is apparently central to Swivel’s strategy: impress with numbers. Swivel’s start page now boasts over one million graphs, and the number is growing quickly (around 15,000 per day). These charts are based on only around 2,400 data sets, which means that there is an average of 450 graphs per data set. Most of these are obviously useless, and the number of graphs is thus artificially inflated to an extent that is ridiculous. It does explain the description of Swivel as “YouTube for data” though, a slogan that was used for a short time when it was launched. But it also means that Swivel needs to be very careful if it wants to be taken seriously, since outrageous claims do not exactly build trust.
Many Eyes, in contrast, has fewer graphs than data sets, which seems much more plausible. There seem to be a lot of data sets that were uploaded as tests or needed to be redone, but which were never deleted. The signal-to-noise ratio is therefore much higher on Many Eyes.
Swivel’s business model is to eventually sell subscriptions to its professional edition to users who want to use its capabilities on their own data and compare that data to all the public data on the site – but who do not want to share their data with the entire Internet. The question of course is whether these users will trust Swivel with their valuable proprietary data, and see enough value in Swivel’s own tools to take that risk (cost probably won’t be an issue, at least not in comparison). After all, they can always download the public data from Swivel and use it with their own tools.
Both sites have interesting approaches to making data analysis more social. Both allow users not only to upload and visualize data, but also to download the data that exists on the sites. This is very useful, as it provides badly needed access to a huge variety of data.
Both sites also allow users to comment on graphs, though this is handled better in Many Eyes. Since you can interact with the visualization, you can include a snapshot of what you were looking at when you submitted the comment. That snapshot then becomes a link that will take another user to exactly the same configuration of the visualization, even including search terms (e.g., in a treemap).
There are also forums on Many Eyes, though they require a separate login name, and as of writing this they mostly contain spam. Well-integrated forums would do wonders for both of these sites, though they obviously require a lot of work. More structured/threaded comments would perhaps be a start, like the ones in Drupal.
Existing visualizations can be changed easily in Swivel. There is an Edit This button that allows you to change the style (colors, etc.) and graph types, and a Compare function that allows you to add other data. Both allow you to save the result as a new graph (or as the same one, if you made it yourself). I don’t really understand why these are separate functions, since adding new data might require changing some of the visualization styles to make sense of it. But these are the functions that make it easy to create new things from existing graphs and data, and thus really take advantage of what has been done before. Many Eyes is missing such a function, and only partially makes up for it through interaction and snapshots. Adding data to an existing visualization requires the user to completely regenerate it from scratch first. To throw out another buzzword, that is not very enabling.
Swivel also has tags and graphs can be rated by users. Both are kind of obvious features of Web 2.0 sites, and their lack (at least of the tagging) is a definite loss for Many Eyes. Tags provide easy and democratic means of navigation, and are basically a must for a highly interactive website. Ratings are not nearly as useful or necessary, but they provide the mechanisms for the ever popular “highest rated” lists.
The interactivity and live snapshot ideas of Many Eyes combined with the editing and comparison features of Swivel plus a well-integrated forum would make for one killer of a data analysis website.
Many Eyes’ use of Java is understandable from an academic perspective, and also from the fact that this is done by IBM (which has done a lot more for Java than Sun). But it has also been called a “brave decision” by a user on the forums, and for a good reason. As great as Java is for programming, its use in browsers is becoming a nightmare. And while having Java installed could be taken for granted a few years ago, this is not longer the case. Microsoft has done a lot of work towards that end, and users have gotten used to seeing interactive stuff in Flash, or using AJAX, rather than Java. The makers of Many Eyes also mean it when they say that on the Mac, the site is “best viewed in Safari”: the mouse coordinates are off by quite a bit in Firefox and Camino, which is odd (I haven’t seen that before).
But of course Java has one great advantage: interaction. The interactive features of Many Eyes are wonderful, and would be a major nightmare to code in AJAX or Flash. There is also a lot of existing code in Java that its makers are undoubtedly using, and so they were able to launch the site much faster. Another aspect that should not be ignored is server load. Having the client draw the images means a lot less server power is necessary to keep the site going.
Still, I think that Many Eyes will need to consider using either AJAX or Flash or be limited to a fairly small (and shrinking) potential audience because of its use of Java. And that would be a real shame.
Swivel uses an open-source graphing library called Ploticus, which produces static graphics and also has the ability to generate image maps for interaction. They seem to be very keen on not coding their own graphics, which strikes me as a strange limitation. 2D InfoVis is not exactly rocket science, and there is a lot of good literature out there on visualization techniques that are described well enough for a competent programmer to implement in very short time. Swivel acknowledges that Many Eyes has a larger variety of graphical capabilities (they call it “Tufte on Steroids”), which seems to indicate that they will want to stick with line and bar charts.
Many Eyes’ makers have a much broader horizon on visualization, and quite obviously do all the programming themselves. This leads not only to a much greater variety of visualization techniques and interaction (see below), but also to the prospect of an expanding set of visualizations, and the implementation of cutting-edge work.
Swivel offers a total of four visualization techniques: vertical and horizontal bar charts, scatterplots, and line charts. The data can be shown as absolute value, difference from the average (in the data set), and as a percentage of the largest value. The user can change the colors of the background and lines, which is a necessary feature to keep all the graphs from looking the same.
In contrast, Many Eyes offers more than a dozen visualization types, and that’s not even counting bar charts twice. There are zoomable maps of the world and the US (for showing states); standard line graphs, stacked line graphs, and line graphs for categories (which can be hierarchical); bar charts, block histograms, and bubble charts; scatterplots and network diagrams; pie charts, treemaps and change treemaps (which are really just a different mapping of data onto treemaps). Many Eyes’ classification of visualization options is simple but very useful, and a great guide to show users what to try out to answer specific questions.
All of Many Eyes’ visualizations are interactive, allowing the user to query exact numbers, zoom, etc. This is especially useful (and necessary) in the case of maps and treemaps. The ability to create live snapshots for comments is also an excellent feature that is worth mentioning again.
The resulting graphs look much nicer on Many Eyes, and are also easier to read. The legends in Swivel’s graphs are badly laid out (why not align them?), and the color spots next to the descriptions are too small too small to be clearly seen and matched with the colors of the lines.
In addition to the visualization, Swivel does some data analysis. It calculates a measure of correlation for all possible pairs of data dimensions present in the graph, and shows them as bar graphs. Like with the automatic creation of graphs from data, this feature creates a lot of unnecessary and often nonsensical information. If the correlations were displayed in a matrix rather than as individual bars (i.e., using some fairly straight-forward information visualization), the presentation would make more sense and use up a lot less space (for n data dimensions, they calculate correlations for all n*(n-1)/2 pairs, i.e., for seven curves this makes 21 correlations). Swivel warns the users that correlation does not equal causation, which is a good thing.
What Many Eyes is missing is mixing and matching (or “mashing up”) of parts of different data sets, not just one. The navigation for this is a bit clumsy in Swivel, but the ability to put the data into a different context and compare it to other data makes the site a lot more interesting. This feature is a real requirement if users want to make more of the data they have uploaded than getting it visualized.
Both Swivel and Many Eyes handle the comparison of numbers surprisingly badly. When combining more than one data dimension, Swivel converts all numbers to percentages, and puts them on the same scale. This does not make sense in many cases, and makes it impossible to read the real numbers when an interesting development is found. Many Eyes does not convert the numbers, but can only handle one scale, which is also rather limiting. This is not some weird and unusual use case, but what real people do with real data all the time – and can easily do in Excel (and there are even clever stacking techniques). Both Swivel and Many Eyes should be able to handle two axes, and Many Eyes’ interactive features should make it easy to use more than two.
What is also missing from both sites is the ability to mix different graph types. This is especially useful when showing not just absolute values, but also rates of change. There are good reasons to show the values as bar charts and the change as lines, for example. And who knows, perhaps I want to add some more data as points? This should be fairly easy to do for Many Eyes, but may be a limitation of the Ploticus library Swivel is using.
While both sites can handle whole numbers and floats, Swivel also understands dates and handles them well. Many Eyes does not seem to recognize dates, but can deal with hierarchical categories and graphs.
In terms of capabilities, Many Eyes wins hands down. Of course, Swivel’s focus is a slightly different one, but they still need to provide some more useful techniques and cannot limit themselves to not even include pie charts. Treemaps are also making inroads into the world of business intelligence, so users who are familiar with data analysis tools will expect to see more than just the bare minimum in visualization. And besides, there are no data analysis capabilities other than the charts and the simple correlation calculation on Swivel yet.
The only big thing Many Eyes is missing is a way to combine data from different data sets into one visualization. This is a big one, though, that they will need to address soon.
Both sites seamlessly fit into the whole social bookmarking/blogging world. They provide HTML snippets to include graphs in blogs and on other websites, and Swivel also has a Digg It! link on every graph page.
Swivel’s embedding function allows for a lot of customization, including a “Sparkline” version (which is just a scaled down version of the graph, and not very useful – true sparklines would be great, though). Interestingly (and slightly unnervingly), Swivel embedded images are “alive”, i.e., the settings are not included in the code you paste into your page, but the images react to changes you make on the Swivel page – even after they have been embedded.
Many Eyes has a similar function that creates a very nice and compact thumbnail, but does not allow you to embed a larger snapshot of the visualization (at least not directly). The link simply points to the visualization, and does not take parameters or interactions into account (like the comment thumbnail directly on the page does). This seems to be the only category where Many Eyes wins in terms of simplicity, and where Swivel has a lot more options than are really needed.
Swivel also has topical charts, like one for Valentine’s Day and one on taxes right now. These make for easy link targets for blogs, and undoubtedly create a lot of hits on the site. Swivel’s style is also more like that of established social activity sites like Digg and Flickr than Many Eyes. In a word, Swivel is simply prettier, and looks more modern.
I had not expected to be writing about ethics in this review, but Swivel is doing a few things that deserve to be discussed in a bit more depth.
Among the featured graphs on Swivel’s page right now is the one below. It seems to show the number of people speaking a specific language at home, and their ability to speak English (in American households), and there is a strong correlation between the two. But while the blue bars show the percentage of people who think they speak English “very well”, the red bars simply show the rest. The strong correlation between these two numbers is therefore not very surprising, since there is a simple mathematical dependency between them. This is simply chart junk, and while Swivel can’t keep their users from producing meaningless charts, they are very unwise to promote this kind of nonsense on their front page.
The second issue I have are their inflated numbers. Their claim of over a million graphs may be technically correct, but only a tiny fraction of these were generated by users, and most of them make no sense at all. I understand that they are trying to create a buzz, and that they are fighting for media attention among such established hype targets as MySpace and YouTube. But doing this with such questionable means cannot be in their best interest, and will hurt them in the long run.
To be fair, Swivel also scores some positive points in the ethics department. They provide the means to include a data citation and URL where the data came from directly into the graph, thus adding both weight (if the data comes from a reputable source, and not just somebody’s imagination) and credit to the original source(s). Many Eyes only shows that information on the graph’s website, but it’s missing when somebody posts a screenshot. Swivel also warns its users that correlation and causation are not the same thing.
Swivel needs to tone down the hype and grow up. They are not YouTube, and data visualization isn’t nearly as sexy and mass marketable as video sharing. They can only live on the Digg crowd for so long, at some point they will need to appeal to more sophisticated customers that are actually willing to spend money. Those will see through their marketing buzz and will demand more capabilities. Playing with data is great, but it will only take them so far.
Many Eyes needs to rethink Java. They undoubtedly have good reasons to use it, and in a perfect world they would be right. In this world, however, Java is more a stumbling block than an enabling technology on the browser end today, and this will severely limit their reach. One of their challenges will be to provide cutting-edge visualization tools in a way that people can actually use. Easier editing and better interoperability between their already large number of visualizations should make the service much more attractive. They should also present an idea where they are trying to go with this, if the site will exist for a longer time, or just disappear once IBM stops believing that anything can come of it.
So yes, there is a lot to criticize, and I am not shy about doing that. Both websites are a great start, but have a long way to go. It is great to see that social websites are not just used for sharing videos of sleeping kittens any more, but for collaboratively making sense of our world.