Venn Diagrams

Venn diagrams are a great way to visualize the structure of set relationships. They’re also an example of a technique that works very well for a particular purpose, but that entirely fails outside its well-defined scope or when the number of sets gets too large.

The idea of the Venn diagram is simple: sets are shown as regions, typically circles. The inside of the circle represents elements of a particular set, the outside anything that is not in that set. A set might contain all dogs: anything inside the circle is a dog, anything outside is not a dog.

It gets more interesting when more sets are involved. The typical schoolbook example is of two sets and their potential interactions. Let’s say the left set in these images contains dogs, the right one black animals.

The left image shows set intersection: all A that are also B, i.e., all dogs that are also black. The right image shows set union: all things that are in at least one of the sets, i.e., all dogs and all black animals (including black dogs). Even without being familiar with set theory, it’s still easy to understand where the criteria overlap and where they don’t.

Slightly more complex relationships are set difference and set complement. The left image shows A subtracted from B, i.e., black animals that are not dogs. The right image includes all elements that are in either A or B (but not both), i.e., dogs or black animals, but not black dogs.

There are more set operations, and they are all easily explained using Venn diagrams. I imagine that many people think of Venn diagrams when they think of sets. That is not a bad thing as long as the limitations of the technique are understood. Many typical set problems are simple enough to be solved using Venn diagrams.

Limitation: Number of Sets

While Venn diagrams are great for two or even three sets, they very quickly break down when the number of sets goes beyond three. It’s not like people haven’t tried, though, with results ranging from pointless to downright silly.

Four sets are doable, though they show the challenge as more sets are added. The shapes of the intersections are very different, and it becomes easier to miss configurations. The simplicity and regular layout that made the two- and three-set diagram useful is nowhere to be found.

The image below shows a version of the Venn diagram for six sets. Not only are most people unable to think in terms of all the 64 possible combinations of six sets, the diagram does not provide much help.

If it’s not possible in 2D, then maybe in three dimensions? This image is supposed to show some of the possible intersections of four sets. While it’s nice to look at, it should be obvious that it is futile to figure out which sets are included and which ones are not.

All visualization techniques break down at some point. In most cases, it is fairly obvious when it happens, but there is no hard number that clearly defines that point. There are also many criteria like screen resolution, etc., that have an impact. But in the case of Venn diagrams, that point is very clearly defined: two or three sets work perfectly well, anything above three sets is pointless.

Limitation: Sizes of Sets

Another piece of information Venn diagrams do not convey is the size of a set. While it is possible to imagine doing that, it typically does not work without serious distortions of the diagram. If the shape has to be altered significantly to correctly represent size, it is likely that different parts of the diagram will be very different shapes, thus being tough to compare. The Venn diagram simply isn’t able to perform this function in a reasonable way.

In the medical and bioinformatics literature, Venn diagrams are a popular way of showing different study conditions, sometimes with the intention of directly reflecting set sizes, sometimes with annotations. Rather than insist on Venn diagrams, it would be a better idea to use better alternatives, like I have shown in the past.

Conclusions: Venn to Use, Venn Not to Use

Venn diagrams have their uses. They’re great for teaching basic set theory and they can help illustrate combinations of criteria, as long as there are no more than three. But it is equally important to be aware of the limitations, and to know when to look for alternatives.


All images from the Wikipedia page on Venn diagrams.

Comments

  1. Jim Vallandingham says

    Thanks for creating this nice review of an important topic.

    I’ve seen the 4-set version more than a few times, and I’ve always been struck by how unhelpful it is. Glad to see my opinion matches yours.

    An important lesson in the fact that just because something can be done, doesn’t mean it should be done.

  2. derek says

    Sometimes you can extend it to a fourth set, provided that the data you have completely exclude some relationships. When that happens, the four set diagram shows that fact stunningly well, but you have to be on the lookout for the data that give you the opportunity to use the diagram.

    Like the calculus I was taught at school, information visualisation is often a matter of being able to recognise problems as looking like other problems you’ve encountered before and know the solution to.

  3. Jon Peltier says

    The Venn diagram for six sets “does not provide much help.”

    Sure is pretty, though.

    Like so many other techniques, Venn diagrams work well within a narrow realm, and poorly outside, where they are used most often.

  4. derek says

    I see that with my naive talk of four or more sets being feasible, provided some of the combinations are empty sets, I’m describing Euler diagrams, which I hadn’t heard of before :-)

  5. T J Bate says

    Visokio Omniscope has an interactive Venn View that will go up to 5 subsets plus ‘outside’ records. Several innovative interactive business solutions have been implemented with this Venn View at the heart of the user filtering/query interaction. Omniscope is free to try:
    http://www.visokio.com/gala

  6. Matt says

    I created a Venn Diagram at work to show how we’re cleaning data. It was a single large circle, representing the population of data, then proportionately smaller circles representing the % of records needing cleaning, which then had a classic Venn Diagram inside it which showed how we were cleaning those records.

    From my perspective it was something very simple, but people love it because it’s familiar and easy-to-read. I love using Venn Diagrams because of those reasons, even when you’re dealing with a single set of data.

  7. Raphael says

    One thing I wonder about Venn (and related) diagrams:

    How effective are we at judging areas of complex shapes? Do they really help us evaluate the relative size of each of their components?

    I’m not entirely convinced they are better than a table of numbers, or a simple proportional area plot.

Leave a Reply