When dealing with large amounts of data, we often use summary statistics like average, median, standard deviation, sum, etc. They’re useful because they actually hide data, they reduce the amount of information we need to look at to give us a sense of the data. But the same averages and can describe datasets that look vastly different.
Things I cover in the video:
- Anscombe’s Quartet: https://eagereyes.org/criticism/anscombes-quartet
- Alberto Cairo’s DataSaurus: http://www.thefunctionalart.com/2016/08/download-datasaurus-never-trust-summary.html
- Justin Matejka and George Fitzmaurice’s awesome website about their 2016 CHI paper: https://www.autodeskresearch.com/publications/samestats
- The Jobless Rate for People Like You (requires Flash): https://archive.nytimes.com/www.nytimes.com/interactive/2009/11/06/business/economy/unemployment-lines.html
Let me know what you think! This is an experiment, and I want to know what people think works and what doesn’t. Please leave a comment below or on YouTube. And if you found it interesting, please consider subscribing on YouTube and giving it a 👍.