“Statistics are like bikinis – what they reveal is interesting. What they hide is vital” – Aaron Levenstein
This is one of my favorite quotes and whether through malice, poor training or simple ignorance, “bad statistics” is running rife.
Not a day goes by where my social media feed is not flooded with data-driven memes, news broadcasters that quote mesmerizing infographics and scientists with grand philosophic propositions, all of them using math to offer surprising conclusions. Yet when I delve into it and look at the statistics a little more closely, I find that it is rare that these stand up to statistical scrutiny.
Andrew Lang famously said: “Politicians use statistics in the same way that a drunk uses lamp-posts — for support rather than illumination”. Much like pictures and film clips, statistics are widely used and accepted stimuli to elicit emotions. (Yes. Again it is all about feelings). The misuse of statistics by governments is an ancient art but this practice is not restricted to statesmen. I’ve learned that very clever propagandists and revered scientists – right or left – can almost always find a way to present the data that seems to support his case.
Even the best of scientists and statisticians struggle to embrace the GrowthMindset and are bound to make mistakes – over and over again. Here are the greatest hits of research and statistical mistakes:
Even the all-mighty scientists are human and primarily driven by their emotions. A dash of Descartes Error, mix in some subconscious biases and we have a very hard time sorting truth from viral nonsense.The quicker we accept this the better. In research, bias occurs when “systematic errors are introduced into sampling or testing by selecting or encouraging one outcome or answer over others”. (Here is a chart of brain busting cognitive biases)
It is impossible to survey every single personal on the planet to get their opinion on the subject. Therefore researchers opt to select a sample of folks that ‘represent’ the overall population. Imagine a market survey about breakfast cereal consumption. Who should be surveyed? Mom, Dad, kids? Dad might believe that he makes all the decisions, and mom actually buys the food but in most houses – the kids rule supremely.
A classic frame error occurred in the 1936 presidential election between Roosevelt and Landon. The sample frame was from car registrations and telephone directories. In 1936, many Americans did not own cars or telephones and those who did were largely Republicans. The results wrongly predicted a Republican victory. (Strange how history repeats itself).
Let me ask you – Have you stopped beating your wife? This is a straight forward question and there are only two answers – Yes or No. This question presupposes that you have beaten your wife prior to its asking, as well as that you have a wife. If you are unmarried or have never beaten your wife, then the question is loaded. A “loaded question”, like a loaded gun, is a dangerous thing. Questions should never be worded in a way that’ll sway the respondent to one side of the argument and doesn’t accurately reflect their opinion or situation.
Extrapolation and Over Generalisation
Statisticians get all hot and bothered when they spot a trend. For example: if 100% of apples are observed to be red in summer. The assertion “All apples are red” would be an instance of overgeneralization Statistically speaking all the apples are indeed red, however, Granny Smith would probably disagree that her apples are red as well. A real-world example of the overgeneralization is the fallacy modern polling techniques – my US brethren should be all too familiar with this one.
Cherry Picking (Discarding unfavorable data or worse “fudging the data”)
Whether committed intentionally or unintentionally, this fallacy is a major problem in public debate. The is the result of confirmation bias – a tendency very deeply ingrained in the human species – to seek out confirmation of one’s beliefs and values. A telltale sign that you’re being cherried is the words “this proves that…”. Cherry Picking also fits nicely with the next point.
Correlation does not equal Causation
If the number of people buying ice cream at the beach is statistically related to the number of people who drown at the beach, then proclaiming that the love for Häagen-Dazs causes drowning is ludicrous. The first thing that you learn in Statistics class is that correlation does not equal causation. Unfortunately, not all correlation/causation mistakes are as obvious as my love for ice cream. How Stuff Works (article link listed below) lists 10 famous correlations over the last century and no correlation vs causation conversation will ever be complete without discussing the Vaccination Debate.
However, not all “bad statistics” are drenched in malicious intent. Here are a few ways that they are spawned to life:
- The source is a subject matter expert but not a statistics expert – Results are misinterpreted.
- The source is a statistician, not a subject matter expert – Numbers change, as reality does not and CONTEXT MATTERS!
- The subject being studied is not well defined – Shitty samples that do not represent the population.
- Data quality is poor – Garbage in, garbage out.
- Mixed or contradicting motives – If the facts are not “newsworthy” (which may require exaggeration) they may not be published. The motives of advertisers are even more mixed.
All of this being said, statistics is a useful tool for understanding the patterns in the world around us. But our intuition often lets us down when it comes to interpreting those patterns. I honestly do not believe this is intentional and Hanlon’s razor reminds us that we should not “assume bad intentions over neglect and misunderstanding.”
In the ultimate war to save humanity and combatting misunderstanding, no mental Kung Fu arsenal should be without a skeptic statistician reference guide.
A FIELD GUIDE TO HINKY STATISTICS
GUIDELINE 1 – Do a little bit of Math:
Some mistakes are glaringly obvious and thanks to Fox News here is a fantastic example.
GUIDELINE 2 – 83% of Statistics are made up on the spot.
I am not even joking.
GUIDELINE 3 – Averages are quite… well average.
This is so stupidly obvious when you think of it, but a trap we all fall for. For instance, you can see one study showing that for every 100 Americans, there are 88 guns, which could lead someone to reasonably assume that it’s hard to find an American who isn’t packing heat. When you hear “on average”, walk away.
GUIDELINE 4 – Be wary of those who come bearing pretty pictures
A picture is worth a thousand words and graphs can be especially evil so here is what to look for:
- There’s no label on the Y axis
- The scales are all wrong
- It lacks context.
These data points (such as they are) only suggest WHAT happens, not WHY it happens.
GUIDELINE 5 – Look for the words “This Proves”… ask yourself – who is doing your thinking for you?
If the researcher, media and I dare say neuroscientists make grand statements and backs it up with statistics – ask them to SHOW YOU THE DATA! How was the data gathered? When was the data collected? Who collected the data? Who was asked? Who paid for the research?…These are great follow-up questions to test your hinky meter.
GUIDELINE 6 – Be a skeptic.
And most importantly: Follow the money.
Samuel Clemens (aka Mark Twain) famously said: “Figures don’t lie, but liars figure. Always remember – as humans, our behavior is influenced by the way we are incentivised.
Learn, Unlearn and Relearn? Here are some additional resources: