By Brittany Davis, Head of Data at Narrator.ai
“Correlation does not equal causation”
I don’t like this phrase. Not because it’s not accurate (of course it’s accurate). I don’t like how it’s used to disarm analysts. This one simple phrase can bring any analysis to a screeching halt as soon as the words are uttered.
When a stakeholder says, “Yes… but correlation does not equal causation,” that’s code for “Your insights aren’t good enough for me to act on.” aka a showstopper.
This popular phrase has led decision makers to believe that they need a causal insight in order to make a decision with data. Yes, in a perfect world, we’d only act on causal insights. But in practice, this requirement isn’t reasonable. More often than not, when stakeholders require “causality” to make a decision, it takes way too long so they lose patience and end up making a decision without any data at all.
Consider A/B testing, for example. This the most common way teams are tackling the requirement of causality today. But an A/B is surprisingly difficult to execute correctly – as shown by the countless statisticians waving their hands trying to get us to acknowledge this fact (like this and this). The sad reality is that A/B tests require a lot of data, flawless engineering implementation, and a high level of statistical rigor to do it right… so we end up releasing new features without valid results.
This happens all the time! But data teams are not doing themselves any favors by going through the motions of a half-baked attempt at proving causality, just to make a gut-based decision at the end of the day. We need to change the approach.
The downside to causality
The reality is that causality is very difficult to prove. Not only does it require a higher level of statistical rigor, it also requires A LOT of carefully collected data. Meaning you will have to wait a long time before you can make any causal claim. This is true for other causal inference methods too, not just A/B testing.
Ultimately, causality is an impractical requirement when making decisions with data. So let’s stop trying and find another way. Let’s go back to using correlations.
I’m not suggesting a free-for-all. We don’t want to end up with ridiculous insights that are “technically correlated” but have no reasonable explanation, like these. I’m talking about using correlations in a business context to maximize our chances of making the “best”…
Continue reading: https://www.kdnuggets.com/2021/08/correlation-better-causation.html