The golden standard of randomized experiments

 

 

Image by the author.

Being able to establish causality is powerful. It gives you the right to use the word “because” in a conversation. Our sales increased because we have changed the website layout. The crime rate dropped because of the new preventive policy that has been introduced. Pinpointing causal relations correctly is crucial for data-driven decision making, both in business, to optimize a company’s operations, as well as in government, to make sure our tax money is spent in the most efficient way and the policies work effectively. In this series of articles, I discuss four statistical tools that provide scientific grounds to say “because”.

Only establishing causality in a rigorous fashion gives you the right to use the word “because”.

The four methods for causality estimation we will look at are:

  • Randomized experiments.
  • Instrumental variables [coming soon]
  • Regression discontinuity [coming soon]
  • Difference-in-differences [coming soon]

This first part of the series focuses on the golden standard in science: randomized experiments.

Correlation does not imply causation

You might have heard this before. The fact that two things co-occur together does not mean that one of them is causing the other. Just look at this infamous, near-perfect correlation between the number of people who drowned in a pool and the number of Nicolas Cage movies released.

Source: https://www.tylervigen.com/spurious-correlations (CC BY 4.0)

We feel in our guts that this is a spurious correlation. Any causal relation between these two variables seems absurd. But this consideration aside, how would you say what is causing what? Are Nicolas Cage movies so bad, that people drown themselves after watching them? Can the drownings make Cage release more films in some way? Or maybe there exists an external factor causing both: poor economy could make some people drown themselves out of despair, and force Cage to star in more movies to boost his income.

To answer causality-related questions, we need rigorous statistical tools. The simplest and the most desirable one to use is randomized experiments.

Potential Outcome Model

Before jumping to randomized experiments, let me first introduce the framework that is used to analyze causality called the Potential Outcome Model. Most of its vocabulary comes from medical research. Actually, causality estimation is also known as treatment evaluation. This is because the early applications of the…

Continue reading: https://towardsdatascience.com/establishing-causality-part-1-49cb9230884c?source=rss—-7f60cf5620c9—4

Source: towardsdatascience.com