Causality, the field focused on understanding the relationships between cause and effect, seeks to answer critical questions such as ‘Why?’ and ‘What if?’. Understanding the concept of causality is crucial from fighting climate change, to our quest for happiness, including strategic decisions making.
Examples of major questions requiring causal inference:
- What impact might banning fuel cars have on pollution?
- What are the causes behind the spread of certain health issues?
- Could reducing screen time lead to increased happiness?
- What is the Return On Investment of our ad campaign?
In what follows I will essentially refer to two free e-books available with Python code and data to play with. The first e-book offers quick overviews, while the second allows for a more in-depth exploration of the content.
- Causal Inference for the Brave and True by Matheus Facure
2. Causal Inference: The Mixtape by Scott Cuningham
1.1 The fundamental problem of causal inference
Let’s dive into the most fundamental concept necessary to understand causal inference through a situation we might all be familiar with.
Imagine that you have been working on your computer all day long, a deadline is approaching, and you start to feel a headache coming on. You still have a few hours of work ahead, so you decide to take a pill. After a while, your headache is gone.
But then, you start questioning: Was it really the pill that made the difference? Or was it because you drank tea or took a break? The fascinating but eventually also frustrating part is that it is impossible to answer this question as all those effects are confounded.
The only way to know for certain if it was the pill that cured your headache would be to have two parallel worlds.
In one of the two worlds you take the pill, and in the other, you don’t, or you take a placebo ideally. You can only prove the pill’s causal effect if you feel better in the world where you took the pill, as the pill is the only difference between the two worlds.
Unfortunately, we do not have access to parallel worlds to experiment with and assess causality. Hence, many factors occur simultaneously and are confounded (e.g., taking a pill for a headache, drinking tea, and taking a break; increasing ad spending during peak sales seasons; assigning more police officers to areas with higher crime rates, etc.).
To quickly grasp this fundamental concept in more depth without requiring any additional technical knowledge, you can dive into the following article on Towards Data Science:
📚 Resource:
The Science and Art of Causality (part 1)
1.2 A little bit of formalization: Potential Outcomes
Now that you understand the basic idea, it is time to go further and theoretically formalize these concepts. The most common approach is the potential outcomes framework, which allows for the clear articulation of model assumptions. These are essential for specifying the problems and identifying the solutions.
The central notation used in this model are:
- Yᵢ(0) represents the potential outcome of individual i without the treatment.
- Yᵢ(1) represents the potential outcome of individual i with the treatment.
Note that various notations are used. The reference to the treatment (1 or 0) may appear in parentheses (as used above), in superscript, or subscript. The letter “Y” refers to the outcome of interest, such as a binary variable that takes the value one if a headache is present and zero otherwise. The subscript “i” refers to the observed entity (e.g., a person, a lab rat, a city, etc.). Finally, the term ‘treatment’ refers to the ‘cause’ you are interested in (e.g., a pill, an advertisement, a policy, etc.).
Using this notation, we can refer to the fundamental problem of causal inference by stating that it is impossible to observe both Yᵢ(0) and Yᵢ(1) simultaneously. In other words, you never observe the outcome for the same individual with and without the treatment at the same time.
While we cannot identify the individual effect Yᵢ(1)-Yᵢ(0), we can measure the Average Treatment Effect (ATE): E[Yᵢ(1)-Yᵢ(0)]. However, this Average Treatment Effect is biased if you have systematic differences between the two groups other than the treatment.
To go beyond this short introduction you can refer to the two following chapters:
📚Resources:
1.3 Visual representation of causal links: Directed (Acyclic) Graphs
Visual representations are powerful tools for reducing mental load, clarifying assumptions, and facilitating communication. In causal inference, we use directed graphs. As the name suggests, these graphs depict various elements (e.g., headache, pill, tea) as nodes, connected by unidirectional arrows that illustrate the direction of causal relationships. (Note: I deliberately omit mentioning the common assumption of ‘Acyclicity’ associated with these graphs, as it goes beyond the scope of this overview but is discussed in the second reference available at the end of this subsection.)
Causal inference primarly differs from predictive inference due to the assumed underlying causal relationships. Those relationships are explicitly represented using this special kind of graph called Directed (Acyclic) Graphs. This tool jointly with the potential outcome framework is at the core of causal inference and will allow thinking clearly about the potential problems and, consequently, solutions for assessing causality