Understanding Junctions (Chains, Forks, and Colliders) and the Role they Play in Causal Inference | by Graham Harrison

Explaining junctions using correlation, independence and regression to understand their critical importance in causal inference

Photo by Ricardo Gomez Angel on Unsplash

Causal inference is the application of probability, visualisation, and machine learning in understanding the answer to the question “why?”

It is a relatively new field of data science and offers the potential to extend the benefits of predictive algorithms which address the symptoms of an underlying business problem to permanently curing the business problem by establishing cause and effect.

Typically causal inference will start with a dataset (like any other branch of data science) and then augment the data with a visual representation of the causes and effects enshrined in the relationships between the data items. A common form of this visualisation is the Directed Acyclic Graph or DAG.

DAGs look deceptively simple but they hide a lot of complexity which must be fully understood to maximise the application of causal inference techniques.

Even the most complex DAGs can be broken down into a collection of junctions which can only be one of 3 patterns — a chain, a fork, or a collider — and once those patterns are explained the more complex techniques can be built up, understood and applied.

This article will take the time to fully explain and understand the 3 patterns of junction setting the foundations for the reader to understand the detail of complex causal inference techniques.

We are going to need an example DAG to explore and explain. I have constructed the fictitious DAG below because it is sufficiently simple to effectively explore the concepts and sufficiently complex to contain all 3 types of junctions …