OPINION
Much of contemporary data science answers the question “What’s going on?” At my firm, for example, we often try to spot how well a company is performing, and how one performance indicator is tied to another through correlations.
A more powerful question worth answering would be “Why is this happening?” For example, if we detect a significant correlation between the presence of women in management and a company’s revenues, what is cause and what is effect here? Or, if people undergo a training program, will this cause their performance to improve? Or would better-performing people want to undergo a training program, and hence we only see an effect due to selection bias?
Several approaches exist to pinpoint causal relationships in data science. Propensity-Score Matching (PSM) is one of the older ones, having emerged around 40 years ago. Other methods like Structural Equation Modeling arose at the same time. Approaches like Instrumental Variables arose several decades before. Causal statistics is still a very active field, with many new methods being developed.
A key advantage of PSM is that it allows researchers to work with real-world data. In particular, it…