Robust Statistics for Data Scientists Part 1: Resilient Measures of Central Tendency and Dispersions | by Alessandro Tomassini | Jan, 2024

Editor
2 Min Read


Building a foundation: understanding and applying robust measures in data analysis

Image generate with DALL-E

The role of statistics in Data Science is central, bridging raw data to actionable insights. However, not all statistical methods are created equal, especially when faced with the harsh realities of (messy) real-world data. This brings us to the purpose of robust statistics, a subfield designed to withstand the anomalies of data that often throw traditional statistical methods off course.

While classical statistics have served us well, their susceptibility to outliers and extreme values can lead to misleading conclusions. Enter robust statistics, which aims to provide more reliable results under a wider variety of conditions. This approach is not about discarding outliers without consideration but about developing methods that are less sensitive to them.

Robust statistics is grounded in the principle of resilience. It’s about constructing statistical methods that remain unaffected, or minimally affected, by small deviations from assumptions that traditional methods hold dear. This resilience is crucial in real-world data analysis, where perfectly distributed datasets are the exception, not the norm.

Key concepts in robust statistics are outliers, leverage points, and breakdown points.

Outliers and Legerave Points

Outliers are data points that significantly deviate from the other observations in the dataset. Leverage points, particularly in the context of regression analysis, are outliers in the independent variable space that can excessively influence the fit of the model. In both cases, their presence can distort the results of classical statistical analyses.

For instance, let’s consider a dataset where we measure the effect of hours on exam scores. An outlier might be a student who studied very little but scored exceptionally high, while a leverage point could be a student who studied an unusually high number of hours compared to peers.

Share this Article
Please enter CoinGecko Free Api Key to get this plugin works.