Deal with Missingness Like a Pro: Multivariate and Iterative Imputation Algorithms | by Gizem Kaya | Dec, 2024

Editor
2 Min Read


Using LightGBM, kNN and AutoEncoders for imputation and improving them further via iterative method MICE

Real-world data is mostly messy and requires careful preprocessing before using in any machine learning (ML) model. We almost always face the null values in our datasets, which could have been highly valuable for our analysis or modelling if observed. We refer to it as the missingness in the data.

There can be various reasons behind the missingness, such as the malfunction of a device, a non-mandatory field in the ERP system, or a non-applicable question in a survey for the participants. Depending on the reason, the nature of the missingness also varies. How we can understand this nature is explained in detail in my previous article. In this article, the focus is mostly on how to handle this missingness properly without causing bias or loss of critical insights by deletion or imputation.

Red Wine Quality data by UCI Machine Learning Repository is used in this article [1]. It is an open source dataset which is available and can be downloaded through this link.

It is essential to understand the nature of the missingness (MCAR, MAR, MNAR) to decide on the correct handling methodology. Therefore, if you think you need more information on that, I suggest you to initially read my previous article.

Share this Article
Please enter CoinGecko Free Api Key to get this plugin works.