Time is the most well-defined continuum in physics and, hence, in nature. It should be of no surprise, then, the importance of continuity in time series datasets — a chronological sequence of observations.
This concept alone drives the motivation behind this article. Real-world datasets are susceptible to missing values for various reasons, such as faulty sensors, failures in data ingestion, or simply the absence of information during a given time. That, however, doesn’t change the underlying nature of the data-generating process of your features.
Understanding what caused those interruptions and analyzing and handling them in a time series dataset is, therefore, paramount to any subsequent task.
Table of Contents
The Goal of this Article
After a comprehensive exploratory analysis of your time series, you might find that missing values are present to a considerable extent. By seeking an understanding of the nature of your data, you should be able to differentiate a gap that represents missingness from a gap that entails an actual interruption, characterizing it as an intermittent series.
This article will focus on the first scenario — analysis of missing values and methods to evaluate imputation results. While the actual techniques to perform imputation are many [1][2], I will elaborate on the…