Over the last several years, data quality and observability have become hot topics. There is a huge array of solutions in the space (in no particular order, and certainly not exhaustive):
Regardless of their specific features, all of these tools have a similar goal: improve visibility of data quality issues, reduce the number of data incidents, and increase trust. Despite a lower barrier to entry, however, data quality programs remain difficult to implement successfully. I believe that there are three low-hanging fruit that can improve your outcomes. Let’s dive in!
Hint 1: Focus on process failures, not bad records (when you can)
For engineering-minded folks, it can be hard pill to swallow that some number of “bad” records will not only flow into your system but through your system, and that may be OK! Consider the following:
- Will the bad records flush out when corrected in the source…