If you desire a correct and reliable machine learning model with good performance, then testing is one of the essential practices to conduct, and if you are determined to learn about testing, you are at the right place. In this article, we explain the importance of testing through a practical example where we apply testing across the different steps of machine learning workflow. The entire codebase for this article is accessible in the associated repository.
Table of contents:
· 1. Introduction
· 2. Project setting
· 3. Code Testing
∘ 3.1. Unit testing
∘ 3.2. Integration testing
· 4. Data Testing
∘ 4.1. Data validation
∘ 4.2. Policy compliant
∘ 4.3. Features importance
· 5. Model Testing
· 6. Conclusion
Testing is defined as the process of evaluating an application system, code, or machine learning model to ensure its correctness, reliability, and performance. In MLOps, testing is one of the main principles that I consider it the second one to consider after version controlling when starting your machine learning projects. As version controlling and all the MLops principles, to ensure that we harness all the benefits, testing should be applied across the different steps of machine learning workflow, including data, Machine Learning model (ML model), and code.
Why testing? Testing your code, data, and models improves versioning by ensuring that the code changes is functioning correctly, automation by adding it to the automation pipeline, monitoring by detecting potential issues, reproducibility by ensuring that models can be reproduced consistently over time.
When to perform testing? Testing is a continuous process performed at various stages of the project life cycle: unit testing is performed during…