Setting Up Automated Model Training Workflows with AWS S3 | by Khuyen Tran | Mar, 2024

Editor
2 Min Read


The Open-Source Approach for Workflow Automation

Consider you’re an e-commerce platform aiming to enhance recommendation personalization. Your data resides in S3.

To refine recommendations, you plan to retrain recommendation models using fresh customer interaction data whenever a new file is added to S3. But how exactly do you approach this task?

Unless otherwise noted, all images are by the author

Two common solutions to this problem are:

  1. AWS Lambda: A serverless compute service by AWS, allowing code execution in response to events without managing servers.
  2. Open-source orchestrators: Tools automating, scheduling, and monitoring workflows and tasks, usually self-hosted.

Using an open-source orchestrator offers advantages over AWS Lambda:

  • Cost-Effectiveness: Running long tasks on AWS Lambda can be costly. Open-source orchestrators let you use your infrastructure, potentially saving costs.
  • Faster Iteration: Developing and testing workflows locally speeds up the process, making it easier to debug and refine.
  • Environment Control: Full control over the execution environment allows you to customize your development tools and IDEs to match your preferences.

While you could solve this problem in Apache Airflow, it would require complex infrastructure and deployment setup. Thus, we’ll use Kestra, which offers an intuitive UI and can be launched in a single Docker command.

Feel free to play and fork the source code of this article here:

This workflow consists of two main components: Python scripts and orchestration.

Share this Article
Please enter CoinGecko Free Api Key to get this plugin works.