Plan–Code–Execute: Designing Agents That Create Their Own Tools

Contents

The Scenario Explainability Agent Architecture 1. The Analyst Agent 2. The Planner 3. The Coder 4. The Executor 5. The Reporter Experiment Results Spatial Analysis Analysis of Sales Order (Demand) signal Analysis of Production Signal End-to-End Supply Planner’s report Conclusion Reference

today focus on how multiple agents coordinate while selecting tools from a predefined toolbox. While effective, this design quietly assumes that the tools required for a task are known in advance.

Let’s challenge that assumption in this article.

Modern LLMs can already decompose complex goals into multi-step plans, reason about execution order, and generate correct, executable code. In other words, they don’t just use tools — they can create them.

This article explores using a plan–code–execute agentic architecture to build an end-to-end Explainability Agent, where tools are generated on demand, illustrated through a real world scenario.

The architecture proposed here is easily adaptable to other scenarios as well.

The Scenario

We will be explaining the results of a Graph Neural Network (GNN) model for demand forecasting, that I have explored in my previous article. We have the following data signals to explain:

The Spatial features of the nodes (SKUs).
The Temporal features (Sales Order, Production etc) added to the graph nodes.
Results from a graph neural network model trained to forecast demand for a SKU.
Results from a GNN trained to detect anomalies in production.

All of these generated a series of analyses, results and charts which need to be explained holistically, from a planner’s perspective, that is grounded in facts and domain understanding.

Explainability Agent Architecture

We will create a plan-code-execute agentic workflow. To have granular control, interpretability and flexibility in tool generation, we will build fully custom agents rather than working through the abstractions of one of the well-known orchestration frameworks. Following agents are created:

1. The Analyst Agent

We need an analyst agent which would be the eyes of the application. Its job is to Discover the environment (Raw Data, Model Results, Previous Outputs) which the rest of the application will use. It prevents hallucinations such as the planner agent assuming the sales file is named “sales.csv” when the correct file name is “Sales Order.csv” in the dataset.

In addition to just scanning what is there, it also opens the files and creates a dynamic schema. For instance, Sales Order.csv has columns [Date, Amount] and demand_vs_production.csv has [Predicted, Actual]. This forms the context for the planning agent.

2. The Planner

The Planner receives the user’s high-level goal and the Analyst’s schema. It functions as the Brain of the application. Its job is to decompose the request into a dependency graph of steps. To do that, it first infers intent from the user request. If the user asks for “Accuracy,” it knows to target the Forecast Results folder. If they ask for “Sales history,” it targets Raw Data. The output is a JSON plan which has steps clearly describing the task and also in addition a flag called CODE or TEXT. Tools are generated by the Coder agent only for the CODE steps.

Sample output: A JSON plan.

Step 1: Load Graph data and find neighbors of SOS001L12P. (CODE)
Step 2: Aggregate temporal sales vs production. (CODE)
Step 3: Calculate WAPE. (CODE)
Step 4: Prepare summary (TEXT)

3. The Coder

The Coder is the hands of the application, which does the bulk of the work. The Coder receives each step from the plan and the schema. For every plan CODE step, it writes a standalone, executable Python script.

It handles library imports (pandas, networkx), absolute paths, and error catching. It creates self sufficient software, not just code snippets.

4. The Executor

The Executor runs the generated script in a controlled environment. In order to do that, it captures stdout and stderr. If a script fails (e.g., FileNotFound), the logs provide the feedback loop for potential self-correction (though due to the grounding provided by the Analyst’s schema this does not happen).

5. The Reporter

Finally, the Reporter reads the logs and artifacts (CSVs, PNGs) produced by the scripts and answers the user’s original question, in the tone, depth and persona required by the user. It functions as the Voice of the application.

The agents are called in sequence using a simple orchestration script (main.py). A config.py contains the path of the dataset folder containing the nodes and edges information, and to the results folders of the demand forecasting and production anomaly detection models. Each of the agents and the orchestration script is less than 100 lines of code. The LLM used is gemini-2.5-flash.

The main errors encountered during testing was the coder agent referring a data source that is not present (hallucination), for which I strengthened the coder prompt to strictly follow the plan step context. Also, the analyst agent was initially not there, and was added as a grounding step before the planner to prevent hallucination.

Here is the planner agent code for reference

import json
from ..llm.client import GeminiClient

class Planner:
    def __init__(self):
        self.llm = GeminiClient()

    def create_plan(self, user_prompt, dataset_context=""):
        """Decomposes user prompt into execution steps."""
        system_prompt = f"""
        You are a Senior Project Manager for a Data Science team.
        Your goal is to break down a high-level user request into a list of specific, executable steps.
        
        --- DATASET AVAILABLE (Use EXACT Filenames) ---
        Z-score
        -----------------------------------------------
        
        --- DATASET GUIDELINES ---
        - The dataset contains both 'Raw Data' (Historical) and 'Model Results' (Predictions/Anomalies).
        - CAREFULLY infer which source is needed based on the user's goal. 
        - Example: "Sales Analysis" -> Raw Data. "Forecast Accuracy" -> Model Results.
        
        Output format: JSON list of objects with fields:
        - step_id: int
        - name: str (short title)
        - description: str (detailed instruction for the coder)
        - type: "CODE" (if it requires writing a script) or "TEXT" (if just a summary)
        
        --- SCOPE RESTRICTION ---
        - DO NOT USE Machine Learning (LinearRegression, Training, Random Forest).
        - USE ONLY Statistical Analysis (Mean, Median, Std Dev, Z-Score, Rolling Averages).
        - FOCUS on Data Visualization (Line plots, Histograms).
        - If the user asks for "forecast" or "accuracy", PREFER WAPE (Weighted MAPE) over standard MAPE.
        
        Example:
        User: "Analyze SKU001"
        Output:
        [
            
        "step_id": 5,
        "name": "Detect Outliers using Z-Score",
        "description": "Calculate the Z-score for each data point in the 'SOS001L12P' production volume. Identify potential outliers by flagging data points where the absolute Z-score exceeds a predefined threshold (e.g., ,
            Z-score
        ]
        """
        
        full_prompt = f"
        "step_id": 2,
        "name": "Analyze and Visualize Production Trends and Anomalies",
        "description": "Create a time-series line plot showing the 'Actual_Production' for SOS001L12P, overlaid with 'SOS001L12P_7_day_avg' and 'SOS001L12P_30_day_avg'. Highlight points where 'Is_Anomaly' is True (from top_sku_SOS001L12P_predictions.csv) or where 'Z_Score' exceeds a certain threshold (e.g., \n\nUser Request: Z-score\nJSON Plan:"
        response = self.llm.generate_content(full_prompt)
        
        # basic cleanup for JSON markdown blocks
        clean_response = response.replace("```json", "").replace("```", "").strip()
        print(f"\n[DEBUG] Raw Planner Response:\n > 2 or 3) to visually confirm the user's observation of a downward trend and anomalies in the second half of the duration. Identify the start date of the 'second half' based on the data's time range.",
        "type": "CODE"
    \n") # Added for debugging
        try:
            return json.loads(clean_response)
        except json.JSONDecodeError:
            print(f"Failed to parse plan: {response}")
            return []

Experiment Results

We will focus on the end-to-end explainability for SKU SOS001L12P. This is the node with the maximum sales volume. Planners and management would be interested in understanding issues early in order to fulfill demand and maintain service quality.

We will build the explainability in 4 steps:

Analyse the shared relations of this SKU with its neighbors in the supply network.
Analyse the trend in sales volume.
Correlate with the production signal, note anomalies.
Create a comprehensive report combining with demand forecast quality to explain the health of this SKU.

Spatial Analysis

The goal set for the agent was

“perform a thorough analysis of the spatial structure around node SOS001L12P and give me a visualization of the graph
edges with labels connecting this SKU with its neighbors”

The planner agent created a 5-step plan for loading node and edge data, extracting neighbors and edges attributes, preparing a graph visualization with labels and creating a report for the spatial structure.

Here is the plan

[
    {
        "step_id": 1,
        "name": "Load Node and Edge Data",
        "description": "Load 'Nodes.csv' to confirm 'SOS001L12P' as a valid node. Load all edge datasets: 'Edges (Plant).csv', 'Edges (Product Group).csv', 'Edges (Product Sub-Group).csv', and 'Edges (Storage Location).csv'. These files contain the structural information needed for spatial analysis.",
        "type": "CODE"
    },
    {
        "step_id": 2,
        "name": "Extract Direct Neighbors and Edge Attributes",
        "description": "Iterate through each loaded edge dataset. For each dataset, filter rows where 'SOS001L12P' is present in either the 'node1' or 'node2' column. For each matching record, identify the neighboring node and extract the specific edge attribute that defines the relationship (e.g., 'Plant' from 'Edges (Plant).csv', 'GroupCode' from 'Edges (Product Group).csv', 'SubGroupCode' from 'Edges (Product Sub-Group).csv', 'Storage Location' from 'Edges (Storage Location).csv'). Consolidate all unique neighbors and their corresponding labeled relationships into a single structure.",
        "type": "CODE"
    },
    {
        "step_id": 3,
        "name": "Prepare Graph for Visualization",
        "description": "Construct a graph object (e.g., using NetworkX) with 'SOS001L12P' as the central node. Add all identified direct neighbors as nodes and create edges between 'SOS001L12P' and each neighbor. Assign the extracted edge attributes (e.g., 'Plant: Plant A', 'GroupCode: GRP1') as labels to these edges for clarity in the visualization.",
        "type": "CODE"
    },
    {
        "step_id": 4,
        "name": "Visualize Graph Edges with Labels",
        "description": "Generate a clear graph visualization of the spatial structure around 'SOS001L12P'. The plot should display 'SOS001L12P' prominently in the center, with its direct neighbors connected by edges. Each edge must be explicitly labeled with the type of relationship and its specific attribute value (e.g., 'Plant: P001', 'Product Group: PG002'). Ensure appropriate layout and styling for readability.",
        "type": "CODE"
    },
    {
        "step_id": 5,
        "name": "Summarize Spatial Connectivity",
        "description": "Provide a textual summary of the spatial structure around 'SOS001L12P'. Describe the types of entities it is connected to (Plants, Product Groups, Product Sub-Groups, Storage Locations), list the specific neighbors identified, and briefly explain the nature of these connections based on the edge labels.",
        "type": "TEXT"
    }
]

The coder agent created 4 tools, one for each of the CODE steps

And the primary output is the graph of this SKU’s relations with its neighbors with all the relations labeled, such as shared plants, storage locations, product groups etc.

And a brief report is generated of the findings which notes:

“Neighbor Relationships: The analysis identified 39 distinct relationships connecting SOS001L12P to 39 unique neighboring nodes.”

Full report here

A thorough analysis of the spatial structure around node `SOS001L12P` has been completed.

**Analysis Findings:**
1.  **Node Validation:** The node `SOS001L12P` was confirmed as a valid node within the master data.
2.  **Neighbor Relationships:** The analysis identified 39 distinct relationships connecting `SOS001L12P` to 39 unique neighboring nodes. These relationships are categorized by `RelationshipType` and `RelationshipValue`:
    *   **Plant:** `SOS001L12P` is connected to multiple neighbors via various 'Plant' codes (e.g., 2103, 2111, 2112, 2114, 2115, 2116, 2117, 2118, 2119, 2120, 2121, 2122).
    *   **GroupCode:** `SOS001L12P` shares 'GroupCode' 'S' with neighbors like `SOS008L02P`, `SOS005L04P`, `SOS003L04P`, `SOS002L09P`, `SOS500M24P`, `SOS250M48P`, `SOP001L12P`, `SO0005L04P`, `SO0002L09P`, `SO0001L12P`, `SO0500M24P`, `SE500G24P`, and `SE200G24P`.
    *   **SubGroupCode:** `SOS001L12P` is linked by 'SubGroupCode' 'SOS' to nodes such as `SOS008L02P`, `SOS005L04P`, `SOS003L04P`, `SOS002L09P`, `SOS500M24P`, and `SOS250M48P`.
    *   **Storage Location:** `SOS001L12P` is associated with its neighbors through various 'Storage Location' codes (e.g., 330.0, 1130.0, 1230.0, 1430.0, 1530.0, 1630.0, 1730.0, 1830.0, 1930.0, 2030.0, 2130.0, 2230.0).

The full list of neighbors and their specific relationships can be found in `C:/Auto-agent/workspace/output\SOS001L12P_neighbors_relationships.csv`.

**Visualization:**
A graph visualization has been generated, centering on `SOS001L12P` and displaying all its direct neighbors and their connecting edges with labels indicating the `RelationshipType` and `RelationshipValue`. The resulting graph contains 40 nodes (including `SOS001L12P`) and 39 edges.

The visualization is available at: `C:/Auto-agent/workspace/output\SOS001L12P_neighbor_graph.png`

Analysis of Sales Order (Demand) signal

Since we are interested in the accuracy of the demand forecasting model, it is important to analyse the sales trends. The goal was set to:

“perform analysis of sales order temporal signal of SOS001L12P noting the trend, outliers and any abnormalities in volume trends”

The planner generated a 6-step plan to load and process sales data, create 7-day and 30-day rolling averages, calculate outliers, visualize trends and prepare a report.

The plan is here

[
    {
        "step_id": 1,
        "name": "Load Sales Order Data for SOS001L12P",
        "description": "Load the 'Sales Order.csv' file from C:\\AnomalyGNN\\dataset\\Raw Dataset\\Temporal Data\\Unit\\Sales Order.csv. Focus on the 'Date' and 'SOS001L12P' columns, treating 'SOS001L12P' as the sales volume.",
        "type": "CODE"
    },
    {
        "step_id": 2,
        "name": "Preprocess Temporal Data",
        "description": "Convert the 'Date' column to datetime objects and set it as the DataFrame index. Ensure the sales volume column (SOS001L12P) is numerical. Handle any missing values, potentially by filling with 0 or forward/backward fill if appropriate for sales data (specify chosen method).",
        "type": "CODE"
    },
    {
        "step_id": 3,
        "name": "Analyze Sales Volume Trends",
        "description": "Calculate 7-day and 30-day rolling averages for the 'SOS001L12P' sales volume. These will help identify short-term and long-term trends in the sales signal. Store these as new columns in the DataFrame.",
        "type": "CODE"
    },
    {
        "step_id": 4,
        "name": "Detect Outliers using Z-Score",
        "description": "Calculate the Z-score for the 'SOS001L12P' sales volume. Identify potential outliers by marking data points where the absolute Z-score exceeds a predefined threshold (e.g., 2 or 3 standard deviations). Store the Z-score and an 'Is_Outlier' boolean column.",
        "type": "CODE"
    },
    {
        "step_id": 5,
        "name": "Visualize Sales Order Temporal Signal",
        "description": "Create a time-series line plot showing the 'SOS001L12P' sales volume over time. Overlay the 7-day and 30-day rolling averages. Highlight the identified outliers on the plot to visually represent abnormalities in volume trends.",
        "type": "CODE"
    },
    {
        "step_id": 6,
        "name": "Summarize Trends and Abnormalities",
        "description": "Provide a textual summary of the observed trends (e.g., seasonality, growth/decline), the frequency and magnitude of identified outliers, and any other notable abnormalities or shifts in the sales volume patterns for SOS001L12P. This summary should interpret the visualizations and statistical measures.",
        "type": "TEXT"
    }
]

The coder agent created 5 tools and the primary output of the pipeline is the following trend. The notable observation is a downward trend in the second half of the duration which could be a cause for concern that needs to be investigated.

The analysis report also notes this declining trend.

“Persistent Decline: The consistent decrease in both daily sales and the rolling averages (7-day and 30-day) from January to August clearly indicates a persistent decline in sales volume for SOS001L12P.“

Here is the full report

**Analysis of SOS001L12P Sales Order Temporal Signal**

This analysis covers sales data for product SOS001L12P from January 1, 2023, to August 9, 2023.

**1. Volume Trend During the Duration:**
The sales volume for SOS001L12P shows a clear and persistent declining trend throughout the analyzed period.
*   In early January 2023, daily sales volumes were high, with individual days reaching up to 21,394 units. The 30-day rolling average was approximately 12,629 by January 10th.
*   By early August 2023, the sales volume had significantly decreased. The 30-day rolling average plummeted to approximately 5,165 by August 9th, indicating a substantial drop in consistent sales activity over the eight-month period.

**2. Persistent Declines and Other Issues:**
*   **Persistent Decline:** The consistent decrease in both daily sales and the rolling averages (7-day and 30-day) from January to August clearly indicates a persistent decline in sales volume for SOS001L12P.
*   **Anomalies:** The analysis identified 14 specific days classified as anomalies. These were characterized by unusually low sales volumes, either due to a Z-score below -2.0 or three consecutive days of negative percentage change. Notable severe drops include:
    *   January 4, 2023: Sales dropped to 10,920.08.
    *   March 8, 2023: Sales recorded an exceptionally low value of 200.00.
*   **Zero Sales:** Towards the end of the analyzed period, there were instances of zero sales recorded (e.g., August 8th and 9th), further exacerbating the declining trend and highlighting critical issues in sales performance.

Analysis of Production Signal

It is important to correlate the production anomalies report to see if there is a causal indication as to why the sales might be declining. To do this, we set the following goal for the agent:

“perform analysis of the production temporal signal of SOS001L12P ,pointing out the trend, outliers and any abnormalities in volume trends“

A 7-step plan was generated similar to the sales analysis to calculate trends and outliers, but also, to visualize anomalies generated by the anomaly detection model.

Here is the plan

[
    {
        "step_id": 1,
        "name": "Load Production Data for SOS001L12P",
        "description": "Load the 'Production .csv' file from 'C:\\AnomalyGNN\\dataset\\Raw Dataset\\Temporal Data\\Unit\\' into a pandas DataFrame. Select only the 'Date' and 'SOS001L12P' columns for analysis.",
        "type": "CODE"
    },
    {
        "step_id": 2,
        "name": "Clean and Prepare Production Data",
        "description": "Convert the 'Date' column to datetime objects and set it as the DataFrame index. Ensure the 'SOS001L12P' column is numeric. Handle any missing values in 'SOS001L12P' by filling with 0, assuming missing production data means zero units produced.",
        "type": "CODE"
    },
    {
        "step_id": 3,
        "name": "Calculate Rolling Averages for Trend",
        "description": "Compute a 7-day and a 30-day rolling average for the 'SOS001L12P' production volume. These rolling averages will help to visualize and identify short-term and long-term trends in the data.",
        "type": "CODE"
    },
    {
        "step_id": 4,
        "name": "Visualize Production Volume and Trends",
        "description": "Create a time series line plot showing the daily 'SOS001L12P' production volume. Overlay the 7-day and 30-day rolling averages on the same plot to visually represent the trend. Ensure proper labeling for axes and a clear title.",
        "type": "CODE"
    },
    {
        "step_id": 5,
        "name": "Detect Outliers using Z-Score",
        "description": "Calculate the Z-score for each data point in the 'SOS001L12P' production volume. Identify potential outliers by flagging data points where the absolute Z-score exceeds a predefined threshold (e.g., |Z-score| > 2.5 or 3). Store the dates and values of these outliers.",
        "type": "CODE"
    },
    {
        "step_id": 6,
        "name": "Visualize Outliers and Abnormalities",
        "description": "Generate a time series line plot of the 'SOS001L12P' production volume. On this plot, distinctly mark or color the data points that were identified as outliers in the previous step to highlight abnormalities in volume trends.",
        "type": "CODE"
    },
    {
        "step_id": 7,
        "name": "Summarize Trend, Outliers, and Abnormalities",
        "description": "Provide a textual summary of the observed overall trend (e.g., increasing, decreasing, stable, seasonal), describe the characteristics of the identified outliers (e.g., magnitude, frequency, specific dates), and comment on any notable abnormalities in the production volume patterns based on the visualizations and statistical analysis.",
        "type": "TEXT"
    }
]

Appropriate tools were created, executed and outputs produced. Following are the production trend chart and more importantly, the chart showing a significant number of anomalies in the second half of the period, which might be impacting sales.

Anomaly chart showing significant anomalies in the second half

This is noted in the report generated:

” Despite the lack of Z-score outliers, a more sophisticated anomaly detection method **identified 17 anomalous data points** in the SOS001L12P production.“

Full report here

Here's an analysis of the production temporal signal for SOS001L12P:

**1. Data Overview:**
The production data for SOS001L12P was successfully loaded and preprocessed, covering 221 days from January 1, 2023, to August 9, 2023. The 'Date' column was correctly parsed, and the 'SOS001L12P' production volume column was confirmed to be numeric with no missing values.

**2. Trend Analysis:**
*   **Rolling Averages:** 7-day and 30-day rolling averages were computed to smooth out daily fluctuations and highlight underlying trends.
*   The plot `SOS001L12P_production_trends.png` (saved to `C://Auto-agent/workspace/output`) visually represents the daily production volume alongside these rolling averages, allowing for observation of the overall temporal trend.
*   Looking at the tail of the data, the 7-day and 30-day rolling averages show a general decline in production volume towards the end of the observed period (early August), with the 30-day average dropping from 5582.97 on August 7th to 5075.73 on August 9th, and the 7-day average dropping from 3724.71 to 2911.00 in the same period.

**3. Outliers (Z-score Method):**
*   Using a Z-score threshold of |Z-score| > 2.5, **no potential outliers were identified** in the SOS001L12P production data. This suggests that while there might be variations, no individual data points significantly deviated from the mean (7480.90) by more than 2.5 standard deviations (4729.55).

**4. Abnormalities in Volume Trends:**
*   Despite the lack of Z-score outliers, a more sophisticated anomaly detection method **identified 17 anomalous data points** in the SOS001L12P production.
*   These anomalies are visualized in the plot `SOS001L12P_production_with_anomalies_plot.png` (saved to `C://Auto-agent/workspace/output`), which would highlight specific dates where production volume exhibited unusual behavior compared to its typical pattern. A significant drop to 0 on August 9th, for instance, is likely to be one such anomaly, suggesting a halt or severe reduction in production.

End-to-End Supply Planner’s report

Now that we have performed spatio-temporal analysis on the SKU SOS001L12P, the next logical step is to correlate the findings to the demand forecast model to see how it is adapting to the trends. And prepare a full report for this SKU health that the planner can take to the management.

To do this, we set the following goal:

“I am a supply chain planner. I would like you to consider the past analysis you did recently along with other data sources and create a end to end report on SOS001L12P. I noticed that the production is trending down in the second half of the duration as well as there are many anomalies during this period. Correlate the production with the demand forecast and comment on how our forecasting model is performing during the duration.“

The planner generated a plan to correlate production, sales and demand forecast, evaluate forecasting model performance using the primary metric WAPE (Weighted Absolute Percentage Error), then prepare a comprehensive report.

Here is the plan

[
    {
        "step_id": 1,
        "name": "Load and Consolidate SOS001L12P Data",
        "description": "Load the following datasets for SOS001L12P: \n1. Production rolling averages: C:\\Auto-agent\\workspace\\output\\production analysis - 2\\production_rolling_averages_SOS001L12P.csv (columns: Date, SOS001L12P, SOS001L12P_7_day_avg, SOS001L12P_30_day_avg)\n2. Production Z-Scores: C:\\Auto-agent\\workspace\\output\\production analysis - 2\\SOS001L12P_production_z_scores.csv (columns: Date, Actual_Production, Z_Score)\n3. Anomaly detection results: C:\\AnomalyGNN\\results\\top_sku_SOS001L12P_predictions.csv (columns: Date, Is_Anomaly)\n4. Demand vs. Production Forecast: C:\\GNN\\results\\demand_vs_production_SOS001L12P.csv (columns: Date, Predicted_Demand, Actual_Sales, Actual_Production)\n\nMerge all these dataframes on the 'Date' column, ensuring 'Date' is converted to a datetime format for accurate time-series analysis.",
        "type": "CODE"
    },
    {
        "step_id": 2,
        "name": "Analyze and Visualize Production Trends and Anomalies",
        "description": "Create a time-series line plot showing the 'Actual_Production' for SOS001L12P, overlaid with 'SOS001L12P_7_day_avg' and 'SOS001L12P_30_day_avg'. Highlight points where 'Is_Anomaly' is True (from top_sku_SOS001L12P_predictions.csv) or where 'Z_Score' exceeds a certain threshold (e.g., |Z_Score| > 2 or 3) to visually confirm the user's observation of a downward trend and anomalies in the second half of the duration. Identify the start date of the 'second half' based on the data's time range.",
        "type": "CODE"
    },
    {
        "step_id": 3,
        "name": "Correlate Production, Sales, and Demand Forecast",
        "description": "Generate a single time-series line plot comparing 'Actual_Production', 'Actual_Sales', and 'Predicted_Demand' for SOS001L12P. Analyze the visual correlation and note periods of significant divergence or alignment, especially during the identified 'second half' where production anomalies and downward trends were observed.",
        "type": "CODE"
    },
    {
        "step_id": 4,
        "name": "Evaluate Forecasting Model Performance (WAPE)",
        "description": "Calculate the Weighted Average Percentage Error (WAPE) for the 'Predicted_Demand' against 'Actual_Sales' for SOS001L12P using the consolidated data. Calculate WAPE for the entire duration and separately for the 'second half' of the duration (as identified in Step 2) to assess how the model performed during the period of observed production issues. Present the WAPE values and interpret their meaning regarding forecast accuracy.",
        "type": "CODE"
    },
    {
        "step_id": 5,
        "name": "Generate Comprehensive End-to-End Report for SOS001L12P",
        "description": "Provide a narrative report summarizing all findings. \n1. Confirm the observed downward trend in production for SOS001L12P during the second half of the duration, citing specific dates or periods. \n2. Validate the user's observation of many anomalies, providing a count or highlighting key anomaly dates. \n3. Discuss the correlation between actual production, actual sales, and predicted demand, explaining how they interact, especially during the problematic second half. \n4. Comment on the forecasting model's performance based on the calculated WAPE values, noting any significant differences in accuracy between the overall duration and the 'second half'. \n5. Offer insights into potential reasons for observed trends or anomalies and their impact on supply chain planning.",
        "type": "TEXT"
    }
]

The combined analysis produces a planner-ready assessment, which notes in its conclusion:

“For SKU SOS001L12P, the second half of the analyzed duration (April 21, 2023 – August 9, 2023) was characterized by increased volatility and a downward trend in Actual Production, accompanied by numerous anomalies. Actual Production during this period was also lower than Actual Sales. While the demand forecasting model effectively captured Actual Sales trends (WAPE of 29.70% in the second half, showing an improvement from the overall 35.44%), production struggled to align with these demands, leading to weak correlations between production and both sales and predicted demand. These divergences, often coinciding with identified production anomalies, suggest potential supply constraints or operational disruptions impacting our ability to meet demand consistently.“

The full report is here

Here is an end-to-end report for SKU SOS001L12P, considering your observations and the recent analyses.

---

### **End-to-End Report: SKU SOS001L12P Analysis (2023-01-01 to 2023-08-09)**

**1. Overview**
This report synthesizes recent production analysis, anomaly detection, and demand forecasting data for SKU SOS001L12P, covering the period from January 1, 2023, to August 9, 2023. The analysis specifically focuses on trends and model performance during the second half of this duration, starting approximately April 21, 2023.

**2. Production Trends and Anomalies**
The analysis confirms your observation regarding production trends in the second half of the duration (from 2023-04-21).

*   **Downward Trend and Volatility**: While the time series generally shows a positive visual correlation between Actual Production, Actual Sales, and Predicted Demand, Actual Production appears to be more volatile and exhibits sharper drops compared to Actual Sales and Predicted Demand in certain segments of the second half.
*   **Production vs. Sales**: In the second half, Actual Production (average 4866 units) was notably lower than Actual Sales (average 5136 units).
*   **Anomalies**: A total of 17 combined anomaly points (based on 'Is_Anomaly=True' or |Z_Score| > 2.5) were identified across the entire dataset. Several of these production anomalies were detected in the second half, specifically on dates such as 2023-04-21, 2023-05-05, 2023-05-21, 2023-05-22, and 2023-05-23. These anomaly periods often coincide with significant divergences where Actual Production deviates sharply from Actual Sales and Predicted Demand.

**3. Correlation: Production, Demand Forecast, and Sales**
The correlation analysis in the second half of the duration (from 2023-04-21) reveals distinct relationships:

*   **Strong Sales-Demand Correlation**: Actual Sales and Predicted Demand show a strong positive correlation of **0.85**. This suggests the forecasting model is generally effective in tracking actual sales patterns.
*   **Weak Production Correlation**: Actual Production's correlation with both Actual Sales and Predicted Demand is significantly weaker:
    *   Actual Production vs. Actual Sales: **0.35**
    *   Actual Production vs. Predicted Demand: **0.38**
*   **Divergence**: Despite a general visual correlation across the entire period, the second half exhibits noticeable periods where Actual Production diverges from Actual Sales and Predicted Demand. This is particularly evident during the observed anomalies, indicating that production output did not consistently align with market demand or sales in these instances.

**4. Demand Forecasting Model Performance**
The forecasting model's performance was assessed using the Weighted Average Percentage Error (WAPE):

*   **Overall Performance**: The overall WAPE for SOS001L12P across the entire duration is **35.44%**.
*   **Second Half Performance**: For the 'second half' of the duration (from 2023-04-21), the WAPE is **29.70%**.
*   **Interpretation**: The WAPE in the second half is lower than the overall WAPE. This indicates that, despite the observed production issues and anomalies during this period, the demand forecasting model's accuracy actually **improved** in the second half. The model's Predicted Demand consistently tracks Actual Sales closely, suggesting it is reasonably effective in capturing demand patterns for planning purposes.

**Conclusion**
For SKU SOS001L12P, the second half of the analyzed duration (April 21, 2023 - August 9, 2023) was characterized by increased volatility and a downward trend in Actual Production, accompanied by numerous anomalies. Actual Production during this period was also lower than Actual Sales. While the demand forecasting model effectively captured Actual Sales trends (WAPE of 29.70% in the second half, showing an improvement from the overall 35.44%), production struggled to align with these demands, leading to weak correlations between production and both sales and predicted demand. These divergences, often coinciding with identified production anomalies, suggest potential supply constraints or operational disruptions impacting our ability to meet demand consistently.

This is a production-grade report consisting of facts, figures and KPIs to support the interpretations derived.

And all of this was achieved with a completely flexible agentic framework that can adapt to any explainability goal desired.

Developing and testing this application, along with generating the above reports and results cost less than $1 in token usage.

Conclusion

The future of agentic systems is not larger tool catalogs or more elaborate orchestration frameworks. It is agents that can decide what needs to exist in the first place.

When planning, coding, and execution are treated as reasoning steps, tools stop being static assets and become disposable artifacts—created, used, and discarded as the problem demands. This shifts agent design away from prompt engineering trying to bind the agents and tools to the orchestration framework toward software generation as a core capability.

The plan–code–execute architecture demonstrates that production-grade, evidence-based analysis can emerge from small, transparent agents with minimal glue code and minimal cost. As LLMs continue to improve, the most valuable agents will be those that can autonomously turn intent into working software—without being told in advance how to do it.

In that sense, auto-planning and auto-coding agents are not just a convenience. They are a natural evolution of agentic systems.

For more on the GNN model for demand forecasting referred in this article, read my article here.

Connect with me and share your comments at www.linkedin.com/in/partha-sarkar-lets-talk-AI

Reference

SupplyGraph: A Benchmark Dataset for Supply Chain Planning using Graph Neural Networks : Authors: Azmine Toushik Wasi, MD Shafikul Islam, Adipto Raihan Akib

_{Images used in this article are generated using Google Gemini. Charts and underlying code created by me.}