Building a LangGraph Agent from Scratch

Contents

The term “AI agent” is one of the most popular right now. They emerged after the LLM hype, when people realized that the latest LLM capabilities are impressive but that they can only perform tasks on which they have been explicitly trained. In that sense, normal LLMs do not have tools that would allow them to do anything outside their scope of knowledge. RAG To address this, Retrieval-Augmented Generation (RAG) was later introduced to retrieve additional context from external data sources and inject it into the prompt, so the LLM becomes aware of more context. We can roughly say that RAG made the LLM more knowledgeable, but for more complex problems, the LLM + RAG approach still failed when the solution path was not known in advance. RAG pipeline Agents Agents are a remarkable concept built around LLMs that introduce state, decision-making, and memory. Agents can be thought of as a set of predefined tools for analyzing results and storing them in memory for later use before producing the final answer. LangGraph Extra functionalities Agent State Tools Graph construction Example Conclusion Resources

RAG

LangGraph is a popular framework used for creating agents. As the name suggests, agents are constructed using graphs with nodes and edges.

Nodes represent the agent’s state, which evolves over time. Edges define the control flow by specifying transition rules and conditions between nodes.

To better understand LangGraph in practice, we will go through a detailed example. While LangGraph might seem too verbose for the problem below, it usually has a much larger impact on complex problems with large graphs.

First, we need to install the necessary libraries.

langgraph==1.0.5
langchain-community==0.4.1
jupyter==1.1.1
notebook==7.5.1
langchain[openai]

Then we import the necessary modules.

import os
from dotenv import load_dotenv

import json
import random
from pydantic import BaseModel
from typing import Optional, List, Dict, Any

from langgraph.graph import StateGraph, START, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from langchain.chat_models import init_chat_model
from langchain.tools import tool

from IPython.display import Image, display

We would also need to create an .env file and add an OPENAI_API_KEY there:

OPENAI_API_KEY=...

Then, with load_dotenv(), we can load the environment variables into the system.

load_dotenv()

Extra functionalities

The function below will be useful for us to visually display constructed graphs.

def display_graph(graph):
    return display(Image(graph.get_graph().draw_mermaid_png()))

Agent

Let us initialize an agent based on GPT-5-nano using a simple command:

llm = init_chat_model("openai:gpt-5-nano")

State

In our example, we will construct an agent capable of answering questions about soccer. Its thought process will be based on retrieved statistics about players.

To do that, we need to define a state. In our case, it will be an entity containing all the information an LLM needs about a player. To define a state, we need to write a class that inherits from pydantic.BaseModel:

class PlayerState(BaseModel):
    question: str
    selected_tools: Optional[List[str]] = None
    name: Optional[str] = None
    club: Optional[str] = None
    country: Optional[str] = None
    number: Optional[int] = None
    rating: Optional[int] = None
    goals: Optional[List[int]] = None
    minutes_played: Optional[List[int]] = None
    summary: Optional[str] = None

When moving between LangGraph nodes, each node takes as input an instance of PlayerState that specifies how to process the state. Our task will be to define how exactly that state is processed.

Tools

First, we will define some of the tools an agent can use. A tool can be roughly thought of as an additional function that an agent can call to retrieve the information needed to answer a user’s question.

To define a tool, we need to write a function with a @tool decorator. It is important to use clear parameter names and function docstrings, as the agent will consider them when deciding whether to call the tool based on the input context.

To make our examples simpler, we are going to use mock data instead of real data retrieved from external sources, which is usually the case for production applications.

In the first tool, we will return information about a player’s club and country by name.

@tool
def fetch_player_information_tool(name: str):
    """Contains information about the football club of a player and its country"""
    data = {
        'Haaland': {
            'club': 'Manchester City',
            'country': 'Norway'
        },
        'Kane': {
            'club': 'Bayern',
            'country': 'England'
        },
        'Lautaro': {
            'club': 'Inter',
            'country': 'Argentina'
        },
        'Ronaldo': {
            'club': 'Al-Nassr',
            'country': 'Portugal'
        }
    }
    if name in data:
        print(f"Returning player information: {data[name]}")
        return data[name]
    else:
        return {
            'club': 'unknown',
            'country': 'unknown'
        }

def fetch_player_information(state: PlayerState):
    return fetch_player_information_tool.invoke({'name': state.name})

You might be asking why we place a tool inside another function, which seems like over-engineering. In fact, these two functions have different responsibilities.

The function fetch_player_information() takes a state as a parameter and is compatible with the LangGraph framework. It extracts the name field and calls a tool that operates on the parameter level.

It provides a clear separation of concerns and allows easy reuse of the same tool across multiple graph nodes.

Then we have an analogous function that retrieves a player’s jersey number:

@tool
def fetch_player_jersey_number_tool(name: str):
    "Returns player jersey number"
    data = {
        'Haaland': 9,
        'Kane': 9,
        'Lautaro': 10,
        'Ronaldo': 7
    }
    if name in data:
        print(f"Returning player number: {data[name]}")
        return {'number': data[name]}
    else:
        return {'number': 0}

def fetch_player_jersey_number(state: PlayerState):
    return fetch_player_jersey_tool.invoke({'name': state.name})

For the third tool, we will be fetching the player’s FIFA rating:

@tool
def fetch_player_rating_tool(name: str):
    "Returns player rating in the FIFA"
    data = {
        'Haaland': 92,
        'Kane': 89,
        'Lautaro': 88,
        'Ronaldo': 90
    }
    if name in data:
        print(f"Returning rating data: {data[name]}")
        return {'rating': data[name]}
    else:
        return {'rating': 0}

def fetch_player_rating(state: PlayerState):
    return fetch_player_rating_tool.invoke({'name': state.name})

Now, let us write several more graph node functions that will retrieve external data. We are not going to label them as tools as before, which means they won’t be something the agent decides to call or not.

def retrieve_goals(state: PlayerState):
    name = state.name
    data = {
        'Haaland': [25, 40, 28, 33, 36],
        'Kane': [33, 37, 41, 38, 29],
        'Lautaro': [19, 25, 27, 24, 25],
        'Ronaldo': [27, 32, 28, 30, 36]
    }
    if name in data:
        return {'goals': data[name]}
    else:
        return {'goals': [0]}

Here is a graph node that retrieves the number of minutes played over the last several seasons.

def retrieve_minutes_played(state: PlayerState):
    name = state.name
    data = {
        'Haaland': [2108, 3102, 3156, 2617, 2758],
        'Kane': [2924, 2850, 3133, 2784, 2680],
        'Lautaro': [2445, 2498, 2519, 2773],
        'Ronaldo': [3001, 2560, 2804, 2487, 2771]
    }
    if name in data:
        return {'minutes_played': data[name]}
    else:
        return {'minutes_played': [0]}

Below is a node that extracts a player’s name from a user question.

def extract_name(state: PlayerState):
    question = state.question
    prompt = f"""
You are a football name extractor assistant.
Your goal is to just extract a surname of a footballer in the following question.
User question: {question}
You have to just output a string containing one word - footballer surname.
    """
    response = llm.invoke([HumanMessage(content=prompt)]).content
    print(f"Player name: ", response)
    return {'name': response}

Now is the time when things get interesting. Do you remember the three tools we defined above? Thanks to them, we can now create a planner that will ask the agent to choose a specific tool to call based on the context of the situation:

def planner(state: PlayerState):
    question = state.question
    prompt = f"""
You are a football player summary assistant.
You have the following tools available: ['fetch_player_jersey_number', 'fetch_player_information', 'fetch_player_rating']
User question: {question}
Decide which tools are required to answer.
Return a JSON list of tool names, e.g. ['fetch_player_jersey_number', 'fetch_rating']
    """
    response = llm.invoke([HumanMessage(content=prompt)]).content
    try:
        selected_tools = json.loads(response)
    except:
        selected_tools = []
    return {'selected_tools': selected_tools}

In our case, we will ask the agent to create a summary of a soccer player. It will decide on its own which tool to call to retrieve additional data. Docstrings under tools play an important role: they provide the agent with additional context about the tools.

Below is our final graph node, which will take multiple fields retrieved from previous steps and call the LLM to generate final summary.

def write_summary(state: PlayerState):
    question = state.question
    data = {
        'name': state.name,
        'country': state.country,
        'number': state.number,
        'rating': state.rating,
        'goals': state.goals,
        'minutes_played': state.minutes_played,
    }
    prompt = f"""
You are a football reporter assistant.
Given the following data and statistics of the football player, you will have to create a markdown summary of that player.
Player data:
{json.dumps(data, indent=4)}
The markdown summary has to include the following information:

- Player full name (if only first name or last name is provided, try to guess the full name)
- Player country (also add flag emoji)
- Player number (also add the number in the emoji(-s) form)
- FIFA rating
- Total number of goals in last 3 seasons
- Average number of minutes required to score one goal
- Response to the user question: {question}
    """
    response = llm.invoke([HumanMessage(content=prompt)]).content
    return {"summary": response}

Graph construction

We now have all the elements to build a graph. Firstly, we initialize the graph using the StateGraph constructor. Then, we add nodes to that graph one by one using the add_node() method. It takes two parameters: a string used to assign a name to the node, and a callable function associated with the node that takes a graph state as its only parameter.

graph_builder = StateGraph(PlayerState)
graph_builder.add_node('extract_name', extract_name)
graph_builder.add_node('planner', planner)
graph_builder.add_node('fetch_player_jersey_number', fetch_player_jersey_number)
graph_builder.add_node('fetch_player_information', fetch_player_information)
graph_builder.add_node('fetch_player_rating', fetch_player_rating)
graph_builder.add_node('retrieve_goals', retrieve_goals)
graph_builder.add_node('retrieve_minutes_played', retrieve_minutes_played)
graph_builder.add_node('write_summary', write_summary)

Right now, our graph consists only of nodes. We need to add edges to it. The edges in LangGraph are oriented and added via the add_edge() method, specifying the names of the start and end nodes.

The only thing we need to take into account is the planner, which behaves slightly differently from other nodes. As shown above, it can return the selected_tools field, which contains 0 to 3 output nodes.

For that, we need to use the add_conditional_edges() method taking three parameters:

The planner node name;
A callable function taking a LangGraph node and returning a list of strings indicating the list of node names should be called;
A dictionary mapping strings from the second parameter to node names.

In our case, we will define the route_tools() node to simply return the state.selected_tools field as a result of a planner function.

def route_tools(state: PlayerState):
    return state.selected_tools or []

Then we can construct nodes:

graph_builder.add_edge(START, 'extract_name')
graph_builder.add_edge('extract_name', 'planner')
graph_builder.add_conditional_edges(
    'planner',
    route_tools,
    {
        'fetch_player_jersey_number': 'fetch_player_jersey_number',
        'fetch_player_information': 'fetch_player_information',
        'fetch_player_rating': 'fetch_player_rating'
    }
)
graph_builder.add_edge('fetch_player_jersey_number', 'retrieve_goals')
graph_builder.add_edge('fetch_player_information', 'retrieve_goals')
graph_builder.add_edge('fetch_player_rating', 'retrieve_goals')
graph_builder.add_edge('retrieve_goals', 'retrieve_minutes_played')
graph_builder.add_edge('retrieve_minutes_played', 'write_summary')
graph_builder.add_edge('write_summary', END)

START and END are LangGraph constants used to define the graph’s start and end points.

The last step is to compile the graph. We can optionally visualize it using the helper function defined above.

graph = graph_builder.compile()
display_graph(graph)

Getting first experience with LangGraph — Graph diagram

Example

We are now finally able to use our graph! To do so, we can use the invoke method and pass a dictionary containing the question field with a custom user question:

result = graph.invoke({
    'question': 'Will Haaland be able to win the FIFA World Cup for Norway in 2026 based on his recent performance and stats?'
})

And here is an example result we can obtain!

{'question': 'Will Haaland be able to win the FIFA World Cup for Norway in 2026 based on his recent performance and stats?',
 'selected_tools': ['fetch_player_information', 'fetch_player_rating'],
 'name': 'Haaland',
 'club': 'Manchester City',
 'country': 'Norway',
 'rating': 92,
 'goals': [25, 40, 28, 33, 36],
 'minutes_played': [2108, 3102, 3156, 2617, 2758],
 'summary': '- Full name: Erling Haaland\n- Country: Norway 🇳🇴\n- Number: N/A
- FIFA rating: 92\n- Total goals in last 3 seasons: 97 (28 + 33 + 36)\n- Average minutes per goal (last 3 seasons): 87.95 minutes per goal\n- Will Haaland win the FIFA World Cup for Norway in 2026 based on recent performance and stats?\n  - Short answer: Not guaranteed. Haaland remains among the world’s top forwards (92 rating, elite goal output), and he could be a key factor for Norway. However, World Cup success is a team achievement dependent on Norway’s overall squad quality, depth, tactics, injuries, and tournament context. Based on statistics alone, he strengthens Norway’s chances, but a World Cup title in 2026 cannot be predicted with certainty.'}

A cool thing is that we can observe the entire state of the graph and analyze the tools the agent has chosen to generate the final answer. The final summary looks great!

Conclusion

In this article, we have examined AI agents that have opened a new chapter for LLMs. Equipped with state-of-the-art tools and decision-making, we now have much greater potential to solve complex tasks.

An example we saw in this article introduced us to LangGraph — one of the most popular frameworks for building agents. Its simplicity and elegance allow to construct complex decision chains. While, for our simple example, LangGraph might seem like overkill, it becomes extremely useful for larger projects where state and graph structures are much more complex.