Mahadev Maitri's Logo

Building Simple Agentic AI with LangGraph

March 27th, 2025

As someone who has been working with AI systems for a while, building agents and tools, I've seen how the current state of LLMs has evolved beyond just chatbots that can answer questions or generate text. They're now intelligent systems that can think, plan, and take autonomous actions to solve user's query . I thought it would be interesting to share what I've learned about how these agentic AI systems work and how you can build them yourself.

Introduction to Agentic AI

What is Agentic AI? In simple terms, it is a system designed to work on its own, making decisions and taking actions based on a user's query or task while utilizing various tools and resources with minimal human intervention. Imagine an AI that can not only understand and process information but can also take actions, use tools, and work autonomously to achieve specific objectives. For example, based on the user's input, it can search the web, retrieve information, and perform tasks like booking a flight or ordering food, all without requiring constant human guidance.

Lets discuss more detailed breakdown of an agent's thought process:

  1. User Prompt & Goal Identification: You give the agent a task, like, "What's the weather in New York, and what is the current time there? Also, what's the square root of 529?"
  2. Reasoning and Planning: The AI analyzes your request and breaks it down into smaller, actionable steps. It recognizes that this isn't one task, but three:
    • Find the weather in New York.
    • Find the current time in New York.
    • Calculate the square root of 529.
  3. Tool Selection: The agent consults its "toolkit" to find the right tool for each step. It reads the description of each tool to understand its purpose.
    • For "weather in New York," it matches this to a weather_tool.
    • For "time in New York," it finds the get_current_time tool.
    • For "square root of 529," it identifies the calculator tool.
  4. Execution: The agent calls these tools one by one with the correct information (the "arguments"). It runs weather_tool(location="New York"), get_current_time(timezone_name="America/New_York"), and calculator(expression="sqrt(529)"). The tools execute their code and return the results.
  5. Synthesis and Response: The agent gathers the results from all the tools - "The weather is partly cloudy," "The time is 2:30 PM," and "The result is 23.0" - and synthesizes them into a single, coherent, human-friendly answer.

This ability to plan and execute the right tools along with reasoning about the user's request is what makes agentic AI so powerful. It can handle complex tasks that require multiple steps and different tools, all while maintaining a clear understanding of the user's intent. You can use tools to interact with APIs, search databases, modify data, and perform computations, which allows the agent to achieve a wide range of tasks autonomously.

LangChain Tools and Function Calling

What is LangChain? LangChain is an open-source framework that acts as the "scaffolding" for building applications with LLMs. It abstracts away the complexities of working with LLMs from different providers, enabling developers to focus on implementation and logic rather than the underlying infrastructure. Let's take an example how LangChain helps abstract the complexity of working with different providers. For instance, when you want to switch from OpenAI's GPT-4o to Anthropic's Claude or Google's Gemini, LangChain provides a unified interface so you don't need to rewrite your entire application. Similarly, when working with vector databases like Pinecone, Chroma, or FAISS for storing embeddings, LangChain offers consistent APIs regardless of which vector store you choose. This abstraction layer means you can focus on building your agent's logic rather than dealing with the intricacies of each provider's specific implementation details.

One of the key features of LangChain is its robust support for tools. A tool is simply a function that the agent can call to perform a specific action - this is often referred to as function calling. Examples of tools include Google Search, GitHub Toolkit, and many others. You can find a complete list of the tools provided by LangChain at here.

It's important to understand that the LLM doesn't run the code itself. Instead, it generates a structured message (like a JSON object) that says, "I need to run the calculator tool with the argument expression='2+2'." The agentic application code then receives this message, executes the actual calculator function, and passes the result back to the LLM so it can continue its reasoning process.

Creating Custom Tools

While LangChain offers many built-in tools, we can also create custom tools tailored to our specific needs. We can turn any Python function into a tool using the @tool decorator. Let's look at our custom get_current_time tool again and break it down:

python
1# tools.py
2
3import json
4from datetime import datetime, timezone
5import pytz
6from langchain_core.tools import tool
7
8@tool
9def get_current_time(timezone_name: str = "UTC") -> str:
10    """
11    Get current date and time information.
12
13    Args:
14        timezone_name: Timezone name (e.g., "UTC", "US/Eastern", "Europe/London", "Asia/Tokyo")
15
16    Returns:
17        JSON string with time information
18    """
19    # ... function logic ...
20

Here's what makes this work:

  • @tool Decorator: This simple line tells LangChain that this function is a tool available for the agent to use.
  • The Docstring: This is the most critical part. The LLM reads the description inside the docstring ("""...""") to understand what the tool does. A clear, descriptive docstring is essential for the agent to know when to use your tool. The Args section tells it what inputs are required.
  • Type Hints (timezone_name: str, -> str): These help the LLM understand the expected data types for the inputs and outputs, leading to more reliable tool calls.

By following this structure, we can create custom tools that the agent can use to perform specific tasks to the application.

Building Your First Agent with LangChain

Now, let's connect agents and tools to build a simple application. We'll use LangGraph, a library built on top of LangChain, to create a stateful, multi-step agent. Think of LangGraph as a way to define the agent's logic as a flowchart or a state machine (let's visualize it).

Let's do a step-by-step walkthrough of the code:

  1. Define the Agent State: The state is the agent's "memory" of what it has done so far. In other words, it keeps track of the conversation history and any other relevant information.
  2. Initialize the Agent: In our SimpleAgent class, we set up the core components:
    • The LLM (Gemini-2.5-Flash-Preview-05-20). Google generously provides a free tier for this model, which is one of the top models based on the LMArena Leaderboard and is great for experimentation.
    • The list of tools we've defined. For our simple agent, we will use the get_current_time, calculator, web_search_tool, weather_tool, ingest_documents, and search_documents tools.
    • A SqliteSaver for memory, which allows the agent to remember conversations across different sessions using a thread_id.
  3. Create the Worflow: Let's define the workflow using LangGraph. This is where we define how the agent will interact with the tools and how it will reason about the user's request. The workflow is a directed graph where each node represents a step in the agent's reasoning process.
    • workflow = StateGraph(AgentState): We start by creating a new graph that will manage our AgentState.
    • workflow.add_node("agent", ...): We add our first node, the "agent" itself. This node's job is to call the LLM to decide what to do next.
    • workflow.add_node("tools", ...): We add a second node for executing tools. ToolNode is a pre-built LangGraph node that handles this for us.
    • workflow.set_entry_point("agent"): We tell the graph to always start at the "agent" node.
    • workflow.add_conditional_edges(...): This is the decision-making step. After the "agent" node runs, this edge checks if the LLM decided to call a tool. If yes, it routes the flow to the "tools" node. If not, it routes to END, and the process finishes.
    • workflow.add_edge("tools", "agent"): After the "tools" node runs, this edge sends the result back to the "agent" node so it can process the new information and decide what to do next.
  4. Run the Agent: Invoke the workflow with a user prompt, and a thread_id (to maintain conversation history). The workflow will execute the agent's logic, calling the LLM to reason about the user's request, deciding which tools to call, and looping through the process until it reaches a conclusion.
python
1# graph.py
2
3from typing import TypedDict
4
5# Agent's state representation
6class AgentState(TypedDict):
7    messages: Annotated[List[BaseMessage], operator.add]
8    next_action: Optional[str]
9
10
11# Setup workflow graph and tool node for the agent
12class SimpleAgent:
13    def __init__(self):
14        # ... setup ...
15        self.tools = [
16            get_current_time,
17            calculator,
18            weather_tool,
19            web_search_tool,
20            ingest_documents,
21            search_documents,
22        ]
23        self.agent = self._create_graph()
24
25    def _create_graph(self):
26        workflow = StateGraph(AgentState)
27        tool_node = ToolNode(self.tools)
28
29        # The agent node calls the LLM to reason
30        workflow.add_node("agent", self._call_model)
31        # The tool node executes the chosen function
32        workflow.add_node("tools", tool_node)
33
34        # The graph starts with the agent
35        workflow.set_entry_point("agent")
36
37        # After the agent node, we decide where to go next
38        workflow.add_conditional_edges(
39            "agent",
40            self._should_continue, # This function makes the decision
41            {
42                "tools": "tools", # If a tool is called, go to the tools node
43                "end": END,       # Otherwise, end the conversation
44            }
45        )
46
47        # After the tools node runs, loop back to the agent node
48        workflow.add_edge("tools", "agent")
49
50        # Compile the graph into a runnable agent
51        return workflow.compile(checkpointer=self.memory)
52
53    def _should_continue(self, state: AgentState) -> str:
54        # This function checks the last message from the LLM
55        last_message = state["messages"][-1]
56        # If it contains a tool call, we continue to the "tools" node
57        if hasattr(last_message, "tool_calls") and last_message.tool_calls:
58            return "tools"
59        # Otherwise, we're done
60        return "end"
61

This graph structure creates a robust loop: Reason -> Act -> Observe -> Reason, which is the fundamental cycle of any intelligent agent. Let's put this in Mermaid diagram format to visualize it better:

Simple Agentic AI Workflow

Retrieval-Augmented Generation (RAG) in Agentic AI

Now, let's say we want to ask a question that requires retrieving information from a set of documents or a database but the LLM does not have the information in its training data. If we still ask the question, the LLM will try to answer based on its training data, which may not be accurate or up-to-date, leading to what we call "hallucinations." To avoid this, we can use a technique called Retrieval-Augmented Generation (RAG). This helps LLM to retrieve relevant information from the set of documents or a database before generating a response.

If we connect this to our agentic AI, RAG becomes even more powerful. Instead of just a simple retrieval step, the agent can actively decide when to search for information, what to search for, and how to use the results.

Let's dive deeper into the RAG process:

  1. Ingestion and Indexing (Building the Knowledge Base):
    • Ingesting Documents: This is the first step, where we extract text from various sources like PDFs, websites, or databases. For example, we can use PyPDFLoader to read PDF files and extract their text.
    • Splitting Text: One of the reasons we want to split the text into smaller chunks is to manage the size of the text, making it easier for the LLM to process and understand. For example, we can use RecursiveCharacterTextSplitter to split the text into smaller, manageable pieces with overlapping of chunks. LLMs have a limited "context window" (the amount of text they can consider at once).
    • Embedding: Each chunk of text is passed to an embedding model (like models/text-embedding-004), which converts it into a list of numbers called a vector. This vector represents the semantic meaning of the text. Think of it like a map where similar words live close together - "car" and "automobile" would be neighbors, while "car" and "banana" would be far apart. Instead of just three dimensions like our physical world, these word maps use hundreds of dimensions, allowing the computer to capture subtle relationships like how "king" relates to "queen" in the same way "man" relates to "woman."

      A sample Interactive 3D Word Embedding Visualization:

      Click words to select

      Drag to rotate

      Scroll to zoom

      Transport
      Food
      Royalty
      Gender
    • Storing in a Vector Database: These vectors are stored in a specialized database, like ChromaDB. A vector database is designed to do one thing very well: find vectors that are "closest" to a given query vector. "Closest" in this context means most semantically similar.
  2. Agentic Retrieval:
    • When a user asks, "What was mentioned in the project summary document?", the agent knows not to use a web search. Instead, it uses its search_documents tool.
    • The agent takes the user's query ("project summary document") and converts it into a vector using the same embedding model.
    • It then searches the vector database to find the document chunks that are most semantically similar to the user's question. This is done by calculating the distance between the query vector and the vectors of the stored chunks. The distance is often measured using cosine similarity, which tells us how "aligned" two vectors are.
    • The database returns the most relevant chunks of text. The agent then includes this retrieved text in its prompt to the LLM, allowing it to generate an answer based on the specific content of your document.

Through the addition of new tools like ingest_documents and search_documents, we enhance our agentic AI's capabilities, allowing it to not only reason and plan but also to retrieve and utilize information from a knowledge base. This makes the agent more capable of handling questions about documents or databases that it has processed.

Here is a practical demonstration of the agentic AI:

plaintext
1You: What is the weather in Tokyo?
2Agent: In Tokyo, the current weather is broken clouds. The wind speed is 3.6 m/s, direction: 150°. Humidity is 46%. The current temperature is 27.19°C, with a high of 27.95°C and a low of 25.75°C. It feels like 27.35°C. Cloud cover is 75%.
3
4You: What is sin of 60?
5Agent: The sine of 60 degrees is approximately 0.866.
6
7You: 'Machine_Learning_and_AI_for_Healthcare_Big_Data_for.pdf', what is supervised learning in healthcare?
8Agent: Supervised learning in healthcare involves using labeled training data to infer a function, essentially learning a mapping of Y = f(X) to make predictions for new data (X). This means that the algorithm learns from past examples where both the input (e.g., symptoms) and the correct output (e.g., diagnosis) are provided. The learning continues until the model achieves a desired level of accuracy.
9
10Key applications of supervised learning in healthcare include:
11
12*   **Classification:** Predicting an outcome based on a training dataset where the output variable is in distinct categories. For example, classifying a patient as "sick" or "unhealthy" based on symptoms. Techniques like support vector machines, naïve Bayes, k-nearest neighbors (KNN), and logistic regression are used.
13*   **Regression:** Similar to classification, but the output variable is a real value rather than a category. Examples include predicting height, body temperature, or weight. Linear regression, polynomial regression, and neural networks are some models used for regression.
14*   **Forecasting:** Making predictions based on past and present data, also known as time-series forecasting.
15
16Supervised learning is particularly useful for predictive modeling in healthcare, such as identifying high-risk patients or those with chronic conditions who could benefit from personalized interventions. It can also help providers find best practices and treatments to improve patient outcomes and reduce hospital admissions by analyzing symptom comparisons, treatments, and their effects.
17

If you want to see the agent in action, you can run the code above in your local environment. Please follow the instructions in the agentic-ai.

To have a better understanding of how the agent works or continue to build on it, you can refer to the LangChain Agents Tutorial or the LangGraph Agents Overview documentation. You can also visualize the agent's workflow using the LangGraph Visualizer to see how the agent's logic flows and how it interacts with the tools along with other features like response_format.

How Code Completion Systems Like Cursor and Copilot Work

These days everyone is talking about Vibe coding and expects LLMs with agents to be able to understand the user's requirements and write the entire codebase from scratch, including setting up the project structure, implementing complex business logic, handling edge cases, and deploying the application. However, the reality is not that simple. Maybe in the future, but not now. Currently, code completion systems like Cursor and GitHub Copilot are designed to assist developers by providing code suggestions, autocompletions, and context-aware code snippets based on the current codebase and the developer's intent. Basically, to increase the productivity of developers. Let me explain how these systems work in a simplified manner:

Code Indexing: The Foundation of Context-Awareness

First and foremost, these systems need to understand the codebase they are working with and represent it in a way that LLMs can easily process. This is where code indexing comes into play. I have elaborated on how indexing works in the above section. But for code completion systems, the process is slightly different:

  1. Parsing and Abstract Syntax Trees (ASTs): The first step is to parse the codebase. Instead of just reading the text, the system uses a parser (like the highly effiicient tree-sitter) to build an Abstract Syntax Tree (AST) for each file. An AST is a tree-like representation of the code's structure, capturing the relationships between different parts of the code. For example, for the line def add(a: int, b: int) -> int: return a + b, the AST would create nodes like:
    • FunctionDef node: Contains the function name (add), parameters, return type, and body.
    • Arguments node: Contains parameter definitions - a and b, both with type annotation int.
    • Return node: Contains the return statement with a BinOp (binary operation) node for a + b.
    • BinOp node: Contains the left operand (a), operator (+), and right operand (b).
    Each node contains metadata like line numbers, column positions, variable names, types, and scope information. This structured representation allows the system to understand not just what the code says, but what it means semantically.
  2. Semantic Embedding: Once the code is structured as an AST, it's converted into numerical representations, or embeddings. It doesn't just embed the code itself; it embeds the code's structure, its relationships, and the natural language in comments and docstrings. This creates a semantic index of the codebase, a sophisticated searchable map of the codebase where the meaning of the code is captured in a way that LLMs can understand. For example, the function add would be embedded in a way that captures its purpose (adding two numbers), its parameters, return type, and even its usage in the codebase.

Code Generation and Autocompletion

With this level of indexing, the LLMs can now provide context-aware code suggestions and autocompletions. Here's how it works:

  1. Advanced Prompt Engineering: When you start typing, the LLM doesn't just see the characters from the current context (or line). It constructs a complex prompt that is sent to the LLM. This prompt includes:
    • The code before the cursor.
    • The code after the cursor (this helps with a technique called in-filling, where the model fills in the middle).
    • Snippets of code from other files that are semantically related to what you're writing.
    • The overall structure and framework of the project, which helps the LLM understand the context of your code.
  2. Inferring Developer Intent: By combining the natural language comments (e.g., // function to fetch user data from the API), docstrings (e.g., """function to fetch user data from the API"""), and the code structure, the LLM can infer the next few lines of code based on the context.

In summary, code completion systems like Cursor and Copilot parse, index, and semantically understand your entire project, allowing them to act as a true AI-powered pair programmer, helping developers increase their productivity and write better code faster.

I also use GitHub Copilot at times to increase my productivity, especially when I am working on a new project or learning a new technology. It helps me write code faster and understand better by providing context-aware suggestions and autocompletions.

References