Building a Simple AI Agent Using FastAPI, LangGraph, and MCP
FastAPI

Building a Simple AI Agent Using FastAPI, LangGraph, and MCP

A practical guide to building tool-aware, stateful AI agents with FastAPI, LangGraph workflows, and Model Context Protocol (MCP)

TermTrix
TermTrix
Dec 26, 2025
5min read

Building a Simple AI Agent Using FastAPI, LangGraph, and MCP

Modern AI agents are no longer just single LLM calls. Real-world agents need tool access, memory, workflow control, and clean APIs. In this post, we will build a simple but production-ready AI agent using FastAPI, LangGraph, and MCP (Model Context Protocol).

This article is written for backend engineers who want a clear mental model and a practical implementation.

Tech Stack Overview

We will use the following components:

  • FastAPI as the API layer
  • FastMCP as the tool and prompt server (Model Context Protocol)
  • LangGraph for agent workflow orchestration
  • LangChain MCP adapters to connect MCP tools to LangGraph

By the end, the agent will be able to:

  • Call external tools such as Wikipedia and REST Countries
  • Load system prompts dynamically from MCP
  • Execute a LangGraph-based agent workflow via a FastAPI endpoint

High-Level Architecture

The architecture is intentionally modular.

Client | | POST /workflow v FastAPI | | MCP Client (HTTP) v FastMCP Server | | Tools Prompts | LangGraph Agent

Key Design Idea

  • MCP acts as the tool and prompt server
  • LangGraph acts as the agent brain
  • FastAPI exposes the agent as a clean HTTP API

This separation keeps the system maintainable and scalable.

Step 1: FastAPI as the Agent Gateway

FastAPI is used as the entry point for client requests. It mounts the MCP server and exposes a workflow endpoint.

from fastapi import FastAPI
from mcp_server.server import mcp_app
from workflows.graph import create_graph
from langchain_mcp_adapters.client import MultiServerMCPClient

app = FastAPI(lifespan=mcp_app.lifespan)
app.mount("/agent", mcp_app)

Mounting MCP allows FastAPI to host both the agent API and the MCP server in the same process.

MCP Client Configuration

FastAPI communicates with MCP through an HTTP-based MCP client.

client = MultiServerMCPClient({
    "agent": {
        "transport": "http",
        "url": "http://localhost:8000/agent/mcp",
    },
})

This client is responsible for discovering tools and prompts at runtime.

Step 2: Workflow Endpoint

The workflow endpoint performs three responsibilities:

  • Opens an MCP session

  • Builds the LangGraph agent

  • Executes the agent with user input

    @app.post("/workflow") async def run_workflow(message: str): config = {"configurable": {"thread_id": "001"}}

      async with client.session("agent") as session:
          agent = await create_graph(session=session)
          response = await agent.ainvoke(
              {"messages": message},
              config=config
          )
    
          return response["messages"][-1].content
    

Why thread_id Matters

The thread_id enables conversation memory and checkpointing inside LangGraph. Without it, each request would be stateless.

Step 3: Defining MCP Tools with FastMCP

FastMCP allows tools to be defined using decorators. These tools are automatically discoverable by LangGraph.

Wikipedia Tool

@mcp.tool(
    name="global_news",
    description="Get global news from Wikipedia"
)
async def global_news(query: str):
    return wikipedia.summary(query)

Country Details Tool

@mcp.tool(
    name="get_countries_details",
    description="Get details of a country"
)
async def get_countries_details(country_name: str):
    async with httpx.AsyncClient(timeout=15.0) as client:
        response = await client.get(
            f"https://restcountries.com/v3.1/name/{country_name}?fullText=true"
        )
        response.raise_for_status()
        return response.json()

Currency Tool

@mcp.tool(
    name="get_currency",
    description="Get details of a currency"
)
async def get_currency(currency_code: str):
    async with httpx.AsyncClient(timeout=15.0) as client:
        response = await client.get(
            f"https://restcountries.com/v3.1/currency/{currency_code}"
        )
        response.raise_for_status()
        return response.json()

These tools are exposed through MCP and can be invoked by the LLM through LangGraph.

Step 4: MCP Prompts for System Instructions

Instead of embedding system prompts inside agent code, MCP manages them centrally.

@mcp.prompt
async def common_prompt() -> str:
    return """
    You are a helpful assistant.
    Answer the question based on the tools provided.
    """

This approach enables:

  • Centralized prompt management
  • Runtime updates without redeploying agents
  • Shared prompts across multiple agents

Step 5: MCP Server with Redis Event Store

To persist events and conversation history, we use Redis as the event store.

from fastmcp.server.event_store import EventStore
from key_value.aio.stores.redis import RedisStore

redis_store = RedisStore(url="redis://localhost:6379")

event_store = EventStore(
    storage=redis_store,
    max_events_per_stream=100,
    ttl=3600,
)

Creating the MCP App

def create_app():
    register_tools(mcp)
    register_prompts(mcp)
    return mcp.http_app(
        event_store=event_store,
        path="/mcp"
    )

mcp_app = create_app()

This setup ensures tools, prompts, and memory are all managed by MCP.

Step 6: LangGraph Agent Construction

LangGraph is responsible for orchestrating the agent logic.

Loading MCP Tools and Prompts

tools = await load_mcp_tools(session)
system_prompt = await load_mcp_prompt(
    session=session,
    name="common_prompt"
)

Prompt Template

prompt_template = ChatPromptTemplate.from_messages([
    ("system", system_prompt[0].content),
    MessagesPlaceholder("messages")
])

Binding Tools to the LLM

llm_with_tool = llm.bind_tools(tools)
chat_llm = prompt_template | llm_with_tool

This setup allows the LLM to decide when to call tools.

Step 7: LangGraph Workflow Definition

The workflow defines how the agent loops between reasoning and tool execution.

graph = StateGraph(EnrichmentState)

graph.add_node("chat_node", chat_node)
graph.add_node("tool_node", ToolNode(tools=tools))

graph.add_edge(START, "chat_node")
graph.add_conditional_edges(
    "chat_node",
    tools_condition,
    {"tools": "tool_node", "__end__": END}
)
graph.add_edge("tool_node", "chat_node")

graph = graph.compile(checkpointer=MemorySaver())

How the Agent Loop Works

  • The chat node lets the LLM reason
  • If a tool is required, execution moves to the tool node
  • Tool results are fed back to the LLM
  • The loop ends when no further tools are needed

This is a true agent loop, not a single-shot LLM call.

Final Result

At the end of this setup, you have:

  • A FastAPI-powered agent API
  • An MCP-based tool and prompt server
  • LangGraph-driven workflow orchestration
  • Redis-backed memory and event storage
  • A clean separation between API, tools, prompts, and agent logic

This architecture scales well as agents grow more complex and is suitable for real production workloads.

#FastAPI#LangGraph#AI Agents Model Context Protocol#Python#Backend Engineering#LLM Tools#System Design

Read Next

Building RAG with Elasticsearch as a Vector Store
System Design

Building RAG with Elasticsearch as a Vector Store

Build a production-ready RAG system using Elasticsearch as a unified vector store. Learn how to integrate LangChain and Ollama for efficient document retrieval.

Using an MCP Server with LangGraph: A Practical Guide to MCP Adapters
AI Agent

Using an MCP Server with LangGraph: A Practical Guide to MCP Adapters

Learn how to integrate an MCP server with LangGraph using MCP adapters to build deterministic, schema-validated AI agents. This practical guide explains why prompt-only tool calling fails and how MCP enables reliable, production-grade agent workflows.

🚀 Turbopack in Next.js: Does turbopackFileSystemCacheForDev Make Your App Lightning Fast?
Next js

🚀 Turbopack in Next.js: Does turbopackFileSystemCacheForDev Make Your App Lightning Fast?

How to Create a Perfect AWS Security Group (Production-Ready & Secure)
Cloud Security

How to Create a Perfect AWS Security Group (Production-Ready & Secure)

Learn how to design a production-ready AWS Security Group using least-privilege principles for EC2, RDS, and Redis—without breaking your app. AWS Security Group best practices Secure EC2 Security Group RDS Security Group configuration Redis Security Group AWS AWS least privilege networking Cloud security for backend apps

Load Testing FastAPI: Can Your API Handle 1 Million Requests?
Backend Engineering

Load Testing FastAPI: Can Your API Handle 1 Million Requests?

Learn how to load test a FastAPI application using Apache JMeter to simulate one million requests, analyze throughput and latency, and uncover real production bottlenecks before traffic hits.

How to Use PostgreSQL for LangGraph Memory and Checkpointing with FastAPI
AI Engineering

How to Use PostgreSQL for LangGraph Memory and Checkpointing with FastAPI

A deep dive into real-world issues when integrating LangGraph with FastAPI and Postgres. Learn why async context managers break checkpointing, how to fix _AsyncGeneratorContextManager errors, create missing tables, and deploy LangGraph agents correctly in production.

Building a Simple AI Agent Using FastAPI, LangGraph, and MCP
AI Agents

Building a Simple AI Agent Using FastAPI, LangGraph, and MCP

Build a production-ready AI agent using FastAPI, LangGraph, and MCP. Learn how to design tool-calling agents with memory, Redis persistence, and clean workflow orchestration.

Using Celery With FastAPI: Solving Async Event Loop Errors Cleanly--
Backend Engineering

Using Celery With FastAPI: Solving Async Event Loop Errors Cleanly--

Learn why async/await fails inside Celery tasks when using FastAPI, and discover a clean, production-safe pattern to avoid event loop errors using internal FastAPI endpoints.Python FastAPI Celery AsyncProgramming BackendEngineering DistributedSystems Microservices

Sharding PostgreSQL for Django Applications with Citus
Databases

Sharding PostgreSQL for Django Applications with Citus

Scale your Django application horizontally by sharding PostgreSQL with the Citus extension. Improve performance with distributed storage and parallel queries.