Mem0 vs Letta vs Zep: Agent Memory 2026

Q: Is Mem0 better than RAG for agent memory?

For chat-history recall, Mem0 outperforms naive RAG because it stores extracted facts instead of raw message chunks. That produces cleaner retrievals and uses 80–90% fewer tokens. Use RAG for document knowledge bases and Mem0 (or Letta/Zep) for user/session memory.

Q: What is the difference between MemGPT and Letta?

Letta is the production framework built by the original MemGPT authors. MemGPT is the research paper; Letta is the supported open-source server and SDK that implements its memory-block architecture, with a managed cloud option.

Q: Can I use Mem0, Letta, or Zep with Claude or open-source models?

Yes. All three are model-agnostic: you pass an LLM provider (Anthropic, OpenAI, Bedrock, Ollama, vLLM) in the config. Letta's agent runtime calls your chosen model for both reasoning and memory management; Mem0 and Zep call the LLM only for fact extraction and contradiction resolution.

Q: How much does agent memory add to per-turn latency?

Expect 50–200 ms for read latency across all three frameworks when self-hosted with a co-located vector store. Writes are typically async: Mem0 and Zep return immediately and process extraction in the background. Letta's synchronous core-memory updates can add 150–400 ms to turns where the agent edits memory.

Q: Do I need a knowledge graph for agent memory?

Only if your facts have temporal validity (a subscription tier, a ticket state, an organizational role that changes). For static personal preferences a vector-based store like Mem0 is simpler and cheaper. Use Zep when "as of date X, what was true?" is a real query in your domain.

Updated: May 26, 2026

Mem0, Letta, and Zep are the three serious choices for agent memory in production. Each gives your AI agents durable long-term memory beyond the context window, but they get there in very different ways. Mem0 is a lightweight layer that extracts facts and shoves them into a vector store. Letta exposes an explicit memory-block API descended from the original MemGPT research. Zep is built on a temporal knowledge graph (Graphiti) that tracks how facts change over time. This guide compares all three on architecture, latency, recall accuracy, and operational cost, so you can pick the right one for your stack.

Mem0 stores memories as extracted facts in a vector database. It's the simplest drop-in option, with single-digit millisecond write latency and an OpenAI-compatible API.
Letta (formerly MemGPT) gives agents an editable core-memory block plus archival memory. It's the best choice when you need self-editing, persona-aware agents with full state.
Zep uses Graphiti, a temporal knowledge graph that resolves contradictions and tracks fact validity windows. Ideal for support agents and CRMs where data shifts over time.
For pure RAG-like recall on long chat histories, Mem0 wins on cost. For stateful long-running agents, Letta wins. For temporally accurate fact tracking, Zep wins.
All three integrate with LangGraph, CrewAI, and the OpenAI/Anthropic SDKs, but only Letta ships its own agent runtime.
Self-hosting works for all three. Managed cloud is available via Mem0 Platform, Letta Cloud, and Zep Cloud.

What is agent memory and why do you need it?

Agent memory is the persistent store an LLM agent reads from and writes to so it can remember facts, preferences, and prior decisions across conversations. Without it, every interaction starts from a blank context window. You then face an ugly choice: re-stuff entire chat histories on every turn (expensive, hits token limits), or accept that the agent forgets the user's name between sessions.

Three problems make this harder than naive RAG over chat logs:

Working vs. long-term memory. A chat agent needs short-term scratchpad memory during a task and durable memory after. Cognitive-science-inspired splits map cleanly onto framework concepts (Letta's core vs. archival, Mem0's short-term vs. long-term).
Contradiction and decay. "The user lives in Berlin" becomes false when they move. Vector search returns both old and new facts unless the framework reconciles them.
Recall precision. Pure semantic search over conversation chunks produces noisy retrievals. Extracted-fact memory (Mem0) and graph memory (Zep) outperform naive RAG on the LOCOMO benchmark by 20–40 percentage points.

Honestly, this is the part most teams underestimate. I hit this exact bug shipping a coaching app last quarter: the agent kept "forgetting" what plan the user was on, even though we stored the chat log. Storing isn't remembering. If you're starting from scratch and want the broader picture before vendor commitment, our LLM memory and state management patterns guide walks through the architectural choices first.

Mem0 vs Letta vs Zep comparison table

This summary covers the dimensions teams actually compare when shortlisting a memory framework. Read the deep dives below for the reasoning behind each row.

Dimension	Mem0	Letta (MemGPT)	Zep (Graphiti)
Core abstraction	Extracted facts in vector store	Editable memory blocks + archival	Temporal knowledge graph
Best for	Personalization, chat history compression	Stateful long-running agents	Temporal facts, CRMs, support
Self-hosting	Yes (Apache 2.0)	Yes (Apache 2.0)	Yes (Apache 2.0, Graphiti)
Managed cloud	Mem0 Platform	Letta Cloud	Zep Cloud
Storage	Qdrant, pgvector, Chroma, Pinecone	PostgreSQL + pgvector	Neo4j or FalkorDB
Write latency (p50)	~80–200 ms (async)	~150–400 ms (sync core update)	~300–800 ms (graph extraction)
Contradiction handling	Update-or-replace via LLM judge	Manual edits via tools	Automatic temporal invalidation
Agent runtime included	No (memory layer only)	Yes (full agent server)	No (memory layer only)
Framework integrations	LangGraph, CrewAI, AutoGen	LangGraph, custom	LangGraph, CrewAI, LlamaIndex

Mem0 deep dive: extracted-fact memory

Mem0 is the simplest framework to slot into an existing agent. You hand it a list of messages, and an extractor LLM pulls out salient facts ("User prefers dark mode", "User's manager is Alice") and writes them to a vector store. On retrieval, Mem0 runs a semantic search plus an optional re-ranker and returns memory strings you concatenate into the system prompt.

What makes Mem0 fast is that the heavy lifting happens asynchronously. memory.add() can return before the extractor finishes, with writes batched to the vector store. That keeps the user-facing turn under typical latency budgets.

from mem0 import Memory

memory = Memory.from_config({
    "vector_store": {
        "provider": "qdrant",
        "config": {"host": "localhost", "port": 6333, "collection_name": "agent_mem"},
    },
    "llm": {"provider": "anthropic", "config": {"model": "claude-sonnet-4-6"}},
    "embedder": {"provider": "openai", "config": {"model": "text-embedding-3-small"}},
})

# Write: Mem0 extracts atomic facts from the conversation turn
memory.add(
    messages=[
        {"role": "user", "content": "I just moved to Lisbon and started learning Portuguese."},
        {"role": "assistant", "content": "Boa sorte! Want a study plan?"},
    ],
    user_id="user_42",
)

# Read: returns ranked memory strings for prompt injection
results = memory.search(query="Where does the user live?", user_id="user_42", limit=5)
for hit in results["results"]:
    print(hit["memory"], "score:", hit["score"])

Mem0 handles contradictions with an LLM judge. When a new fact arrives ("I moved to Madrid"), the framework retrieves nearby memories and asks the LLM whether the new fact updates, replaces, or adds. This works well for slow-changing personal facts, though it can produce double-writes under bursty updates if you don't serialize per-user.

The Mem0 team published benchmarks on LOCOMO showing roughly 26% accuracy improvement over a strong RAG baseline, and 90% fewer tokens than dumping the full chat history into context. See the Mem0 GitHub repository for the methodology.

Letta deep dive: stateful agents with memory blocks

Letta is the production successor to the MemGPT paper (arXiv 2310.08560). Instead of treating memory as a side-car service, Letta makes memory a first-class part of the agent runtime. Every Letta agent has three parts:

Core memory blocks. Small, always-in-context strings (e.g., persona, user profile) the agent can self-edit with tool calls.
Archival memory. An unbounded vector store the agent searches when it needs older facts.
Recall memory. Full conversational history, queryable by date or content.

The agent itself decides when to call core_memory_replace or archival_memory_insert. Memory management is an emergent behavior of the LLM, not a hard-coded pipeline. That makes Letta the right pick when you need agents that maintain long-running personas or evolving task state across days.

from letta_client import Letta

client = Letta(token="YOUR_LETTA_TOKEN")  # or local server

agent = client.agents.create(
    model="anthropic/claude-sonnet-4-6",
    embedding="openai/text-embedding-3-small",
    memory_blocks=[
        {"label": "persona", "value": "You are Aria, a calm onboarding coach."},
        {"label": "human", "value": "Name: unknown. Goals: unknown."},
    ],
)

# The agent will call core_memory_replace on the 'human' block when it learns new facts
reply = client.agents.messages.create(
    agent_id=agent.id,
    messages=[{"role": "user", "content": "Hi! I'm Sam, prepping for the AWS SAA exam."}],
)
print(reply.messages[-1].content)

The trade-off is operational weight. A Letta deployment runs a Postgres database, a server process, and per-agent state. That's heavier than just wrapping a memory client. Latency also rises, because every turn may trigger memory-management tool calls. Plan for 1.5–3× the per-turn token cost of a stateless agent, offset by far better long-context behavior.

Zep deep dive: temporal knowledge graphs with Graphiti

Zep takes a third approach. It builds a knowledge graph of entities and relationships as the conversation flows, using its open-source Graphiti library. Each fact is stored as an edge with a validity window. When a contradicting fact appears, the old edge is invalidated rather than overwritten, so the agent can still answer "what did the user prefer last quarter?"

This temporal model matters for support, sales, and CRM agents where the same entity changes attributes over time. A user upgrades from Starter to Pro. A ticket moves from open to closed. A deal stage advances. Vector-only memory would surface both states as equally relevant; Zep ranks the current state higher while preserving history.

from zep_cloud.client import Zep

zep = Zep(api_key="YOUR_ZEP_API_KEY")

# Create a session per user conversation
zep.memory.add_session(session_id="sess_42", user_id="user_42")

# Stream turns; Graphiti extracts entities, relations, and temporal facts in the background
zep.memory.add(
    session_id="sess_42",
    messages=[
        {"role": "user", "content": "I'm on the Starter plan but I want to upgrade to Pro."},
        {"role": "assistant", "content": "Done, you're now on Pro as of today."},
    ],
)

# Retrieve memory for the next turn: returns a context block with current facts + relevant history
memory = zep.memory.get(session_id="sess_42")
print(memory.context)  # ready-to-inject system-prompt string

Zep's memory.get() returns a pre-formatted context block that consolidates facts, recent messages, and a relevance-scored history snippet. That saves you from writing a custom prompt-assembly step. Graph extraction takes a few hundred milliseconds and runs asynchronously, so user-facing latency stays low.

Which agent memory framework should you choose?

The decision usually comes down to three questions:

Do you need temporal accuracy? If your domain has facts that change (subscription tier, ticket status, address), Zep's bi-temporal graph is the only framework that handles this natively. Mem0 and Letta both require you to overwrite old facts manually.
Do you need the LLM itself to manage memory? If your agent should reason about what to remember (a long-running research assistant, an autonomous coding agent), Letta's self-editing memory blocks are the right model. Memory becomes a tool the agent uses, not a sidecar.
Do you just need persistent personalization? Add Mem0. It's the smallest cognitive load on your stack and the cheapest to operate, with no graph database to run.

For most teams shipping a chat product in 2026, the answer is "start with Mem0 plus a separate task tracker, migrate to Letta or Zep when you outgrow it." If you're building a system of record (customer support, sales enablement) where temporal correctness is part of the product, skip Mem0 and start with Zep.

Either way, pair your memory layer with an evaluation harness. Agents that look great with hand-picked test cases regress quietly on real traffic (I learned this the hard way). Our LLM evaluation pipelines guide covers the recall and faithfulness metrics that matter for memory-augmented agents.

How do you integrate agent memory with LangGraph?

All three frameworks expose simple read/write APIs, so a LangGraph integration is just two nodes wrapped around your model call: a memory read node before generation, and a memory write node after. Here's the canonical pattern with Mem0; substitute the client for Letta or Zep:

from langgraph.graph import StateGraph, END
from langchain_anthropic import ChatAnthropic
from mem0 import Memory

memory = Memory()  # configured as above
llm = ChatAnthropic(model="claude-sonnet-4-6")

def read_memory(state):
    hits = memory.search(query=state["input"], user_id=state["user_id"], limit=6)
    state["memories"] = "\n".join(h["memory"] for h in hits["results"])
    return state

def generate(state):
    sys = f"Relevant memories:\n{state['memories']}"
    state["output"] = llm.invoke([("system", sys), ("user", state["input"])]).content
    return state

def write_memory(state):
    memory.add(
        messages=[
            {"role": "user", "content": state["input"]},
            {"role": "assistant", "content": state["output"]},
        ],
        user_id=state["user_id"],
    )
    return state

graph = StateGraph(dict)
graph.add_node("read", read_memory)
graph.add_node("generate", generate)
graph.add_node("write", write_memory)
graph.set_entry_point("read")
graph.add_edge("read", "generate")
graph.add_edge("generate", "write")
graph.add_edge("write", END)
app = graph.compile()

For Letta, replace the three nodes with a single call to client.agents.messages.create(). Letta runs the read/write cycle internally, which is precisely why it's heavier. For Zep, swap memory.search for zep.memory.get(session_id=...), which returns a pre-formatted context block.

Production pitfalls and how to avoid them

A few failure modes show up repeatedly when teams ship agent memory to real users. In my last project I tripped over the first two within the same week, so they're worth a closer look.

Memory pollution from off-topic turns. If a user vents about traffic, do you really want "User dislikes Highway 101" persisted forever? Configure your extractor with a tighter system prompt, or use Mem0's infer=False option to only store explicit memory writes from the agent.
Embedding drift on model upgrades. Switching from text-embedding-3-small to a newer embedder invalidates all stored vectors. Plan a re-embedding job and version your collections (agent_mem_v2).
Cost blow-up from extractor LLMs. Mem0 and Zep both call an LLM on every add. At high write volume that dominates cost. Route the extractor to a cheap model (Haiku 4.5, GPT-4o-mini) and reserve your main model for generation. The LLM cost optimization patterns guide goes deeper on model routing.
PII leakage across users. Always namespace by user/tenant; never let one agent search the shared memory pool. Add a guardrail that strips personal data from system memories (the agent's persona shouldn't include phone numbers).
Silent recall failures. If a memory retrieval misses, the agent confabulates and the user doesn't know. Log every retrieval with its score distribution, alert on retrievals with top-score < 0.5, and add a "I don't know" tool the model can call when memory is empty.

Frequently Asked Questions

Is Mem0 better than RAG for agent memory?

For chat-history recall, Mem0 outperforms naive RAG because it stores extracted facts instead of raw message chunks. That produces cleaner retrievals and uses 80–90% fewer tokens. Use RAG for document knowledge bases and Mem0 (or Letta/Zep) for user/session memory.

What is the difference between MemGPT and Letta?

Letta is the production framework built by the original MemGPT authors. MemGPT is the research paper; Letta is the supported open-source server and SDK that implements its memory-block architecture, with a managed cloud option.

Can I use Mem0, Letta, or Zep with Claude or open-source models?