they repeat analysis. they forget preferences. they don't learn. the memory layer is the difference between a demo and a product 50,000 people use daily.
the memory layer has become the most critical differentiator between toy agents and production-grade systems.
this is the complete guide to building it.
inspired by cognitive science. each type has a different job, different storage, different retrieval pattern.
the context window. recent messages, current tool outputs, active reasoning chain. bounded, expensive, and the only memory most agents have today. this is the scratchpad.
records of past interactions with timestamps and context. "last tuesday, this user asked about kubernetes and got frustrated with the YAML examples." enables learning from experience.
facts, knowledge, user preferences extracted from interactions. "this user prefers typescript over javascript" or "this codebase uses pnpm." the knowledge base that grows over time.
learned behaviors and strategies. "when this user is stuck, use socratic questioning instead of giving direct answers." the hardest to implement, the most valuable when it works.
hot path for speed. cold path for depth. a memory node decides what to save after each turn.
user query
|
v
+-----------------+
| working memory | hot path: recent messages + summarized state
| (context window) | fast, expensive per token, bounded
+-----------------+
|
v
+-----------------+
| memory node | decides: save? retrieve? forget?
| (orchestrator) | runs after every turn
+-----------------+
| |
v v
+--------+ +----------+
| vector | | knowledge | cold path: Mem0 / Zep / Pinecone
| store | | graph | cheap, persistent, searchable
+--------+ +----------+
| |
v v
+-----------------+
| retrieval & | semantic search + graph traversal
| reranking | inject relevant memories into context
+-----------------+
|
v
LLM generates response with memory-augmented context
four frameworks dominate. each makes a different tradeoff.
| framework | what it is | best for | tradeoff | stars |
|---|---|---|---|---|
| Mem0 | memory layer (bolt-on) | adding memory to existing agents. simple personalization, user preferences. clean SDKs, managed cloud. | passive extraction only. no self-editing. limited for complex workflows. | ~48K |
| Zep | temporal knowledge graph | enterprise. tracks entity and relationship changes with validity windows. SOC2 + HIPAA. built on Graphiti engine. | heavier infrastructure. more opinionated. graph complexity. | ~3K |
| Letta (MemGPT) | agent runtime | stateful agents with self-editing memory. OS-inspired architecture. agents manage their own memory blocks. | full runtime, not just a library. more moving parts. vendor lock-in risk. | ~15K |
| Cognee | knowledge pipeline | structured knowledge extraction. builds knowledge graphs from unstructured data. good for RAG enhancement. | more of a pipeline than a memory system. requires integration work. | ~10K |
patterns from teams running memory-augmented agents in production.
not everything should be remembered. define explicit policies: what gets saved, what gets retrieved, what gets garbage collected. memory without forgetting is just hoarding.
summarize old interactions before storing. a 50-turn conversation becomes 3 key facts. working memory compression is now its own research field. compress or go broke on tokens.
check working memory first (free). then vector search (fast). then graph traversal (expensive). cascade through tiers. most queries resolve at the cheapest layer.
inject retrieved memories into a structured block in the system prompt. label them clearly: "[from 3 days ago]" or "[user preference]". the model needs to know what is memory vs instruction.
old memory says "user likes python." new interaction says "user switched to rust." temporal knowledge graphs handle this. without conflict resolution, memory becomes noise.
memory stores PII by definition. encrypt at rest. scope access per user. implement right-to-forget. GDPR and HIPAA are not optional. build deletion into the architecture from day one.
the research frontier. where the next breakthroughs will come from.
agents "remember" things that never happened. false memories injected through retrieval noise. no reliable detection method exists yet.
unsolvedmaintaining a consistent identity and knowledge state across hundreds of sessions over months. current systems degrade. memories contradict each other.
unsolvedadversarial users can inject false information into an agent's long-term memory. one bad interaction corrupts future behavior. no robust defense exists.
unsolvedwhat should an agent forget and when? too much memory is noise. too little is amnesia. the forgetting curve for AI agents is an open research question.
partially solvedlearning HOW to do things from experience, not just WHAT happened. the gap between episodic recall and behavioral adaptation. closest work: brain-inspired multi-memory frameworks (RoboMemory, arxiv 2026).
partially solvedas memory grows, retrieval gets slower and context injection gets more expensive. linear scaling is not good enough. sublinear memory access is the goal.
partially solvedthe best agent is not the one with the best model. it is the one that remembers.
this page is a living document. it updates as the research moves.