$6.27B market in 2025. $28.45B by 2030.

agents without memory are expensive toys.

they repeat analysis. they forget preferences. they don't learn. the memory layer is the difference between a demo and a product 50,000 people use daily.


the memory layer has become the most critical differentiator between toy agents and production-grade systems.

this is the complete guide to building it.


TYPES_PLACEHOLDER
001 / the four types

memory is not one thing. it is four systems working together.

inspired by cognitive science. each type has a different job, different storage, different retrieval pattern.

W

working memory

"i am thinking about this right now"

the context window. recent messages, current tool outputs, active reasoning chain. bounded, expensive, and the only memory most agents have today. this is the scratchpad.

last 5 messages + current function call results + system prompt = your working memory
E

episodic memory

"i remember that event"

records of past interactions with timestamps and context. "last tuesday, this user asked about kubernetes and got frustrated with the YAML examples." enables learning from experience.

session logs, interaction histories, timestamped events with emotional/outcome metadata
S

semantic memory

"i know this fact"

facts, knowledge, user preferences extracted from interactions. "this user prefers typescript over javascript" or "this codebase uses pnpm." the knowledge base that grows over time.

user preferences, domain facts, entity relationships, learned patterns
P

procedural memory

"i know how to do this"

learned behaviors and strategies. "when this user is stuck, use socratic questioning instead of giving direct answers." the hardest to implement, the most valuable when it works.

tool usage patterns, response strategies, workflow templates, learned heuristics

002 / architecture

the dual-layer pattern that works in production

hot path for speed. cold path for depth. a memory node decides what to save after each turn.

  user query
      |
      v
  +-----------------+
  |  working memory  |  hot path: recent messages + summarized state
  |  (context window) |  fast, expensive per token, bounded
  +-----------------+
      |
      v
  +-----------------+
  |   memory node    |  decides: save? retrieve? forget?
  |   (orchestrator) |  runs after every turn
  +-----------------+
      |           |
      v           v
  +--------+  +----------+
  | vector  |  | knowledge |  cold path: Mem0 / Zep / Pinecone
  | store   |  | graph     |  cheap, persistent, searchable
  +--------+  +----------+
      |           |
      v           v
  +-----------------+
  |  retrieval &     |  semantic search + graph traversal
  |  reranking       |  inject relevant memories into context
  +-----------------+
      |
      v
  LLM generates response with memory-augmented context
      

003 / frameworks

the tools that exist today

four frameworks dominate. each makes a different tradeoff.

framework what it is best for tradeoff stars
Mem0 memory layer (bolt-on) adding memory to existing agents. simple personalization, user preferences. clean SDKs, managed cloud. passive extraction only. no self-editing. limited for complex workflows. ~48K
Zep temporal knowledge graph enterprise. tracks entity and relationship changes with validity windows. SOC2 + HIPAA. built on Graphiti engine. heavier infrastructure. more opinionated. graph complexity. ~3K
Letta (MemGPT) agent runtime stateful agents with self-editing memory. OS-inspired architecture. agents manage their own memory blocks. full runtime, not just a library. more moving parts. vendor lock-in risk. ~15K
Cognee knowledge pipeline structured knowledge extraction. builds knowledge graphs from unstructured data. good for RAG enhancement. more of a pipeline than a memory system. requires integration work. ~10K
004 / production patterns

what actually works at scale

patterns from teams running memory-augmented agents in production.

01

write/read/forget policies

not everything should be remembered. define explicit policies: what gets saved, what gets retrieved, what gets garbage collected. memory without forgetting is just hoarding.

02

memory compression

summarize old interactions before storing. a 50-turn conversation becomes 3 key facts. working memory compression is now its own research field. compress or go broke on tokens.

03

tiered retrieval

check working memory first (free). then vector search (fast). then graph traversal (expensive). cascade through tiers. most queries resolve at the cheapest layer.

04

memory-aware prompting

inject retrieved memories into a structured block in the system prompt. label them clearly: "[from 3 days ago]" or "[user preference]". the model needs to know what is memory vs instruction.

05

conflict resolution

old memory says "user likes python." new interaction says "user switched to rust." temporal knowledge graphs handle this. without conflict resolution, memory becomes noise.

06

privacy-first design

memory stores PII by definition. encrypt at rest. scope access per user. implement right-to-forget. GDPR and HIPAA are not optional. build deletion into the architecture from day one.

005 / unsolved

the hard problems nobody has cracked yet

the research frontier. where the next breakthroughs will come from.

memory hallucination

agents "remember" things that never happened. false memories injected through retrieval noise. no reliable detection method exists yet.

unsolved

cross-session coherence

maintaining a consistent identity and knowledge state across hundreds of sessions over months. current systems degrade. memories contradict each other.

unsolved

memory poisoning

adversarial users can inject false information into an agent's long-term memory. one bad interaction corrupts future behavior. no robust defense exists.

unsolved

optimal forgetting

what should an agent forget and when? too much memory is noise. too little is amnesia. the forgetting curve for AI agents is an open research question.

partially solved

procedural memory acquisition

learning HOW to do things from experience, not just WHAT happened. the gap between episodic recall and behavioral adaptation. closest work: brain-inspired multi-memory frameworks (RoboMemory, arxiv 2026).

partially solved

memory cost scaling

as memory grows, retrieval gets slower and context injection gets more expensive. linear scaling is not good enough. sublinear memory access is the goal.

partially solved

the best agent is not the one with the best model. it is the one that remembers.

this page is a living document. it updates as the research moves.