Deep Dives

Understanding Agent Memory Systems: Short-Term, Long-Term, and Episodic

Andrius Putna • Fri Dec 27 2024 • 6 min read •

#ai#agents#memory#architecture#langchain#context-window#vector-database#cognitive-architecture

Understanding Agent Memory Systems: Short-Term, Long-Term, and Episodic

When you have a conversation with someone, you rely on multiple types of memory simultaneously. You remember what was just said (short-term), draw on knowledge you’ve accumulated over years (long-term), and recall specific past experiences (episodic). AI agents face the same challenge—but with fundamentally different constraints and mechanisms.

Memory is what separates a stateless language model from a true agent. Without memory, every interaction starts from zero. With well-designed memory systems, agents can learn, adapt, and maintain coherent behavior across extended interactions. This deep dive explores how modern AI agents implement memory, the tradeoffs involved, and practical patterns for building memory-aware systems.

The Memory Challenge for AI Agents

Language models like GPT-4 or Claude have a fundamental limitation: they’re stateless. Each API call is independent. The model doesn’t inherently remember previous conversations or accumulate knowledge over time. Everything it knows must fit in the context window—the limited amount of text it can process in a single call.

This creates several problems:

Context windows are finite: Even with 128k or 200k token windows, long-running agents quickly exhaust available space
Irrelevant information crowds out relevant: As context grows, the model must process everything, even outdated information
No learning across sessions: Yesterday’s interaction is forgotten unless explicitly preserved
Expensive computation: Processing large contexts costs more in both time and API costs

Agent memory systems solve these problems by selectively storing, retrieving, and managing information outside the model’s context window.

Short-Term Memory: The Working Context

Short-term memory in AI agents mirrors human working memory—it holds the immediately relevant information needed for the current task. This typically includes:

The current conversation history
Active task state and goals
Recently retrieved documents or data
Intermediate reasoning steps

Implementation Patterns

The simplest short-term memory is raw conversation history:

class SimpleShortTermMemory:
    def __init__(self, max_messages: int = 20):
        self.messages = []
        self.max_messages = max_messages

    def add_message(self, role: str, content: str):
        self.messages.append({"role": role, "content": content})
        # Sliding window: keep only recent messages
        if len(self.messages) > self.max_messages:
            self.messages = self.messages[-self.max_messages:]

    def get_context(self) -> list:
        return self.messages

This approach has an obvious limitation: it treats all messages equally. A more sophisticated approach uses summarization to compress older context:

class SummarizingMemory:
    def __init__(self, llm, summary_threshold: int = 10):
        self.llm = llm
        self.summary = ""
        self.recent_messages = []
        self.threshold = summary_threshold

    def add_message(self, role: str, content: str):
        self.recent_messages.append({"role": role, "content": content})

        if len(self.recent_messages) > self.threshold:
            # Summarize older messages
            to_summarize = self.recent_messages[:-5]
            self.summary = self._summarize(self.summary, to_summarize)
            self.recent_messages = self.recent_messages[-5:]

    def _summarize(self, existing_summary: str, messages: list) -> str:
        prompt = f"""Existing summary: {existing_summary}

New messages to incorporate: {messages}

Provide an updated summary that captures key information, decisions made, and current context."""
        return self.llm.invoke(prompt)

This pattern—often called conversation compaction—preserves semantic content while reducing token usage. The tradeoff is that summaries lose detail and require additional LLM calls.

Buffer Strategies

Different use cases call for different buffering strategies:

Token buffer: Keep messages until a token limit is reached, then summarize
Sliding window: Keep the N most recent messages, discard older ones
Entity-aware: Extract and track entities (people, concepts) separately from conversation flow
Importance-weighted: Score messages by relevance and preserve high-scoring ones longer

LangChain provides built-in implementations through ConversationBufferMemory, ConversationSummaryMemory, and ConversationBufferWindowMemory.

Long-Term Memory: Persistent Knowledge

Long-term memory stores information that should persist across sessions and be retrievable when relevant. Unlike short-term memory, which is always in context, long-term memory requires explicit retrieval.

Vector-Based Retrieval

The most common pattern uses vector databases for long-term storage:

class VectorLongTermMemory:
    def __init__(self, embeddings, vectorstore):
        self.embeddings = embeddings
        self.vectorstore = vectorstore

    def store(self, text: str, metadata: dict = None):
        """Store information for later retrieval."""
        self.vectorstore.add_texts(
            texts=[text],
            metadatas=[metadata] if metadata else None
        )

    def retrieve(self, query: str, k: int = 5) -> list[str]:
        """Retrieve relevant memories based on semantic similarity."""
        docs = self.vectorstore.similarity_search(query, k=k)
        return [doc.page_content for doc in docs]

This approach excels at finding semantically related information even when the query uses different terminology. The agent can store facts, user preferences, past interactions, and domain knowledge, then retrieve relevant pieces when needed.

Structured Knowledge Storage

Sometimes you need more than semantic search. Structured storage enables precise queries:

class StructuredMemory:
    def __init__(self):
        self.entities = {}  # entity_name -> attributes
        self.relationships = []  # (entity1, relation, entity2)

    def add_entity(self, name: str, entity_type: str, attributes: dict):
        self.entities[name] = {
            "type": entity_type,
            "attributes": attributes
        }

    def add_relationship(self, entity1: str, relation: str, entity2: str):
        self.relationships.append((entity1, relation, entity2))

    def query_entity(self, name: str) -> dict:
        return self.entities.get(name)

    def query_relationships(self, entity: str) -> list:
        return [r for r in self.relationships if entity in (r[0], r[2])]

Knowledge graphs combine semantic retrieval with structured queries. Tools like Neo4j, and frameworks like LangChain’s GraphCypherQAChain, enable agents to reason over complex relationship networks.

Episodic Memory: Experience Recall

Episodic memory stores specific experiences—complete interactions, task executions, or problem-solving sessions—that can be recalled and learned from. This is particularly valuable for:

Learning from past mistakes
Recalling how similar problems were solved before
Maintaining consistent behavior based on precedent
Building user-specific interaction history

Implementation Pattern

class EpisodicMemory:
    def __init__(self, embeddings, vectorstore):
        self.embeddings = embeddings
        self.vectorstore = vectorstore

    def store_episode(self, episode: dict):
        """Store a complete episode with full context."""
        # Episode structure:
        # - trigger: what initiated the episode
        # - actions: what the agent did
        # - outcome: what happened (success/failure)
        # - lessons: what was learned

        episode_text = f"""
        Situation: {episode['trigger']}
        Actions taken: {episode['actions']}
        Outcome: {episode['outcome']}
        Key learnings: {episode.get('lessons', 'None recorded')}
        """

        self.vectorstore.add_texts(
            texts=[episode_text],
            metadatas=[{
                "type": "episode",
                "timestamp": episode.get("timestamp"),
                "success": episode.get("success", True)
            }]
        )

    def recall_similar_episodes(self, situation: str, k: int = 3) -> list:
        """Find past episodes similar to the current situation."""
        return self.vectorstore.similarity_search(
            situation,
            k=k,
            filter={"type": "episode"}
        )

The key distinction from long-term memory is that episodes are complete narratives with context, actions, and outcomes—not just facts. This enables agents to reason by analogy: “Last time I encountered a similar situation, I did X and it worked/failed.”

Memory Architecture Patterns

Real-world agents typically combine multiple memory types. Here’s a unified architecture:

class AgentMemorySystem:
    def __init__(self, llm, embeddings, vectorstore):
        self.short_term = SummarizingMemory(llm)
        self.long_term = VectorLongTermMemory(embeddings, vectorstore)
        self.episodic = EpisodicMemory(embeddings, vectorstore)

    def build_context(self, current_input: str) -> str:
        """Assemble context from all memory systems."""

        # Always include recent conversation
        recent = self.short_term.get_context()

        # Retrieve relevant long-term memories
        relevant_facts = self.long_term.retrieve(current_input, k=3)

        # Find similar past episodes
        past_episodes = self.episodic.recall_similar_episodes(current_input, k=2)

        context = f"""
## Conversation History
{recent}

## Relevant Knowledge
{relevant_facts}

## Similar Past Situations
{past_episodes}
"""
        return context

Memory Consolidation

Just as humans consolidate memories during sleep, agents benefit from periodic memory maintenance:

Deduplication: Merge redundant stored information
Importance scoring: Promote frequently-accessed memories, demote unused ones
Consistency checking: Identify and resolve contradictory stored facts
Summarization: Compress detailed episodes into generalized knowledge

Practical Considerations

Retrieval Quality

Memory is only useful if the right information is retrieved at the right time. Common issues include:

Over-retrieval: Including irrelevant memories that confuse the model
Under-retrieval: Missing critical information due to poor embedding similarity
Staleness: Retrieving outdated information that’s no longer accurate

Solutions include hybrid search (combining semantic and keyword matching), recency weighting, and explicit memory invalidation.

Cost and Latency

Memory operations add latency and cost:

Embedding generation for storage and retrieval
Vector database queries
Additional context tokens for retrieved memories

Design memory systems with these costs in mind. Not every interaction needs full memory retrieval—use heuristics to decide when memory lookup is worthwhile.

Privacy and Security

Stored memories may contain sensitive information. Consider:

Encryption for stored memories
Access controls limiting what agents can remember about whom
Retention policies that automatically expire old memories
User controls for viewing and deleting stored information

The Future of Agent Memory

Current memory systems are relatively primitive compared to human cognition. Emerging research explores:

Learned retrieval: Models that learn what to remember and when to retrieve, rather than relying on fixed heuristics
Compositional memory: Building complex memories from simpler primitives
Cross-agent memory: Sharing learned knowledge across agent instances
Continual learning: Updating model weights based on experiences, not just storing external data

Memory is a foundational capability for truly autonomous agents. As models grow more capable, sophisticated memory architectures will enable agents that learn from experience, maintain consistent personalities, and build genuine expertise over time.

Key Takeaways

Agent memory solves the statefulness problem inherent in LLM architectures
Short-term memory maintains working context through conversation buffers and summarization
Long-term memory uses vector databases for semantic retrieval of persistent knowledge
Episodic memory stores complete experiences for reasoning by analogy
Real-world agents combine multiple memory types with careful retrieval orchestration
Memory operations have cost and latency implications that require thoughtful design

Understanding memory systems is essential for building agents that can maintain context, learn from experience, and operate coherently over extended interactions. The patterns described here provide a foundation—adapt them to your specific use case and constraints.

This post concludes our Week 2 deep dive series. For hands-on practice with memory systems, check out our RAG tutorial, explore our Complete Guide to AI Agent Frameworks, or reference our AI Agents Glossary for memory-related terminology.

Understanding Agent Memory Systems: Short-Term, Long-Term, and Episodic

Understanding Agent Memory Systems: Short-Term, Long-Term, and Episodic

The Memory Challenge for AI Agents

Short-Term Memory: The Working Context

Implementation Patterns

Buffer Strategies

Long-Term Memory: Persistent Knowledge

Vector-Based Retrieval

Structured Knowledge Storage

Episodic Memory: Experience Recall

Implementation Pattern

Memory Architecture Patterns

Memory Consolidation

Practical Considerations

Retrieval Quality

Cost and Latency

Privacy and Security

The Future of Agent Memory

Key Takeaways

Related Posts

Building a RAG Agent with LangChain: Complete Tutorial

Semantic Kernel vs LangChain: Choosing the Right Framework for Enterprise AI Agents

Multi-Agent Collaboration Patterns: Hierarchical, Peer-to-Peer, and Hybrid Architectures

Understanding Agent Memory Systems: Short-Term, Long-Term, and Episodic

Understanding Agent Memory Systems: Short-Term, Long-Term, and Episodic

The Memory Challenge for AI Agents

Short-Term Memory: The Working Context

Implementation Patterns

Buffer Strategies

Long-Term Memory: Persistent Knowledge

Vector-Based Retrieval

Structured Knowledge Storage

Episodic Memory: Experience Recall

Implementation Pattern

Memory Architecture Patterns

Memory Consolidation

Practical Considerations

Retrieval Quality

Cost and Latency

Privacy and Security

The Future of Agent Memory

Key Takeaways

Related Posts

Building a RAG Agent with LangChain: Complete Tutorial

Semantic Kernel vs LangChain: Choosing the Right Framework for Enterprise AI Agents

Multi-Agent Collaboration Patterns: Hierarchical, Peer-to-Peer, and Hybrid Architectures

Don't miss out on AI insights