Advanced Agentscope 7 min read

Advanced State and Memory Management in AgentScope

#AgentScope #Memory #State Management #VectorDB

Advanced State and Memory Management in AgentScope is a critical skill for building production-grade, multi-agent systems that maintain coherent context across long-running sessions. As your agents grow more complex — handling interruptions, multi-turn dialogues, and coordinated pipelines — the default in-memory approach quickly shows its limits. This guide goes deep on AgentScope’s memory architecture, showing you how to implement custom memory modules, share state across agent pipelines, and apply optimization strategies that keep your system responsive at scale.


Core Concepts of State and Memory in AgentScope

AgentScope separates memory into two distinct concerns: short-term session state and long-term persistent memory. Understanding this distinction is fundamental to any advanced implementation.

InMemoryMemory is the default memory backend. It holds the conversation history for the current session in RAM, making it fast and simple. However, it is ephemeral — when the process restarts or the agent is reinstantiated, all state is lost. This is acceptable for stateless chatbot endpoints but is entirely unsuitable for agents expected to retain user context across sessions.

The more powerful abstraction is LongTermMemoryBase, an abstract base class that defines the interface for any persistent memory implementation. AgentScope ships with built-in support for SQLite-backed storage, memory compression strategies, and an advanced technique called ReMe (Retrieval-enhanced Memory), designed for scenarios where the memory corpus grows large enough to require selective retrieval.

Agent-state binding in AgentScope is explicit and constructor-level. Unlike some frameworks where memory is a global or ambient service, AgentScope requires you to pass a memory instance directly into the agent constructor:

from agentscope.agent import ReActAgent
from agentscope.memory import InMemoryMemory

memory = InMemoryMemory()
agent = ReActAgent(
    name="Friday",
    sys_prompt="You are a helpful assistant.",
    model_config_name="qwen-turbo",
    memory=memory,
)

This design makes memory configuration explicit and auditable — you always know exactly which memory backend an agent is using.

Note on deprecated APIs: DialogAgent, DictDialogAgent, and the older prompt-based ReAct agent are all deprecated in AgentScope v1.0. Use ReActAgent exclusively for modern applications. Additionally, centralized model configuration (previously done via agentscope.init()) is deprecated — all components including models and memory must be instantiated explicitly.

AgentScope v1.0 is also fully asynchronous. Every agent call must be awaited. Any code using the old synchronous calling convention will fail silently or raise errors.


Implementing Custom Memory Modules for Complex State

For production scenarios, you will need to extend LongTermMemoryBase to create memory backends suited to your infrastructure. Here is a minimal but complete custom SQLite memory module:

import asyncio
import json
import sqlite3
from typing import Any
from agentscope.memory import LongTermMemoryBase


class SQLiteMemory(LongTermMemoryBase):
    """A persistent memory module backed by SQLite."""

    def __init__(self, db_path: str = "agent_memory.db") -> None:
        self.db_path = db_path
        self._conn = sqlite3.connect(db_path, check_same_thread=False)
        self._init_db()

    def _init_db(self) -> None:
        self._conn.execute(
            """
            CREATE TABLE IF NOT EXISTS messages (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                session_id TEXT NOT NULL,
                role TEXT NOT NULL,
                content TEXT NOT NULL,
                timestamp REAL NOT NULL
            )
            """
        )
        self._conn.commit()

    async def add(self, message: dict[str, Any], session_id: str = "default") -> None:
        import time
        self._conn.execute(
            "INSERT INTO messages (session_id, role, content, timestamp) VALUES (?, ?, ?, ?)",
            (session_id, message["role"], json.dumps(message["content"]), time.time()),
        )
        self._conn.commit()

    async def get(self, session_id: str = "default", limit: int = 50) -> list[dict]:
        cursor = self._conn.execute(
            "SELECT role, content FROM messages WHERE session_id = ? ORDER BY id DESC LIMIT ?",
            (session_id, limit),
        )
        rows = cursor.fetchall()
        return [{"role": r[0], "content": json.loads(r[1])} for r in reversed(rows)]

    async def clear(self, session_id: str = "default") -> None:
        self._conn.execute(
            "DELETE FROM messages WHERE session_id = ?", (session_id,)
        )
        self._conn.commit()


# Usage
async def main():
    memory = SQLiteMemory(db_path="friday_memory.db")

    agent = ReActAgent(
        name="Friday",
        sys_prompt="You are a persistent assistant.",
        model_config_name="qwen-turbo",
        memory=memory,
    )

    response = await agent("What is the capital of France?")
    print(response.content)

    # Retrieve stored history
    history = await memory.get()
    print(f"Messages stored: {len(history)}")


asyncio.run(main())

This pattern scales well. You can swap the SQLite backend for any other data store — a remote Postgres instance, a Redis stream, or even a vector database for semantic retrieval — by implementing the same add, get, and clear interface.

For deployments where you need agents to scale horizontally, consider making the session ID a user-scoped identifier (e.g., a UUID tied to the user’s login session). This ensures that even if your application spins up a new agent instance per request, the conversation history is recovered from the database.


Strategies for Persistent and Shared Memory in Pipelines

In multi-agent systems, the challenge is not just storing memory but sharing state coherently between agents. AgentScope’s MsgHub is the primary coordination primitive for this purpose.

MsgHub acts as a central broadcast controller. When agents are registered as participants, any message published to the hub is visible to all participants simultaneously. This enables a shared “working memory” pattern where the output of one agent immediately becomes available context for the next.

import asyncio
from agentscope.agent import ReActAgent
from agentscope.memory import InMemoryMemory
from agentscope.pipeline import MsgHub, sequential_pipeline


async def main():
    shared_memory = InMemoryMemory()

    researcher = ReActAgent(
        name="Researcher",
        sys_prompt="You research and summarize topics concisely.",
        model_config_name="qwen-turbo",
        memory=shared_memory,
    )

    writer = ReActAgent(
        name="Writer",
        sys_prompt="You expand research summaries into detailed explanations.",
        model_config_name="qwen-turbo",
        memory=shared_memory,
    )

    # MsgHub coordinates message routing between agents
    with MsgHub(participants=[researcher, writer]) as hub:
        # sequential_pipeline runs agents in order, passing output forward
        result = await sequential_pipeline(
            agents=[researcher, writer],
            x="Explain how vector databases enable semantic search.",
        )

    print(result.content)


asyncio.run(main())

The critical detail here is that both researcher and writer share the same InMemoryMemory instance. When the researcher replies, that message is stored in shared memory, and when the writer is invoked next, it has full access to the researcher’s output as prior context.

For more complex orchestration patterns — including fan-out and fan-in topologies — see Multi-Agent Orchestration Patterns: LangGraph, CrewAI, and AutoGen Compared, which covers how different frameworks handle cross-agent state sharing.

Memory preservation during interrupts deserves special attention. AgentScope supports interrupting an agent mid-reply (for human-in-the-loop workflows), but state integrity depends on your memory module flushing writes before the interrupt is processed. With InMemoryMemory, this is automatic but volatile. With a persistent backend like the SQLite module above, ensure you call commit() before yielding control back to the interrupt handler.


Advanced Techniques for Memory Retrieval and Optimization

As the memory corpus grows, raw retrieval of all messages becomes a bottleneck — both in latency and in token cost when the history is injected into LLM prompts. AgentScope addresses this with the ReMe (Retrieval-enhanced Memory) technique, which enables selective, relevance-scored retrieval instead of full-history injection.

The conceptual pattern for ReMe-style retrieval is to embed memory entries as vectors and retrieve only the top-K most semantically relevant entries for each new query. While AgentScope’s RAG module is currently under rework, you can implement this manually using an embedding model:

import asyncio
import numpy as np
from agentscope.agent import ReActAgent
from agentscope.memory import InMemoryMemory


def cosine_similarity(a: list[float], b: list[float]) -> float:
    a_arr, b_arr = np.array(a), np.array(b)
    return float(np.dot(a_arr, b_arr) / (np.linalg.norm(a_arr) * np.linalg.norm(b_arr)))


class RetrievalMemory:
    """Memory with vector-similarity retrieval, built on top of InMemoryMemory."""

    def __init__(self, embed_fn, top_k: int = 5):
        self._store: list[dict] = []
        self._embed_fn = embed_fn
        self._top_k = top_k

    async def add(self, message: dict) -> None:
        embedding = await self._embed_fn(message["content"])
        self._store.append({"message": message, "embedding": embedding})

    async def get_relevant(self, query: str) -> list[dict]:
        if not self._store:
            return []
        query_embedding = await self._embed_fn(query)
        scored = [
            (cosine_similarity(query_embedding, entry["embedding"]), entry["message"])
            for entry in self._store
        ]
        scored.sort(key=lambda x: x[0], reverse=True)
        return [msg for _, msg in scored[: self._top_k]]

This approach connects naturally to Retrieval-Augmented Generation concepts — rather than feeding the entire conversation history as context, you select only the most relevant past exchanges. For a deeper conceptual grounding, see What Is RAG? Retrieval-Augmented Generation Explained.

Memory compression is another essential technique for long-running agents. Instead of retaining raw messages indefinitely, implement a periodic summarization step that replaces older messages with a compressed summary:

async def compress_memory(memory: InMemoryMemory, summarizer_agent: ReActAgent, keep_last: int = 10):
    """Compress older messages into a summary, retaining only the most recent ones."""
    all_messages = await memory.get()
    if len(all_messages) <= keep_last:
        return  # Nothing to compress

    older_messages = all_messages[:-keep_last]
    recent_messages = all_messages[-keep_last:]

    # Summarize the older portion
    history_text = "\n".join(
        f"{m['role']}: {m['content']}" for m in older_messages
    )
    summary_response = await summarizer_agent(
        f"Summarize this conversation history concisely:\n\n{history_text}"
    )

    # Rebuild memory with summary + recent messages
    await memory.clear()
    await memory.add({"role": "system", "content": f"[Summary of prior context]: {summary_response.content}"})
    for msg in recent_messages:
        await memory.add(msg)

For comparison, see how Letta Deployment and Production: Hosting Persistent Agents at Scale approaches similar memory management challenges in a production environment — Letta’s MemGPT-style architecture uses a very similar summarization-based compression strategy.

Environment setup reminder: AgentScope requires Python 3.10 or higher and appropriate API keys for whatever LLM backend you choose:

pip install agentscope
export DASHSCOPE_API_KEY="your-api-key-here"

Frequently Asked Questions

What is the difference between InMemoryMemory and LongTermMemoryBase in AgentScope?

InMemoryMemory stores conversation history in RAM for the duration of the current process. It is fast and zero-configuration but loses all state on restart. LongTermMemoryBase is an abstract interface you extend to create persistent memory backends — backed by SQLite, a remote database, or any other store. Use InMemoryMemory for prototyping or stateless deployments, and a LongTermMemoryBase subclass for any production system that needs memory to survive restarts.

Why does AgentScope require async/await for all agent calls?

AgentScope v1.0 was redesigned around asynchronous execution to support non-blocking I/O — particularly important when agents make multiple LLM API calls or tool invocations concurrently. Synchronous code from older AgentScope versions must be migrated to use async/await patterns. All agent invocations return coroutines and must be awaited within an async def function, typically run via asyncio.run().

How do I share memory between multiple agents in a pipeline?

Pass the same memory instance to multiple agents during construction. When used together with MsgHub and sequential_pipeline, messages from each agent are stored in the shared memory and become available as prior context for subsequent agents. This pattern works for both InMemoryMemory and custom persistent backends, as long as the backend is thread-safe if agents run concurrently.

How can I prevent memory from growing too large and increasing LLM token costs?

Implement a memory compression strategy: periodically summarize older conversation history into a compact summary message, then replace the original messages with that summary. Retain only the most recent N messages in raw form. Alternatively, use a retrieval-based approach where you embed memory entries and only inject the top-K semantically relevant ones into each prompt, rather than the full history.

What happens to memory if an agent is interrupted mid-reply?

With InMemoryMemory, an interrupted reply may not be committed to memory depending on when the interrupt fires relative to the message write. With a persistent backend, ensure your add() method flushes writes (e.g., calls commit() for SQL backends) before yielding. AgentScope’s human-in-the-loop interrupt support is compatible with both memory types, but correct behavior requires that your memory module handles partial state gracefully — either by committing atomically or by implementing a transactional rollback on interrupt.

Related Articles