FAQ

Find answers to commonly asked questions about Memory Service.

General

What is Memory Service?

Memory Service is a backend service that provides persistent memory for AI agents. It stores conversation history, enables semantic search across conversations, and supports advanced features like conversation forking.

Is Memory Service production-ready?

Not yet. Memory Service is in active development and has not reached its first stable release. APIs may change without notice, so production use is not recommended at this time.

What databases are supported?

Memory Service supports:

  • PostgreSQL (recommended) - with pgvector for semantic search
  • SQLite - lightweight, single-file option for local development and small deployments
  • MongoDB - with full-text search support

What vector stores are supported?

For semantic search capabilities:

  • pgvector - PostgreSQL extension for in-database vector search
  • sqlite-vec - SQLite extension for local vector search
  • Qdrant - Dedicated vector database via native gRPC integration

What caches are supported?

Memory Service supports:

  • Local - process-local cache for single-instance deployments
  • Redis
  • Infinispan

Installation & Setup

How do I install Memory Service?

See the Getting Started guide for detailed instructions.

Do I need to run a separate service?

Yes — Memory Service runs as a standalone process alongside your agent application. For Quarkus users, the Dev Services extension can automatically start it in a Docker container during development. For other frameworks, see the Getting Started guide.

Can I run Memory Service without binding a TCP port?

Yes. The Go server supports Unix domain sockets for local single-machine deployments:

memory-service serve \
  --db-url=postgresql://postgres:postgres@localhost:5432/memoryservice \
  --unix-socket="$HOME/.local/run/memory-service/api.sock"

Then access it with tools that support Unix sockets:

curl --unix-socket "$HOME/.local/run/memory-service/api.sock" \
  http://localhost/ready

This mode is intended for local agent processes. Browser apps cannot connect directly to a Unix socket.

Features

What is conversation forking?

Conversation forking allows you to create a new conversation branch from any point in an existing conversation. This is useful for:

  • If you think that correcting or adding more context to a previous message would improve the response, you can fork the conversation at that message and add the new context to continue from that point instead of starting a fresh conversation.
  • Exploring alternative conversation paths

Can old context epochs be safely deleted without breaking forks?

Not currently. Fork safety requires preserving fork ancestry, so per-entry/per-epoch eviction is not supported. Memory Service currently avoids this by keeping history and only allowing deletion at the conversation fork-tree level (delete one conversation in the tree, and the whole tree is deleted).

What happens if someone deletes a message that was a fork point?

Deleting individual messages/entries is not supported. Deletion is at conversation level, and deleting a conversation deletes the entire fork tree (root + all forks).

Is there a hard limit on fork depth?

No hard depth limit is enforced today. Fork ancestry is built iteratively (not recursively), so deep chains do not hit recursion limits, but very large trees still increase data to load/filter.

Can clients see message provenance across parent and child forks?

Yes. Entries include conversationId, and you can call /v1/conversations/{conversationId}/forks to get fork metadata (forkedAtConversationId, forkedAtEntryId) and reconstruct/visualize the tree.

How does semantic search work?

Memory Service uses vector embeddings to enable semantic search across all stored conversations. When you store a message, it’s automatically embedded using your configured embedding model. You can then search for semantically similar content across all conversations.

Can multiple agents share conversations?

Yes. Multiple agents can access the same conversation history, and agent context is isolated per agent client identity (API key/client id).

Can multiple agents overwrite each other’s summaries?

Memory channel writes are agent-isolated by client identity. Also, conversation entries are append-only (no entry update API), so agents append new entries rather than overwriting existing ones.

How big can a single entry be?

There is no strict service-level byte-size limit on content today, but practical datastore/request limits still apply (for example, MongoDB document limits). API-level limits currently include:

  • content max items: 1000
  • indexedContent max length: 100000

Large single entries can hurt performance; enforcing different hard size limits is expected in the future, once work on tuning the system for performance is complete.

Can massive entries be partially retrieved?

No. Entries are returned as whole objects. If your payload is large, split it into multiple entries at write time.

Is custom content schema validated by the service?

Only minimal structural checks are enforced. The agent app owns content schema design and validation; there is no built-in schema registry for arbitrary custom entry payloads.

How is sensitive data leakage handled for indexed content?

indexedContent is intentionally cleartext so it can be searched. The service does not run built-in DLP/redaction for you, so agent apps should redact/sanitize before indexing.

Can search highlights expose raw indexed text?

Yes. Search results can include highlights, so if sensitive text is indexed, snippets may surface it. Redact before indexing.

How is long-term storage growth controlled?

There is no automatic old-epoch eviction in the current model, because that breaks fork guarantees. Growth is currently controlled operationally (for example, deleting conversation trees, applying app-level retention/archival policies, and planned usage metering/quotas).

Operations and Integration

If a conversation is deleted in the background, will connected clients get notified?

No. Memory Service currently has no change-notification/WebSocket event API.

Can admins impersonate users for debugging?

No. Built-in user impersonation (“act as user”) is not currently supported.

Does repeated summarization cause compounding errors?

Potentially yes. Summary-of-summary drift is a known LLM behavior; Memory Service stores what the agent writes and does not correct semantic drift automatically.

Are Spring Boot users permanently blocked from rich events?

No. Rich event support depends on the agent framework producing those events (for example, history/lc4j today with LangChain4j). Spring-based agents can support rich events once their framework stack can emit event payloads.

Contributing

How can I contribute?

We welcome contributions! Check out the GitHub repository for:

  • Open issues
  • Pull requests

Where do I report bugs?

Please report bugs on GitHub Issues.

How do I request a feature?

Open an issue on GitHub Issues.