Stateful AI agent memory layer for PostgreSQL with pgvector. TypeScript-first with intelligent context management and semantic search.
❌ Without pg-agent-memory
// Day 1: User shares preference
await openai.chat.completions.create({
messages: [{ role: 'user', content: 'I prefer Python' }],
});
// AI: "Got it! I'll remember that."
// Day 2: User asks for help
await openai.chat.completions.create({
messages: [{ role: 'user', content: 'Help me start a project' }],
});
// AI: "What language would you like to use?"
// 😤 Forgot everything!
✅ With pg-agent-memory
// Day 1: Store preference
await memory.remember({
conversation: userId,
content: 'User prefers Python',
role: 'system',
});
// Day 2: Retrieve context
const context = await memory.getHistory(userId);
await openai.chat.completions.create({
messages: [...context, { role: 'user', content: 'Help me start a project' }],
});
// AI: "I'll create a Python project for you!"
// 🎯 Remembers everything!
- Persistent Memory - Conversations continue across sessions
- Multi-Model Support - OpenAI, Anthropic, DeepSeek, Google, Meta + custom providers
- Local Embeddings - Zero API costs for vector embeddings with Sentence Transformers
- Memory Compression - Automatic summarization with 4 compression strategies
- Semantic Search - Find relevant memories using AI embeddings
- Universal Tokenizer - Accurate token counting based on official provider documentation
- TypeScript First - Full type safety with autocomplete
- PostgreSQL Native - Uses your existing database
- Zero-Cost Embeddings - Local Sentence Transformers (@xenova/transformers)
- High Performance - ~9ms memory operations, ~5ms vector search
- Multi-agent memory sharing
- Memory graph visualization
- Pattern detection
# Install
npm install pg-agent-memory
# Start PostgreSQL with Docker (includes pgvector)
docker compose up -d
# Run example
npm start
npm install pg-agent-memory
Option 1: Docker (Recommended)
# Use included docker compose configuration
docker compose up -d
Option 2: Existing PostgreSQL
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
import { AgentMemory } from 'pg-agent-memory';
// Option 1: Static factory method (recommended)
const memory = await AgentMemory.create({
agent: 'my-assistant',
connectionString: 'postgresql://user:pass@localhost:5432/db',
});
// Option 2: Traditional constructor + initialize
const memory = new AgentMemory({
agent: 'my-assistant',
connectionString: 'postgresql://user:pass@localhost:5432/db',
});
await memory.initialize();
// Store conversation memory
const memoryId = await memory.remember({
conversation: 'user-123',
content: 'User prefers email notifications',
role: 'user',
importance: 0.8,
timestamp: new Date(),
});
// Find related memories using vector similarity
const related = await memory.findRelatedMemories(memoryId, 5);
// Semantic search across memories
const relevant = await memory.searchMemories('notification preferences');
// Get relevant context for a query
const context = await memory.getRelevantContext(
'user-123',
'user communication preferences',
1000 // max tokens
);
console.log(`Found ${context.messages.length} relevant memories`);
console.log(`Relevance score: ${context.relevanceScore}`);
// Health check for monitoring
const health = await memory.healthCheck();
console.log(`Status: ${health.status}, Memories: ${health.details.memoryCount}`);
// Cleanup
await memory.disconnect();
Configure multiple AI providers with accurate token counting and prompt caching:
const memory = new AgentMemory({
agent: 'multi-model-bot',
connectionString,
modelProviders: [
{
name: 'gpt-4o',
provider: 'openai',
model: 'gpt-4o',
tokenLimits: { context: 128000, output: 4000 },
// High context limit: Supports large conversations
},
{
name: 'claude-sonnet',
provider: 'anthropic',
model: 'claude-sonnet-4-20250514',
tokenLimits: { context: 200000, output: 4000 },
// Large context window: Perfect for long conversations
},
{
name: 'deepseek-coder',
provider: 'deepseek',
model: 'deepseek-coder',
tokenLimits: { context: 32000, output: 4000 },
// Cost-effective option for coding tasks
},
{
name: 'gemini-pro',
provider: 'google',
model: 'gemini-1.5-pro',
tokenLimits: { context: 1048576, output: 8192 },
// Massive context window: Handle very long conversations
},
{
name: 'llama-3',
provider: 'meta',
model: 'llama-3.1-70b',
tokenLimits: { context: 128000, output: 4000 },
// More efficient tokenizer: ~44% fewer tokens than OpenAI
},
],
defaultProvider: 'gpt-4o',
tokenCountingStrategy: 'hybrid', // 'precise', 'fast', or 'hybrid'
});
// Token counting uses official provider documentation:
// - OpenAI: ~4 chars/token or 0.75 words/token (official baseline)
// - Anthropic: ~3.5 chars/token (14% more tokens)
// - DeepSeek: ~3.3 chars/token (20% more tokens)
// - Google: ~4 chars/token (same as OpenAI)
// - Meta/Llama: ~0.75 tokens/word (44% fewer tokens - more efficient)
await memory.remember({
conversation: userId,
content: longText,
provider: 'gpt-4o', // Use OpenAI for this memory
});
Automatic memory compression with multiple strategies:
// Enable compression for large conversations
const compressionResult = await memory.compressMemories({
strategy: 'hybrid', // 'token_based', 'time_based', 'importance_based', 'hybrid'
maxAge: '7d',
targetCompressionRatio: 0.6,
});
console.log(`Compressed ${compressionResult.memoriesCompressed} memories`);
console.log(`Token savings: ${compressionResult.tokensSaved}`);
// Get context with automatic compression
const context = await memory.getRelevantContextWithCompression(
'coding preferences',
4000 // Automatically compresses if needed
);
import OpenAI from 'openai';
import { AgentMemory } from 'pg-agent-memory';
const client = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
const memory = new AgentMemory({ agent: 'support-bot', connectionString });
// Retrieve conversation history
const history = await memory.getHistory(userId);
// Include memory in AI request
const completion = await client.chat.completions.create({
model: 'gpt-4o',
messages: [
...history.map(m => ({ role: m.role, content: m.content })),
{ role: 'user', content: userMessage },
],
});
// Store the interaction
await memory.remember({
conversation: userId,
content: userMessage,
role: 'user',
importance: 0.5,
timestamp: new Date(),
});
await memory.remember({
conversation: userId,
content: completion.choices[0].message.content,
role: 'assistant',
importance: 0.7,
timestamp: new Date(),
});
import Anthropic from '@anthropic-ai/sdk';
import { AgentMemory } from 'pg-agent-memory';
const client = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
const memory = new AgentMemory({
agent: 'claude-assistant',
connectionString,
modelProviders: [
{
name: 'claude',
provider: 'anthropic',
model: 'claude-sonnet-4-20250514',
tokenLimits: { context: 200000, output: 4000 },
},
],
});
// Get conversation with compression for large contexts
const context = await memory.getRelevantContextWithCompression(
'user preferences and history',
180000 // Near Claude's context limit
);
const message = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
messages: [
...context.messages.map(m => ({ role: m.role, content: m.content })),
{ role: 'user', content: userMessage },
],
});
// Store the interaction
await memory.remember({
conversation: userId,
content: message.content[0].text,
role: 'assistant',
importance: 0.7,
timestamp: new Date(),
});
import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { AgentMemory } from 'pg-agent-memory';
const memory = new AgentMemory({ agent: 'chat-assistant', connectionString });
export async function POST(req: Request) {
const { messages, userId } = await req.json();
// Get user's conversation history
const history = await memory.getHistory(userId);
const result = streamText({
model: openai('gpt-4o'),
system: 'You are a helpful assistant with memory.',
messages: [...history, ...messages],
});
// Store the conversation
for (const message of messages) {
await memory.remember({
conversation: userId,
role: message.role,
content: message.content,
importance: 0.5,
timestamp: new Date(),
});
}
return result.toDataStreamResponse();
}
Recommended: Creates and initializes an AgentMemory instance in one call.
const memory = await AgentMemory.create({
agent: 'my-bot',
connectionString: 'postgresql://...',
});
new AgentMemory(config: MemoryConfig)
MemoryConfig:
agent: string
- Unique agent identifierconnectionString: string
- PostgreSQL connection stringtablePrefix?: string
- Table prefix (default: 'agent')maxTokens?: number
- Max tokens per memory (default: 4000)embeddingDimensions?: number
- Vector dimensions (default: 384)
Initialize database schema and embedding model.
Store a conversation memory.
Find memories related to a specific memory using vector similarity.
const related = await memory.findRelatedMemories(memoryId, 5);
Check system health for monitoring and debugging.
const health = await memory.healthCheck();
// Returns: { status: 'healthy' | 'unhealthy', details: {...} }
Message:
{
conversation: string; // Conversation ID
content: string; // Memory content
role: 'user' | 'assistant' | 'system'; // Required, defaults to 'user'
importance: number; // 0-1 relevance score, defaults to 0.5
timestamp: Date; // Required, defaults to new Date()
id?: string; // Optional memory ID
metadata?: Record<string, unknown>;
embedding?: number[]; // Optional vector embedding
expires?: Date | string; // Expiration (e.g., '30d', '1h')
}
Retrieve memories with filtering.
Get chronological conversation history.
Find semantically relevant memories for a query.
Semantic search across all agent memories.
Delete specific memory.
Delete entire conversation.
- PostgreSQL 12+
- pgvector extension
- Memory operations: ~9ms average (range: 5-22ms, first operation slower due to model loading)
- Vector search: ~5ms average for semantic similarity search using pgvector
- Token counting: <1ms sub-millisecond performance for all text sizes
- Embedding generation: Local processing, no API calls required
- Model size: ~80-90MB (all-MiniLM-L6-v2, cached after first download)
- Architecture: Built for production scale with proper indexing and connection pooling
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ AgentMemory │────│ EmbeddingService │────│ @xenova/trans.. │
│ │ │ │ │ (Local Model) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐ ┌──────────────────┐
│ PostgreSQL │────│ pgvector │
│ (Memories) │ │ (Vector Search) │
└─────────────────┘ └──────────────────┘
Components:
- AgentMemory: Main API for memory operations
- EmbeddingService: Local text-to-vector conversion using @xenova/transformers
- PostgreSQL: Persistent storage with ACID properties
- pgvector: Efficient vector similarity search
- @xenova/transformers: Local Sentence Transformers model (all-MiniLM-L6-v2)
import { AgentMemory } from 'pg-agent-memory';
class ChatBot {
private memory: AgentMemory;
constructor() {
this.memory = new AgentMemory({
agent: 'chatbot',
connectionString: process.env.DATABASE_URL,
});
}
async processMessage(userId: string, message: string) {
// Store user message
await this.memory.remember({
conversation: userId,
content: message,
role: 'user',
importance: 0.5,
timestamp: new Date(),
});
// Get relevant context
const context = await this.memory.getRelevantContext(userId, message, 800);
// Generate response using context
const response = await this.generateResponse(message, context);
// Store bot response
await this.memory.remember({
conversation: userId,
content: response,
role: 'assistant',
importance: 0.7,
timestamp: new Date(),
});
return response;
}
}
// Search with filters
const memories = await memory.searchMemories('user preferences', {
importance: { min: 0.7 },
dateRange: {
start: new Date('2024-01-01'),
end: new Date('2024-12-31'),
},
metadata: { category: 'user_settings' },
limit: 10,
});
// Get memories by role
const userMessages = await memory.recall({
conversation: 'user-123',
role: 'user',
limit: 50,
});
# Set database URL
export DATABASE_URL="postgresql://user:pass@localhost:5432/dbname"
# Run basic example
npm run example:basic
# Run chatbot example
npm run example:chatbot
# Run all examples
npm run example:all
# Clone repository
git clone <repository>
cd pg-agent-memory
npm install
# Start PostgreSQL with pgvector
npm run dev:up
# Copy environment variables
cp .env.example .env
# Run examples
npm run example:basic
# Run tests
npm run test:docker
# Start development database
npm run dev:up
# Stop database (data persists)
npm run dev:down
# View database logs
npm run dev:logs
# Clean everything (including data)
npm run dev:clean
# Connect to PostgreSQL shell
bash scripts/docker-dev.sh shell
If you prefer using your own PostgreSQL:
# Install pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
# Set connection string
export DATABASE_URL="postgresql://user:pass@localhost:5432/dbname"
# Run tests
npm test
npm run test:integration
# Unit tests (no database needed)
npm test
# Integration tests with Docker
npm run test:docker
# Integration tests with custom database
export DATABASE_URL="postgresql://user:pass@localhost:5432/test_db"
npm run test:integration
# All tests
npm run test:all
# Code quality checks
npm run lint # ESLint + Prettier
npm run type-check # TypeScript compilation
npm run validate # Full validation (lint + type-check + tests)
# Performance benchmarks
npm run benchmark # Verify performance claims
MIT © Alex Potapenko
- Fork the repository
- Create a feature branch
- Run tests:
npm test
- Submit a pull request