Skip to content

Cloudflare Backup Sync Setup

doobidoo edited this page Aug 23, 2025 · 1 revision

Cloudflare Backup Sync Setup Guide

Overview

This guide shows how to set up automated daily backups of your local MCP Memory Service database to Cloudflare D1 for global access and disaster recovery.

Architecture: Keep your local SQLite-vec service as primary (fast) and sync to Cloudflare D1 as backup (global access).

Prerequisites

  1. Cloudflare Account with Workers/D1 access
  2. Local MCP Memory Service running with SQLite-vec backend
  3. Python 3 (avoid uv environment to prevent unnecessary downloads)

Step 1: Create Cloudflare Resources

Option A: Using Cloudflare Dashboard (Recommended)

  1. Create D1 Database:

    • Go to Cloudflare Dashboard → D1
    • Click "Create Database"
    • Name: your-project-memory-backup
    • Copy the Database ID from the overview page
  2. Create Vectorize Index (Optional - for semantic search):

  3. Get Account ID:

Option B: Using Wrangler CLI

# Install Wrangler
npm install -g wrangler
wrangler login

# Create resources
wrangler d1 create your-project-memory-backup
wrangler vectorize create your-project-memory-backup --dimensions=768 --metric=cosine

Step 2: Create API Token

  1. Go to Cloudflare Dashboard → My Profile → API Tokens
  2. Click "Create Token" → "Custom Token"
  3. Configure permissions:
    • Account: Read
    • D1: Edit
    • Vectorize: Edit (if using)
    • Workers AI: Read (if using)
  4. Save the token securely

Step 3: Configure Environment

Create .env.cloudflare in your MCP Memory Service directory:

# Replace with your actual values
CLOUDFLARE_API_TOKEN=your-api-token-here
CLOUDFLARE_ACCOUNT_ID=your-account-id-here  
CLOUDFLARE_D1_DATABASE_ID=your-d1-database-id-here
CLOUDFLARE_VECTORIZE_INDEX=your-vectorize-index-name

Security Note: Keep this file secure and never commit to version control.

Step 4: Create Sync Script

Create the sync script sync-cloudflare-direct.py:

#!/usr/bin/env python3
"""Direct sync to Cloudflare D1 without heavy dependencies."""

import sqlite3
import json
import requests
import time
import argparse
import os
from datetime import datetime

def export_from_sqlite(db_path):
    """Export memories from local SQLite database."""
    memories = []
    
    try:
        conn = sqlite3.connect(db_path)
        cursor = conn.cursor()
        
        cursor.execute("""
            SELECT content, content_hash, tags, memory_type, metadata, 
                   created_at, created_at_iso, updated_at, updated_at_iso
            FROM memories 
            ORDER BY created_at DESC
        """)
        
        rows = cursor.fetchall()
        print(f"Found {len(rows)} memories to export")
        
        for row in rows:
            try:
                tags = json.loads(row[2]) if row[2] else []
            except (json.JSONDecodeError, TypeError):
                tags = [row[2]] if row[2] else []
            
            try:
                metadata = json.loads(row[4]) if row[4] else {}
            except (json.JSONDecodeError, TypeError):
                metadata = {}
            
            memory = {
                'content': row[0],
                'content_hash': row[1],
                'tags': tags,
                'memory_type': row[3],
                'metadata': metadata,
                'created_at': row[5],
                'created_at_iso': row[6],
                'updated_at': row[7],
                'updated_at_iso': row[8]
            }
            memories.append(memory)
        
        conn.close()
        return memories
        
    except Exception as e:
        print(f"Error exporting from SQLite: {e}")
        return []

def store_to_cloudflare(memories, api_token, account_id, d1_database_id):
    """Store memories to Cloudflare D1."""
    headers = {
        'Authorization': f'Bearer {api_token}',
        'Content-Type': 'application/json'
    }
    
    success_count = 0
    failed_count = 0
    
    # Initialize D1 schema
    init_sql = """
    CREATE TABLE IF NOT EXISTS memories (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        content_hash TEXT UNIQUE NOT NULL,
        content TEXT NOT NULL,
        tags TEXT,
        memory_type TEXT,
        metadata_json TEXT,
        created_at REAL,
        updated_at REAL,
        created_at_iso TEXT,
        updated_at_iso TEXT,
        content_size INTEGER DEFAULT 0,
        vector_id TEXT UNIQUE,
        r2_key TEXT
    );
    
    CREATE INDEX IF NOT EXISTS idx_created_at ON memories(created_at);
    CREATE INDEX IF NOT EXISTS idx_content_hash ON memories(content_hash);
    """
    
    url = f'https://api.cloudflare.com/client/v4/accounts/{account_id}/d1/database/{d1_database_id}/query'
    
    # Initialize schema
    try:
        response = requests.post(url, headers=headers, json={'sql': init_sql})
        if response.status_code == 200:
            print("D1 schema initialized")
        else:
            print(f"Schema init warning: {response.text}")
    except Exception as e:
        print(f"Error initializing schema: {e}")
        return False
    
    # Insert memories
    for i, memory in enumerate(memories):
        try:
            insert_sql = """
            INSERT OR REPLACE INTO memories 
            (content_hash, content, tags, memory_type, metadata_json, 
             created_at, updated_at, created_at_iso, updated_at_iso, content_size)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
            """
            
            params = [
                memory['content_hash'],
                memory['content'],
                json.dumps(memory.get('tags', [])),
                memory.get('memory_type'),
                json.dumps(memory.get('metadata', {})),
                memory.get('created_at'),
                memory.get('updated_at'),
                memory.get('created_at_iso'),
                memory.get('updated_at_iso'),
                len(memory['content']) if memory['content'] else 0
            ]
            
            response = requests.post(url, headers=headers, json={
                'sql': insert_sql,
                'params': params
            })
            
            if response.status_code == 200:
                success_count += 1
                if success_count % 10 == 0:
                    print(f"Progress: {success_count}/{len(memories)} uploaded")
            else:
                failed_count += 1
                print(f"Failed to store memory {memory['content_hash'][:16]}: {response.text}")
            
            # Rate limiting
            if i % 5 == 0:
                time.sleep(0.5)
                
        except Exception as e:
            failed_count += 1
            print(f"Error storing memory: {e}")
    
    print(f"Upload complete: {success_count} success, {failed_count} failed")
    return failed_count == 0

def main():
    parser = argparse.ArgumentParser(description='Sync to Cloudflare D1')
    parser.add_argument('--db-path', 
                       default='~/.local/share/mcp-memory/sqlite_vec.db',
                       help='Local SQLite database path')
    parser.add_argument('--config', 
                       default='.env.cloudflare',
                       help='Config file path')
    
    args = parser.parse_args()
    
    # Load configuration
    config_path = os.path.expanduser(args.config)
    if os.path.exists(config_path):
        with open(config_path) as f:
            for line in f:
                if '=' in line and not line.startswith('#'):
                    key, value = line.strip().split('=', 1)
                    os.environ[key] = value
    
    # Get required config
    api_token = os.getenv('CLOUDFLARE_API_TOKEN')
    account_id = os.getenv('CLOUDFLARE_ACCOUNT_ID')
    d1_database_id = os.getenv('CLOUDFLARE_D1_DATABASE_ID')
    
    if not all([api_token, account_id, d1_database_id]):
        print("Error: Missing required Cloudflare environment variables")
        print("Required: CLOUDFLARE_API_TOKEN, CLOUDFLARE_ACCOUNT_ID, CLOUDFLARE_D1_DATABASE_ID")
        return False
    
    # Export and sync
    db_path = os.path.expanduser(args.db_path)
    memories = export_from_sqlite(db_path)
    
    if memories:
        return store_to_cloudflare(memories, api_token, account_id, d1_database_id)
    else:
        print("No memories to sync")
        return False

if __name__ == '__main__':
    success = main()
    exit(0 if success else 1)

Step 5: Create Automated Sync Script

Create sync-to-cloudflare.sh:

#!/bin/bash
# Daily sync script for Cloudflare backup

cd /path/to/your/mcp-memory-service

# Load configuration
if [ -f ".env.cloudflare" ]; then
    export $(cat .env.cloudflare | grep -v '^#' | xargs)
else
    echo "Error: .env.cloudflare not found"
    exit 1
fi

# Log file
LOG_FILE="/var/log/memory-cloudflare-sync.log"

echo "$(date): Starting Cloudflare sync..." >> "$LOG_FILE"

# Run sync
python3 sync-cloudflare-direct.py --config .env.cloudflare >> "$LOG_FILE" 2>&1

if [ $? -eq 0 ]; then
    echo "$(date): Sync completed successfully" >> "$LOG_FILE"
else
    echo "$(date): Sync failed with error code $?" >> "$LOG_FILE"
fi

echo "$(date): Sync process finished" >> "$LOG_FILE"

Make executable:

chmod +x sync-to-cloudflare.sh

Step 6: Test the Setup

Test Single Memory

Create a test file test-memory.json:

{
  "memories": [
    {
      "content": "Test Cloudflare sync",
      "content_hash": "test123",
      "tags": ["test", "cloudflare"],
      "memory_type": "note",
      "metadata": {},
      "created_at": 1692806400.0,
      "created_at_iso": "2023-08-23T12:00:00Z",
      "updated_at": 1692806400.0,
      "updated_at_iso": "2023-08-23T12:00:00Z"
    }
  ]
}

Test:

python3 sync-cloudflare-direct.py --input test-memory.json

Test Full Sync

python3 sync-cloudflare-direct.py

Test Sync Script

./sync-to-cloudflare.sh

Step 7: Set Up Automated Schedule

Add to crontab for daily backup at 3 AM:

# Edit crontab
crontab -e

# Add this line (adjust path to your script)
0 3 * * * /path/to/your/sync-to-cloudflare.sh

Step 8: Monitor and Verify

Check Sync Logs

tail -f /var/log/memory-cloudflare-sync.log

Verify in Cloudflare Dashboard

  1. Go to your D1 database in Cloudflare Dashboard
  2. Use the "Console" tab to query: SELECT COUNT(*) FROM memories;

Manual Verification

# Check last sync status
grep "$(date '+%Y-%m-%d')" /var/log/memory-cloudflare-sync.log

Troubleshooting

Common Issues

1. Permission denied errors

# Fix log file permissions
sudo touch /var/log/memory-cloudflare-sync.log
sudo chown $USER:$USER /var/log/memory-cloudflare-sync.log

2. Environment variables not found

  • Verify .env.cloudflare exists and has correct format
  • Check file permissions: chmod 600 .env.cloudflare

3. Database connection errors

  • Verify API token permissions in Cloudflare Dashboard
  • Check Account ID and Database ID are correct

4. Schema errors

  • Run the test sync first to initialize the D1 schema
  • Check D1 database console for any existing conflicting tables

Performance Tips

  1. Rate Limiting: The script includes built-in rate limiting (0.5s every 5 requests)
  2. Batch Size: For very large databases (>1000 memories), consider implementing batching
  3. Incremental Sync: For frequent syncs, implement timestamp-based incremental updates

Security Best Practices

  1. API Token: Use minimum required permissions (D1:Edit only)
  2. Environment File: Never commit .env.cloudflare to version control
  3. Log Files: Ensure log files don't expose sensitive information
  4. File Permissions: Set restrictive permissions on config files
chmod 600 .env.cloudflare
chmod 600 sync-cloudflare-direct.py
chmod 755 sync-to-cloudflare.sh

Benefits

Global Access: Access memories from anywhere via Cloudflare's global network
Disaster Recovery: Complete backup if local system fails
Zero Downtime: Local service remains unaffected
Cost Effective: Pay-per-use pricing, typically <$1/month
Automatic: Set-and-forget daily synchronization

Next Steps

  1. Remote Access: Set up a simple web interface to query Cloudflare D1 directly
  2. Bi-directional Sync: Implement conflict resolution for two-way synchronization
  3. Monitoring: Add alerting for failed sync attempts
  4. Optimization: Implement incremental sync for large databases

Support

For issues with this setup:

  1. Check the troubleshooting section above
  2. Review logs in /var/log/memory-cloudflare-sync.log
  3. Test individual components (export, API connection, schema)
  4. Consult the MCP Memory Service documentation for backend-specific issues

Real-World Example

This guide was successfully tested with:

  • 941 memories in local SQLite-vec database
  • Cloudflare D1 database with existing schema
  • Daily automated sync at 3 AM via cron
  • Zero downtime - local service unaffected during sync
  • ~1 minute sync time for 941 memories with rate limiting

Architecture Notes

Why This Approach Works:

  • Local Primary: SQLite-vec provides fast local access (<5ms queries)
  • Cloud Backup: Cloudflare D1 provides global availability and disaster recovery
  • Hybrid Benefits: Best of both worlds - speed + reliability
  • Cost Effective: Minimal Cloudflare usage, typically <$1/month
  • Simple Maintenance: Single Python script, standard cron scheduling
Clone this wiki locally