feat(rag): FAISS vectors and hybrid retrieval #12

haz3141 · 2025-09-06T20:40:20Z

Step 6: Advanced RAG Enhancements

Features Added

Embeddings Pipeline: Added Embeddings class with sentence-transformers support and stub fallback for offline testing
FAISS Vector Store: Added FaissIndex for efficient vector similarity search with numpy fallback
Hybrid Retriever: Combined BM25 and vector scores with configurable alpha weighting
MCP Endpoint: Added /tools/retrieve_hybrid endpoint for hybrid retrieval
Comprehensive Tests: Added tests for hybrid retrieval with deterministic stub embeddings
CI Updates: Updated CI to run hybrid retrieval tests with stub embeddings

Technical Details

Uses sentence-transformers for embeddings (with fallback to stub for tests)
FAISS for vector similarity search (with numpy fallback)
Hybrid scoring combines normalized BM25 and vector scores
All tests use stub embeddings for deterministic, offline testing
Environment variables for configuration (EMBED_BACKEND, EMBED_MODEL, HYBRID_ALPHA)

Version

Bumps to v0.5.0

Testing

All tests pass with stub embeddings
Hybrid endpoint tested and working
CI configuration updated to include hybrid tests

…triever - Add Embeddings class with sentence-transformers and stub fallback - Add FaissIndex for vector similarity search with numpy fallback - Add HybridRetriever combining BM25 and vector scores - Add comprehensive tests for hybrid retrieval - Update requirements.txt with new dependencies

- Add RetrieveHybridRequest model with query, k, and alpha parameters - Add retrieve_hybrid endpoint that uses HybridRetriever - Returns hits with both vector and BM25 scores for transparency

- Update .env.sample with new environment variables for embeddings and hybrid retrieval - Update CI configuration to run hybrid retrieval tests with stub embeddings

- Add lab/security/redact.py for PII pattern redaction - Add lab/security/guardian.py for tool allowlist and response sanitization - Add comprehensive tests for security modules - Integrate Guardian into MCP server endpoints - Add audit logging for all tool calls - Add evaluation harness with hit@k and mrr@k metrics - Add observability with JSONL audit logs - Add promotion flow documentation and security checklist - Update CI with new tests and evaluation step

haz3141 added 18 commits September 6, 2025 16:20

chore(deps): add dspy-ai and pin versions

b0ef6d5

feat(dsp): add Summarize module and demo runner

2ce1db9

feat(mcp): expose /tools/summarize backed by DSPy module

10c3a53

test(dsp): add unit test and HTTP smoke test

6e0a6fe

chore(env): add DSPY_MODEL placeholder to .env.sample

0d6baee

chore(version): bump to 0.3.0 for DSPy integration

538daeb

ci: add minimal lint+unit test workflow

0cc896c

feat(rag): scaffold ingestion, chunking, BM25 retriever with fixture

6c62483

test(rag): add deterministic unit test for chunking and BM25 retrieval

c5a60df

feat(mcp): add /tools/retrieve using minimal BM25 over lab fixtures

048d069

chore(env): add RAG_DATA_DIR and RAG_TOP_K placeholders

1613c8c

ci: run RAG unit test alongside summarize test

08a1556

chore(version): bump to 0.4.0 for minimal RAG endpoint

38b1a2f

feat(mcp): add /tools/retrieve_hybrid endpoint

92c6993

- Add RetrieveHybridRequest model with query, k, and alpha parameters - Add retrieve_hybrid endpoint that uses HybridRetriever - Returns hits with both vector and BM25 scores for transparency

chore(env): add EMBED_BACKEND, EMBED_MODEL, HYBRID_ALPHA placeholders

4017dfd

- Update .env.sample with new environment variables for embeddings and hybrid retrieval - Update CI configuration to run hybrid retrieval tests with stub embeddings

chore(version): bump to 0.5.0 for advanced RAG (FAISS + hybrid)

043e22c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(rag): FAISS vectors and hybrid retrieval #12

feat(rag): FAISS vectors and hybrid retrieval #12

Uh oh!

haz3141 commented Sep 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(rag): FAISS vectors and hybrid retrieval #12

Are you sure you want to change the base?

feat(rag): FAISS vectors and hybrid retrieval #12

Uh oh!

Conversation

haz3141 commented Sep 6, 2025

Features Added

Technical Details

Version

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants