Skip to content

serverdaun/rag-w-binary-quant

Repository files navigation

title emoji colorFrom colorTo sdk sdk_version app_file pinned license short_description
Rag with Binary Quantization
📜
yellow
indigo
gradio
5.41.1
app.py
false
apache-2.0
RAG with Binary Quantization for enhanced performance

CD to HF Space View on Hugging Face Spaces

RAG with Binary Quantization

A high-performance Retrieval-Augmented Generation (RAG) system that uses binary quantization for efficient vector storage and similarity search. This project implements a document Q&A system with optimized memory usage and fast retrieval capabilities.

🚀 Features

  • Binary Quantization: Converts high-dimensional embeddings to binary vectors for memory efficiency
  • Milvus Vector Database: Uses Milvus for scalable vector storage and similarity search
  • Gradio Web Interface: User-friendly web UI for document upload and chat
  • BGE Embeddings: Leverages BAAI/bge-large-en-v1.5 for high-quality text embeddings
  • OpenAI Integration: Uses GPT-4.1 for intelligent question answering
  • Batch Processing: Efficient document processing with configurable batch sizes

🏗️ Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Documents     │───▶│  BGE Embeddings  │───▶│ Binary Vectors  │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                                         │
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   User Query    │───▶│  Query Embedding │───▶│  Milvus Search  │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                                         │
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│  Retrieved Docs │◀───│  Context Fusion  │◀───│  LLM Answer     │
└─────────────────┘    └──────────────────┘    └─────────────────┘

🛠️ Installation

  1. Clone the repository:

    git clone <repository-url>
    cd rag-w-binary-quant
  2. Install dependencies:

    uv sync
  3. Set up environment variables: Create a .env file with your OpenAI API key:

    OPENAI_API_KEY=your_openai_api_key_here

🚀 Usage

Starting the Application

Run the Gradio web interface:

uv run app.py

The application will be available at http://localhost:7860

Using the Interface

  1. Upload Documents:

    • Go to the "Upload & Index" tab
    • Upload your documents (supports multiple file formats)
    • Click "Update Index" to process and index the documents
  2. Chat with Documents:

    • Switch to the "Chat" tab
    • Ask questions about your uploaded documents
    • Get intelligent answers based on the document content

🔧 Configuration

Key configuration parameters in src/config.py:

  • EMBEDDING_MODEL_NAME: BAAI/bge-large-en-v1.5
  • COLLECTION_NAME: "fast_rag"
  • MILVUS_DB_PATH: "milvus_binary_quantized.db"
  • MODEL_NAME: "gpt-4.1"
  • TEMPERATURE: 0.2

📊 Performance Benefits

  • Memory Efficiency: Binary vectors use 8x less memory than float32 embeddings
  • Fast Search: Hamming distance computation is highly optimized
  • Scalable: Milvus provides enterprise-grade vector database capabilities
  • Accurate: BGE embeddings provide high-quality semantic representations

🏛️ Project Structure

rag-w-binary-quant/
├── app.py                 # Gradio web interface
├── main.py               # Main application entry point
├── src/
│   ├── config.py         # Configuration settings
│   ├── data_loader.py    # Document loading utilities
│   ├── embedding_generator.py  # Binary embedding generation
│   ├── vector_store.py   # Milvus vector database operations
│   └── rag_pipeline.py   # RAG question answering pipeline
├── documents/            # Uploaded document storage
└── README.md

🔍 Technical Details

Binary Quantization Process

  1. Float32 Embeddings: Generate embeddings using BGE model
  2. Binary Conversion: Convert to binary using threshold (positive values → 1, negative → 0)
  3. Packing: Pack binary vectors into bytes for efficient storage
  4. Hamming Distance: Use Hamming distance for similarity search

Vector Search

  • Index Type: BIN_FLAT (exact search for binary vectors)
  • Metric: Hamming distance
  • Retrieval: Top-k most similar documents

🙏 Acknowledgments

  • BAAI for the BGE embedding model
  • Milvus for the vector database
  • Gradio for the web interface
  • OpenAI for the language model

About

RAG with Binary Quantization for enhanced performance

Topics

Resources

Stars

Watchers

Forks

Languages