RAG with Binary Quantization

title	emoji	colorFrom	colorTo	sdk	sdk_version	app_file	pinned	license	short_description
Rag with Binary Quantization	📜	yellow	indigo	gradio	5.41.1	app.py	false	apache-2.0	RAG with Binary Quantization for enhanced performance

RAG with Binary Quantization

A high-performance Retrieval-Augmented Generation (RAG) system that uses binary quantization for efficient vector storage and similarity search. This project implements a document Q&A system with optimized memory usage and fast retrieval capabilities.

🚀 Features

Binary Quantization: Converts high-dimensional embeddings to binary vectors for memory efficiency
Milvus Vector Database: Uses Milvus for scalable vector storage and similarity search
Gradio Web Interface: User-friendly web UI for document upload and chat
BGE Embeddings: Leverages BAAI/bge-large-en-v1.5 for high-quality text embeddings
OpenAI Integration: Uses GPT-4.1 for intelligent question answering
Batch Processing: Efficient document processing with configurable batch sizes

🏗️ Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Documents     │───▶│  BGE Embeddings  │───▶│ Binary Vectors  │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                                         │
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   User Query    │───▶│  Query Embedding │───▶│  Milvus Search  │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                                         │
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│  Retrieved Docs │◀───│  Context Fusion  │◀───│  LLM Answer     │
└─────────────────┘    └──────────────────┘    └─────────────────┘

🛠️ Installation

Clone the repository:

git clone <repository-url>
cd rag-w-binary-quant

Install dependencies:
```
uv sync
```
Set up environment variables: Create a .env file with your OpenAI API key:
```
OPENAI_API_KEY=your_openai_api_key_here
```

🚀 Usage

Starting the Application

Run the Gradio web interface:

uv run app.py

The application will be available at http://localhost:7860

Using the Interface

Upload Documents:
- Go to the "Upload & Index" tab
- Upload your documents (supports multiple file formats)
- Click "Update Index" to process and index the documents
Chat with Documents:
- Switch to the "Chat" tab
- Ask questions about your uploaded documents
- Get intelligent answers based on the document content

🔧 Configuration

Key configuration parameters in src/config.py:

EMBEDDING_MODEL_NAME: BAAI/bge-large-en-v1.5
COLLECTION_NAME: "fast_rag"
MILVUS_DB_PATH: "milvus_binary_quantized.db"
MODEL_NAME: "gpt-4.1"
TEMPERATURE: 0.2

📊 Performance Benefits

Memory Efficiency: Binary vectors use 8x less memory than float32 embeddings
Fast Search: Hamming distance computation is highly optimized
Scalable: Milvus provides enterprise-grade vector database capabilities
Accurate: BGE embeddings provide high-quality semantic representations

🏛️ Project Structure

rag-w-binary-quant/
├── app.py                 # Gradio web interface
├── main.py               # Main application entry point
├── src/
│   ├── config.py         # Configuration settings
│   ├── data_loader.py    # Document loading utilities
│   ├── embedding_generator.py  # Binary embedding generation
│   ├── vector_store.py   # Milvus vector database operations
│   └── rag_pipeline.py   # RAG question answering pipeline
├── documents/            # Uploaded document storage
└── README.md

🔍 Technical Details

Binary Quantization Process

Float32 Embeddings: Generate embeddings using BGE model
Binary Conversion: Convert to binary using threshold (positive values → 1, negative → 0)
Packing: Pack binary vectors into bytes for efficient storage
Hamming Distance: Use Hamming distance for similarity search

Vector Search

Index Type: BIN_FLAT (exact search for binary vectors)
Metric: Hamming distance
Retrieval: Top-k most similar documents

🙏 Acknowledgments

BAAI for the BGE embedding model
Milvus for the vector database
Gradio for the web interface
OpenAI for the language model

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG with Binary Quantization

🚀 Features

🏗️ Architecture

🛠️ Installation

🚀 Usage

Starting the Application

Using the Interface

🔧 Configuration

📊 Performance Benefits

🏛️ Project Structure

🔍 Technical Details

Binary Quantization Process

Vector Search

🙏 Acknowledgments

About

Uh oh!

Languages

serverdaun/rag-w-binary-quant

Folders and files

Latest commit

History

Repository files navigation

RAG with Binary Quantization

🚀 Features

🏗️ Architecture

🛠️ Installation

🚀 Usage

Starting the Application

Using the Interface

🔧 Configuration

📊 Performance Benefits

🏛️ Project Structure

🔍 Technical Details

Binary Quantization Process

Vector Search

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages