

KnowFlow is a powerful hybrid Retrieval-Augmented Generation (RAG) system that combines semantic search with knowledge graph capabilities for intelligent document processing and querying.
-
Advanced Document Processing
- Multi-format support (PDF, DOCX, CSV, TXT)
- Intelligent chunking with configurable size and overlap
- Parallel batch processing with S3 storage
- Document status tracking (PENDING, PROCESSING, INDEXED, FAILED)
- Secure per-user document isolation
-
Hybrid RAG + Knowledge Graph Architecture
- Dense semantic embeddings via Google Gemini + pgvector
- Structured knowledge extraction to Neo4j
- Multi-hop reasoning through graph relationships
- Automatic entity and relationship mapping
- Query decomposition for complex questions
-
Smart Query Processing
- Automatic query decomposition for complex questions
- Hybrid vector + graph-based retrieval
- Retrieval quality evaluation and improvement
- Context-aware response synthesis
- Conversation memory with graph context
-
Chat & Session Management
- Persistent chat sessions with history
- Context-aware follow-up questions
- Session renaming and management
- Message tracking with context preservation
- Multi-user support with isolation
-
Security & Authentication
- JWT-based authentication
- Secure password hashing with bcrypt
- Role-based access control
- Per-user data isolation
- Document access verification
-
Storage & Infrastructure
- S3-compatible object storage
- PostgreSQL for structured data
- Neo4j for graph relationships
- Concurrent file operations
- Efficient batch processing
graph TD
A[Frontend React/Vite] -->|REST API| B[FastAPI Backend]
B --> C[PostgreSQL + pgvector]
B --> D[Neo4j Graph DB]
B --> E[Google Gemini API]
B --> F[S3 Storage]
subgraph Document Processing
G[Document Upload] --> H[Chunking]
H --> I[Vector Embedding]
H --> J[Knowledge Extraction]
end
subgraph Query Processing
K[User Query] --> L[Query Decomposition]
L --> M[Vector Search]
L --> N[Graph Traversal]
M --> O[Response Synthesis]
N --> O
end
C -->|Semantic Search| B
D -->|Knowledge Graph| B
F -->|Document Storage| B
E -->|Embeddings & Generation| B
- Python 3.8+
- PostgreSQL 14+ with pgvector extension
- Neo4j 5.0+
- S3-compatible storage
- Google Cloud API key for Gemini
# Database
DATABASE_URL=postgresql://user:pass@localhost:5432/knowflow
VECTOR_COLLECTION_NAME=document_embeddings
# Neo4j
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=password
# Google API
GOOGLE_API_KEY=your_gemini_api_key
GEMINI_MODEL_NAME=gemini-pro
GEMINI_EMBEDDING_MODEL=embedding-001
# AWS S3
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1
S3_BUCKET_NAME=knowflow-documents
# App Settings
SECRET_KEY=your_jwt_secret_key
ACCESS_TOKEN_EXPIRE_MINUTES=60
CHUNK_SIZE=1000
CHUNK_OVERLAP=100
TOP_K_RESULTS=3
- Clone the repository:
git clone https://github.com/yourusername/knowflow.git
cd knowflow
- Install dependencies:
pip install -r requirements.txt
- Run migrations:
alembic upgrade head
- Start the development server:
uvicorn src.main:app --reload
POST /auth/register
- Register new userPOST /auth/login
- Login and get JWT tokenGET /auth/me
- Get current user info
POST /documents/upload
- Upload multiple documentsPOST /documents/{doc_id}/index
- Index document contentGET /documents
- List user documentsGET /documents/{doc_id}
- Get document details
POST /chat/query
- Process a new queryPOST /chat/sessions/{session_id}/messages
- Send follow-up messageGET /chat/sessions
- List chat sessionsPUT /chat/sessions/{session_id}/rename
- Rename sessionDELETE /chat/sessions/{session_id}
- Delete session
- JWT-based authentication with expiration
- Bcrypt password hashing
- Per-user document isolation
- Access control verification
- Secure file storage paths
- Input validation and sanitization
- Structured logging with levels
- Request/response tracking
- Error handling and reporting
- Performance metrics
- Document processing status
- Chat session analytics
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the terms of the LICENSE file included in the repository.