🧠 Schema-Aware NL2SQL: Production-Ready Natural Language to SQL

🚀 Overview

A comprehensive Schema-Aware Natural Language to SQL (NL2SQL) system that converts natural language questions into accurate SQL queries across dynamic database schemas. Features both a web interface and production-ready REST API with deployment support for any cloud platform.

✨ Key Features

🧠 Schema-Aware Intelligence: Dynamic schema extraction and understanding
🔄 Multi-Database Support: SQLite, PostgreSQL, MySQL with dialect transpilation
🌐 Production API: Complete REST API with authentication and monitoring
🖥️ Web Interface: Intuitive Streamlit UI for interactive querying
🚀 Cloud Ready: Docker, Kubernetes, and multi-cloud deployment support
🔒 Security First: API authentication, SQL injection prevention, query validation
📊 Analytics: Query history, confidence scoring, and usage statistics
🧪 Fully Tested: Comprehensive test suite with CI/CD ready structure

📁 Complete Project Structure

Schema-Aware-NL2SQL/
├── 📄 README.md                       # Main documentation
├── 📄 requirements.txt                # Python dependencies
├── 📄 setup.py                        # Package setup
├── 📄 config.py                       # Configuration management
├── 📄 .env.example                    # Environment template
├── 📄 .gitignore                      # Git ignore rules
│
├── 🔧 api.py                          # FastAPI REST API server
├── 🖥️ app.py                          # Streamlit web interface
├── 🎯 demo.py                         # Comprehensive demo script
│
├── 📂 src/                            # Core source code
│   ├── __init__.py
│   ├── nl2sql_agent.py                # Main orchestrator
│   ├── nl2sql_model.py                # T5 model wrapper
│   └── schema_retriever.py            # Database schema extraction
│
├── 📂 docs/                           # Documentation
│   ├── README.md                      # Detailed documentation
│   ├── API_DOCUMENTATION.md           # API reference
│   ├── SETUP_COMPLETE.md              # Setup guide
│   ├── ENVIRONMENT_SETUP.md           # Environment guide
│   └── GITHUB_SETUP.md                # GitHub integration
│
├── 📂 examples/                       # Example scripts
│   ├── quickstart.py                  # Quick start demo
│   └── client_example.py              # API client example
│
├── 📂 tests/                          # Test suite
│   ├── __init__.py
│   ├── test_api.py                    # API endpoint tests
│   └── test_nl2sql_agent.py           # Core functionality tests
│
├── 📂 scripts/                        # Utility scripts
│   ├── deploy.sh                      # Deployment automation
│   ├── run_tests.sh                   # Test runner
│   └── setup_new_environment.py       # Environment setup
│
├── 📂 deployment/                     # Deployment configurations
│   ├── docker/
│   │   ├── Dockerfile                 # Container definition
│   │   └── docker-compose.yml         # Multi-service orchestration
│   ├── kubernetes/
│   │   └── deployment.yaml            # K8s deployment config
│   └── cloud/
│       └── aws-ecs-task.json          # AWS ECS task definition
│
├── 📂 data/                           # Database files
│   └── quickstart_sample.db           # Sample SQLite database
│
├── 📂 models/                         # Model cache (auto-created)
├── 📂 logs/                           # Application logs (auto-created)
└── 📂 nl2sql_env/                     # Virtual environment

🛠️ Quick Start

1. Clone and Setup

# Clone repository
git clone https://github.com/Srijan-Ratrey/Schema-Aware-Natural-Language-to-SQL-Agent.git
cd Schema-Aware-Natural-Language-to-SQL-Agent

# Quick setup with deployment script
chmod +x scripts/deploy.sh
./scripts/deploy.sh dev

2. Start the Web Interface

streamlit run app.py

3. Start the API Server

python api.py

Access API documentation at: http://localhost:8000/docs

🌐 API Quick Start

import requests

# API configuration
API_BASE = "http://localhost:8000"
API_KEY = "your-api-key-here"
headers = {"Authorization": f"Bearer {API_KEY}"}

# Connect to database
requests.post(f"{API_BASE}/connect", 
    json={"db_type": "sqlite", "db_path": "data/quickstart_sample.db"},
    headers=headers
)

# Query database
response = requests.post(f"{API_BASE}/query",
    json={"query": "Show all books with rating above 4.5"},
    headers=headers
)

print(response.json())

🚀 Deployment Options

Development

./scripts/deploy.sh dev

Docker

./scripts/deploy.sh docker

Docker Compose (Full Stack)

./scripts/deploy.sh compose

Kubernetes

kubectl apply -f deployment/kubernetes/deployment.yaml

Cloud Platforms

AWS ECS: Use deployment/cloud/aws-ecs-task.json
Google Cloud Run: Build with Docker and deploy
Azure Container Instances: Deploy with Docker image
Heroku: Deploy with git push

🧪 Testing

# Run comprehensive test suite
./scripts/run_tests.sh

# Run specific tests
python -m pytest tests/test_api.py -v
python -m pytest tests/test_nl2sql_agent.py -v

📊 Features Overview

Core Capabilities

✅ Dynamic schema extraction and understanding
✅ Fine-tuned T5 models (Spider dataset trained)
✅ Multi-database support (SQLite, PostgreSQL, MySQL)
✅ Real-time SQL generation and execution
✅ Confidence scoring and query validation
✅ Query history and analytics

Web Interface Features

✅ Interactive Streamlit UI
✅ Schema visualization
✅ Query result visualization
✅ Batch query processing
✅ Export capabilities

API Features

✅ RESTful API with OpenAPI documentation
✅ Bearer token authentication
✅ Rate limiting and security
✅ Batch query processing
✅ Health monitoring
✅ Comprehensive error handling

Production Features

✅ Docker containerization
✅ Kubernetes deployment
✅ Multi-cloud support
✅ Logging and monitoring
✅ Auto-scaling ready
✅ Security best practices

🛡️ Security Features

🔐 API key authentication
🛡️ SQL injection prevention
✅ Query validation and sanitization
🔒 Read-only query enforcement
📊 Rate limiting and monitoring
🔍 Comprehensive logging

📈 Performance

⚡ Optimized T5 model inference
🚀 Async API endpoints
💾 Schema caching
📊 Query result caching
🔄 Connection pooling
📈 Horizontal scaling support

🤝 Contributing

Fork the repository
Create feature branch: git checkout -b feature/amazing-feature
Run tests: ./scripts/run_tests.sh
Commit changes: git commit -m 'Add amazing feature'
Push to branch: git push origin feature/amazing-feature
Open Pull Request

📚 Documentation

🔗 Related Resources

📘 Spider Dataset - Training data
🤖 Hugging Face Models - Pre-trained models
⚙️ SQLGlot - SQL transpilation
📚 NL2SQL Papers - Research

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Spider Dataset Team for high-quality NL2SQL benchmarks
Hugging Face for transformer models and infrastructure
FastAPI & Streamlit teams for excellent frameworks
SQLAlchemy & SQLGlot for robust SQL handling

🌟 Star this repo if you find it useful!

"Making databases conversational, one query at a time." 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Schema-Aware NL2SQL: Production-Ready Natural Language to SQL

🚀 Overview

✨ Key Features

📁 Complete Project Structure

🛠️ Quick Start

1. Clone and Setup

2. Start the Web Interface

3. Start the API Server

🌐 API Quick Start

🚀 Deployment Options

Development

Docker

Docker Compose (Full Stack)

Kubernetes

Cloud Platforms

🧪 Testing

📊 Features Overview

Core Capabilities

Web Interface Features

API Features

Production Features

🛡️ Security Features

📈 Performance

🤝 Contributing

📚 Documentation

🔗 Related Resources

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.devcontainer		.devcontainer
.streamlit		.streamlit
data		data
deployment		deployment
docs		docs
examples		examples
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
PROJECT_STRUCTURE.md		PROJECT_STRUCTURE.md
README.md		README.md
TRANSFORMATION_LOG.md		TRANSFORMATION_LOG.md
api.py		api.py
app.py		app.py
config.py		config.py
demo.py		demo.py
env.example		env.example
requirements-minimal.txt		requirements-minimal.txt
requirements.txt		requirements.txt
setup.py		setup.py

Srijan-Ratrey/Schema-Aware-Natural-Language-to-SQL-Agent

Folders and files

Latest commit

History

Repository files navigation

🧠 Schema-Aware NL2SQL: Production-Ready Natural Language to SQL

🚀 Overview

✨ Key Features

📁 Complete Project Structure

🛠️ Quick Start

1. Clone and Setup

2. Start the Web Interface

3. Start the API Server

🌐 API Quick Start

🚀 Deployment Options

Development

Docker

Docker Compose (Full Stack)

Kubernetes

Cloud Platforms

🧪 Testing

📊 Features Overview

Core Capabilities

Web Interface Features

API Features

Production Features

🛡️ Security Features

📈 Performance

🤝 Contributing

📚 Documentation

🔗 Related Resources

📄 License

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages