Skip to content

Srijan-Ratrey/Schema-Aware-Natural-Language-to-SQL-Agent

Repository files navigation

🧠 Schema-Aware NL2SQL: Production-Ready Natural Language to SQL

Python 3.8+ License: MIT API Docs

πŸš€ Overview

A comprehensive Schema-Aware Natural Language to SQL (NL2SQL) system that converts natural language questions into accurate SQL queries across dynamic database schemas. Features both a web interface and production-ready REST API with deployment support for any cloud platform.

✨ Key Features

  • 🧠 Schema-Aware Intelligence: Dynamic schema extraction and understanding
  • πŸ”„ Multi-Database Support: SQLite, PostgreSQL, MySQL with dialect transpilation
  • 🌐 Production API: Complete REST API with authentication and monitoring
  • πŸ–₯️ Web Interface: Intuitive Streamlit UI for interactive querying
  • πŸš€ Cloud Ready: Docker, Kubernetes, and multi-cloud deployment support
  • πŸ”’ Security First: API authentication, SQL injection prevention, query validation
  • πŸ“Š Analytics: Query history, confidence scoring, and usage statistics
  • πŸ§ͺ Fully Tested: Comprehensive test suite with CI/CD ready structure

πŸ“ Complete Project Structure

Schema-Aware-NL2SQL/
β”œβ”€β”€ πŸ“„ README.md                       # Main documentation
β”œβ”€β”€ πŸ“„ requirements.txt                # Python dependencies
β”œβ”€β”€ πŸ“„ setup.py                        # Package setup
β”œβ”€β”€ πŸ“„ config.py                       # Configuration management
β”œβ”€β”€ πŸ“„ .env.example                    # Environment template
β”œβ”€β”€ πŸ“„ .gitignore                      # Git ignore rules
β”‚
β”œβ”€β”€ πŸ”§ api.py                          # FastAPI REST API server
β”œβ”€β”€ πŸ–₯️ app.py                          # Streamlit web interface
β”œβ”€β”€ 🎯 demo.py                         # Comprehensive demo script
β”‚
β”œβ”€β”€ πŸ“‚ src/                            # Core source code
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ nl2sql_agent.py                # Main orchestrator
β”‚   β”œβ”€β”€ nl2sql_model.py                # T5 model wrapper
β”‚   └── schema_retriever.py            # Database schema extraction
β”‚
β”œβ”€β”€ πŸ“‚ docs/                           # Documentation
β”‚   β”œβ”€β”€ README.md                      # Detailed documentation
β”‚   β”œβ”€β”€ API_DOCUMENTATION.md           # API reference
β”‚   β”œβ”€β”€ SETUP_COMPLETE.md              # Setup guide
β”‚   β”œβ”€β”€ ENVIRONMENT_SETUP.md           # Environment guide
β”‚   └── GITHUB_SETUP.md                # GitHub integration
β”‚
β”œβ”€β”€ πŸ“‚ examples/                       # Example scripts
β”‚   β”œβ”€β”€ quickstart.py                  # Quick start demo
β”‚   └── client_example.py              # API client example
β”‚
β”œβ”€β”€ πŸ“‚ tests/                          # Test suite
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ test_api.py                    # API endpoint tests
β”‚   └── test_nl2sql_agent.py           # Core functionality tests
β”‚
β”œβ”€β”€ πŸ“‚ scripts/                        # Utility scripts
β”‚   β”œβ”€β”€ deploy.sh                      # Deployment automation
β”‚   β”œβ”€β”€ run_tests.sh                   # Test runner
β”‚   └── setup_new_environment.py       # Environment setup
β”‚
β”œβ”€β”€ πŸ“‚ deployment/                     # Deployment configurations
β”‚   β”œβ”€β”€ docker/
β”‚   β”‚   β”œβ”€β”€ Dockerfile                 # Container definition
β”‚   β”‚   └── docker-compose.yml         # Multi-service orchestration
β”‚   β”œβ”€β”€ kubernetes/
β”‚   β”‚   └── deployment.yaml            # K8s deployment config
β”‚   └── cloud/
β”‚       └── aws-ecs-task.json          # AWS ECS task definition
β”‚
β”œβ”€β”€ πŸ“‚ data/                           # Database files
β”‚   └── quickstart_sample.db           # Sample SQLite database
β”‚
β”œβ”€β”€ πŸ“‚ models/                         # Model cache (auto-created)
β”œβ”€β”€ πŸ“‚ logs/                           # Application logs (auto-created)
└── πŸ“‚ nl2sql_env/                     # Virtual environment

πŸ› οΈ Quick Start

1. Clone and Setup

# Clone repository
git clone https://github.com/Srijan-Ratrey/Schema-Aware-Natural-Language-to-SQL-Agent.git
cd Schema-Aware-Natural-Language-to-SQL-Agent

# Quick setup with deployment script
chmod +x scripts/deploy.sh
./scripts/deploy.sh dev

2. Start the Web Interface

streamlit run app.py

3. Start the API Server

python api.py

Access API documentation at: http://localhost:8000/docs

🌐 API Quick Start

import requests

# API configuration
API_BASE = "http://localhost:8000"
API_KEY = "your-api-key-here"
headers = {"Authorization": f"Bearer {API_KEY}"}

# Connect to database
requests.post(f"{API_BASE}/connect", 
    json={"db_type": "sqlite", "db_path": "data/quickstart_sample.db"},
    headers=headers
)

# Query database
response = requests.post(f"{API_BASE}/query",
    json={"query": "Show all books with rating above 4.5"},
    headers=headers
)

print(response.json())

πŸš€ Deployment Options

Development

./scripts/deploy.sh dev

Docker

./scripts/deploy.sh docker

Docker Compose (Full Stack)

./scripts/deploy.sh compose

Kubernetes

kubectl apply -f deployment/kubernetes/deployment.yaml

Cloud Platforms

  • AWS ECS: Use deployment/cloud/aws-ecs-task.json
  • Google Cloud Run: Build with Docker and deploy
  • Azure Container Instances: Deploy with Docker image
  • Heroku: Deploy with git push

πŸ§ͺ Testing

# Run comprehensive test suite
./scripts/run_tests.sh

# Run specific tests
python -m pytest tests/test_api.py -v
python -m pytest tests/test_nl2sql_agent.py -v

πŸ“Š Features Overview

Core Capabilities

  • βœ… Dynamic schema extraction and understanding
  • βœ… Fine-tuned T5 models (Spider dataset trained)
  • βœ… Multi-database support (SQLite, PostgreSQL, MySQL)
  • βœ… Real-time SQL generation and execution
  • βœ… Confidence scoring and query validation
  • βœ… Query history and analytics

Web Interface Features

  • βœ… Interactive Streamlit UI
  • βœ… Schema visualization
  • βœ… Query result visualization
  • βœ… Batch query processing
  • βœ… Export capabilities

API Features

  • βœ… RESTful API with OpenAPI documentation
  • βœ… Bearer token authentication
  • βœ… Rate limiting and security
  • βœ… Batch query processing
  • βœ… Health monitoring
  • βœ… Comprehensive error handling

Production Features

  • βœ… Docker containerization
  • βœ… Kubernetes deployment
  • βœ… Multi-cloud support
  • βœ… Logging and monitoring
  • βœ… Auto-scaling ready
  • βœ… Security best practices

πŸ›‘οΈ Security Features

  • πŸ” API key authentication
  • πŸ›‘οΈ SQL injection prevention
  • βœ… Query validation and sanitization
  • πŸ”’ Read-only query enforcement
  • πŸ“Š Rate limiting and monitoring
  • πŸ” Comprehensive logging

πŸ“ˆ Performance

  • ⚑ Optimized T5 model inference
  • πŸš€ Async API endpoints
  • πŸ’Ύ Schema caching
  • πŸ“Š Query result caching
  • πŸ”„ Connection pooling
  • πŸ“ˆ Horizontal scaling support

🀝 Contributing

  1. Fork the repository
  2. Create feature branch: git checkout -b feature/amazing-feature
  3. Run tests: ./scripts/run_tests.sh
  4. Commit changes: git commit -m 'Add amazing feature'
  5. Push to branch: git push origin feature/amazing-feature
  6. Open Pull Request

πŸ“š Documentation

πŸ”— Related Resources

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Spider Dataset Team for high-quality NL2SQL benchmarks
  • Hugging Face for transformer models and infrastructure
  • FastAPI & Streamlit teams for excellent frameworks
  • SQLAlchemy & SQLGlot for robust SQL handling

🌟 Star this repo if you find it useful!

"Making databases conversational, one query at a time." πŸš€

Packages

No packages published