A comprehensive Schema-Aware Natural Language to SQL (NL2SQL) system that converts natural language questions into accurate SQL queries across dynamic database schemas. Features both a web interface and production-ready REST API with deployment support for any cloud platform.
- π§ Schema-Aware Intelligence: Dynamic schema extraction and understanding
- π Multi-Database Support: SQLite, PostgreSQL, MySQL with dialect transpilation
- π Production API: Complete REST API with authentication and monitoring
- π₯οΈ Web Interface: Intuitive Streamlit UI for interactive querying
- π Cloud Ready: Docker, Kubernetes, and multi-cloud deployment support
- π Security First: API authentication, SQL injection prevention, query validation
- π Analytics: Query history, confidence scoring, and usage statistics
- π§ͺ Fully Tested: Comprehensive test suite with CI/CD ready structure
Schema-Aware-NL2SQL/
βββ π README.md # Main documentation
βββ π requirements.txt # Python dependencies
βββ π setup.py # Package setup
βββ π config.py # Configuration management
βββ π .env.example # Environment template
βββ π .gitignore # Git ignore rules
β
βββ π§ api.py # FastAPI REST API server
βββ π₯οΈ app.py # Streamlit web interface
βββ π― demo.py # Comprehensive demo script
β
βββ π src/ # Core source code
β βββ __init__.py
β βββ nl2sql_agent.py # Main orchestrator
β βββ nl2sql_model.py # T5 model wrapper
β βββ schema_retriever.py # Database schema extraction
β
βββ π docs/ # Documentation
β βββ README.md # Detailed documentation
β βββ API_DOCUMENTATION.md # API reference
β βββ SETUP_COMPLETE.md # Setup guide
β βββ ENVIRONMENT_SETUP.md # Environment guide
β βββ GITHUB_SETUP.md # GitHub integration
β
βββ π examples/ # Example scripts
β βββ quickstart.py # Quick start demo
β βββ client_example.py # API client example
β
βββ π tests/ # Test suite
β βββ __init__.py
β βββ test_api.py # API endpoint tests
β βββ test_nl2sql_agent.py # Core functionality tests
β
βββ π scripts/ # Utility scripts
β βββ deploy.sh # Deployment automation
β βββ run_tests.sh # Test runner
β βββ setup_new_environment.py # Environment setup
β
βββ π deployment/ # Deployment configurations
β βββ docker/
β β βββ Dockerfile # Container definition
β β βββ docker-compose.yml # Multi-service orchestration
β βββ kubernetes/
β β βββ deployment.yaml # K8s deployment config
β βββ cloud/
β βββ aws-ecs-task.json # AWS ECS task definition
β
βββ π data/ # Database files
β βββ quickstart_sample.db # Sample SQLite database
β
βββ π models/ # Model cache (auto-created)
βββ π logs/ # Application logs (auto-created)
βββ π nl2sql_env/ # Virtual environment
# Clone repository
git clone https://github.com/Srijan-Ratrey/Schema-Aware-Natural-Language-to-SQL-Agent.git
cd Schema-Aware-Natural-Language-to-SQL-Agent
# Quick setup with deployment script
chmod +x scripts/deploy.sh
./scripts/deploy.sh dev
streamlit run app.py
python api.py
Access API documentation at: http://localhost:8000/docs
import requests
# API configuration
API_BASE = "http://localhost:8000"
API_KEY = "your-api-key-here"
headers = {"Authorization": f"Bearer {API_KEY}"}
# Connect to database
requests.post(f"{API_BASE}/connect",
json={"db_type": "sqlite", "db_path": "data/quickstart_sample.db"},
headers=headers
)
# Query database
response = requests.post(f"{API_BASE}/query",
json={"query": "Show all books with rating above 4.5"},
headers=headers
)
print(response.json())
./scripts/deploy.sh dev
./scripts/deploy.sh docker
./scripts/deploy.sh compose
kubectl apply -f deployment/kubernetes/deployment.yaml
- AWS ECS: Use
deployment/cloud/aws-ecs-task.json
- Google Cloud Run: Build with Docker and deploy
- Azure Container Instances: Deploy with Docker image
- Heroku: Deploy with git push
# Run comprehensive test suite
./scripts/run_tests.sh
# Run specific tests
python -m pytest tests/test_api.py -v
python -m pytest tests/test_nl2sql_agent.py -v
- β Dynamic schema extraction and understanding
- β Fine-tuned T5 models (Spider dataset trained)
- β Multi-database support (SQLite, PostgreSQL, MySQL)
- β Real-time SQL generation and execution
- β Confidence scoring and query validation
- β Query history and analytics
- β Interactive Streamlit UI
- β Schema visualization
- β Query result visualization
- β Batch query processing
- β Export capabilities
- β RESTful API with OpenAPI documentation
- β Bearer token authentication
- β Rate limiting and security
- β Batch query processing
- β Health monitoring
- β Comprehensive error handling
- β Docker containerization
- β Kubernetes deployment
- β Multi-cloud support
- β Logging and monitoring
- β Auto-scaling ready
- β Security best practices
- π API key authentication
- π‘οΈ SQL injection prevention
- β Query validation and sanitization
- π Read-only query enforcement
- π Rate limiting and monitoring
- π Comprehensive logging
- β‘ Optimized T5 model inference
- π Async API endpoints
- πΎ Schema caching
- π Query result caching
- π Connection pooling
- π Horizontal scaling support
- Fork the repository
- Create feature branch:
git checkout -b feature/amazing-feature
- Run tests:
./scripts/run_tests.sh
- Commit changes:
git commit -m 'Add amazing feature'
- Push to branch:
git push origin feature/amazing-feature
- Open Pull Request
- π Complete Setup Guide
- π API Documentation
- π³ Deployment Guide
- π§ͺ Testing Guide
- π§ Environment Setup
- π Spider Dataset - Training data
- π€ Hugging Face Models - Pre-trained models
- βοΈ SQLGlot - SQL transpilation
- π NL2SQL Papers - Research
This project is licensed under the MIT License - see the LICENSE file for details.
- Spider Dataset Team for high-quality NL2SQL benchmarks
- Hugging Face for transformer models and infrastructure
- FastAPI & Streamlit teams for excellent frameworks
- SQLAlchemy & SQLGlot for robust SQL handling
π Star this repo if you find it useful!
"Making databases conversational, one query at a time." π