AI Development Lab with MCP Server for secure, auditable AI tool interactions and RAG evaluation gates.
-
Install dependencies:
pip install -r requirements.txt
-
Start the MCP server:
.venv/bin/python -m mcp_server.simple_server
-
Run tests:
pytest
Run evaluation locally:
# Run full evaluation
python eval/run.py --dataset eval/data/lab/lab_dev.jsonl --output eval/runs/$(date +%Y%m%d-%H%M%S)
# Check gates
python scripts/ci/parse_metrics.py eval/runs/*/metrics.json
# Start MCP server
.venv/bin/python -m mcp_server.simple_server
The MCP server provides the following tools:
run_command
: Execute terminal commands safely with timeoutcheck_file
: Check if files exist and get metadataread_file
: Safely read files with line limitslist_directory
: List directory contents with limits
run_eval
: Run RAG evaluation safelycheck_gates
: Check if evaluation gates pass
# Test MCP server
curl -X POST http://localhost:8000/tools/run_command \
-H "Content-Type: application/json" \
-d '{"command": "ls -la", "timeout": 10}'
# Check file existence
curl -X POST http://localhost:8000/tools/check_file \
-H "Content-Type: application/json" \
-d '{"filepath": "eval/run.py"}'
# Run evaluation
curl -X POST http://localhost:8000/tools/run_eval \
-H "Content-Type: application/json" \
-d '{"dataset": "eval/data/lab/lab_dev.jsonl", "output_dir": "eval/runs/test"}'
- MCP Server: FastAPI-based server providing AI tools via MCP protocol
- Security: Guardian-based access control and PII redaction
- Audit: Comprehensive logging of all tool interactions
- Evaluation: Automated testing and metrics for AI models
- RAG Gates: Comprehensive evaluation framework with automated CI integration
lab/
- Research and development experimentseval/
- Evaluation framework and gatesmcp_server/
- MCP server implementationevidence/
- Evaluation evidence and reports
See docs/cursor-usage.md for Cursor IDE setup and usage.