This repo presents resilience patterns for scaling inference for Generative AI workloads on AWS: Bedrock cross-Region inference, AWS account sharding, and intelligent routing with LLM gateways.
fallback throttling load-balancing quotable-api litellm-ai-gateway bedrock-cross-region-inference genai-resilience aws-account-sharding
-
Updated
Oct 2, 2025 - Python