A complete AWX-based network device upgrade management system designed for managing firmware upgrades across 1000+ heterogeneous network devices with comprehensive validation, security, and monitoring.
This system provides automated firmware upgrade capabilities for:
- Cisco NX-OS (Nexus Switches) with ISSU support
- Cisco IOS-XE (Enterprise Routers/Switches) with Install Mode
- Metamako MOS (Ultra-Low Latency Switches) with Application Management
- Opengear (Console Servers/Smart PDUs) with multi-architecture support
- FortiOS (Fortinet Firewalls) with HA coordination
Status: Production ready for all platforms. See Platform Implementation Status for detailed status.
- Phase 1: Image Loading (business hours safe)
- Phase 2: Image Installation (maintenance window)
- Complete rollback capabilities
- Server-Initiated PUSH Transfers Only - All firmware pushed from upgrade server to devices
- Zero Device-Initiated Operations - No device-to-server connections for firmware retrieval
- SSH Key Authentication Priority - SSH keys preferred over password authentication
- SHA512 Hash Verification - Complete integrity validation for all firmware images
- Cryptographic Signature Verification - Where supported by platform
- Complete Security Audit Trail - All operations logged and verified
- Pre/post upgrade network state comparison
- BGP, BFD, IGMP/multicast, routing validation
- IPSec tunnel and VPN connectivity validation
- Interface optics and transceiver health monitoring
- Protocol convergence timing with baseline comparison
- Native systemd service deployment (AWX and NetBox)
- Pre-existing NetBox integration
- InfluxDB v2 metrics integration
- β Complete Grafana dashboard automation with multi-environment support
- β Real-time operational monitoring with 15-second refresh dashboards
- Existing monitoring system integration
# 1. Install base system
./install/setup-system.sh
# 2. Setup AWX with native services
./install/setup-awx.sh
# 3. Setup NetBox with native services
./install/setup-netbox.sh
# 4. Configure monitoring integration
./install/configure-telegraf.sh
# 5. Set up SSL certificates
./install/setup-ssl.sh
# 6. Start all services
./install/create-services.sh
# 7. Deploy Grafana dashboards
cd integration/grafana
export INFLUXDB_TOKEN="your_token_here"
./provision-dashboards.sh
Comprehensive testing capabilities for Mac/Linux development without physical devices:
- β Syntax Validation: 100% CLEAN - All 69+ Ansible files pass syntax checks
- β Security Validation: 100% COMPLIANT - All secure transfer tests pass (10/10)
- β Test Suite Pass Rate: 100% - All 23 test suites passing cleanly β
- β Container Integration: SUCCESS - Multi-architecture images (amd64/arm64) available
- β Molecule Testing: 5/9 ROLES - Critical roles configured with Docker testing
- β Container Tests: OPTIMIZED - Parallel execution in 18 minutes (6 test suites)
# Syntax validation (100% clean)
ansible-playbook --syntax-check ansible-content/playbooks/main-upgrade-workflow.yml
# Mock device testing (all 5 platforms)
ansible-playbook -i tests/mock-inventories/all-platforms.yml --check \
ansible-content/playbooks/main-upgrade-workflow.yml
# Complete test suite
./tests/run-all-tests.sh
# Molecule testing (requires Docker)
cd tests/molecule-tests && molecule test
# Container testing (production ready)
docker run --rm ghcr.io/garryshtern/network-device-upgrade-system:latest
podman run --rm ghcr.io/garryshtern/network-device-upgrade-system:latest
- Mock Inventory Testing - Simulated device testing for all platforms β
- Variable Validation - Requirements and constraint validation β
- Template Rendering - Jinja2 template testing without connections β
- Workflow Logic - Decision path and conditional testing β
- Error Handling - Error condition and recovery validation β
- Integration Testing - Complete workflow with mock devices β
- Performance Testing - Execution time and resource measurement β
- Molecule Testing - Container-based advanced testing β
- Platform-Specific Testing - Vendor-specific comprehensive testing β
- YAML/JSON Validation - File syntax and structure validation β
- CI/CD Integration - GitHub Actions automated testing β
See comprehensive guide: Testing Framework Guide
Complete documentation with architectural diagrams and implementation guides:
- π Documentation Hub - Start here for comprehensive guides
- βοΈ Installation Guide - Step-by-step deployment
- π Workflow Guide - Upgrade process and safety mechanisms
- ποΈ Platform Guide - Technical implementation details
- π Implementation Status - Current completion analysis
- π§ͺ Testing Framework Guide - Comprehensive testing without physical devices
- π§ͺ Molecule Testing Guide - Role-specific container testing
graph TD
A[AWX Services<br/>Job Control<br/>systemd] --> B[Ansible Engine<br/>Playbook Execution<br/>Role-Based]
B --> C[Network Devices<br/>1000+ Supported<br/>Multi-Vendor]
D[NetBox<br/>Inventory DB<br/>Pre-existing] --> B
E[Telegraf<br/>Metrics Agent<br/>Collection] --> F[InfluxDB v2<br/>Time Series<br/>Existing]
C --> F
F --> H[Grafana<br/>Dashboards<br/>Existing]
C -.-> I[Cisco NX-OS]
C -.-> J[Cisco IOS-XE]
C -.-> K[FortiOS]
C -.-> L[Metamako MOS]
C -.-> M[Opengear]
style A fill:#e1f5fe
style C fill:#f3e5f5
style F fill:#e8f5e8
style H fill:#fff3e0
Alternative System Flow:
Component | Function | Integration |
---|---|---|
AWX Services (systemd) | Job orchestration and workflow control | β Ansible Engine |
Ansible Engine | Playbook execution and device automation | β Network Devices |
NetBox (Pre-existing) | Device inventory and IPAM management | β Ansible Engine |
Telegraf | Metrics collection agent | β InfluxDB v2 |
Network Devices | Target devices for upgrades | β Metrics Export |
InfluxDB v2 | Time-series metrics storage | β Grafana |
Grafana | Monitoring dashboards and visualization | Final consumer |
flowchart TD
U[User Request] --> A[AWX Web UI]
A --> B[Job Templates]
B --> C[Workflows]
B --> D[Dynamic Inventory]
D --> E[NetBox<br/>Device Data<br/>Variables]
C --> F[Ansible Execution]
D --> F
F --> G[Network Devices]
G --> H[Metrics Collection]
H --> I[InfluxDB]
I --> J[Grafana<br/>Dashboards]
subgraph "Job Templates"
B1[Health Check]
B2[Image Load]
B3[Validation]
end
subgraph "Workflows"
C1[Phase 1: Load]
C2[Phase 2: Install]
C3[Phase 3: Verify]
end
style U fill:#ffeb3b
style G fill:#f3e5f5
style I fill:#e8f5e8
style J fill:#fff3e0
Simplified Data Flow:
- User Request β AWX Web Interface
- AWX β Executes Ansible playbooks
- Ansible β Connects to network devices via SSH/API
- NetBox β Provides device inventory to Ansible
- Network Devices β Export metrics during operations
- Telegraf β Collects metrics and sends to InfluxDB
- InfluxDB β Stores time-series data for Grafana
- Grafana β Displays dashboards and reports to users
- OS: RHEL/CentOS 8+ or Ubuntu 20.04+
- CPU: 4 cores minimum
- RAM: 8GB minimum
- Storage: 100GB+ for firmware and logs
- Network: Reliable connectivity to all managed devices
- Python: 3.13 with pip
- Ansible: 11.9.0 (ansible-core 2.19.1) - Latest stable version
- Git: Latest stable version
- Single Server Deployment: No clustering required
- Container-based AWX: Podman/Docker container deployment
- Pre-existing NetBox: Uses existing NetBox installation
- SystemD User Services: Native Linux user service management for base components
network-upgrade-system/
βββ deployment/ # Service-based deployment structure
β βββ system/ # Base system setup (SSL, system config)
β βββ services/ # Individual service deployments
β β βββ awx/ # AWX automation platform
β β βββ netbox/ # NetBox IPAM & device inventory
β β βββ grafana/ # β
Complete dashboard automation
β β βββ telegraf/ # Metrics collection
β β βββ redis/ # Caching & job queue
β βββ scripts/ # General deployment scripts
βββ ansible-content/ # Ansible automation content
β βββ playbooks/ # Main orchestration playbooks
β βββ roles/ # Vendor-specific upgrade roles
β βββ collections/ # Ansible collection requirements
βββ tests/ # Comprehensive test suites
βββ docs/ # Complete documentation
βββ tools/ # Development and utility tools
βββ .claude/ # Claude Code commands and workflows
- π Installation Guide - Complete setup instructions
- π Upgrade Workflow Guide - Upgrade process and safety mechanisms
- ποΈ Platform Implementation Guide - Technical implementation details
- π Grafana Integration - Dashboard automation and monitoring
- π Documentation Hub - Complete documentation index
For technical support and questions:
- Check the Installation Guide troubleshooting section
- Review platform-specific procedures in Platform Implementation Guide
- Examine log files in
$HOME/.local/share/network-upgrade/logs/
- Use the built-in health check:
./scripts/system-health.sh
This project is licensed under the MIT License - see the LICENSE file for details.