Skip to content

Production-ready AWX-based network device upgrade management system for 1000+ heterogeneous network devices with comprehensive validation, security, and monitoring.

License

Notifications You must be signed in to change notification settings

garryshtern/network-device-upgrade-system

Repository files navigation

Network Device Upgrade Management System

A complete AWX-based network device upgrade management system designed for managing firmware upgrades across 1000+ heterogeneous network devices with comprehensive validation, security, and monitoring.

Overview

This system provides automated firmware upgrade capabilities for:

  • Cisco NX-OS (Nexus Switches) with ISSU support
  • Cisco IOS-XE (Enterprise Routers/Switches) with Install Mode
  • Metamako MOS (Ultra-Low Latency Switches) with Application Management
  • Opengear (Console Servers/Smart PDUs) with multi-architecture support
  • FortiOS (Fortinet Firewalls) with HA coordination

Status: Production ready for all platforms. See Platform Implementation Status for detailed status.

Key Features

βœ… Phase-Separated Upgrade Process

  • Phase 1: Image Loading (business hours safe)
  • Phase 2: Image Installation (maintenance window)
  • Complete rollback capabilities

πŸ”’ Maximum Security Compliance

  • Server-Initiated PUSH Transfers Only - All firmware pushed from upgrade server to devices
  • Zero Device-Initiated Operations - No device-to-server connections for firmware retrieval
  • SSH Key Authentication Priority - SSH keys preferred over password authentication
  • SHA512 Hash Verification - Complete integrity validation for all firmware images
  • Cryptographic Signature Verification - Where supported by platform
  • Complete Security Audit Trail - All operations logged and verified

πŸ“Š Advanced Validation

  • Pre/post upgrade network state comparison
  • BGP, BFD, IGMP/multicast, routing validation
  • IPSec tunnel and VPN connectivity validation
  • Interface optics and transceiver health monitoring
  • Protocol convergence timing with baseline comparison

πŸš€ Enterprise Integration

  • Native systemd service deployment (AWX and NetBox)
  • Pre-existing NetBox integration
  • InfluxDB v2 metrics integration
  • βœ… Complete Grafana dashboard automation with multi-environment support
  • βœ… Real-time operational monitoring with 15-second refresh dashboards
  • Existing monitoring system integration

Quick Start

# 1. Install base system
./install/setup-system.sh

# 2. Setup AWX with native services
./install/setup-awx.sh

# 3. Setup NetBox with native services
./install/setup-netbox.sh

# 4. Configure monitoring integration
./install/configure-telegraf.sh

# 5. Set up SSL certificates
./install/setup-ssl.sh

# 6. Start all services
./install/create-services.sh

# 7. Deploy Grafana dashboards
cd integration/grafana
export INFLUXDB_TOKEN="your_token_here"
./provision-dashboards.sh

πŸ§ͺ Testing Framework

Comprehensive testing capabilities for Mac/Linux development without physical devices:

πŸ“Š Current Test Results (Updated: October 1, 2025)

  • βœ… Syntax Validation: 100% CLEAN - All 69+ Ansible files pass syntax checks
  • βœ… Security Validation: 100% COMPLIANT - All secure transfer tests pass (10/10)
  • βœ… Test Suite Pass Rate: 100% - All 23 test suites passing cleanly βœ…
  • βœ… Container Integration: SUCCESS - Multi-architecture images (amd64/arm64) available
  • βœ… Molecule Testing: 5/9 ROLES - Critical roles configured with Docker testing
  • βœ… Container Tests: OPTIMIZED - Parallel execution in 18 minutes (6 test suites)

πŸš€ Quick Testing

# Syntax validation (100% clean)
ansible-playbook --syntax-check ansible-content/playbooks/main-upgrade-workflow.yml

# Mock device testing (all 5 platforms)
ansible-playbook -i tests/mock-inventories/all-platforms.yml --check \
  ansible-content/playbooks/main-upgrade-workflow.yml

# Complete test suite
./tests/run-all-tests.sh

# Molecule testing (requires Docker)
cd tests/molecule-tests && molecule test

# Container testing (production ready)
docker run --rm ghcr.io/garryshtern/network-device-upgrade-system:latest
podman run --rm ghcr.io/garryshtern/network-device-upgrade-system:latest

βœ… Testing Categories - FULLY IMPLEMENTED

  • Mock Inventory Testing - Simulated device testing for all platforms βœ…
  • Variable Validation - Requirements and constraint validation βœ…
  • Template Rendering - Jinja2 template testing without connections βœ…
  • Workflow Logic - Decision path and conditional testing βœ…
  • Error Handling - Error condition and recovery validation βœ…
  • Integration Testing - Complete workflow with mock devices βœ…
  • Performance Testing - Execution time and resource measurement βœ…
  • Molecule Testing - Container-based advanced testing βœ…
  • Platform-Specific Testing - Vendor-specific comprehensive testing βœ…
  • YAML/JSON Validation - File syntax and structure validation βœ…
  • CI/CD Integration - GitHub Actions automated testing βœ…

See comprehensive guide: Testing Framework Guide

πŸ“š Documentation

Complete documentation with architectural diagrams and implementation guides:

Architecture

System Overview

graph TD
    A[AWX Services<br/>Job Control<br/>systemd] --> B[Ansible Engine<br/>Playbook Execution<br/>Role-Based]
    B --> C[Network Devices<br/>1000+ Supported<br/>Multi-Vendor]
    
    D[NetBox<br/>Inventory DB<br/>Pre-existing] --> B
    E[Telegraf<br/>Metrics Agent<br/>Collection] --> F[InfluxDB v2<br/>Time Series<br/>Existing]
    
    C --> F
    F --> H[Grafana<br/>Dashboards<br/>Existing]
    
    C -.-> I[Cisco NX-OS]
    C -.-> J[Cisco IOS-XE]  
    C -.-> K[FortiOS]
    C -.-> L[Metamako MOS]
    C -.-> M[Opengear]
    
    style A fill:#e1f5fe
    style C fill:#f3e5f5
    style F fill:#e8f5e8
    style H fill:#fff3e0
Loading

Alternative System Flow:

Component Function Integration
AWX Services (systemd) Job orchestration and workflow control β†’ Ansible Engine
Ansible Engine Playbook execution and device automation β†’ Network Devices
NetBox (Pre-existing) Device inventory and IPAM management β†’ Ansible Engine
Telegraf Metrics collection agent β†’ InfluxDB v2
Network Devices Target devices for upgrades β†’ Metrics Export
InfluxDB v2 Time-series metrics storage β†’ Grafana
Grafana Monitoring dashboards and visualization Final consumer

Component Interaction Flow

flowchart TD
    U[User Request] --> A[AWX Web UI]
    A --> B[Job Templates]
    B --> C[Workflows]
    
    B --> D[Dynamic Inventory]
    D --> E[NetBox<br/>Device Data<br/>Variables]
    C --> F[Ansible Execution]
    D --> F
    
    F --> G[Network Devices]
    G --> H[Metrics Collection]
    H --> I[InfluxDB]
    I --> J[Grafana<br/>Dashboards]
    
    subgraph "Job Templates"
        B1[Health Check]
        B2[Image Load]
        B3[Validation]
    end
    
    subgraph "Workflows"  
        C1[Phase 1: Load]
        C2[Phase 2: Install]
        C3[Phase 3: Verify]
    end
    
    style U fill:#ffeb3b
    style G fill:#f3e5f5
    style I fill:#e8f5e8
    style J fill:#fff3e0
Loading

Simplified Data Flow:

  1. User Request β†’ AWX Web Interface
  2. AWX β†’ Executes Ansible playbooks
  3. Ansible β†’ Connects to network devices via SSH/API
  4. NetBox β†’ Provides device inventory to Ansible
  5. Network Devices β†’ Export metrics during operations
  6. Telegraf β†’ Collects metrics and sends to InfluxDB
  7. InfluxDB β†’ Stores time-series data for Grafana
  8. Grafana β†’ Displays dashboards and reports to users

Resource Requirements

Minimum System Requirements

  • OS: RHEL/CentOS 8+ or Ubuntu 20.04+
  • CPU: 4 cores minimum
  • RAM: 8GB minimum
  • Storage: 100GB+ for firmware and logs
  • Network: Reliable connectivity to all managed devices

Software Requirements

  • Python: 3.13 with pip
  • Ansible: 11.9.0 (ansible-core 2.19.1) - Latest stable version
  • Git: Latest stable version

Supported Platforms

  • Single Server Deployment: No clustering required
  • Container-based AWX: Podman/Docker container deployment
  • Pre-existing NetBox: Uses existing NetBox installation
  • SystemD User Services: Native Linux user service management for base components

Directory Structure

network-upgrade-system/
β”œβ”€β”€ deployment/                # Service-based deployment structure
β”‚   β”œβ”€β”€ system/                # Base system setup (SSL, system config)
β”‚   β”œβ”€β”€ services/              # Individual service deployments
β”‚   β”‚   β”œβ”€β”€ awx/               # AWX automation platform
β”‚   β”‚   β”œβ”€β”€ netbox/            # NetBox IPAM & device inventory
β”‚   β”‚   β”œβ”€β”€ grafana/           # βœ… Complete dashboard automation
β”‚   β”‚   β”œβ”€β”€ telegraf/          # Metrics collection
β”‚   β”‚   └── redis/             # Caching & job queue
β”‚   └── scripts/               # General deployment scripts
β”œβ”€β”€ ansible-content/           # Ansible automation content
β”‚   β”œβ”€β”€ playbooks/             # Main orchestration playbooks
β”‚   β”œβ”€β”€ roles/                 # Vendor-specific upgrade roles
β”‚   └── collections/           # Ansible collection requirements
β”œβ”€β”€ tests/                     # Comprehensive test suites
β”œβ”€β”€ docs/                      # Complete documentation
β”œβ”€β”€ tools/                     # Development and utility tools
└── .claude/                   # Claude Code commands and workflows

Documentation

Support

For technical support and questions:

  • Check the Installation Guide troubleshooting section
  • Review platform-specific procedures in Platform Implementation Guide
  • Examine log files in $HOME/.local/share/network-upgrade/logs/
  • Use the built-in health check: ./scripts/system-health.sh

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Production-ready AWX-based network device upgrade management system for 1000+ heterogeneous network devices with comprehensive validation, security, and monitoring.

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Packages

 
 
 

Contributors 2

  •  
  •