How to Self-Host AI on Your Proxmox Homelab with Ollama and Open WebUI

By NetworkThinkTank | April 23, 2026

Introduction

I got tired of sending my data to cloud AI services. Every prompt I typed into ChatGPT or Claude was being stored, analyzed, and used for training. For personal questions, code snippets with API keys, and private brainstorming sessions, that never sat well with me.

So I built my own. A fully self-hosted AI assistant running on my Proxmox homelab, powered by Ollama for local LLM inference and Open WebUI for a polished ChatGPT-like interface. The models run on my own NVIDIA GPU, the data stays on my NAS, and nothing leaves my network.

This guide walks you through exactly how I did it – from VM creation to pulling your first model and chatting with it through a clean web interface. If you have a Proxmox server and a spare GPU, you can have this running in an afternoon.

Prerequisites and Hardware Requirements

Here is what you need before starting:

Hardware

  • Proxmox VE host (version 7.x or 8.x)
  • NVIDIA GPU with at least 8GB VRAM (I use an RTX 3060 Ti 8GB)
  • Minimum 16GB RAM allocated to the AI VM (32GB recommended)
  • NAS with NFS or SMB shares available (Synology, TrueNAS, etc.)
  • At least 50GB free storage for models (100GB+ recommended)

Software

  • Proxmox VE installed and running
  • Ubuntu Server 22.04 or 24.04 LTS ISO
  • Docker and Docker Compose
  • NVIDIA drivers (535+ recommended)
  • NVIDIA Container Toolkit

Network

  • Static IP or DHCP reservation for the AI VM
  • Access to your NAS from the VM subnet
  • Optional: domain name for reverse proxy

Architecture Overview

The stack looks like this:

All AI processing happens locally on the GPU inside the VM. Open WebUI provides the browser-based chat interface and connects to Ollama’s API on the backend. The NAS stores all model files and conversation data so nothing is lost if the VM needs rebuilding.

Step 1: Preparing the Proxmox VM

First, create a new VM in Proxmox optimized for AI workloads.

VM Configuration

  • VM ID: 205
  • Name: ollama-gpu
  • OS: Ubuntu Server 22.04 LTS
  • CPU: host type, 8 cores
  • RAM: 16GB minimum (I use 32GB)
  • Disk: 100GB on local-lvm (SSD preferred)
  • Network: vmbr0, bridge mode
  • BIOS: OVMF (UEFI) for GPU passthrough

Enabling IOMMU on the Proxmox Host

Edit GRUB configuration:

nano /etc/default/grub

For AMD CPUs, set:

GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"

For Intel CPUs, set:

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"

Update GRUB and reboot:

update-grub
reboot

Add VFIO modules to /etc/modules:

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Blacklist the NVIDIA drivers on the host:

echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf

Find your GPU’s PCI IDs:

lspci -nn | grep NVIDIA

Example output:

27:00.0 VGA compatible controller [0300]: NVIDIA [10de:2489]
27:00.1 Audio device [0403]: NVIDIA [10de:228b]

Bind to VFIO:

echo "options vfio-pci ids=10de:2489,10de:228b" > /etc/modprobe.d/vfio.conf

Update initramfs and reboot:

update-initramfs -u -k all
reboot

Add GPU to the VM

In the Proxmox web UI:

VM 205 > Hardware > Add > PCI Device
- Select your NVIDIA GPU
- Check "All Functions"
- Check "ROM-Bar"
- Check "PCI-Express"
- Set "Primary GPU" if this VM has no other display

Verify GPU Inside the VM

After booting the VM, run:

lspci | grep -i nvidia

You should see your GPU listed. If not, check IOMMU groups and VFIO binding on the Proxmox host.

Step 2: Installing NVIDIA Drivers and Docker

With the GPU visible inside the VM, install the required software.

Update the System

sudo apt update && sudo apt upgrade -y
sudo reboot

Install NVIDIA Drivers

sudo apt install -y nvidia-driver-535 nvidia-utils-535
sudo reboot

Verify GPU Access

nvidia-smi

Expected output shows your RTX 3060 Ti with driver version and CUDA version. If nvidia-smi fails, check that the GPU passthrough is configured correctly on the Proxmox host.

Install Docker

sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) \
signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo $VERSION_CODENAME) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io \
docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER

Install NVIDIA Container Toolkit

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Verify GPU in Docker

docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi

If this shows your GPU info inside the container, you are ready to deploy Ollama.

Step 3: Deploying Ollama

Create a project directory:

mkdir -p ~/ai-stack
cd ~/ai-stack

Create docker-compose.yml:

version: "3.8"
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- /mnt/nas/ollama-models:/root/.ollama
environment:
- NVIDIA_VISIBLE_DEVICES=all
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: unless-stopped
ports:
- "3000:8080"
volumes:
- /mnt/nas/openwebui-data:/app/backend/data
environment:
- OLLAMA_BASE_URL=http://ollama:11434
depends_on:
- ollama

Start the stack:

docker compose up -d

Pull your first model:

docker exec -it ollama ollama pull llama3.1:8b

This downloads the Llama 3.1 8B parameter model, which is an excellent starting point for an 8GB GPU. The download is roughly 4.7GB and will be stored on your NAS mount.

Other Recommended Models for 8GB VRAM

ollama pull mistral:7b # Great for general tasks
ollama pull codellama:7b # Optimized for coding
ollama pull llama3.1:8b-instruct # Best for chat interactions
ollama pull phi3:mini # Microsoft's compact model
ollama pull gemma2:9b # Google's open model

Test the Ollama API

curl http://localhost:11434/api/generate -d '{
"model": "llama3.1:8b",
"prompt": "Hello, how are you?",
"stream": false
}'

If you get a JSON response with generated text, Ollama is working.

Step 4: Configuring Open WebUI

Open your browser and navigate to:

http://<VM-IP>:3000

First-Time Setup

  1. Create an admin account (first user automatically becomes admin)
  2. Set a strong password – this is your AI assistant gateway
  3. Open WebUI will auto-detect Ollama at the configured URL

Connecting to Ollama

Open WebUI should automatically connect to Ollama using the OLLAMA_BASE_URL environment variable we set in Docker Compose. Verify by clicking Settings > Connections and confirming the Ollama URL shows http://ollama:11434 with a green status.

Key Settings to Configure

  • Settings > General: Set default model to llama3.1:8b
  • Settings > Interface: Enable chat history, code highlighting
  • Settings > Models: View and manage downloaded models
  • Settings > Audio: Enable speech-to-text if desired
  • Settings > Images: Configure image generation if using a compatible model

Creating Custom Modelfiles

You can create specialized assistants using Ollama Modelfiles. Example – A coding assistant:

FROM codellama:7b
SYSTEM "You are an expert programmer. You write clean, efficient
code with clear comments. When asked about code, provide working
examples with explanations."
PARAMETER temperature 0.3
PARAMETER num_ctx 4096

Save this as coding-assistant.modelfile and create it:

docker exec -it ollama ollama create coding-assistant \
-f /path/to/coding-assistant.modelfile

This model then appears in Open WebUI as a selectable assistant.

Step 5: NAS Storage Integration

Storing models and data on your NAS ensures persistence and makes backups straightforward.

Mount NFS Shares on the VM

Install NFS client:

sudo apt install -y nfs-common

Create mount points:

sudo mkdir -p /mnt/nas/ollama-models
sudo mkdir -p /mnt/nas/openwebui-data

Add to /etc/fstab for persistent mounts:

192.168.1.100:/volume1/ai-models /mnt/nas/ollama-models nfs defaults,_netdev 0 0
192.168.1.100:/volume1/ai-data /mnt/nas/openwebui-data nfs defaults,_netdev 0 0

Mount everything:

sudo mount -a

Verify mounts:

df -h | grep nas

Replace 192.168.1.100 with your NAS IP and adjust the share paths to match your NAS configuration (Synology, TrueNAS, etc.).

Important: Make sure Docker containers have write permissions to these mount points. Set ownership if needed:

sudo chown -R 1000:1000 /mnt/nas/ollama-models
sudo chown -R 1000:1000 /mnt/nas/openwebui-data

Backup Strategy

  • NAS snapshots protect model data and conversations
  • Export Open WebUI settings periodically from the admin panel
  • Keep docker-compose.yml in a Git repository
  • Document your Modelfile customizations

Step 6: Networking and Remote Access

Expose Services on Your LAN

By default, the services are accessible at:

  • Ollama API: http://<VM-IP>:11434
  • Open WebUI: http://<VM-IP>:3000

To make Ollama accessible to other machines on your network, ensure the Ollama container binds to 0.0.0.0 (default in our Docker Compose config).

Reverse Proxy with Nginx

Install Nginx on the VM (or use a dedicated reverse proxy VM):

sudo apt install -y nginx

Create /etc/nginx/sites-available/ai-assistant:

server {
listen 80;
server_name ai.homelab.local;
location / {
proxy_pass http://localhost:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 300s;
proxy_send_timeout 300s;
}
}

Enable the site:

sudo ln -s /etc/nginx/sites-available/ai-assistant \
/etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx

Add ai.homelab.local to your DNS server or local hosts file pointing to the VM IP address.

SSL with Let’s Encrypt (if publicly accessible)

sudo apt install -y certbot python3-certbot-nginx
sudo certbot --nginx -d ai.yourdomain.com

Firewall Rules

sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw allow 3000/tcp
sudo ufw allow 11434/tcp
sudo ufw enable

Performance Tuning and Optimization

GPU Memory Management

  • The RTX 3060 Ti has 8GB VRAM, which limits model size
  • Stick to 7B-8B parameter models for best performance
  • Use Q4_K_M quantized models for the best speed/quality balance
  • Monitor VRAM usage: nvidia-smi -l 1 (updates every second)

Model Quantization Guide

Q4_K_M - Best balance of speed and quality (recommended)
Q5_K_M - Slightly better quality, slightly slower
Q8_0 - Near full quality, uses significantly more VRAM
F16 - Full precision, requires 2x the VRAM (not for 8GB cards)

Context Length vs. Speed

  • Default context: 2048 tokens (fast, limited memory)
  • Extended context: 4096 tokens (good balance)
  • Maximum context: 8192+ tokens (slower, more VRAM usage)

Set in your Modelfile or at runtime:

PARAMETER num_ctx 4096

Monitoring Resource Usage

watch -n 1 nvidia-smi # GPU monitoring
htop # CPU and RAM monitoring
docker stats # Container resource usage
iostat -x 1 # Disk I/O monitoring

Conclusion

After following this guide, you now have a fully self-hosted AI assistant running on your Proxmox homelab. Your data stays private, your models run locally on your GPU, and you have a clean web interface for interacting with multiple AI models.

The entire stack – Ollama for inference, Open WebUI for the interface, NAS for storage – runs reliably as a set of Docker containers inside a Proxmox VM with GPU passthrough. It survives reboots, updates cleanly, and scales as you add more models.

This is what homelabbing is about: taking control of your own infrastructure and running services that matter to you. A private AI assistant is one of the most practical and rewarding projects you can build today.


Real-World Deployment Tips

  1. Start with small models first: Pull llama3.1:8b before anything else. It fits comfortably in 8GB VRAM and responds fast. Get everything working before experimenting with larger models.
  2. Use NAS storage from day one: Do not store models on the VM’s local disk. When you inevitably rebuild the VM, you will lose hours re-downloading models. NAS storage makes rebuilds trivial.
  3. Pin your Docker image versions: Use specific tags instead of “latest” in production. An unexpected update broke my Open WebUI setup once when the API format changed between versions.
  4. Set OLLAMA_NUM_PARALLEL=1: On an 8GB card, running multiple concurrent requests causes out-of-memory crashes. Limit Ollama to one request at a time with this environment variable.
  5. Monitor VRAM proactively: Add nvidia-smi -l 5 to a tmux session so you always see GPU memory usage. VRAM exhaustion causes silent failures that are hard to debug.
  6. Enable Docker restart policies: The “unless-stopped” restart policy in our Docker Compose file means containers recover automatically after host reboots or crashes.
  7. Test your NFS mounts under load: Some NAS devices throttle NFS under heavy I/O. Run model inference while monitoring NAS performance to catch bottlenecks early.
  8. Keep a shell alias for quick model pulls: Add to your .bashrc: alias opull='docker exec -it ollama ollama pull' Then pulling models is just: opull mistral:7b

Honest Takeaways and Lessons Learned

  1. Local LLMs are not ChatGPT replacements (yet): The 7B-9B models that fit on an 8GB GPU are impressive but noticeably less capable than GPT-4 or Claude for complex reasoning. They excel at drafting, summarization, code completion, and brainstorming. Manage your expectations accordingly.
  2. GPU passthrough is the hardest part: Getting IOMMU groups clean, VFIO binding correct, and the GPU visible inside the VM took more troubleshooting than the entire rest of the stack combined. Once it works, it stays working, but expect 2-4 hours of debugging on your first attempt.
  3. Open WebUI is surprisingly polished: I expected a rough open-source interface. Instead, Open WebUI is genuinely pleasant to use daily. The chat interface, model switching, conversation history, and document upload features rival commercial products.
  4. Storage adds up fast: Each 7B model is 4-5GB. If you start collecting models (and you will), budget 100-200GB of NAS storage. I currently have 12 models taking up 67GB.
  5. The privacy benefit is real: Once you start using a local AI for sensitive queries – tax questions, medical research, private code review – you realize how uncomfortable it was sending that data to third-party servers. This alone justifies the project.
  6. Docker makes everything easier: Without Docker and the NVIDIA Container Toolkit, this setup would involve painful manual dependency management. The containerized approach means clean upgrades and easy rollbacks.
  7. Community models keep getting better: The open-source LLM ecosystem is evolving rapidly. Models that were state-of-the-art six months ago are now outperformed by newer releases. Check Ollama’s model library regularly for improvements.

Common Pitfalls and How to Avoid Them

Pitfall 1: IOMMU Group Conflicts

Problem: Your GPU shares an IOMMU group with other devices.
Solution: Check groups with: find /sys/kernel/iommu_groups/ -type l
If your GPU is not in a clean group, you may need an ACS override patch or a different PCIe slot. Move the GPU to a slot that isolates it in its own IOMMU group.

Pitfall 2: NVIDIA Driver Conflicts on Proxmox Host

Problem: The Proxmox host loads NVIDIA drivers before VFIO can claim the GPU.
Solution: Blacklist nouveau and nvidia in /etc/modprobe.d/blacklist.conf and ensure VFIO modules load first. Add softdep nvidia pre: vfio-pci to modprobe configuration.

Pitfall 3: Docker Cannot See the GPU

Problem: docker run --gpus all fails with “could not select device driver”.
Solution: The NVIDIA Container Toolkit is not installed or not configured. Run:

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Pitfall 4: Open WebUI Cannot Connect to Ollama

Problem: Open WebUI shows “Connection failed” for the Ollama backend.
Solution: Ensure both containers are on the same Docker network (Docker Compose handles this automatically). Verify the OLLAMA_BASE_URL is set to http://ollama:11434 (using the container name, not localhost).

Pitfall 5: Models Disappear After VM Reboot

Problem: Downloaded models are gone after restarting the VM.
Solution: The NFS/SMB mount is not persisting across reboots. Add the mount to /etc/fstab with the _netdev option and verify with sudo mount -a after reboot.

Pitfall 6: Out of Memory (OOM) Crashes

Problem: Ollama crashes or returns errors during inference.
Solution: You are likely running a model too large for your VRAM. Stick to 7B-8B models on 8GB cards. Set OLLAMA_NUM_PARALLEL=1 to prevent concurrent requests from exceeding VRAM. Monitor with nvidia-smi.

Pitfall 7: Slow Model Loading from NAS

Problem: Models take a very long time to load initially.
Solution: NFS over a 1Gbps connection is the bottleneck. Models are 4-5GB each, so initial load takes 30-40 seconds. Consider 10Gbps networking or storing frequently-used models on local SSD with NAS as backup.

Pitfall 8: GPU Passthrough Breaks After Proxmox Update

Problem: GPU passthrough stops working after a Proxmox kernel update.
Solution: Kernel updates can change IOMMU behavior. After updates, verify VFIO binding:

lspci -nnk -s 27:00
dmesg | grep -i vfio
update-initramfs -u -k all

Always test GPU passthrough after host kernel updates before relying on the AI assistant for important work.


LinkedIn Version

BUILDING A PRIVATE AI ASSISTANT ON MY HOMELAB

I built a self-hosted AI assistant using Ollama and Open WebUI, running on my Proxmox homelab with an NVIDIA RTX 3060 Ti.

Why? Privacy. Control. Learning.

Every prompt I type stays on my network. My models run on my GPU. My conversations are stored on my NAS. Nothing goes to the cloud.

The stack:

  • Proxmox VE for virtualization
  • Ubuntu VM with GPU passthrough (PCIe/IOMMU)
  • Ollama for local LLM inference
  • Open WebUI for a ChatGPT-like interface
  • NAS integration for persistent model storage

What surprised me:

  • Open WebUI is genuinely polished – rivals commercial AI interfaces
  • GPU passthrough was the hardest part (expect 2-4 hours first time)
  • 7B/8B models on an 8GB GPU are great for daily tasks
  • The privacy benefit is more significant than I expected

The open-source AI ecosystem has matured to the point where running your own AI assistant is not just possible – it is practical.

If you have a homelab with a spare GPU, this is one of the most rewarding projects you can build right now.

Full setup guide on my blog: NetworkThinkTank.blog

#homelab #AI #selfhosted #Ollama #OpenWebUI #Proxmox #privacy #LLM #artificialintelligence #homelabbing


Social Media Teaser

Just deployed a fully self-hosted AI assistant on my Proxmox homelab using Ollama and Open WebUI – complete with GPU passthrough and NAS storage integration. Every prompt stays private, every model runs locally, and the web interface rivals ChatGPT. Full build guide with Docker configs and real deployment tips on the blog.


Follow-Up Article Ideas

  1. “Scaling Up: Adding a Second GPU to Your Ollama Homelab for Larger Language Models” – Covers multi-GPU passthrough in Proxmox, running 13B and 70B parameter models across multiple GPUs, VRAM pooling strategies, and benchmarking multi-GPU vs. single-GPU inference performance.
  2. “Building a RAG Pipeline: Teaching Your Self-Hosted AI About Your Own Documents” – Covers Retrieval Augmented Generation (RAG) setup with Open WebUI’s document upload feature, embedding models, vector databases (ChromaDB), indexing your personal knowledge base, and making your AI assistant an expert on your own files.
  3. “Hardening Your Self-Hosted AI: Security Best Practices for Homelab LLM Deployments” – Covers network segmentation for AI services, authentication and access control in Open WebUI, SSL/TLS configuration, firewall rules, monitoring for unauthorized access, Docker security hardening, and safely exposing your AI assistant outside your home network with VPN or Cloudflare Tunnel.

Building a Complete CCNA Lab: EVE-NG to Proxmox Migration with Infrastructure as Code

After weeks of building, troubleshooting, and optimizing my CCNA lab environment, I am excited to share the entire project — now fully documented and open-sourced on GitHub. This post walks through the journey from an initial EVE-NG deployment to a fully automated Proxmox-based lab using Terraform, Ansible, and custom shell scripts.

You can find the complete repository here: github.com/jczaldivar71/eve-ng-ccna-lab

Project Overview

The EVE-NG CCNA Lab project started as a straightforward network simulation environment for CCNA study. It quickly evolved into a full infrastructure-as-code project covering:

  • EVE-NG lab deployment with API-driven automation
  • Migration from EVE-NG to Proxmox for better performance and scalability
  • Custom shell scripts for image management, licensing, and node orchestration
  • A Python script (generate_readme.py) to auto-generate comprehensive documentation
  • qcow2 disk image optimization achieving a 39% storage reduction
  • Terraform and Ansible playbooks for reproducible infrastructure deployment

GitHub Documentation and the generate_readme.py Script

One of the key pieces of this project is the generate_readme.py Python script. Rather than manually maintaining a README that would inevitably fall out of sync with the actual project structure, I wrote a script that scans the repository and automatically generates a comprehensive README.md file.

The script inspects every directory — configs/, scripts/, terraform-ansible/, topology/, and images/ — and produces a fully formatted document with a table of contents, script references, setup instructions, and troubleshooting tips. Running it is as simple as:

cd scripts/
python generate_readme.py

The generated README covers 13 sections including Overview, Project Structure, Prerequisites, Quick Start, Lab Topology, Scripts Reference, qcow2 Image Management, EVE-NG API Usage, Proxmox Deployment, Configuration Files, Troubleshooting, Known Limitations, and License information. At 340 lines, it serves as a complete guide for anyone wanting to replicate or build upon this lab.

EVE-NG to Proxmox Migration

EVE-NG is a fantastic network emulation platform, but I ran into limitations around resource management and integration with modern IaC tools. The decision to migrate to Proxmox was driven by several factors:

  • Better resource control: Proxmox provides fine-grained CPU, memory, and storage allocation through its API
  • Terraform integration: The Proxmox Terraform provider enables declarative infrastructure definitions
  • Thin provisioning: Proxmox handles thin-provisioned qcow2 images natively, which was critical for storage optimization
  • Ansible compatibility: Post-deployment configuration is seamless with Ansible playbooks targeting Proxmox VMs

The migration involved exporting router and switch images from EVE-NG, converting and optimizing the qcow2 disk images, and then redeploying them on Proxmox using Terraform. The entire workflow is captured in the terraform-ansible/ directory of the repository.

Automation Scripts

The scripts/ directory contains six purpose-built shell scripts that automate every aspect of lab management:

  • eve-ng-api-auth.sh: Handles cookie-based API authentication with EVE-NG, exporting session tokens for use in subsequent API calls. Includes examples for listing labs, getting node details, and starting all nodes.
  • start-lab-nodes.sh: Automates the process of starting all lab nodes through the EVE-NG REST API with proper sequencing and health checks.
  • scp-upload-images.sh: Securely transfers qcow2 images to the EVE-NG or Proxmox host via SCP with progress tracking and integrity verification.
  • qcow2-optimize.sh: The image optimization workhorse — converts, compresses, and thin-provisions qcow2 disk images (more on this below).
  • fix-permissions.sh: Ensures correct file ownership and permissions on EVE-NG image directories, a common source of lab startup failures.
  • iol-license-fix.sh: Generates and applies the proper IOL (IOS on Linux) license file, which is required for Cisco IOL images to boot correctly.

Each script is documented with usage instructions and can be run independently or chained together for a complete deployment workflow.

qcow2 Image Optimization

One of the most impactful parts of this project was optimizing the qcow2 disk images. Network appliance images (Cisco IOSv, IOSvL2, CSR1000v, etc.) often ship with significant wasted space — preallocated but unused disk blocks that consume real storage.

The qcow2-optimize.sh script automates a multi-step optimization pipeline:

  1. Sparsification: Uses virt-sparsify to zero out unused blocks within the guest filesystem
  2. Compression: Applies qcow2 internal compression via qemu-img convert -c
  3. Thin provisioning: Ensures metadata is set for thin-provisioned allocation on the hypervisor
  4. Integrity check: Runs qemu-img check to verify image health post-optimization

The results were significant: total image storage dropped from 30GB to 18.3GB — a 39% reduction. This is especially meaningful in a home lab where storage is often limited. The optimized images boot identically to the originals but consume far less disk space on the Proxmox host.

Terraform and Ansible Deployment

The final piece of the puzzle is fully automated deployment using Terraform and Ansible. The terraform-ansible/ directory contains everything needed to stand up the lab from scratch:

Terraform handles the infrastructure provisioning:

  • VM creation on Proxmox with defined CPU, memory, and disk parameters
  • Network interface configuration with VLAN tagging
  • Cloud-init integration for initial bootstrapping
  • State management for tracking deployed resources

Ansible manages the post-deployment configuration:

  • init-proxmox.yml: Initializes the Proxmox host with required packages, storage configuration, and network bridges
  • deploy-vm.yml: Deploys individual VMs with their specific configurations
  • remove-gateway.yml: Cleans up default gateway routes that can interfere with lab routing exercises

Configuration variables are stored in group_vars/all.yml (with a .sample template provided), and the hosts inventory file defines the Proxmox target. The ansible.cfg sets sensible defaults for host key checking and privilege escalation.

With this setup, spinning up a complete CCNA lab goes from a manual multi-hour process to a single command:

cd terraform-ansible/
terraform init && terraform apply
ansible-playbook -i hosts deploy-vm.yml

What’s Next

This project is a living repository — I plan to continue adding to it as I progress through my CCNA studies and expand the lab. Future additions may include:

  • Additional topology configurations for specific CCNA exam topics
  • Integration with network monitoring tools
  • CI/CD pipeline for automated lab testing
  • Support for additional platforms (VIRL, GNS3)

If you are studying for the CCNA or building your own home lab, feel free to fork the repository and adapt it to your needs. Contributions and feedback are always welcome.

GitHub Repository: https://github.com/jczaldivar71/eve-ng-ccna-lab

How I Built an AI Network Monitoring Tool (Beginner Friendly)

Hey everyone! If you have been following my blog, you know I love combining Python with network engineering. From automating backups with Netmiko to monitoring IP SLAs with DNA Center, I am always looking for ways to make our lives as network engineers easier. Today, I am excited to walk you through my latest project: an AI-Powered Network Health Checker. Don’t worry — this is totally beginner friendly. If you can write a basic Python script, you can follow along!

What Does This Tool Do?

In a nutshell, this tool pulls real-time data from your network devices (think CPU usage, memory utilization, interface errors, etc.), feeds that data into a simple machine learning model, and tells you whether each device is healthy or if there might be an issue. The output is super straightforward — you will see messages like “Device is healthy” or “Potential issue detected.” No PhD in data science required!

Step 1: Pulling Device Data with Python

Just like in my previous posts on network automation, we start by connecting to our devices and grabbing the data we need. I used the Netmiko library to SSH into each device and pull key metrics. Here is a simplified version of the script:

from netmiko import ConnectHandler
import re
device = {
'device_type': 'cisco_ios',
'host': '192.168.1.1',
'username': 'admin',
'password': 'yourpassword',
}
connection = ConnectHandler(**device)
cpu_output = connection.send_command('show processes cpu')
cpu_match = re.search(r'CPU utilization for five seconds: (\\d+)%', cpu_output)
cpu_usage = int(cpu_match.group(1)) if cpu_match else 0
mem_output = connection.send_command('show processes memory')
mem_match = re.search(r'Processor Pool Total:\\s+(\\d+)\\s+Used:\\s+(\\d+)', mem_output)
if mem_match:
mem_total = int(mem_match.group(1))
mem_used = int(mem_match.group(2))
mem_usage = (mem_used / mem_total) * 100
else:
mem_usage = 0
intf_output = connection.send_command('show interfaces')
error_matches = re.findall(r'(\\d+) input errors', intf_output)
total_errors = sum(int(e) for e in error_matches)
print(f"CPU Usage: {cpu_usage}%")
print(f"Memory Usage: {mem_usage:.1f}%")
print(f"Total Interface Errors: {total_errors}")
connection.disconnect()

This script connects to a Cisco IOS device, grabs CPU usage, memory utilization, and interface error counts. You can easily expand this to loop through multiple devices from an inventory file — just like we did in the backup config script project.

Step 2: Building a Simple ML Model for Anomaly Detection

Here is where the AI magic comes in — but I promise it is simpler than it sounds. We are using scikit-learn’s Isolation Forest algorithm, which is perfect for anomaly detection. It learns what “normal” looks like from your data and flags anything that seems off.

import numpy as np
from sklearn.ensemble import IsolationForest
training_data = np.array([
[15, 40, 0], [20, 45, 1], [18, 42, 0],
[22, 50, 2], [17, 38, 0], [19, 44, 1],
[21, 47, 0], [16, 41, 1], [20, 43, 0], [18, 46, 2],
])
model = IsolationForest(contamination=0.1, random_state=42)
model.fit(training_data)
new_device_data = np.array([[cpu_usage, mem_usage, total_errors]])
prediction = model.predict(new_device_data)
if prediction[0] == 1:
print("Device is healthy")
else:
print("Potential issue detected")

The Isolation Forest works by randomly partitioning data points. Anomalies are isolated faster because they are different from the majority of the data. The contamination parameter tells the model roughly what percentage of data points are expected to be anomalies — I set it to 0.1 (10%) as a starting point, but you can tune this for your environment.

Step 3: Putting It All Together

Now let us combine everything into a single script that loops through your devices, pulls the data, and runs it through the model:

from netmiko import ConnectHandler
from sklearn.ensemble import IsolationForest
import numpy as np
import re
devices = [
{'device_type': 'cisco_ios', 'host': '192.168.1.1', 'username': 'admin', 'password': 'yourpassword'},
{'device_type': 'cisco_ios', 'host': '192.168.1.2', 'username': 'admin', 'password': 'yourpassword'},
]
training_data = np.array([
[15, 40, 0], [20, 45, 1], [18, 42, 0],
[22, 50, 2], [17, 38, 0], [19, 44, 1],
[21, 47, 0], [16, 41, 1], [20, 43, 0], [18, 46, 2],
])
model = IsolationForest(contamination=0.1, random_state=42)
model.fit(training_data)
def get_device_metrics(device):
connection = ConnectHandler(**device)
cpu_output = connection.send_command('show processes cpu')
cpu_match = re.search(r'CPU utilization for five seconds: (\\d+)%', cpu_output)
cpu_usage = int(cpu_match.group(1)) if cpu_match else 0
mem_output = connection.send_command('show processes memory')
mem_match = re.search(r'Processor Pool Total:\\s+(\\d+)\\s+Used:\\s+(\\d+)', mem_output)
mem_usage = (int(mem_match.group(2)) / int(mem_match.group(1))) * 100 if mem_match else 0
intf_output = connection.send_command('show interfaces')
error_matches = re.findall(r'(\\d+) input errors', intf_output)
total_errors = sum(int(e) for e in error_matches)
connection.disconnect()
return [cpu_usage, mem_usage, total_errors]
for device in devices:
print(f"Checking device: {device['host']}")
metrics = get_device_metrics(device)
print(f" CPU: {metrics[0]}% | Memory: {metrics[1]:.1f}% | Errors: {metrics[2]}")
prediction = model.predict(np.array([metrics]))
if prediction[0] == 1:
print(" Status: Device is healthy")
else:
print(" Status: Potential issue detected")

Here is what the output looks like:

Checking device: 192.168.1.1
CPU: 18% | Memory: 43.2% | Errors: 0
Status: Device is healthy
Checking device: 192.168.1.2
CPU: 85% | Memory: 92.1% | Errors: 47
Status: Potential issue detected

What’s Next?

This is just the starting point. Here are some ideas to take it further:

  • Add more metrics like uplink bandwidth utilization or BGP neighbor status
  • Save your model to a file using joblib so you don’t retrain every time
  • Set up a cron job or scheduled task to run the script at regular intervals
  • Send alerts via email or Slack when an issue is detected
  • Build a simple dashboard with Flask to visualize device health

Get the Code

I have uploaded the full project to my GitHub repo. Feel free to clone it, play around with it, and make it your own:

GitHub: https://github.com/NetworkThinkTank-Labs/ai-network-health-checker

Final Thoughts

If you are a network engineer who is curious about AI and machine learning, this is a great beginner project to get your feet wet. You don’t need to understand every detail of how Isolation Forest works under the hood — just know that it is a tool that can help you spot problems before they become outages.

As always, if you have questions or want to share how you have customized this for your own network, drop a comment below or reach out to me. Happy automating!

Automating Network Device Backups with Python and Netmiko

Backing up network device configurations is one of the most critical tasks in network administration. A missed backup could mean hours of manual reconfiguration after a failure. In this post, we will walk through a Python script that automates this process by connecting to a router or switch via SSH and saving the running-config to a local file.
We will use Netmiko, a popular Python library that simplifies SSH connections to network devices. Whether you manage a handful of switches or hundreds of routers, this script gives you a repeatable, automated way to capture configurations on demand.

Prerequisites

Before getting started, make sure you have:

  • Python 3.8 or higher installed
  • SSH access to your target network device
  • Device credentials (username and password)
  • The Netmiko library installed

To install Netmiko, run:

pip install netmiko

You can also clone the full project repository and install from the requirements file:

git clone https://github.com/NetworkThinkTank-Labs/backup-config-script.git
cd backup-config-script
pip install -r requirements.txt

Connecting to a Router or Switch

The core of our script uses Netmiko’s ConnectHandler to establish an SSH session. You provide the device type, hostname or IP address, and your credentials. Netmiko handles the SSH negotiation and drops you into an authenticated session.

Here is the connection function from our script:

def connect_to_device(host, username, password, device_type, port=22, enable_secret=None):
device = {
‘device_type’: device_type,
‘host’: host,
‘username’: username,
‘password’: password,
‘port’: port,
}
if enable_secret:
device[‘secret’] = enable_secret
connection = ConnectHandler(**device)
if enable_secret:
connection.enable()
return connection

Netmiko supports a wide range of device types including cisco_ios, cisco_nxos, arista_eos, and juniper_junos. You simply pass the appropriate device type string and Netmiko adapts its behavior accordingly.

Retrieving the Running Configuration

Once connected, pulling the running configuration is a single method call. We use send_command to execute show running-config on the device and capture the output as a string:

def backup_running_config(connection):
running_config = connection.send_command(‘show running-config’)
return running_config

Netmiko handles paging automatically, so even if your configuration is long, you will get the complete output without needing to send space or press Enter to page through it.

Saving the Configuration to a File

With the configuration captured in memory, the next step is writing it to a file. Our script creates a backups directory automatically and saves each configuration with a timestamped filename so you never overwrite a previous backup:

def save_config_to_file(config, hostname, output_dir=’backups’):
os.makedirs(output_dir, exist_ok=True)
timestamp = datetime.now().strftime(‘%Y-%m-%d_%H-%M-%S’)
safe_hostname = hostname.replace(‘.’, ‘‘).replace(‘:’, ‘‘)
filename = f'{safe_hostname}running-config{timestamp}.txt’
filepath = os.path.join(output_dir, filename)
with open(filepath, ‘w’) as f:
f.write(config)
return filepath

This produces backup files like:

backups/192_168_1_1_running-config_2026-04-10_14-30-00.txt

Running the Script

The script accepts command-line arguments so you can target any device without editing the code:

python backup_config.py –host 192.168.1.1 –username admin –password mypassword

You can also specify a different device type, SSH port, output directory, or enable password:

python backup_config.py –host 10.0.0.1 –username admin –password mypassword –device-type cisco_nxos –output-dir /var/backups/network

When you run the script, you will see output like this:

[] Connecting to 192.168.1.1:22 (cisco_ios)… [+] Successfully connected to 192.168.1.1 [] Retrieving running-config…
[+] Retrieved 15234 characters of configuration
[+] Configuration saved to backups/192_168_1_1_running-config_2026-04-10_14-30-00.txt
[+] Disconnected. Backup complete!

What is Next

This script provides a solid foundation for network configuration backups. Here are some ideas for extending it:

  • Loop through a list of devices from a CSV or YAML inventory file to back up your entire network in one run
  • Schedule the script with cron (Linux) or Task Scheduler (Windows) for automatic daily backups
  • Add email or webhook notifications on success or failure
  • Compare configurations between backups to detect unauthorized changes using difflib
  • Integrate with Git to version-control your configurations automatically

Wrapping Up

Automating network device backups does not have to be complicated. With Python and Netmiko, you can connect to any router or switch, pull the running configuration, and save it to a timestamped file in just a few lines of code.

Check out the full source code and setup instructions in the GitHub repository: https://github.com/NetworkThinkTank-Labs/backup-config-script

If you found this useful, check out my other posts on network automation including Monitoring IP SLAs with Python, DNA Center, and NetBox and Build a Home Lab Like a Pro. Stay tuned for more content from the NetworkThinkTank!

Monitoring IP SLAs with Python, DNA Center, and NetBox

Automate IP SLA monitoring across your network using Python, Cisco DNA Center APIs, and NetBox as your source of truth.

GitHub Repo: https://github.com/NetworkThinkTank-Labs/ip-sla-monitor

Introduction

If you manage a network of any size, you know that keeping tabs on performance metrics like latency, jitter, and packet loss is critical. Cisco IP SLA (Service Level Agreement) operations are the go-to feature for probing network paths and measuring these metrics directly from your routers and switches. But manually checking IP SLA statistics across dozens or hundreds of devices? That does not scale.

In this post, I will walk you through a Python-based tool I built that pulls IP SLA data from Cisco DNA Center via its REST API, enriches it with device metadata from NetBox, and generates automated performance reports. Whether you are running a handful of branch routers or a large enterprise campus, this approach gives you a scalable, repeatable way to monitor network performance.

What is IP SLA?

Cisco IP SLA is a built-in feature on Cisco IOS and IOS-XE devices that allows you to generate synthetic traffic to measure network performance. You can configure operations like ICMP echo (ping), UDP jitter, HTTP GET, and more. Each operation continuously measures metrics such as round-trip time (RTT), latency, jitter, packet loss, and availability. These metrics are essential for validating SLA compliance, troubleshooting performance issues, and capacity planning.

The Tools

This project brings together three key components. First, Python does the heavy lifting for API calls, data parsing, and report generation. Second, Cisco DNA Center provides a centralized REST API for pulling device inventory and running CLI commands across your entire network without SSH-ing into each device individually. Third, NetBox acts as our network source of truth, storing device metadata like site assignments, roles, platforms, and IP addresses that we use to enrich the raw SLA data.

How It Works

The IP SLA Monitor tool follows a simple three-step workflow:

  1. Authenticate with DNA Center and pull IP SLA operation statistics from all monitored devices using the command-runner API.
  2. 2. Query NetBox for each device to enrich the data with site name, device role, platform, and management IP.
  3. 3. Evaluate each SLA operation against configurable thresholds for latency, jitter, and packet loss, then generate JSON and CSV reports.
  4. The tool can run as a one-shot collection or in a continuous monitoring loop with a configurable polling interval. Alerts are logged to the console when any operation exceeds your defined thresholds.

The Python Code

The project is organized into three main scripts:

ip_sla_monitor.py is the main orchestration script that ties everything together. It loads configuration from a .env file, initializes the DNA Center and NetBox clients, collects SLA data, enriches it, evaluates thresholds, and saves reports.

dnac_integration.py handles all communication with the Cisco DNA Center REST API including authentication, device inventory retrieval, and IP SLA data collection via the command-runner API.

netbox_integration.py connects to the NetBox API to look up device metadata by hostname, returning site assignments, device roles, platform types, and IP addresses.

Getting Started

Getting up and running is straightforward. Clone the repository, set up a virtual environment, install the dependencies from requirements.txt, and configure your .env file with your DNA Center and NetBox credentials. The only Python packages required are requests for HTTP API calls, python-dotenv for environment variable management, and urllib3. No complex frameworks or heavy dependencies. Full setup instructions are in the GitHub repo README.

Threshold Alerting

One of the most useful features is configurable threshold alerting. You define your acceptable limits for latency, jitter, and packet loss in the .env file, and the tool flags any SLA operation that exceeds those limits. For example, with default thresholds of 100ms latency, 30ms jitter, and 1% packet loss, a branch router showing 115ms latency and 2.1% packet loss would be immediately flagged as an alert in the console output and reports.

Sample Output

The tool generates both JSON and CSV reports. The JSON report includes a summary section with total operations, passing and failing counts, and average latency, followed by detailed per-operation data enriched with NetBox metadata. The CSV report provides the same data in a tabular format that you can easily import into Excel or feed into other monitoring tools. Sample output files are included in the GitHub repository under the output directory.

What is Next

This project is a solid foundation, but there is plenty of room to extend it. Some ideas for future enhancements include adding webhook or email notifications for alerts, integrating with Grafana for real-time dashboards, storing historical data in a time-series database like InfluxDB, and expanding the command-runner integration to pull live SLA statistics directly from devices.

Wrapping Up

Python automation combined with APIs from DNA Center and NetBox gives network engineers a powerful toolkit for monitoring IP SLAs at scale. Instead of manually checking IP SLA stats on individual devices, you can automate the entire workflow and get enriched reports in minutes.

Check out the full source code, sample outputs, and setup instructions in the GitHub repository: https://github.com/NetworkThinkTank-Labs/ip-sla-monitor

If you found this useful, check out my other posts on network automation including Automating Network Device Backups with Python and Netmiko and Build a Home Lab Like a Pro. Stay tuned for more content from the NetworkThinkTank!

Build a Home Lab Like a Pro

Transform your spare room into a career-accelerating network laboratory

p>


If you’re serious about leveling up your networking skills, there’s no substitute for hands-on experience. Certifications are great. Books are essential. But nothing cements your understanding of BGP peering, VXLAN fabrics, or automation workflows like building it yourself and watching packets traverse your own infrastructure.

After 20+ years in networking — from pulling cable to managing backbone infrastructure at Lumen — I can tell you that the engineers who stand out are the ones who lab. They break things on purpose, fix them under pressure, and walk into production environments with confidence.

In this guide, I’ll walk you through everything you need to build a professional-grade home lab, from hardware selection to virtualization platforms to topology design. And because I believe in open-source learning, I’ve created a companion GitHub repository with sample configs, topology files, and scripts you can use to get started immediately.

GitHub Repo: NetworkThinkTank-Labs/home-lab-guide

p>


Why Build a Home Lab?

Let’s get the obvious out of the way: you can’t become a great network engineer by reading alone. Here’s why a home lab is one of the best investments you can make in your career:

p>

  • Hands-on practice for certifications — CCNA, CCNP, JNCIA, and beyond all require you to understand how protocols actually behave, not just how they’re described in RFCs.
  • Safe environment to break things — Production networks don’t forgive mistakes. Your lab does. Misconfigure OSPF areas? Blow up a spanning-tree topology? No pager going off at 2 AM.
  • Real-world skill building — Employers want engineers who can troubleshoot, not just configure. A lab gives you the reps.
  • Automation testing ground — Python scripts, Ansible playbooks, Netmiko sessions — test them here before you touch production.
  • Portfolio building — Document your labs on GitHub and your blog. Show hiring managers what you can do, not just what you’ve memorized.


Hardware Recommendations

You don’t need to spend thousands of dollars to build an effective lab. Here’s a tiered approach based on budget and goals.

Tier 1: Budget Build ($200-$500)

Perfect for getting started with virtualization and basic routing/switching labs.

ComponentRecommendationEstimated Cost
ServerDell OptiPlex 7050/7060 (used) or HP EliteDesk 800 G3$100-$200
RAM32 GB DDR4 (upgrade if needed)$40-$60
Storage500 GB SSD + 1 TB HDD$50-$80
SwitchManaged switch (Netgear GS308T or TP-Link TL-SG108E)$30-$50
MiscUSB-to-serial console cable, Ethernet cables, power strip$20-$40

Tier 2: Intermediate Build ($500-$1,500)

For engineers running multiple VMs, nested virtualization, and more complex topologies.

ComponentRecommendationEstimated Cost
ServerDell PowerEdge R720/R730 or HP ProLiant DL380 Gen9$200-$500
RAM64-128 GB DDR4 ECC$100-$200
Storage1 TB NVMe SSD + 2 TB HDD$100-$200
NetworkIntel X520-DA2 10GbE NIC (SFP+)$30-$50
SwitchCisco Catalyst 2960-X or Arista 7010T (used)$50-$150
FirewallNetgate SG-1100 (pfSense) or old PC with OPNsense$100-$200
UPSAPC Back-UPS 600VA$60-$80

Virtualization Platforms

This is where the magic happens. Modern network labs are predominantly virtual, and the platform you choose shapes your entire lab experience.

Proxmox VE (My Top Pick for Home Labs)

Jonathan's AI Homelab Network Diagram showing Proxmox host with Ubuntu Ollama GPU VM, NVIDIA RTX 3060, NAS shared storage, and LAN network connectivity

GNS3

Why: The OG network emulator. Runs real Cisco IOS/IOU images and integrates with QEMU/KVM for full OS emulation. Best for Cisco-centric labs and certification prep (CCNA/CCNP). Pair with our BGP Fundamentals Lab and VPN IPSec/GRE Lab.

p>

EVE-NG

Why: Web-based, multi-vendor support, great for complex topologies with dozens of nodes. Best for multi-vendor labs (Cisco, Juniper, Arista, Palo Alto, Fortinet). Pair with our EVPN-VXLAN Lab for data center fabric simulation.

p>

Containerlab

Why: Lightweight, code-defined network topologies using containers. Perfect for modern network automation workflows. Uses Docker containers running network OS images (cEOS, FRR, Nokia SR Linux). Define topologies in YAML and version control them in Git. Pair with our Network Automation with Ansible Lab.

p>


Designing Your Network Topology

A well-designed lab topology mirrors real-world architectures. Here are three topologies I recommend, progressing from simple to advanced.

Topology 1: The Starter Lab

Internet > Firewall (pfSense/OPNsense) > Core Switch (L3, VLAN trunking) > VLANs (Management, Lab, Servers)

What you’ll learn: VLANs, inter-VLAN routing, firewall rules, NAT, DHCP

p>

Topology 2: The Enterprise Lab

Dual ISP routers with eBGP > Edge Router (BGP AS 65000) > Core switches (OSPF, LACP) > Access switches

What you’ll learn: BGP peering, OSPF areas, HSRP/VRRP, link aggregation, redundancy

p>

Topology 3: The Data Center Fabric

Spine-01/Spine-02 > Leaf-01 through Leaf-04 > Server clusters

What you’ll learn: EVPN-VXLAN, BGP underlay/overlay, spine-leaf architecture, network automation at scale. Check out our EVPN-VXLAN Lab for a full walkthrough.

p>


Essential Software Tools

Network Management and Monitoring

  • LibreNMS or Zabbix – SNMP-based monitoring, alerting, graphing
  • Grafana + Prometheus – Modern dashboards and metrics visualization
  • Wireshark – Packet capture and protocol analysis

Automation and DevOps

  • Ansible – Agentless configuration management (Ansible automation lab)
  • Python + Netmiko/NAPALM – Script-driven device management (network-backup-automation repo)
  • Git – Version control everything: configs, scripts, documentation
  • Docker – Containerize your tools and services
  • LibreNMS or Zabbix – SNMP-based monitoring, alerting, graphing
  • Grafana + Prometheus – Modern dashboards and metrics visualization
  • Wireshark – Packet capture and protocol analysis

Automation and DevOps

  • Ansible – Agentless configuration management (automation lab)
  • Python + Netmiko/NAPALM – Script-driven device management (backup automation)
  • Git – Version control everything: configs, scripts, documentation
  • Docker – Containerize your tools and services


The Lab Exercise Workflow

Here’s how I approach lab exercises, and how you can use the NetworkThinkTank-Labs repositories to follow along:

  1. Define Your Objective – Pick a protocol or concept. Example: “I want to understand eBGP peering between two autonomous systems.”
  2. Build the Topology – Use GNS3, EVE-NG, or Containerlab to spin up the required nodes. Clone the relevant repo.
  3. Configure – Follow the lab guide’s step-by-step configs. Type them manually — don’t just paste. Muscle memory matters.
  4. Verify – Use show commands, ping tests, traceroutes, and Wireshark captures to verify your work.
  5. Break It – Intentionally introduce failures. Shut down a link. Add a route filter. Change an AS number. Watch what happens and troubleshoot.
  6. Document – Write up what you learned. Push your configs to your own GitHub fork. Blog about it.

The Lab Exercise Workflow

  1. Define Your Objective – Pick a protocol or concept to learn.
  2. Build the Topology – Use GNS3, EVE-NG, or Containerlab.
  3. Configure – Type configs manually for muscle memory.
  4. Verify – Use show commands, pings, and Wireshark.
  5. Break It – Introduce failures and troubleshoot.
  6. Document – Push configs to GitHub and blog about it.

Automating Network Device Backups with Python and Netmiko

As network engineers, one of our most critical yet tedious responsibilities is maintaining up-to-date backups of device configurations. Whether you manage a handful of switches or hundreds of routers across multiple sites, manually logging into each device to copy its running configuration is time-consuming, error-prone, and simply does not scale.

In this post, I will walk you through a Python automation script I built that connects to network devices via SSH, pulls their running configurations, and saves them to organized backup files, all in parallel. The full project is available on my GitHub repository.

GitHub Repository: https://github.com/jczaldivar71/network-backup-automation

Why Automate Network Backups?

Before diving into the code, let us consider why automation matters here. Manual backups are inconsistent since engineers may forget devices or skip them during busy periods. They are slow because logging into devices one at a time does not scale. They lack accountability with no automatic logging of what was backed up and when. They are error-prone since copy-paste mistakes can result in incomplete or corrupted backup files.

An automated solution runs on a schedule, covers every device in your inventory, logs every action, and produces consistent, timestamped backup files every single time.

What the Script Does

The network backup automation script provides several key features. It supports multiple platforms including Cisco IOS, Cisco ASA, Cisco NX-OS, Juniper JunOS, and Arista EOS. It uses concurrent connections via Python ThreadPoolExecutor to back up multiple devices simultaneously. Backups are organized into date-stamped directories for easy retrieval. Each backup includes metadata headers showing the device hostname, IP address, device type, and timestamp. The script generates a JSON summary report after each run showing successes and failures. It also provides comprehensive error handling for timeouts, authentication failures, and connection issues.

Project Structure

The project is organized as follows. The main script is network_backup.py which contains all the automation logic. The requirements.txt file lists the Python dependencies, primarily Netmiko and Paramiko. The inventory.json file is a sample device inventory template where you define your network devices. The .gitignore file ensures backup files and sensitive data are not accidentally committed to version control. The LICENSE file contains the MIT open-source license.

How It Works

The script follows a straightforward workflow. First, it loads your device inventory from a JSON file. Each device entry includes the hostname, IP address, device type, credentials, and optional enable secret. Second, it creates a date-stamped backup directory to keep backups organized by day. Third, it spawns multiple worker threads using ThreadPoolExecutor to connect to devices in parallel. Fourth, for each device, it establishes an SSH connection using Netmiko, enters enable mode if needed, and runs the appropriate show command for that platform. Fifth, the configuration output is saved to a file with a metadata header. Finally, a JSON report is generated summarizing the results of the entire backup run.

The Device Inventory

The inventory.json file is where you define all the devices you want to back up. Here is an example of what it looks like. Each device entry includes the hostname for identification, the host IP address, the device_type which tells Netmiko how to communicate with the device, the username and password for SSH authentication, the port number which defaults to 22, and an optional secret for enable mode on Cisco devices.

Key Code Highlights

The BACKUP_COMMANDS dictionary maps each device type to the appropriate command for retrieving the running configuration. For Cisco IOS, ASA, and NX-OS devices, it uses “show running-config”. For Juniper JunOS devices, it uses “show configuration | display set”. For Arista EOS, it also uses “show running-config”.

The backup_device function is the core of the script. It takes a device dictionary and backup directory path, establishes the SSH connection using Netmiko ConnectHandler, enters enable mode if a secret is provided, runs the backup command, and saves the output with metadata headers to a timestamped config file.

The run_backups function orchestrates the entire process. It loads the inventory, creates the backup directory, then uses ThreadPoolExecutor to run backups in parallel across all devices. After all backups complete, it logs a summary and generates a JSON report file.

Getting Started

To use this script in your own environment, follow these steps. First, clone the repository from GitHub. Second, install the dependencies using pip install with the requirements.txt file. Third, edit inventory.json with your actual device information including hostnames, IP addresses, credentials, and device types. Fourth, run the script using python network_backup.py. You can also customize the behavior with command-line arguments such as specifying a different inventory file with the -i flag, changing the output directory with -o, adjusting the number of concurrent workers with -w, or enabling verbose debug logging with -v.

Security Considerations

Since this script handles network device credentials, security is paramount. Never commit your actual inventory.json file with real credentials to version control, which is why it is included in the .gitignore file. Consider using environment variables or a secrets manager for credentials in production. Restrict file permissions on the backup directory since configuration files may contain sensitive information. Use SSH key-based authentication instead of passwords when possible. Run the script from a secured management workstation or jump box.

What is Next

This script provides a solid foundation, but there are many ways to extend it. You could add email or Slack notifications for backup failures. You could integrate with a scheduling system like cron or Windows Task Scheduler to run backups automatically. You could implement configuration diff detection to alert you when configurations change unexpectedly. You could add support for additional device types. You could also store backups in a Git repository for version-controlled configuration management.

Conclusion

Network automation does not have to be overwhelming. Starting with a practical use case like configuration backups is an excellent way to build your Python skills while immediately adding value to your organization. The script handles the complexity of multi-vendor support, concurrent connections, and error handling so you can focus on what matters most, keeping your network safe and well-documented.

Check out the full project on GitHub: https://github.com/jczaldivar71/network-backup-automation

Feel free to fork it, customize it for your environment, and let me know how it works for you. Happy automating!

AI-Powered Networking: How Artificial Intelligence is Transforming Network Management in 2026

The networking landscape is undergoing a seismic shift. Artificial Intelligence (AI) is no longer a futuristic concept — it’s actively reshaping how networks are designed, monitored, and managed. From self-healing infrastructure to predictive threat detection, AI-powered networking is the hottest trend in IT right now.

What is AI-Powered Networking?

AI-powered networking refers to the use of machine learning (ML), deep learning, and intelligent automation to manage, optimize, and secure networks. Unlike traditional networks that rely on manual configurations and reactive troubleshooting, AI-driven networks are proactive, adaptive, and self-optimizing.

Key Trends Driving AI in Networking in 2026

1. Intent-Based Networking (IBN)

Intent-Based Networking allows administrators to define desired network outcomes in plain language, and the AI automatically translates those intentions into configurations. Cisco’s DNA Center and similar platforms are leading this revolution, dramatically reducing human error and configuration time.

2. AIOps for Network Operations

AIOps (Artificial Intelligence for IT Operations) platforms are now mainstream in large enterprises. These tools correlate data from multiple sources, detect anomalies before they cause outages, and even recommend or automatically apply fixes. Tools like Moogsoft, Splunk, and Cisco ThousandEyes are at the forefront of this trend.

3. AI-Driven Network Security

Cybersecurity threats are evolving faster than human analysts can respond. AI-powered security tools like Darktrace, CrowdStrike, and Palo Alto’s Cortex XDR use behavioral analytics and machine learning to detect zero-day threats, insider attacks, and advanced persistent threats (APTs) in real time.

4. Smart SD-WAN with AI Optimization

SD-WAN has been a hot topic for years, but in 2026, AI is taking it to the next level. AI-enhanced SD-WAN solutions dynamically route traffic based on real-time application performance data, automatically shifting workloads between MPLS, broadband, and 5G links to guarantee optimal user experience.

5. Autonomous Networks (Zero-Touch Provisioning)

Zero-touch provisioning powered by AI enables network devices to be automatically configured and deployed without manual intervention. This is critical for the massive scale of IoT deployments, edge computing, and 5G infrastructure rollouts happening globally.

Real-World Benefits of AI in Networking

  • Reduced downtime: Predictive analytics identify potential failures hours or days before they occur.
  • Faster troubleshooting: AI reduces Mean Time to Resolution (MTTR) by up to 90% in some deployments.
  • Enhanced security posture: Continuous behavioral monitoring catches threats that signature-based tools miss.
  • Operational cost savings: Automation reduces the need for manual intervention, lowering OpEx significantly.
  • Improved user experience: Dynamic traffic shaping ensures applications always have the bandwidth they need.

Challenges and Considerations

While the promise of AI networking is immense, it’s not without challenges. Data privacy concerns, the need for large volumes of quality training data, the risk of AI model bias, and the shortage of skilled professionals who understand both networking and AI are all hurdles that organizations must navigate carefully.

Conclusion

AI-powered networking is no longer optional for organizations that want to stay competitive. Whether you’re a network engineer looking to upskill, an IT manager evaluating new solutions, or a business leader planning digital transformation, understanding AI’s role in networking is essential. The future of networking is intelligent, autonomous, and AI-driven — and that future is already here.

Retro Vibes and Pixel Dreams: Unleashing Creativity in a Digital World

Welcome to the digital den of dreams where creativity flows as freely as the pixels on a screen! Today, let’s dive into a realm that marries nostalgia with cutting-edge digital artistry, as displayed in the fascinating pixel art illustrations before us.

First up, we’re thrust into a nostalgic setup reminiscent of the late 80s computing era, but with a modern twist. The scene is littered with the tools of the digital artisan: floppy disks, a pixelated lamp shedding light on a keyboard below, and a book that proclaims, “The best fight is the one you’re not in,” a playful nod to both strategic pacifism and perhaps the avoidance of digital overload.

The central piece of this pixelated puzzle is the computer screen, displaying an intricate graphic that seems to burst forth from the confines of its digital bounds. This isn’t just any display; it’s a celebration of digital potential, of binary bits exploding into a symphony of information. Is it a new software or perhaps a gateway into the ever-expanding internet universe? The mysteries of technology beckon!

Now, let’s focus on the portrait of our digital pioneer in the second illustration. Here, we meet a visionary, the face behind the creativity. With a stylish pair of glasses and a look of serene confidence, he represents the modern creator. His environment is modest, with subtle hints of a life dedicated to digital creation: a strategically placed gaming chair for those marathon coding sessions, and a plush companion to keep the atmosphere light.

This isn’t just a picture; it’s a story. It tells of late nights turned into early mornings, where the glow from the monitor is the only light. It speaks of the quiet dedication of turning code into art, and art into a shared experience that transcends physical boundaries. Our creator isn’t just making games, software, or graphics; he’s crafting worlds.

In a blend of retro aesthetics and contemporary digital craft, these images offer more than just a visual treat; they are a call to embrace our digital tools and create something meaningful. Whether it’s writing code, designing graphics, or simply dreaming up worlds, the digital canvas is vast and forgiving.

So, dear readers, let’s take inspiration from our digital artisan here. Grab your tools of creation—be they as outdated as a floppy disk or as advanced as the latest graphics tablet—and paint your pixels. After all, in the world of digital art, the only limit is your imagination. Ready, set, create!

Iron Man lose control of his suit and crash into half of the city

Iron Man was flying over New York City, enjoying the view and feeling proud of his latest invention. He had upgraded his suit with a new feature that allowed him to control it with his mind. He could fly faster, shoot more accurately, and perform amazing maneuvers without using his hands or voice.

He decided to test his new feature by doing a loop in the air. He focused his mind on the direction he wanted to go and felt the suit respond. He soared up into the sky, then curved down in a perfect circle. He was about to complete the loop when he heard a loud noise in his ear.

“Hey, Tony, are you there?” It was Pepper, his girlfriend and assistant, calling him on his phone. She sounded worried.

“Uh, yeah, I’m here. What’s up?” Iron Man said, trying to sound casual.

“I need you to come back to the tower right now. There’s an emergency.”

“What kind of emergency?”

“It’s…it’s hard to explain. Just trust me, you need to see this.”

“Okay, okay, I’m on my way. Just give me a minute.”

Iron Man tried to end the call, but he realized he had a problem. He had forgotten to turn off the mind control feature. His suit was still following his thoughts, and his thoughts were now distracted by Pepper’s call. He felt the suit veer off course and lose altitude. He looked down and saw the buildings and cars below him getting closer.

“Uh-oh,” he said.

He tried to regain control of the suit, but it was too late. He crashed into a billboard, then bounced off and hit a bus, then a fire hydrant, then a hot dog stand. He caused a huge mess and a lot of damage. People screamed and ran away from him. He finally came to a stop on the sidewalk, lying on his back, covered in ketchup and mustard.

He groaned and looked up. He saw a crowd of people staring at him, some with cameras and phones. He saw a news helicopter hovering above him. He saw the billboard he had hit, which read: “Lose your balance? Try our new yoga classes!”

He felt a surge of embarrassment and anger. He activated his phone and called Pepper back.

“Pepper, what was the emergency?” he asked.

“Oh, nothing, really. I just wanted to tell you that I love you and I miss you.”

“You…you what?”

“I love you and I miss you. And I’m sorry for interrupting your flight. I just wanted to hear your voice.”

“Pepper, do you have any idea what you just did? You made me lose control of my suit and crash into half of the city. I’m on live TV right now, looking like a fool. Everyone thinks I’m a clumsy idiot. Do you know how bad this is for my reputation?”

“Oh, Tony, I’m so sorry. I didn’t know you were using your mind control feature. I thought you were just flying normally. I didn’t mean to cause you any trouble. Please forgive me.”

Iron Man sighed. He looked at the crowd again. He saw some kids laughing and pointing at him. He saw a cop approaching him with a ticket. He saw a reporter running towards him with a microphone.

He realized he had no choice. He had to forgive Pepper. He loved her too much to stay mad at her. He also realized he had to fix his suit. He needed to add a switch or a button or something to turn off the mind control feature. He couldn’t risk another accident like this.

He smiled and said to Pepper: “It’s okay, Pepper. I forgive you. I love you and I miss you too. But I have to go now. I have some explaining to do.”

He hung up the phone and got up. He put on his helmet and activated his thrusters. He flew away from the scene, hoping to avoid any more trouble.

He hoped that the next time he used his mind control feature, he would be more careful and focused. He hoped that the next time he talked to Pepper, he would be more attentive and sweet. He hoped that the next time he flew over New York City, he would not lose his balance.