This guide covers the essential steps for troubleshooting Proxmox boot failures and network recovery issues. Whether you’re dealing with a Proxmox VE node that won’t boot, network interfaces that fail to initialize, or connectivity problems after an update, this visual reference provides a structured approach to diagnosing and resolving common Proxmox infrastructure issue
Today I completed a BIOS update on my MSI MAG X570S Tomahawk Max WiFi motherboard, upgrading from the original BIOS Version 1.00 (dated 07/06/2021) to the latest Version 1.D1 (7D54v1D1, dated 09/19/2025). This update was performed using MSI’s M-FLASH utility as part of my Proxmox homelab infrastructure maintenance.
The visual guide above outlines the complete BIOS update process using MSI’s M-FLASH utility. Here’s a summary of the key steps performed:
Downloaded BIOS version 7D54v1D1 from MSI official support page
Extracted BIOS file (E7D54AMS.1D1) to a FAT32-formatted USB drive
Stopped all Proxmox VMs and containers before rebooting
Rebooted into BIOS and used M-FLASH to flash the new BIOS
Confirmed BIOS updated to Version 1.D1 (Release Date: 09/19/2025) via dmidecode
For detailed step-by-step documentation of this BIOS update process, visit the NetworkThinkTank-Labs GitHub repository. This motherboard serves as the foundation for my Proxmox homelab running GPU passthrough with an NVIDIA RTX 3060 Ti for AI workloads.
Some days in the homelab are quiet — a config tweak here, a firmware update there. And then there are days like today. April 29, 2026, turned into a full-blown infrastructure marathon: eight distinct projects spanning networking, virtualization, AI deployment, storage management, and documentation. Here is a complete rundown of everything that got done.
1. GitHub Documentation — 565 Lines of Technical Writing
Documentation is the backbone of any serious homelab. Today I pushed 565 lines of new documentation across multiple GitHub repositories. This included updated READMEs, configuration guides, topology diagrams, and step-by-step walkthroughs. Every project in my lab now has proper technical documentation that anyone can follow to replicate the setup. If it is not documented, it did not happen — and today, it all got documented.
2. EVE-NG CCNA Lab Updates — 29 Nodes
My EVE-NG CCNA lab got a major overhaul. The lab now contains 29 active nodes, covering routing, switching, and network services. This includes Cisco IOS routers and switches configured for OSPF, EIGRP, BGP, VLANs, STP, ACLs, NAT, and DHCP. The lab features API troubleshooting support, Proxmox migration readiness via Terraform and Ansible, and qcow2 image management. Whether you are studying for the CCNA or just want a robust network simulation environment, this lab has you covered.
3. Blog Post Publishing — 2,943 Words
Earlier this month, I published a comprehensive 2,943-word blog post on the Network ThinkTank blog covering how to self-host AI on a Proxmox homelab with Ollama and Open WebUI. Today’s writing adds to that momentum. Consistent publishing is key to building a knowledge base that helps both myself and the broader homelab community.
4. Ollama Models Deployment — 4 Models
Local AI is the future of privacy-conscious computing. Today I deployed four Ollama models on my Proxmox homelab, running inference entirely on local hardware. The models are served through Open WebUI, giving me a polished ChatGPT-like interface without any data leaving my network. No API keys, no cloud dependency, no privacy concerns — just pure local LLM power. The models cover different use cases from general conversation to code generation and technical assistance.
5. OpenClaw AI Agent Deployment
OpenClaw, an AI agent framework, was deployed and configured in the homelab today. This adds autonomous AI agent capabilities to the infrastructure, enabling task automation and intelligent workflows. The deployment involved setting up the agent runtime, configuring API endpoints, and testing basic agent interactions. This is a step toward building a more intelligent, self-managing homelab environment.
6. Windows Server VM Build
A fresh Windows Server virtual machine was built from scratch today. This VM will serve as a core infrastructure component for Active Directory, DNS, DHCP, and Group Policy management. The build process included creating the VM in the hypervisor, installing the OS, applying initial configurations, and setting up remote management. Having a Windows Server in the lab opens up enterprise-grade identity and access management capabilities.
7. NAS Storage Cleanup — ~20GB Freed
Storage hygiene is critical in any homelab environment. Today’s cleanup operation freed approximately 20GB of space on the NAS. This involved removing outdated VM snapshots, clearing old ISO images, purging stale Docker volumes, and archiving completed project files. A clean NAS is a happy NAS — and with 20GB reclaimed, there is plenty of room for new projects.
8. UniFi Network Server Installation
The UniFi Network Server was installed and configured today, bringing enterprise-grade network management to the homelab. This provides centralized control over UniFi access points, switches, and security gateways. The installation included setting up the controller software, adopting network devices, configuring wireless networks, and establishing monitoring dashboards. With UniFi in place, the entire network infrastructure can be managed from a single pane of glass.
Wrapping Up
Eight projects. One day. From AI deployments to network labs, from storage cleanup to documentation — today was a masterclass in homelab productivity. Every one of these projects builds on the others, creating a more capable, better-documented, and more resilient infrastructure.
The key takeaway? Documentation makes everything better. By writing things down — both in GitHub repos and blog posts — I am building a knowledge base that pays dividends every time I need to troubleshoot, replicate, or expand my setup.
If you are running a homelab, I encourage you to document your work, share your configs, and keep building. The community is stronger when we share what we learn.
Until next time — keep labbing.
Follow the Network ThinkTank blog for more homelab guides, networking tutorials, and infrastructure deep-dives. Check out the companion GitHub repositories at github.com/jczaldivar71 for configs, scripts, and technical documentation.
I got tired of sending my data to cloud AI services. Every prompt I typed into ChatGPT or Claude was being stored, analyzed, and used for training. For personal questions, code snippets with API keys, and private brainstorming sessions, that never sat well with me.
So I built my own. A fully self-hosted AI assistant running on my Proxmox homelab, powered by Ollama for local LLM inference and Open WebUI for a polished ChatGPT-like interface. The models run on my own NVIDIA GPU, the data stays on my NAS, and nothing leaves my network.
This guide walks you through exactly how I did it – from VM creation to pulling your first model and chatting with it through a clean web interface. If you have a Proxmox server and a spare GPU, you can have this running in an afternoon.
Prerequisites and Hardware Requirements
Here is what you need before starting:
Hardware
Proxmox VE host (version 7.x or 8.x)
NVIDIA GPU with at least 8GB VRAM (I use an RTX 3060 Ti 8GB)
Minimum 16GB RAM allocated to the AI VM (32GB recommended)
NAS with NFS or SMB shares available (Synology, TrueNAS, etc.)
At least 50GB free storage for models (100GB+ recommended)
Software
Proxmox VE installed and running
Ubuntu Server 22.04 or 24.04 LTS ISO
Docker and Docker Compose
NVIDIA drivers (535+ recommended)
NVIDIA Container Toolkit
Network
Static IP or DHCP reservation for the AI VM
Access to your NAS from the VM subnet
Optional: domain name for reverse proxy
Architecture Overview
The stack looks like this:
All AI processing happens locally on the GPU inside the VM. Open WebUI provides the browser-based chat interface and connects to Ollama’s API on the backend. The NAS stores all model files and conversation data so nothing is lost if the VM needs rebuilding.
Step 1: Preparing the Proxmox VM
First, create a new VM in Proxmox optimized for AI workloads.
Expected output shows your RTX 3060 Ti with driver version and CUDA version. If nvidia-smi fails, check that the GPU passthrough is configured correctly on the Proxmox host.
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
If this shows your GPU info inside the container, you are ready to deploy Ollama.
Step 3: Deploying Ollama
Create a project directory:
mkdir -p ~/ai-stack
cd ~/ai-stack
Create docker-compose.yml:
version: "3.8"
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- /mnt/nas/ollama-models:/root/.ollama
environment:
- NVIDIA_VISIBLE_DEVICES=all
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: unless-stopped
ports:
- "3000:8080"
volumes:
- /mnt/nas/openwebui-data:/app/backend/data
environment:
- OLLAMA_BASE_URL=http://ollama:11434
depends_on:
- ollama
Start the stack:
docker compose up -d
Pull your first model:
docker exec -it ollama ollama pull llama3.1:8b
This downloads the Llama 3.1 8B parameter model, which is an excellent starting point for an 8GB GPU. The download is roughly 4.7GB and will be stored on your NAS mount.
Other Recommended Models for 8GB VRAM
ollama pull mistral:7b # Great for general tasks
ollama pull codellama:7b # Optimized for coding
ollama pull llama3.1:8b-instruct # Best for chat interactions
ollama pull phi3:mini # Microsoft's compact model
ollama pull gemma2:9b # Google's open model
Test the Ollama API
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1:8b",
"prompt": "Hello, how are you?",
"stream": false
}'
If you get a JSON response with generated text, Ollama is working.
Step 4: Configuring Open WebUI
Open your browser and navigate to:
http://<VM-IP>:3000
First-Time Setup
Create an admin account (first user automatically becomes admin)
Set a strong password – this is your AI assistant gateway
Open WebUI will auto-detect Ollama at the configured URL
Connecting to Ollama
Open WebUI should automatically connect to Ollama using the OLLAMA_BASE_URL environment variable we set in Docker Compose. Verify by clicking Settings > Connections and confirming the Ollama URL shows http://ollama:11434 with a green status.
Key Settings to Configure
Settings > General: Set default model to llama3.1:8b
Maximum context: 8192+ tokens (slower, more VRAM usage)
Set in your Modelfile or at runtime:
PARAMETER num_ctx 4096
Monitoring Resource Usage
watch -n 1 nvidia-smi # GPU monitoring
htop # CPU and RAM monitoring
docker stats # Container resource usage
iostat -x 1 # Disk I/O monitoring
Conclusion
After following this guide, you now have a fully self-hosted AI assistant running on your Proxmox homelab. Your data stays private, your models run locally on your GPU, and you have a clean web interface for interacting with multiple AI models.
The entire stack – Ollama for inference, Open WebUI for the interface, NAS for storage – runs reliably as a set of Docker containers inside a Proxmox VM with GPU passthrough. It survives reboots, updates cleanly, and scales as you add more models.
This is what homelabbing is about: taking control of your own infrastructure and running services that matter to you. A private AI assistant is one of the most practical and rewarding projects you can build today.
Real-World Deployment Tips
Start with small models first: Pull llama3.1:8b before anything else. It fits comfortably in 8GB VRAM and responds fast. Get everything working before experimenting with larger models.
Use NAS storage from day one: Do not store models on the VM’s local disk. When you inevitably rebuild the VM, you will lose hours re-downloading models. NAS storage makes rebuilds trivial.
Pin your Docker image versions: Use specific tags instead of “latest” in production. An unexpected update broke my Open WebUI setup once when the API format changed between versions.
Set OLLAMA_NUM_PARALLEL=1: On an 8GB card, running multiple concurrent requests causes out-of-memory crashes. Limit Ollama to one request at a time with this environment variable.
Monitor VRAM proactively: Add nvidia-smi -l 5 to a tmux session so you always see GPU memory usage. VRAM exhaustion causes silent failures that are hard to debug.
Enable Docker restart policies: The “unless-stopped” restart policy in our Docker Compose file means containers recover automatically after host reboots or crashes.
Test your NFS mounts under load: Some NAS devices throttle NFS under heavy I/O. Run model inference while monitoring NAS performance to catch bottlenecks early.
Keep a shell alias for quick model pulls: Add to your .bashrc: alias opull='docker exec -it ollama ollama pull' Then pulling models is just: opull mistral:7b
Honest Takeaways and Lessons Learned
Local LLMs are not ChatGPT replacements (yet): The 7B-9B models that fit on an 8GB GPU are impressive but noticeably less capable than GPT-4 or Claude for complex reasoning. They excel at drafting, summarization, code completion, and brainstorming. Manage your expectations accordingly.
GPU passthrough is the hardest part: Getting IOMMU groups clean, VFIO binding correct, and the GPU visible inside the VM took more troubleshooting than the entire rest of the stack combined. Once it works, it stays working, but expect 2-4 hours of debugging on your first attempt.
Open WebUI is surprisingly polished: I expected a rough open-source interface. Instead, Open WebUI is genuinely pleasant to use daily. The chat interface, model switching, conversation history, and document upload features rival commercial products.
Storage adds up fast: Each 7B model is 4-5GB. If you start collecting models (and you will), budget 100-200GB of NAS storage. I currently have 12 models taking up 67GB.
The privacy benefit is real: Once you start using a local AI for sensitive queries – tax questions, medical research, private code review – you realize how uncomfortable it was sending that data to third-party servers. This alone justifies the project.
Docker makes everything easier: Without Docker and the NVIDIA Container Toolkit, this setup would involve painful manual dependency management. The containerized approach means clean upgrades and easy rollbacks.
Community models keep getting better: The open-source LLM ecosystem is evolving rapidly. Models that were state-of-the-art six months ago are now outperformed by newer releases. Check Ollama’s model library regularly for improvements.
Common Pitfalls and How to Avoid Them
Pitfall 1: IOMMU Group Conflicts
Problem: Your GPU shares an IOMMU group with other devices. Solution: Check groups with: find /sys/kernel/iommu_groups/ -type l If your GPU is not in a clean group, you may need an ACS override patch or a different PCIe slot. Move the GPU to a slot that isolates it in its own IOMMU group.
Pitfall 2: NVIDIA Driver Conflicts on Proxmox Host
Problem: The Proxmox host loads NVIDIA drivers before VFIO can claim the GPU. Solution: Blacklist nouveau and nvidia in /etc/modprobe.d/blacklist.conf and ensure VFIO modules load first. Add softdep nvidia pre: vfio-pci to modprobe configuration.
Pitfall 3: Docker Cannot See the GPU
Problem:docker run --gpus all fails with “could not select device driver”. Solution: The NVIDIA Container Toolkit is not installed or not configured. Run:
Problem: Open WebUI shows “Connection failed” for the Ollama backend. Solution: Ensure both containers are on the same Docker network (Docker Compose handles this automatically). Verify the OLLAMA_BASE_URL is set to http://ollama:11434 (using the container name, not localhost).
Pitfall 5: Models Disappear After VM Reboot
Problem: Downloaded models are gone after restarting the VM. Solution: The NFS/SMB mount is not persisting across reboots. Add the mount to /etc/fstab with the _netdev option and verify with sudo mount -a after reboot.
Pitfall 6: Out of Memory (OOM) Crashes
Problem: Ollama crashes or returns errors during inference. Solution: You are likely running a model too large for your VRAM. Stick to 7B-8B models on 8GB cards. Set OLLAMA_NUM_PARALLEL=1 to prevent concurrent requests from exceeding VRAM. Monitor with nvidia-smi.
Pitfall 7: Slow Model Loading from NAS
Problem: Models take a very long time to load initially. Solution: NFS over a 1Gbps connection is the bottleneck. Models are 4-5GB each, so initial load takes 30-40 seconds. Consider 10Gbps networking or storing frequently-used models on local SSD with NAS as backup.
Pitfall 8: GPU Passthrough Breaks After Proxmox Update
Problem: GPU passthrough stops working after a Proxmox kernel update. Solution: Kernel updates can change IOMMU behavior. After updates, verify VFIO binding:
lspci -nnk -s 27:00
dmesg | grep -i vfio
update-initramfs -u -k all
Always test GPU passthrough after host kernel updates before relying on the AI assistant for important work.
LinkedIn Version
BUILDING A PRIVATE AI ASSISTANT ON MY HOMELAB
I built a self-hosted AI assistant using Ollama and Open WebUI, running on my Proxmox homelab with an NVIDIA RTX 3060 Ti.
Why? Privacy. Control. Learning.
Every prompt I type stays on my network. My models run on my GPU. My conversations are stored on my NAS. Nothing goes to the cloud.
The stack:
Proxmox VE for virtualization
Ubuntu VM with GPU passthrough (PCIe/IOMMU)
Ollama for local LLM inference
Open WebUI for a ChatGPT-like interface
NAS integration for persistent model storage
What surprised me:
Open WebUI is genuinely polished – rivals commercial AI interfaces
GPU passthrough was the hardest part (expect 2-4 hours first time)
7B/8B models on an 8GB GPU are great for daily tasks
The privacy benefit is more significant than I expected
The open-source AI ecosystem has matured to the point where running your own AI assistant is not just possible – it is practical.
If you have a homelab with a spare GPU, this is one of the most rewarding projects you can build right now.
Full setup guide on my blog: NetworkThinkTank.blog
Just deployed a fully self-hosted AI assistant on my Proxmox homelab using Ollama and Open WebUI – complete with GPU passthrough and NAS storage integration. Every prompt stays private, every model runs locally, and the web interface rivals ChatGPT. Full build guide with Docker configs and real deployment tips on the blog.
Follow-Up Article Ideas
“Scaling Up: Adding a Second GPU to Your Ollama Homelab for Larger Language Models” – Covers multi-GPU passthrough in Proxmox, running 13B and 70B parameter models across multiple GPUs, VRAM pooling strategies, and benchmarking multi-GPU vs. single-GPU inference performance.
“Building a RAG Pipeline: Teaching Your Self-Hosted AI About Your Own Documents” – Covers Retrieval Augmented Generation (RAG) setup with Open WebUI’s document upload feature, embedding models, vector databases (ChromaDB), indexing your personal knowledge base, and making your AI assistant an expert on your own files.
“Hardening Your Self-Hosted AI: Security Best Practices for Homelab LLM Deployments” – Covers network segmentation for AI services, authentication and access control in Open WebUI, SSL/TLS configuration, firewall rules, monitoring for unauthorized access, Docker security hardening, and safely exposing your AI assistant outside your home network with VPN or Cloudflare Tunnel.
After weeks of building, troubleshooting, and optimizing my CCNA lab environment, I am excited to share the entire project — now fully documented and open-sourced on GitHub. This post walks through the journey from an initial EVE-NG deployment to a fully automated Proxmox-based lab using Terraform, Ansible, and custom shell scripts.
The EVE-NG CCNA Lab project started as a straightforward network simulation environment for CCNA study. It quickly evolved into a full infrastructure-as-code project covering:
EVE-NG lab deployment with API-driven automation
Migration from EVE-NG to Proxmox for better performance and scalability
Custom shell scripts for image management, licensing, and node orchestration
A Python script (generate_readme.py) to auto-generate comprehensive documentation
qcow2 disk image optimization achieving a 39% storage reduction
Terraform and Ansible playbooks for reproducible infrastructure deployment
GitHub Documentation and the generate_readme.py Script
One of the key pieces of this project is the generate_readme.py Python script. Rather than manually maintaining a README that would inevitably fall out of sync with the actual project structure, I wrote a script that scans the repository and automatically generates a comprehensive README.md file.
The script inspects every directory — configs/, scripts/, terraform-ansible/, topology/, and images/ — and produces a fully formatted document with a table of contents, script references, setup instructions, and troubleshooting tips. Running it is as simple as:
cd scripts/
python generate_readme.py
The generated README covers 13 sections including Overview, Project Structure, Prerequisites, Quick Start, Lab Topology, Scripts Reference, qcow2 Image Management, EVE-NG API Usage, Proxmox Deployment, Configuration Files, Troubleshooting, Known Limitations, and License information. At 340 lines, it serves as a complete guide for anyone wanting to replicate or build upon this lab.
EVE-NG to Proxmox Migration
EVE-NG is a fantastic network emulation platform, but I ran into limitations around resource management and integration with modern IaC tools. The decision to migrate to Proxmox was driven by several factors:
Better resource control: Proxmox provides fine-grained CPU, memory, and storage allocation through its API
Terraform integration: The Proxmox Terraform provider enables declarative infrastructure definitions
Thin provisioning: Proxmox handles thin-provisioned qcow2 images natively, which was critical for storage optimization
Ansible compatibility: Post-deployment configuration is seamless with Ansible playbooks targeting Proxmox VMs
The migration involved exporting router and switch images from EVE-NG, converting and optimizing the qcow2 disk images, and then redeploying them on Proxmox using Terraform. The entire workflow is captured in the terraform-ansible/ directory of the repository.
Automation Scripts
The scripts/ directory contains six purpose-built shell scripts that automate every aspect of lab management:
eve-ng-api-auth.sh: Handles cookie-based API authentication with EVE-NG, exporting session tokens for use in subsequent API calls. Includes examples for listing labs, getting node details, and starting all nodes.
start-lab-nodes.sh: Automates the process of starting all lab nodes through the EVE-NG REST API with proper sequencing and health checks.
scp-upload-images.sh: Securely transfers qcow2 images to the EVE-NG or Proxmox host via SCP with progress tracking and integrity verification.
qcow2-optimize.sh: The image optimization workhorse — converts, compresses, and thin-provisions qcow2 disk images (more on this below).
fix-permissions.sh: Ensures correct file ownership and permissions on EVE-NG image directories, a common source of lab startup failures.
iol-license-fix.sh: Generates and applies the proper IOL (IOS on Linux) license file, which is required for Cisco IOL images to boot correctly.
Each script is documented with usage instructions and can be run independently or chained together for a complete deployment workflow.
qcow2 Image Optimization
One of the most impactful parts of this project was optimizing the qcow2 disk images. Network appliance images (Cisco IOSv, IOSvL2, CSR1000v, etc.) often ship with significant wasted space — preallocated but unused disk blocks that consume real storage.
The qcow2-optimize.sh script automates a multi-step optimization pipeline:
Sparsification: Uses virt-sparsify to zero out unused blocks within the guest filesystem
Compression: Applies qcow2 internal compression via qemu-img convert -c
Thin provisioning: Ensures metadata is set for thin-provisioned allocation on the hypervisor
Integrity check: Runs qemu-img check to verify image health post-optimization
The results were significant: total image storage dropped from 30GB to 18.3GB — a 39% reduction. This is especially meaningful in a home lab where storage is often limited. The optimized images boot identically to the originals but consume far less disk space on the Proxmox host.
Terraform and Ansible Deployment
The final piece of the puzzle is fully automated deployment using Terraform and Ansible. The terraform-ansible/ directory contains everything needed to stand up the lab from scratch:
Terraform handles the infrastructure provisioning:
VM creation on Proxmox with defined CPU, memory, and disk parameters
Network interface configuration with VLAN tagging
Cloud-init integration for initial bootstrapping
State management for tracking deployed resources
Ansible manages the post-deployment configuration:
init-proxmox.yml: Initializes the Proxmox host with required packages, storage configuration, and network bridges
deploy-vm.yml: Deploys individual VMs with their specific configurations
remove-gateway.yml: Cleans up default gateway routes that can interfere with lab routing exercises
Configuration variables are stored in group_vars/all.yml (with a .sample template provided), and the hosts inventory file defines the Proxmox target. The ansible.cfg sets sensible defaults for host key checking and privilege escalation.
With this setup, spinning up a complete CCNA lab goes from a manual multi-hour process to a single command:
This project is a living repository — I plan to continue adding to it as I progress through my CCNA studies and expand the lab. Future additions may include:
Additional topology configurations for specific CCNA exam topics
Integration with network monitoring tools
CI/CD pipeline for automated lab testing
Support for additional platforms (VIRL, GNS3)
If you are studying for the CCNA or building your own home lab, feel free to fork the repository and adapt it to your needs. Contributions and feedback are always welcome.
Hey everyone! If you have been following my blog, you know I love combining Python with network engineering. From automating backups with Netmiko to monitoring IP SLAs with DNA Center, I am always looking for ways to make our lives as network engineers easier. Today, I am excited to walk you through my latest project: an AI-Powered Network Health Checker. Don’t worry — this is totally beginner friendly. If you can write a basic Python script, you can follow along!
What Does This Tool Do?
In a nutshell, this tool pulls real-time data from your network devices (think CPU usage, memory utilization, interface errors, etc.), feeds that data into a simple machine learning model, and tells you whether each device is healthy or if there might be an issue. The output is super straightforward — you will see messages like “Device is healthy” or “Potential issue detected.” No PhD in data science required!
Step 1: Pulling Device Data with Python
Just like in my previous posts on network automation, we start by connecting to our devices and grabbing the data we need. I used the Netmiko library to SSH into each device and pull key metrics. Here is a simplified version of the script:
This script connects to a Cisco IOS device, grabs CPU usage, memory utilization, and interface error counts. You can easily expand this to loop through multiple devices from an inventory file — just like we did in the backup config script project.
Step 2: Building a Simple ML Model for Anomaly Detection
Here is where the AI magic comes in — but I promise it is simpler than it sounds. We are using scikit-learn’s Isolation Forest algorithm, which is perfect for anomaly detection. It learns what “normal” looks like from your data and flags anything that seems off.
The Isolation Forest works by randomly partitioning data points. Anomalies are isolated faster because they are different from the majority of the data. The contamination parameter tells the model roughly what percentage of data points are expected to be anomalies — I set it to 0.1 (10%) as a starting point, but you can tune this for your environment.
Step 3: Putting It All Together
Now let us combine everything into a single script that loops through your devices, pulls the data, and runs it through the model:
If you are a network engineer who is curious about AI and machine learning, this is a great beginner project to get your feet wet. You don’t need to understand every detail of how Isolation Forest works under the hood — just know that it is a tool that can help you spot problems before they become outages.
As always, if you have questions or want to share how you have customized this for your own network, drop a comment below or reach out to me. Happy automating!
Backing up network device configurations is one of the most critical tasks in network administration. A missed backup could mean hours of manual reconfiguration after a failure. In this post, we will walk through a Python script that automates this process by connecting to a router or switch via SSH and saving the running-config to a local file. We will use Netmiko, a popular Python library that simplifies SSH connections to network devices. Whether you manage a handful of switches or hundreds of routers, this script gives you a repeatable, automated way to capture configurations on demand.
Prerequisites
Before getting started, make sure you have:
Python 3.8 or higher installed
SSH access to your target network device
Device credentials (username and password)
The Netmiko library installed
To install Netmiko, run:
pip install netmiko
You can also clone the full project repository and install from the requirements file:
The core of our script uses Netmiko’s ConnectHandler to establish an SSH session. You provide the device type, hostname or IP address, and your credentials. Netmiko handles the SSH negotiation and drops you into an authenticated session.
Netmiko supports a wide range of device types including cisco_ios, cisco_nxos, arista_eos, and juniper_junos. You simply pass the appropriate device type string and Netmiko adapts its behavior accordingly.
Retrieving the Running Configuration
Once connected, pulling the running configuration is a single method call. We use send_command to execute show running-config on the device and capture the output as a string:
Netmiko handles paging automatically, so even if your configuration is long, you will get the complete output without needing to send space or press Enter to page through it.
Saving the Configuration to a File
With the configuration captured in memory, the next step is writing it to a file. Our script creates a backups directory automatically and saves each configuration with a timestamped filename so you never overwrite a previous backup:
When you run the script, you will see output like this:
[] Connecting to 192.168.1.1:22 (cisco_ios)… [+] Successfully connected to 192.168.1.1 [] Retrieving running-config… [+] Retrieved 15234 characters of configuration [+] Configuration saved to backups/192_168_1_1_running-config_2026-04-10_14-30-00.txt [+] Disconnected. Backup complete!
What is Next
This script provides a solid foundation for network configuration backups. Here are some ideas for extending it:
Loop through a list of devices from a CSV or YAML inventory file to back up your entire network in one run
Schedule the script with cron (Linux) or Task Scheduler (Windows) for automatic daily backups
Add email or webhook notifications on success or failure
Compare configurations between backups to detect unauthorized changes using difflib
Integrate with Git to version-control your configurations automatically
Wrapping Up
Automating network device backups does not have to be complicated. With Python and Netmiko, you can connect to any router or switch, pull the running configuration, and save it to a timestamped file in just a few lines of code.
If you found this useful, check out my other posts on network automation including Monitoring IP SLAs with Python, DNA Center, and NetBox and Build a Home Lab Like a Pro. Stay tuned for more content from the NetworkThinkTank!
If you manage a network of any size, you know that keeping tabs on performance metrics like latency, jitter, and packet loss is critical. Cisco IP SLA (Service Level Agreement) operations are the go-to feature for probing network paths and measuring these metrics directly from your routers and switches. But manually checking IP SLA statistics across dozens or hundreds of devices? That does not scale.
In this post, I will walk you through a Python-based tool I built that pulls IP SLA data from Cisco DNA Center via its REST API, enriches it with device metadata from NetBox, and generates automated performance reports. Whether you are running a handful of branch routers or a large enterprise campus, this approach gives you a scalable, repeatable way to monitor network performance.
What is IP SLA?
Cisco IP SLA is a built-in feature on Cisco IOS and IOS-XE devices that allows you to generate synthetic traffic to measure network performance. You can configure operations like ICMP echo (ping), UDP jitter, HTTP GET, and more. Each operation continuously measures metrics such as round-trip time (RTT), latency, jitter, packet loss, and availability. These metrics are essential for validating SLA compliance, troubleshooting performance issues, and capacity planning.
The Tools
This project brings together three key components. First, Python does the heavy lifting for API calls, data parsing, and report generation. Second, Cisco DNA Center provides a centralized REST API for pulling device inventory and running CLI commands across your entire network without SSH-ing into each device individually. Third, NetBox acts as our network source of truth, storing device metadata like site assignments, roles, platforms, and IP addresses that we use to enrich the raw SLA data.
How It Works
The IP SLA Monitor tool follows a simple three-step workflow:
Authenticate with DNA Center and pull IP SLA operation statistics from all monitored devices using the command-runner API.
2. Query NetBox for each device to enrich the data with site name, device role, platform, and management IP.
3. Evaluate each SLA operation against configurable thresholds for latency, jitter, and packet loss, then generate JSON and CSV reports.
The tool can run as a one-shot collection or in a continuous monitoring loop with a configurable polling interval. Alerts are logged to the console when any operation exceeds your defined thresholds.
The Python Code
The project is organized into three main scripts:
ip_sla_monitor.py is the main orchestration script that ties everything together. It loads configuration from a .env file, initializes the DNA Center and NetBox clients, collects SLA data, enriches it, evaluates thresholds, and saves reports.
dnac_integration.py handles all communication with the Cisco DNA Center REST API including authentication, device inventory retrieval, and IP SLA data collection via the command-runner API.
netbox_integration.py connects to the NetBox API to look up device metadata by hostname, returning site assignments, device roles, platform types, and IP addresses.
Getting Started
Getting up and running is straightforward. Clone the repository, set up a virtual environment, install the dependencies from requirements.txt, and configure your .env file with your DNA Center and NetBox credentials. The only Python packages required are requests for HTTP API calls, python-dotenv for environment variable management, and urllib3. No complex frameworks or heavy dependencies. Full setup instructions are in the GitHub repo README.
Threshold Alerting
One of the most useful features is configurable threshold alerting. You define your acceptable limits for latency, jitter, and packet loss in the .env file, and the tool flags any SLA operation that exceeds those limits. For example, with default thresholds of 100ms latency, 30ms jitter, and 1% packet loss, a branch router showing 115ms latency and 2.1% packet loss would be immediately flagged as an alert in the console output and reports.
Sample Output
The tool generates both JSON and CSV reports. The JSON report includes a summary section with total operations, passing and failing counts, and average latency, followed by detailed per-operation data enriched with NetBox metadata. The CSV report provides the same data in a tabular format that you can easily import into Excel or feed into other monitoring tools. Sample output files are included in the GitHub repository under the output directory.
What is Next
This project is a solid foundation, but there is plenty of room to extend it. Some ideas for future enhancements include adding webhook or email notifications for alerts, integrating with Grafana for real-time dashboards, storing historical data in a time-series database like InfluxDB, and expanding the command-runner integration to pull live SLA statistics directly from devices.
Wrapping Up
Python automation combined with APIs from DNA Center and NetBox gives network engineers a powerful toolkit for monitoring IP SLAs at scale. Instead of manually checking IP SLA stats on individual devices, you can automate the entire workflow and get enriched reports in minutes.
If you found this useful, check out my other posts on network automation including Automating Network Device Backups with Python and Netmiko and Build a Home Lab Like a Pro. Stay tuned for more content from the NetworkThinkTank!
Transform your spare room into a career-accelerating network laboratory
p>
If you’re serious about leveling up your networking skills, there’s no substitute for hands-on experience. Certifications are great. Books are essential. But nothing cements your understanding of BGP peering, VXLAN fabrics, or automation workflows like building it yourself and watching packets traverse your own infrastructure.
After 20+ years in networking — from pulling cable to managing backbone infrastructure at Lumen — I can tell you that the engineers who stand out are the ones who lab. They break things on purpose, fix them under pressure, and walk into production environments with confidence.
In this guide, I’ll walk you through everything you need to build a professional-grade home lab, from hardware selection to virtualization platforms to topology design. And because I believe in open-source learning, I’ve created a companion GitHub repository with sample configs, topology files, and scripts you can use to get started immediately.
Let’s get the obvious out of the way: you can’t become a great network engineer by reading alone. Here’s why a home lab is one of the best investments you can make in your career:
p>
Hands-on practice for certifications — CCNA, CCNP, JNCIA, and beyond all require you to understand how protocols actually behave, not just how they’re described in RFCs.
Safe environment to break things — Production networks don’t forgive mistakes. Your lab does. Misconfigure OSPF areas? Blow up a spanning-tree topology? No pager going off at 2 AM.
Real-world skill building — Employers want engineers who can troubleshoot, not just configure. A lab gives you the reps.
Automation testing ground — Python scripts, Ansible playbooks, Netmiko sessions — test them here before you touch production.
Portfolio building — Document your labs on GitHub and your blog. Show hiring managers what you can do, not just what you’ve memorized.
Hardware Recommendations
You don’t need to spend thousands of dollars to build an effective lab. Here’s a tiered approach based on budget and goals.
Tier 1: Budget Build ($200-$500)
Perfect for getting started with virtualization and basic routing/switching labs.
Component
Recommendation
Estimated Cost
Server
Dell OptiPlex 7050/7060 (used) or HP EliteDesk 800 G3
$100-$200
RAM
32 GB DDR4 (upgrade if needed)
$40-$60
Storage
500 GB SSD + 1 TB HDD
$50-$80
Switch
Managed switch (Netgear GS308T or TP-Link TL-SG108E)
$30-$50
Misc
USB-to-serial console cable, Ethernet cables, power strip
$20-$40
Tier 2: Intermediate Build ($500-$1,500)
For engineers running multiple VMs, nested virtualization, and more complex topologies.
Component
Recommendation
Estimated Cost
Server
Dell PowerEdge R720/R730 or HP ProLiant DL380 Gen9
$200-$500
RAM
64-128 GB DDR4 ECC
$100-$200
Storage
1 TB NVMe SSD + 2 TB HDD
$100-$200
Network
Intel X520-DA2 10GbE NIC (SFP+)
$30-$50
Switch
Cisco Catalyst 2960-X or Arista 7010T (used)
$50-$150
Firewall
Netgate SG-1100 (pfSense) or old PC with OPNsense
$100-$200
UPS
APC Back-UPS 600VA
$60-$80
Virtualization Platforms
This is where the magic happens. Modern network labs are predominantly virtual, and the platform you choose shapes your entire lab experience.
Proxmox VE (My Top Pick for Home Labs)
GNS3
Why: The OG network emulator. Runs real Cisco IOS/IOU images and integrates with QEMU/KVM for full OS emulation. Best for Cisco-centric labs and certification prep (CCNA/CCNP). Pair with our BGP Fundamentals Lab and VPN IPSec/GRE Lab.
p>
EVE-NG
Why: Web-based, multi-vendor support, great for complex topologies with dozens of nodes. Best for multi-vendor labs (Cisco, Juniper, Arista, Palo Alto, Fortinet). Pair with our EVPN-VXLAN Lab for data center fabric simulation.
p>
Containerlab
Why: Lightweight, code-defined network topologies using containers. Perfect for modern network automation workflows. Uses Docker containers running network OS images (cEOS, FRR, Nokia SR Linux). Define topologies in YAML and version control them in Git. Pair with our Network Automation with Ansible Lab.
p>
Designing Your Network Topology
A well-designed lab topology mirrors real-world architectures. Here are three topologies I recommend, progressing from simple to advanced.
What you’ll learn: VLANs, inter-VLAN routing, firewall rules, NAT, DHCP
p>
Topology 2: The Enterprise Lab
Dual ISP routers with eBGP > Edge Router (BGP AS 65000) > Core switches (OSPF, LACP) > Access switches
What you’ll learn: BGP peering, OSPF areas, HSRP/VRRP, link aggregation, redundancy
p>
Topology 3: The Data Center Fabric
Spine-01/Spine-02 > Leaf-01 through Leaf-04 > Server clusters
What you’ll learn: EVPN-VXLAN, BGP underlay/overlay, spine-leaf architecture, network automation at scale. Check out our EVPN-VXLAN Lab for a full walkthrough.
p>
Essential Software Tools
Network Management and Monitoring
LibreNMS or Zabbix – SNMP-based monitoring, alerting, graphing
Grafana + Prometheus – Modern dashboards and metrics visualization
As network engineers, one of our most critical yet tedious responsibilities is maintaining up-to-date backups of device configurations. Whether you manage a handful of switches or hundreds of routers across multiple sites, manually logging into each device to copy its running configuration is time-consuming, error-prone, and simply does not scale.
In this post, I will walk you through a Python automation script I built that connects to network devices via SSH, pulls their running configurations, and saves them to organized backup files, all in parallel. The full project is available on my GitHub repository.
Before diving into the code, let us consider why automation matters here. Manual backups are inconsistent since engineers may forget devices or skip them during busy periods. They are slow because logging into devices one at a time does not scale. They lack accountability with no automatic logging of what was backed up and when. They are error-prone since copy-paste mistakes can result in incomplete or corrupted backup files.
An automated solution runs on a schedule, covers every device in your inventory, logs every action, and produces consistent, timestamped backup files every single time.
What the Script Does
The network backup automation script provides several key features. It supports multiple platforms including Cisco IOS, Cisco ASA, Cisco NX-OS, Juniper JunOS, and Arista EOS. It uses concurrent connections via Python ThreadPoolExecutor to back up multiple devices simultaneously. Backups are organized into date-stamped directories for easy retrieval. Each backup includes metadata headers showing the device hostname, IP address, device type, and timestamp. The script generates a JSON summary report after each run showing successes and failures. It also provides comprehensive error handling for timeouts, authentication failures, and connection issues.
Project Structure
The project is organized as follows. The main script is network_backup.py which contains all the automation logic. The requirements.txt file lists the Python dependencies, primarily Netmiko and Paramiko. The inventory.json file is a sample device inventory template where you define your network devices. The .gitignore file ensures backup files and sensitive data are not accidentally committed to version control. The LICENSE file contains the MIT open-source license.
How It Works
The script follows a straightforward workflow. First, it loads your device inventory from a JSON file. Each device entry includes the hostname, IP address, device type, credentials, and optional enable secret. Second, it creates a date-stamped backup directory to keep backups organized by day. Third, it spawns multiple worker threads using ThreadPoolExecutor to connect to devices in parallel. Fourth, for each device, it establishes an SSH connection using Netmiko, enters enable mode if needed, and runs the appropriate show command for that platform. Fifth, the configuration output is saved to a file with a metadata header. Finally, a JSON report is generated summarizing the results of the entire backup run.
The Device Inventory
The inventory.json file is where you define all the devices you want to back up. Here is an example of what it looks like. Each device entry includes the hostname for identification, the host IP address, the device_type which tells Netmiko how to communicate with the device, the username and password for SSH authentication, the port number which defaults to 22, and an optional secret for enable mode on Cisco devices.
Key Code Highlights
The BACKUP_COMMANDS dictionary maps each device type to the appropriate command for retrieving the running configuration. For Cisco IOS, ASA, and NX-OS devices, it uses “show running-config”. For Juniper JunOS devices, it uses “show configuration | display set”. For Arista EOS, it also uses “show running-config”.
The backup_device function is the core of the script. It takes a device dictionary and backup directory path, establishes the SSH connection using Netmiko ConnectHandler, enters enable mode if a secret is provided, runs the backup command, and saves the output with metadata headers to a timestamped config file.
The run_backups function orchestrates the entire process. It loads the inventory, creates the backup directory, then uses ThreadPoolExecutor to run backups in parallel across all devices. After all backups complete, it logs a summary and generates a JSON report file.
Getting Started
To use this script in your own environment, follow these steps. First, clone the repository from GitHub. Second, install the dependencies using pip install with the requirements.txt file. Third, edit inventory.json with your actual device information including hostnames, IP addresses, credentials, and device types. Fourth, run the script using python network_backup.py. You can also customize the behavior with command-line arguments such as specifying a different inventory file with the -i flag, changing the output directory with -o, adjusting the number of concurrent workers with -w, or enabling verbose debug logging with -v.
Security Considerations
Since this script handles network device credentials, security is paramount. Never commit your actual inventory.json file with real credentials to version control, which is why it is included in the .gitignore file. Consider using environment variables or a secrets manager for credentials in production. Restrict file permissions on the backup directory since configuration files may contain sensitive information. Use SSH key-based authentication instead of passwords when possible. Run the script from a secured management workstation or jump box.
What is Next
This script provides a solid foundation, but there are many ways to extend it. You could add email or Slack notifications for backup failures. You could integrate with a scheduling system like cron or Windows Task Scheduler to run backups automatically. You could implement configuration diff detection to alert you when configurations change unexpectedly. You could add support for additional device types. You could also store backups in a Git repository for version-controlled configuration management.
Conclusion
Network automation does not have to be overwhelming. Starting with a practical use case like configuration backups is an excellent way to build your Python skills while immediately adding value to your organization. The script handles the complexity of multi-vendor support, concurrent connections, and error handling so you can focus on what matters most, keeping your network safe and well-documented.