How JARVIS Built the EMG-CCNP LAB Portal

May 25, 2026May 25, 2026 itstuffsite Uncategorized

Date: May 25, 2026 | Tags: CCNP, Network Lab, HTML, CSS, JavaScript, JARVIS, AI Development, Local Development

Today I’m excited to showcase one of the most visually stunning projects to come out of my home lab setup: the EMG-CCNP LAB Portal. This is a fully interactive, single-page network lab dashboard that was designed and built entirely by JARVIS – my AI assistant powered by Claude. The entire project lives locally on my MacBook Desktop and represents a new approach to how we can visualize and interact with CCNP-level networking concepts.

EMG-CCNP LAB Portal - Interactive network lab dashboard built. byJARVIS featuring network topology, device console, lab modules, and live event logging

How JARVIS Built the EMG-CCNP LAB Portal

The EMG-CCNP LAB Portal was created by JARVIS using a structured prompt file (.jarvis_prompt.md) that outlined the desired features and design specifications. JARVIS interpreted the requirements and generated a complete, production-quality web application in a single HTML file – no frameworks, no build tools, no external dependencies. Just pure HTML, CSS, and JavaScript working together in 1,244 lines of meticulously crafted code (55KB total).

What makes this impressive is that JARVIS didn’t just write code – it made architectural decisions about color theory, animation timing, layout composition, and user interaction patterns. The result is a cohesive, professional-grade interface that looks like it came from a dedicated UI/UX team, not an AI assistant running on a local MacBook.

Beautiful Design Elements

The Color System (CSS Custom Properties)

The portal uses a sophisticated dark-mode color palette defined through CSS custom properties. The deep blue-black background (#030810) provides the perfect canvas for vibrant accent colors: electric cyan (#00d4ff) for primary elements, neon green (#00ff88) for success states, vivid orange (#ff7700) for warnings, and purple (#884cff) for decorative accents. Every color has an accompanying glow effect using multi-layered box-shadows, creating a futuristic cyberpunk aesthetic.

Animated Particle Network Background

Behind all the content, a JavaScript Canvas animation renders a continuously moving particle network. Nodes drift across the screen and connect with lines when they come within proximity of each other – a subtle but powerful visual metaphor for networking itself. Layered on top is a CSS grid overlay using barely-visible cyan lines at 60px intervals, adding depth and a technical feel to the background.

Interactive Network Topology (SVG)

The centerpiece is a fully interactive SVG network topology diagram featuring 9 nodes: 2 ISP clouds (AS 65001 and AS 65002), a core router R1-CORE (AS 65100), 2 distribution switches, and 4 access switches. The links are animated with dashed strokes and traveling packet dots – orange for eBGP connections, cyan for OSPF areas, and green for EIGRP links. This isn’t a static image; it’s a living, breathing representation of a multi-protocol enterprise network.

Lab Module Cards

Eight lab module cards cover the core CCNP ENCOR and ENARSI topics: OSPF, EIGRP, BGP, MPLS, QoS, SD-WAN, VPN, and Redistribution. Each card features color-coded status badges, animated progress bars that fill dynamically, and “LAUNCH” buttons that trigger simulated console output. The cards use semi-transparent panel backgrounds (rgba(6,15,38,0.88)) with subtle border glows that respond to hover interactions.

Device Console with Typewriter Effect

A simulated Cisco IOS terminal displays router command output using a typewriter animation effect. It cycles through real commands like show ip route summary, show bgp summary, and show ip ospf neighbor, making the interface feel alive and connected to actual networking operations. The monospace Courier New font and green-on-black color scheme perfectly capture the look of a real terminal session.

Live Event Logger

A real-time scrolling event feed displays color-coded log entries (INFO in cyan, SUCCESS in green, WARN in orange, ERROR in red, DEBUG in purple) that automatically appear every 3.8 seconds. The newest entries appear at the top with smooth animations, simulating the monitoring experience you’d see in a production NOC (Network Operations Center).

JARVIS AI Assistant Orb

In the bottom-right corner floats a glowing orb – click it and JARVIS appears as an in-page AI assistant that responds to CCNP keywords. Ask about OSPF, EIGRP, BGP, MPLS, QoS, VPN, SD-WAN, or redistribution, and JARVIS provides contextual explanations. It’s a beautiful touch that bridges the gap between a static dashboard and an intelligent learning tool.

The Technology Stack: Zero Dependencies

One of the most remarkable aspects of the EMG-CCNP LAB Portal is what it doesn’t use. There are no React components, no Tailwind classes, no npm packages, no webpack bundles. The entire application is:

Pure HTML5 – Semantic markup with proper structure
CSS3 – Custom properties, keyframe animations, flexbox/grid layouts, gradients, pseudo-elements, and multi-layered box-shadows
Vanilla JavaScript – Canvas API for particle animation, setInterval for live updates, DOM manipulation for interactivity, and inline SVG animation
Inline SVG – The network topology is rendered as scalable vector graphics directly in the HTML

Everything is contained in a single index.html file. No build step. No compilation. No server required. Just open the file and it runs.

Local Development Process: Served from Desktop

The development workflow for this project is refreshingly simple. The file lives at:

file:///Users/jczaldivar/Desktop/emg-ccnp-lab-portal/index.html

That’s it. No local server, no Docker containers, no cloud deployment. The file sits on my MacBook Desktop in its own project folder and opens directly in Chrome. This is intentional – the EMG-CCNP LAB Portal is a personal learning and visualization tool, and keeping it local means zero latency, zero dependencies on external services, and instant iteration.

When I want to make changes, JARVIS modifies the file directly on disk. I refresh the browser and the updates are live. The development cycle is measured in seconds, not minutes. This local-first approach proves that you don’t need a complex CI/CD pipeline to build something beautiful and functional.

Final Thoughts

The EMG-CCNP LAB Portal represents what’s possible when AI-assisted development meets clear design vision. JARVIS took a structured prompt and turned it into a 1,244-line masterpiece that combines real networking knowledge with stunning visual design. Every animation, every color choice, every interaction was considered and implemented with purpose.

If you’re studying for the CCNP or just love beautiful network visualizations, this project shows that learning tools don’t have to be boring. They can be cyberpunk. They can glow. They can make you feel like you’re in a sci-fi movie while you study OSPF area types and BGP path selection.

Built by JARVIS. Served from Desktop. Zero dependencies. Pure craft.

OpenClaw Health Monitoring System: Automated Service Health Checks with Discord Alerts

May 10, 2026May 10, 2026 itstuffsite Uncategorized

Date: May 10, 2026 | Tags: OpenClaw, Health Monitoring, Proxmox, systemd, Discord, Automation

Overview

The OpenClaw Health Monitor is a systemd-based health checking system that runs on the Proxmox host (10.10.20.252). It monitors critical OpenClaw services every 15 minutes and sends alerts to Discord when issues are detected. This post covers the architecture, components, and management of the health monitoring system.

How It Works

The health monitoring system uses a systemd timer (openclaw-health-alerts.timer) that triggers a Python health check script every 15 minutes. The script checks each monitored service via HTTP requests and reports any failures to Discord through a relay server.

The alert flow is as follows:

Timer triggers the service every 15 minutes
2. Health check script checks each monitored service
3. If any service is CRITICAL, an alert payload is built
4. Alert is sent via HTTP POST to the relay server
5. Relay forwards the payload to Discord webhook
6. Success is indicated by HTTP 204 response from Discord

Components

Systemd Timer

Unit: openclaw-health-alerts.timer
– Schedule: Every 15 minutes
– Triggers: openclaw-health-alerts.service
– Location: /etc/systemd/system/openclaw-health-alerts.timer

Health Check Script

Location: /opt/openclaw/scripts/openclaw_health_alerts.py
– Log File: /var/log/openclaw/health_alerts.log

Monitored Services

The following services are monitored:

openclaw-gateway (10.10.20.207, port 15750) – HTTP 200 check
– ollama (10.10.20.29, port 11434) – HTTP 200 check
– openwebui (10.10.29.29, port 3000) – HTTP 200 check
– openclaw-agent (CT 206, port 3000) – get_exec > HTTP 200 check

Discord Alerting

Alerts are sent to Discord via a relay server:

Relay URL: http://10.10.20.36:8888/relay
– Relay Host: Windows machine at 10.10.20.36
– Relay Port: TCP 8888
– The relay forwards health alert payloads to the Discord webhook endpoint.

Status Indicators

OK: Service responding with HTTP 200
– CRITICAL: Service unreachable or not responding

Management Commands

Check timer status:

systemctl status openclaw-health-alerts.timer –no-pager

List active timers:

systemctl list-timers | grep -i openclaw

View recent service logs (last 80 lines):

journalctl -u openclaw-health-alerts.service -n 80 –no-pager

Manually trigger a health check:

systemctl start openclaw-health-alerts.service

Enable/disable the timer:

systemctl enable openclaw-health-alerts.timer

systemctl disable openclaw-health-alerts.timer

Troubleshooting

Timer not running:

systemctl start openclaw-health-alerts.timer

systemctl enable openclaw-health-alerts.timer

Discord alerts not sending:

Check relay server is running on 10.10.20.36:8888
2. Verify Windows firewall allows inbound TCP 8888
3. Check Discord webhook URL is valid (not expired/deleted)

Service showing CRITICAL:

SSH to the relevant host
2. Check if the service container/process is running
3. Verify network connectivity between Proxmox and service host

The full documentation is available in the NetworkThinkTank-Labs GitHub repository in the README-OpenClaw-Health.md file.

NAS Migration Complete: Ollama Models & OpenWebUI Data Now on QNAP NAS

April 30, 2026April 30, 2026 itstuffsite Uncategorized

Date: April 30, 2026 | Tags: homelab, NAS, NFS, QNAP, Ollama, OpenWebUI, Proxmox, Docker

Summary

Successfully completed the migration of both Ollama model data and OpenWebUI application data from local VM storage to a QNAP NAS (TS-453D) via NFS. This involved diagnosing and fixing a broken NAS mount configuration, correcting NFS export issues on the QNAP, and performing live data migrations with minimal downtime.

Background

The Ollama AI stack (Ollama + Open WebUI) runs on VM 205 (IP: 10.10.20.39) inside a Proxmox VE cluster (host IP: 10.10.20.252). The stack was originally configured to store both model weights and OpenWebUI application data on a NAS via NFS mounts, but at some point the NAS mounts broke, causing the stack to fail. A previous workaround switched the Docker volumes to local paths to restore service, but the goal was always to move the data back to the NAS for centralized storage and backup.

Phase 1: NAS Mount Investigation & Diagnosis

The Problem

The NFS mounts on VM 205 were broken. The original /etc/fstab had:

			
10.20.6:/volume1/ollama-models  /mnt/nas/ollama-models  nfs defaults,_netdev,nofail,x-systemd.automount 0 0
10.20.6:/volume1/openwebui-data /mnt/nas/openwebui-data nfs defaults,_netdev,nofail,x-systemd.automount 0 0

Root Cause Analysis

Two distinct issues were found:

Wrong NAS IP in fstab: The fstab pointed to 10.10.20.6, but the actual NAS IP is 10.10.20.4. Pinging 10.10.20.6 returned 100% packet loss. Pinging 10.10.20.4 succeeded with <1ms latency.
Wrong path format: The fstab used Synology-style paths (/volume1/...) but the NAS is a QNAP TS-453D running QTS 5.2.3. QNAP uses different NFS export paths.

NAS Discovery

Property	Value
Model	QNAP TS-453D
Firmware	QTS 5.2.3.3451
CPU	Intel Celeron J4125 (4C/4T, 2.7GHz)
RAM	4 GB
IP	10.10.20.4
Hostname	NAS5F20EF
MAC	24:5A:8E:5F:20:F0

Open Ports on NAS

Port	Service	Status
22	SSH	Open
111	rpcbind	Open
443	HTTPS	Open
445	SMB	Open
2049	NFS	Open
8080	HTTP	Open

NFS Export Issue

Even though the NFS service was running (confirmed via rpcinfo showing NFS v2/3/4 on TCP/UDP port 2049), showmount -e 10.10.20.4 returned an empty export list. SSH into the QNAP revealed the /etc/exports file contained an invalid NFS option read-wr instead of rw, causing exportfs to silently fail with no active exports.

Phase 2: NFS Export Fix

SSH into QNAP NAS: ssh jczaldivar@10.10.20.4
Backup original exports: sudo cp /etc/exports /etc/exports.bak
Write corrected exports file with proper rw options (replacing invalid read-wr)
Reload NFS exports: sudo exportfs -ra
Verify exports active: sudo exportfs -v confirmed both shares exported with rw
Verify from Proxmox: showmount -e 10.10.20.4 now listed both /ollama-models and /openwebui-data

Corrected /etc/exports on QNAP:

			
/share/CACHEDEV1_DATA/ollama-models *(rw,async,no_subtree_check,no_root_squash,insecure) 10.10.20.0/24(rw,async,no_subtree_check,no_root_squash,insecure)
/share/CACHEDEV1_DATA/openwebui-data *(rw,async,no_subtree_check,no_root_squash,insecure) 10.10.20.0/24(rw,async,no_subtree_check,no_root_squash,insecure)

Phase 3: Ollama Models Migration

The NAS already contained a full set of Ollama models from a previous configuration (9.8 GB):

Model	Size	ID
nomic-embed-text:latest	274 MB	0a195f422b47
qwen2.5-coder:7b	4.7 GB	dae161a2
llama3.1:8b	4.9 GB	46e0d1c039e
tinyllama:latest	637 MB	26449156de35

The NFS share was mounted on VM 205 at /mnt/nas/ollama-models, docker-compose.yml was updated to use the NAS path, and the container was recreated, immediately picking up all 4 models from the NAS.

Phase 4: OpenWebUI Data Migration

Location	Size	Description
NAS (pre-existing)	890 MB	Older copy from previous config
Local VM (active)	934 MB	Current working copy

Mounted NFS share: mount -t nfs 10.10.20.4:/openwebui-data /mnt/nas/openwebui-data
Stopped open-webui container for data consistency
Rsync’d local data to NAS: rsync -av --delete – transferred 934 MB at ~110 MB/sec
Updated docker-compose.yml volume to NAS path
Updated /etc/fstab with correct NAS IP (10.10.20.4) and paths
Restarted all containers – both recreated with NAS mounts and running healthy

Verification Results

Check	Result
ollama container	Up, running
open-webui container	Up, healthy
OpenWebUI HTTP (port 3000)	HTTP 200
Ollama models accessible	All 4 models loaded
NFS mount ollama-models	rw, NFS v3
NFS mount openwebui-data	rw, NFS v3
NAS ollama-models size	9.8 GB
NAS openwebui-data size	890 MB

Current Network Topology

Infrastructure Overview

All devices are on the 10.10.20.0/24 subnet.

Device	IP Address	Role	Details
Proxmox VE Host	10.10.20.252	Hypervisor	Hosts all VMs and containers
VM 205 (ollama-gpu)	10.10.20.39	AI Stack	Docker: ollama (:11434), open-webui (:3000), NVIDIA GPU passthrough
QNAP NAS (TS-453D)	10.10.20.4	NFS Storage	Hostname: NAS5F20EF, QTS 5.2.3, Intel J4125, 4GB RAM
CT 206	–	OpenClaw agent	Telegram bot
CT 120	–	dns-server	DNS services
CT 130	–	ad-server	Active Directory
CT 200	–	opn-attackrr-agent1	Security lab
CT 201	–	metasploitable2	Security lab target
CT 202	–	dvwa-target	Security lab target
CT 203	–	mycompany-dc	Domain controller
VM 204	–	eve-ng	Network emulator

Network Connectivity Map

Source	Destination	Protocol	Port/Service
Proxmox (10.10.20.252)	VM 205 (10.10.20.39)	VM Host	Virtual NIC
VM 205 (10.10.20.39)	QNAP NAS (10.10.20.4)	NFS v3	TCP 2049
User Browser	VM 205 (10.10.20.39)	HTTP	Port 3000 (Open WebUI)
open-webui container	ollama container	HTTP	Port 11434 (Docker network)

NAS Storage Layout

Share Name	Internal Path	Size	Type
/ollama-models	/share/CACHEDEV1_DATA/ollama-models	9.8 GB	NFS Export
/openwebui-data	/share/CACHEDEV1_DATA/openwebui-data	890 MB	NFS Export
4SSD-SHARE	–	1.96 TB	SMB Share
Container	–	7.43 GB	Container Station
easystore	–	1.35 TB	USB External
Public	–	24 KB	SMB Share

Data Flow

Request path: User Browser -> HTTP:3000 -> Open WebUI (VM 205 container) -> HTTP:11434 (docker network) -> Ollama (VM 205 container) -> GPU Inference (NVIDIA passthrough)

Storage path: Ollama reads/writes model data from /root/.ollama inside the container, which maps to /mnt/nas/ollama-models on VM 205, which is an NFS mount to 10.10.20.4:/ollama-models on the QNAP NAS (internal path: /share/CACHEDEV1_DATA/ollama-models).

WebUI data path: Open WebUI reads/writes app data from /app/backend/data inside the container, which maps to /mnt/nas/openwebui-data on VM 205, which is an NFS mount to 10.10.20.4:/openwebui-data on the QNAP NAS (internal path: /share/CACHEDEV1_DATA/openwebui-data).

Key Configuration Files

VM 205 – /etc/fstab (corrected):

			
10.20.4:/ollama-models  /mnt/nas/ollama-models  nfs defaults,_netdev,nofail,x-systemd.automount 0 0
10.20.4:/openwebui-data /mnt/nas/openwebui-data nfs defaults,_netdev,nofail,x-systemd.automount 0 0

VM 205 – docker-compose.yml:

			
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    ports: ["11434:11434"]
    volumes: ["/mnt/nas/ollama-models:/root/.ollama"]
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    ports: ["3000:8080"]
    environment: ["OLLAMA_BASE_URI=http://ollama:11434"]
    volumes: ["/mnt/nas/openwebui-data:/app/backend/data"]
    depends_on: [ollama]

		

Lessons Learned

Always verify the NAS IP: The fstab had a stale IP (10.10.20.6) that no longer existed. The actual NAS was at 10.10.20.4.
QNAP != Synology paths: Synology uses /volume1/share-name, QNAP uses /share/CACHEDEV1_DATA/share-name internally and exports as /share-name.
Check /etc/exports directly: The QNAP web UI showed NFS enabled, but the underlying exports file had an invalid option (read-wr instead of rw), causing silent export failures.
showmount -e is your friend: Quick way to verify NFS exports are actually published.
Keep local backups: The docker-compose.yml.bak file preserved the original NAS-based config for easy restoration.

Status: Complete

Both Ollama models (9.8 GB, 4 models) and OpenWebUI data (890 MB) are now running from the QNAP NAS via NFS. The stack is fully operational with all 4 models accessible and OpenWebUI responding healthy on port 3000. Migration completed April 30, 2026.

Proxmox Boot + Network Recovery Troubleshooting

April 30, 2026 itstuffsite Uncategorized

This guide covers the essential steps for troubleshooting Proxmox boot failures and network recovery issues. Whether you’re dealing with a Proxmox VE node that won’t boot, network interfaces that fail to initialize, or connectivity problems after an update, this visual reference provides a structured approach to diagnosing and resolving common Proxmox infrastructure issue

MSI MAG X570S Tomahawk Max WiFi BIOS Update: From Version 1.00 to 1.D1

April 29, 2026 itstuffsite Uncategorized

Today I completed a BIOS update on my MSI MAG X570S Tomahawk Max WiFi motherboard, upgrading from the original BIOS Version 1.00 (dated 07/06/2021) to the latest Version 1.D1 (7D54v1D1, dated 09/19/2025). This update was performed using MSI’s M-FLASH utility as part of my Proxmox homelab infrastructure maintenance.

MSI MAG X570S Tomahawk Max WiFi BIOS Update via M-FLASH - Step by step visual guide showing the complete BIOS update process from version 1.00 to 1.D1

The visual guide above outlines the complete BIOS update process using MSI’s M-FLASH utility. Here’s a summary of the key steps performed:

Downloaded BIOS version 7D54v1D1 from MSI official support page
Extracted BIOS file (E7D54AMS.1D1) to a FAT32-formatted USB drive
Stopped all Proxmox VMs and containers before rebooting
Rebooted into BIOS and used M-FLASH to flash the new BIOS
Verified critical BIOS settings: IOMMU enabled, SVM enabled, Above 4G Memory enabled, Re-Size BAR disabled
Confirmed BIOS updated to Version 1.D1 (Release Date: 09/19/2025) via dmidecode

For detailed step-by-step documentation of this BIOS update process, visit the NetworkThinkTank-Labs GitHub repository. This motherboard serves as the foundation for my Proxmox homelab running GPU passthrough with an NVIDIA RTX 3060 Ti for AI workloads.

Homelab Day in Review: 8 Projects Completed in a Single Day

April 29, 2026April 30, 2026 itstuffsite Uncategorized

Some days in the homelab are quiet — a config tweak here, a firmware update there. And then there are days like today. April 29, 2026, turned into a full-blown infrastructure marathon: eight distinct projects spanning networking, virtualization, AI deployment, storage management, and documentation. Here is a complete rundown of everything that got done.

1. GitHub Documentation — 565 Lines of Technical Writing

Documentation is the backbone of any serious homelab. Today I pushed 565 lines of new documentation across multiple GitHub repositories. This included updated READMEs, configuration guides, topology diagrams, and step-by-step walkthroughs. Every project in my lab now has proper technical documentation that anyone can follow to replicate the setup. If it is not documented, it did not happen — and today, it all got documented.

2. EVE-NG CCNA Lab Updates — 29 Nodes

My EVE-NG CCNA lab got a major overhaul. The lab now contains 29 active nodes, covering routing, switching, and network services. This includes Cisco IOS routers and switches configured for OSPF, EIGRP, BGP, VLANs, STP, ACLs, NAT, and DHCP. The lab features API troubleshooting support, Proxmox migration readiness via Terraform and Ansible, and qcow2 image management. Whether you are studying for the CCNA or just want a robust network simulation environment, this lab has you covered.

3. Blog Post Publishing — 2,943 Words

Earlier this month, I published a comprehensive 2,943-word blog post on the Network ThinkTank blog covering how to self-host AI on a Proxmox homelab with Ollama and Open WebUI. Today’s writing adds to that momentum. Consistent publishing is key to building a knowledge base that helps both myself and the broader homelab community.

4. Ollama Models Deployment — 4 Models

Local AI is the future of privacy-conscious computing. Today I deployed four Ollama models on my Proxmox homelab, running inference entirely on local hardware. The models are served through Open WebUI, giving me a polished ChatGPT-like interface without any data leaving my network. No API keys, no cloud dependency, no privacy concerns — just pure local LLM power. The models cover different use cases from general conversation to code generation and technical assistance.

5. OpenClaw AI Agent Deployment

OpenClaw, an AI agent framework, was deployed and configured in the homelab today. This adds autonomous AI agent capabilities to the infrastructure, enabling task automation and intelligent workflows. The deployment involved setting up the agent runtime, configuring API endpoints, and testing basic agent interactions. This is a step toward building a more intelligent, self-managing homelab environment.

6. Windows Server VM Build

A fresh Windows Server virtual machine was built from scratch today. This VM will serve as a core infrastructure component for Active Directory, DNS, DHCP, and Group Policy management. The build process included creating the VM in the hypervisor, installing the OS, applying initial configurations, and setting up remote management. Having a Windows Server in the lab opens up enterprise-grade identity and access management capabilities.

7. NAS Storage Cleanup — ~20GB Freed

Storage hygiene is critical in any homelab environment. Today’s cleanup operation freed approximately 20GB of space on the NAS. This involved removing outdated VM snapshots, clearing old ISO images, purging stale Docker volumes, and archiving completed project files. A clean NAS is a happy NAS — and with 20GB reclaimed, there is plenty of room for new projects.

8. UniFi Network Server Installation

The UniFi Network Server was installed and configured today, bringing enterprise-grade network management to the homelab. This provides centralized control over UniFi access points, switches, and security gateways. The installation included setting up the controller software, adopting network devices, configuring wireless networks, and establishing monitoring dashboards. With UniFi in place, the entire network infrastructure can be managed from a single pane of glass.

Wrapping Up

Eight projects. One day. From AI deployments to network labs, from storage cleanup to documentation — today was a masterclass in homelab productivity. Every one of these projects builds on the others, creating a more capable, better-documented, and more resilient infrastructure.

The key takeaway? Documentation makes everything better. By writing things down — both in GitHub repos and blog posts — I am building a knowledge base that pays dividends every time I need to troubleshoot, replicate, or expand my setup.

If you are running a homelab, I encourage you to document your work, share your configs, and keep building. The community is stronger when we share what we learn.

Until next time — keep labbing.

Follow the Network ThinkTank blog for more homelab guides, networking tutorials, and infrastructure deep-dives. Check out the companion GitHub repositories at github.com/jczaldivar71 for configs, scripts, and technical documentation.

How to Self-Host AI on Your Proxmox Homelab with Ollama and Open WebUI

April 24, 2026April 24, 2026 itstuffsite AI & Machine Learning, Homelab Docker, GPU passthrough, homelab, linux, llm, NAS, Ollama, Open WebUI, Proxmox, self-hosted AI, virtualization

By NetworkThinkTank | April 23, 2026

Introduction

I got tired of sending my data to cloud AI services. Every prompt I typed into ChatGPT or Claude was being stored, analyzed, and used for training. For personal questions, code snippets with API keys, and private brainstorming sessions, that never sat well with me.

So I built my own. A fully self-hosted AI assistant running on my Proxmox homelab, powered by Ollama for local LLM inference and Open WebUI for a polished ChatGPT-like interface. The models run on my own NVIDIA GPU, the data stays on my NAS, and nothing leaves my network.

This guide walks you through exactly how I did it – from VM creation to pulling your first model and chatting with it through a clean web interface. If you have a Proxmox server and a spare GPU, you can have this running in an afternoon.

Prerequisites and Hardware Requirements

Here is what you need before starting:

Hardware

Proxmox VE host (version 7.x or 8.x)
NVIDIA GPU with at least 8GB VRAM (I use an RTX 3060 Ti 8GB)
Minimum 16GB RAM allocated to the AI VM (32GB recommended)
NAS with NFS or SMB shares available (Synology, TrueNAS, etc.)
At least 50GB free storage for models (100GB+ recommended)

Software

Proxmox VE installed and running
Ubuntu Server 22.04 or 24.04 LTS ISO
Docker and Docker Compose
NVIDIA drivers (535+ recommended)
NVIDIA Container Toolkit

Network

Static IP or DHCP reservation for the AI VM
Access to your NAS from the VM subnet
Optional: domain name for reverse proxy

Architecture Overview

The stack looks like this:

All AI processing happens locally on the GPU inside the VM. Open WebUI provides the browser-based chat interface and connects to Ollama’s API on the backend. The NAS stores all model files and conversation data so nothing is lost if the VM needs rebuilding.

Step 1: Preparing the Proxmox VM

First, create a new VM in Proxmox optimized for AI workloads.

VM Configuration

VM ID: 205
Name: ollama-gpu
OS: Ubuntu Server 22.04 LTS
CPU: host type, 8 cores
RAM: 16GB minimum (I use 32GB)
Disk: 100GB on local-lvm (SSD preferred)
Network: vmbr0, bridge mode
BIOS: OVMF (UEFI) for GPU passthrough

Enabling IOMMU on the Proxmox Host

Edit GRUB configuration:

nano /etc/default/grub

For AMD CPUs, set:

GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"

For Intel CPUs, set:

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"

Update GRUB and reboot:

			
update-grub
reboot

Add VFIO modules to /etc/modules:

			
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Blacklist the NVIDIA drivers on the host:

			
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf

Find your GPU’s PCI IDs:

lspci -nn | grep NVIDIA

Example output:

			
00.0 VGA compatible controller [0300]: NVIDIA [10de:2489]
00.1 Audio device [0403]: NVIDIA [10de:228b]

Bind to VFIO:

echo "options vfio-pci ids=10de:2489,10de:228b" > /etc/modprobe.d/vfio.conf

Update initramfs and reboot:

			
update-initramfs -u -k all
reboot

Add GPU to the VM

In the Proxmox web UI:

			
VM 205 > Hardware > Add > PCI Device
- Select your NVIDIA GPU
- Check "All Functions"
- Check "ROM-Bar"
- Check "PCI-Express"
- Set "Primary GPU" if this VM has no other display

		

Verify GPU Inside the VM

After booting the VM, run:

lspci | grep -i nvidia

You should see your GPU listed. If not, check IOMMU groups and VFIO binding on the Proxmox host.

Step 2: Installing NVIDIA Drivers and Docker

With the GPU visible inside the VM, install the required software.

Update the System

			
sudo apt update && sudo apt upgrade -y
sudo reboot

Install NVIDIA Drivers

			
sudo apt install -y nvidia-driver-535 nvidia-utils-535
sudo reboot

Verify GPU Access

nvidia-smi

Expected output shows your RTX 3060 Ti with driver version and CUDA version. If nvidia-smi fails, check that the GPU passthrough is configured correctly on the Proxmox host.

Install Docker

			
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
  sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) \
  signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo $VERSION_CODENAME) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io \
  docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER

		

Install NVIDIA Container Toolkit

			
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

		

Verify GPU in Docker

docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi

If this shows your GPU info inside the container, you are ready to deploy Ollama.

Step 3: Deploying Ollama

Create a project directory:

			
mkdir -p ~/ai-stack
cd ~/ai-stack

Create docker-compose.yml:

			
version: "3.8"
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - /mnt/nas/ollama-models:/root/.ollama
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "3000:8080"
    volumes:
      - /mnt/nas/openwebui-data:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    depends_on:
      - ollama

		

Start the stack:

docker compose up -d

Pull your first model:

docker exec -it ollama ollama pull llama3.1:8b

This downloads the Llama 3.1 8B parameter model, which is an excellent starting point for an 8GB GPU. The download is roughly 4.7GB and will be stored on your NAS mount.

Other Recommended Models for 8GB VRAM

			
ollama pull mistral:7b        # Great for general tasks
ollama pull codellama:7b      # Optimized for coding
ollama pull llama3.1:8b-instruct # Best for chat interactions
ollama pull phi3:mini          # Microsoft's compact model
ollama pull gemma2:9b          # Google's open model

		

Test the Ollama API

			
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1:8b",
  "prompt": "Hello, how are you?",
  "stream": false
}'

		

If you get a JSON response with generated text, Ollama is working.

Step 4: Configuring Open WebUI

Open your browser and navigate to:

http://<VM-IP>:3000

First-Time Setup

Create an admin account (first user automatically becomes admin)
Set a strong password – this is your AI assistant gateway
Open WebUI will auto-detect Ollama at the configured URL

Connecting to Ollama

Open WebUI should automatically connect to Ollama using the OLLAMA_BASE_URL environment variable we set in Docker Compose. Verify by clicking Settings > Connections and confirming the Ollama URL shows http://ollama:11434 with a green status.

Key Settings to Configure

Settings > General: Set default model to llama3.1:8b
Settings > Interface: Enable chat history, code highlighting
Settings > Models: View and manage downloaded models
Settings > Audio: Enable speech-to-text if desired
Settings > Images: Configure image generation if using a compatible model

Creating Custom Modelfiles

You can create specialized assistants using Ollama Modelfiles. Example – A coding assistant:

			
FROM codellama:7b
SYSTEM "You are an expert programmer. You write clean, efficient
code with clear comments. When asked about code, provide working
examples with explanations."
PARAMETER temperature 0.3
PARAMETER num_ctx 4096

		

Save this as coding-assistant.modelfile and create it:

			
docker exec -it ollama ollama create coding-assistant \
  -f /path/to/coding-assistant.modelfile

This model then appears in Open WebUI as a selectable assistant.

Step 5: NAS Storage Integration

Storing models and data on your NAS ensures persistence and makes backups straightforward.

Mount NFS Shares on the VM

Install NFS client:

sudo apt install -y nfs-common

Create mount points:

			
sudo mkdir -p /mnt/nas/ollama-models
sudo mkdir -p /mnt/nas/openwebui-data

Add to /etc/fstab for persistent mounts:

			
168.1.100:/volume1/ai-models  /mnt/nas/ollama-models  nfs  defaults,_netdev  0  0
168.1.100:/volume1/ai-data    /mnt/nas/openwebui-data nfs  defaults,_netdev  0  0

Mount everything:

sudo mount -a

Verify mounts:

df -h | grep nas

Replace 192.168.1.100 with your NAS IP and adjust the share paths to match your NAS configuration (Synology, TrueNAS, etc.).

Important: Make sure Docker containers have write permissions to these mount points. Set ownership if needed:

			
sudo chown -R 1000:1000 /mnt/nas/ollama-models
sudo chown -R 1000:1000 /mnt/nas/openwebui-data

Backup Strategy

NAS snapshots protect model data and conversations
Export Open WebUI settings periodically from the admin panel
Keep docker-compose.yml in a Git repository
Document your Modelfile customizations

Step 6: Networking and Remote Access

Expose Services on Your LAN

By default, the services are accessible at:

Ollama API: http://<VM-IP>:11434
Open WebUI: http://<VM-IP>:3000

To make Ollama accessible to other machines on your network, ensure the Ollama container binds to 0.0.0.0 (default in our Docker Compose config).

Reverse Proxy with Nginx

Install Nginx on the VM (or use a dedicated reverse proxy VM):

sudo apt install -y nginx

Create /etc/nginx/sites-available/ai-assistant:

			
server {
    listen 80;
    server_name ai.homelab.local;
    location / {
        proxy_pass http://localhost:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_read_timeout 300s;
        proxy_send_timeout 300s;
    }
}

		

Enable the site:

			
sudo ln -s /etc/nginx/sites-available/ai-assistant \
  /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx

Add ai.homelab.local to your DNS server or local hosts file pointing to the VM IP address.

SSL with Let’s Encrypt (if publicly accessible)

			
sudo apt install -y certbot python3-certbot-nginx
sudo certbot --nginx -d ai.yourdomain.com

Firewall Rules

			
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw allow 3000/tcp
sudo ufw allow 11434/tcp
sudo ufw enable

		

Performance Tuning and Optimization

GPU Memory Management

The RTX 3060 Ti has 8GB VRAM, which limits model size
Stick to 7B-8B parameter models for best performance
Use Q4_K_M quantized models for the best speed/quality balance
Monitor VRAM usage: nvidia-smi -l 1 (updates every second)

Model Quantization Guide

			
Q4_K_M  - Best balance of speed and quality (recommended)
Q5_K_M  - Slightly better quality, slightly slower
Q8_0    - Near full quality, uses significantly more VRAM
F16     - Full precision, requires 2x the VRAM (not for 8GB cards)

Context Length vs. Speed

Default context: 2048 tokens (fast, limited memory)
Extended context: 4096 tokens (good balance)
Maximum context: 8192+ tokens (slower, more VRAM usage)

Set in your Modelfile or at runtime:

PARAMETER num_ctx 4096

Monitoring Resource Usage

			
watch -n 1 nvidia-smi       # GPU monitoring
htop                         # CPU and RAM monitoring
docker stats                 # Container resource usage
iostat -x 1                  # Disk I/O monitoring

Conclusion

After following this guide, you now have a fully self-hosted AI assistant running on your Proxmox homelab. Your data stays private, your models run locally on your GPU, and you have a clean web interface for interacting with multiple AI models.

The entire stack – Ollama for inference, Open WebUI for the interface, NAS for storage – runs reliably as a set of Docker containers inside a Proxmox VM with GPU passthrough. It survives reboots, updates cleanly, and scales as you add more models.

This is what homelabbing is about: taking control of your own infrastructure and running services that matter to you. A private AI assistant is one of the most practical and rewarding projects you can build today.

Real-World Deployment Tips

Start with small models first: Pull llama3.1:8b before anything else. It fits comfortably in 8GB VRAM and responds fast. Get everything working before experimenting with larger models.
Use NAS storage from day one: Do not store models on the VM’s local disk. When you inevitably rebuild the VM, you will lose hours re-downloading models. NAS storage makes rebuilds trivial.
Pin your Docker image versions: Use specific tags instead of “latest” in production. An unexpected update broke my Open WebUI setup once when the API format changed between versions.
Set OLLAMA_NUM_PARALLEL=1: On an 8GB card, running multiple concurrent requests causes out-of-memory crashes. Limit Ollama to one request at a time with this environment variable.
Monitor VRAM proactively: Add nvidia-smi -l 5 to a tmux session so you always see GPU memory usage. VRAM exhaustion causes silent failures that are hard to debug.
Enable Docker restart policies: The “unless-stopped” restart policy in our Docker Compose file means containers recover automatically after host reboots or crashes.
Test your NFS mounts under load: Some NAS devices throttle NFS under heavy I/O. Run model inference while monitoring NAS performance to catch bottlenecks early.
Keep a shell alias for quick model pulls: Add to your .bashrc: alias opull='docker exec -it ollama ollama pull' Then pulling models is just: opull mistral:7b

Honest Takeaways and Lessons Learned

Local LLMs are not ChatGPT replacements (yet): The 7B-9B models that fit on an 8GB GPU are impressive but noticeably less capable than GPT-4 or Claude for complex reasoning. They excel at drafting, summarization, code completion, and brainstorming. Manage your expectations accordingly.
GPU passthrough is the hardest part: Getting IOMMU groups clean, VFIO binding correct, and the GPU visible inside the VM took more troubleshooting than the entire rest of the stack combined. Once it works, it stays working, but expect 2-4 hours of debugging on your first attempt.
Open WebUI is surprisingly polished: I expected a rough open-source interface. Instead, Open WebUI is genuinely pleasant to use daily. The chat interface, model switching, conversation history, and document upload features rival commercial products.
Storage adds up fast: Each 7B model is 4-5GB. If you start collecting models (and you will), budget 100-200GB of NAS storage. I currently have 12 models taking up 67GB.
The privacy benefit is real: Once you start using a local AI for sensitive queries – tax questions, medical research, private code review – you realize how uncomfortable it was sending that data to third-party servers. This alone justifies the project.
Docker makes everything easier: Without Docker and the NVIDIA Container Toolkit, this setup would involve painful manual dependency management. The containerized approach means clean upgrades and easy rollbacks.
Community models keep getting better: The open-source LLM ecosystem is evolving rapidly. Models that were state-of-the-art six months ago are now outperformed by newer releases. Check Ollama’s model library regularly for improvements.

Common Pitfalls and How to Avoid Them

Pitfall 1: IOMMU Group Conflicts

Problem: Your GPU shares an IOMMU group with other devices.
Solution: Check groups with: find /sys/kernel/iommu_groups/ -type l
If your GPU is not in a clean group, you may need an ACS override patch or a different PCIe slot. Move the GPU to a slot that isolates it in its own IOMMU group.

Pitfall 2: NVIDIA Driver Conflicts on Proxmox Host

Problem: The Proxmox host loads NVIDIA drivers before VFIO can claim the GPU.
Solution: Blacklist nouveau and nvidia in /etc/modprobe.d/blacklist.conf and ensure VFIO modules load first. Add softdep nvidia pre: vfio-pci to modprobe configuration.

Pitfall 3: Docker Cannot See the GPU

Problem: docker run --gpus all fails with “could not select device driver”.
Solution: The NVIDIA Container Toolkit is not installed or not configured. Run:

			
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Pitfall 4: Open WebUI Cannot Connect to Ollama

Problem: Open WebUI shows “Connection failed” for the Ollama backend.
Solution: Ensure both containers are on the same Docker network (Docker Compose handles this automatically). Verify the OLLAMA_BASE_URL is set to http://ollama:11434 (using the container name, not localhost).

Pitfall 5: Models Disappear After VM Reboot

Problem: Downloaded models are gone after restarting the VM.
Solution: The NFS/SMB mount is not persisting across reboots. Add the mount to /etc/fstab with the _netdev option and verify with sudo mount -a after reboot.

Pitfall 6: Out of Memory (OOM) Crashes

Problem: Ollama crashes or returns errors during inference.
Solution: You are likely running a model too large for your VRAM. Stick to 7B-8B models on 8GB cards. Set OLLAMA_NUM_PARALLEL=1 to prevent concurrent requests from exceeding VRAM. Monitor with nvidia-smi.

Pitfall 7: Slow Model Loading from NAS

Problem: Models take a very long time to load initially.
Solution: NFS over a 1Gbps connection is the bottleneck. Models are 4-5GB each, so initial load takes 30-40 seconds. Consider 10Gbps networking or storing frequently-used models on local SSD with NAS as backup.

Pitfall 8: GPU Passthrough Breaks After Proxmox Update

Problem: GPU passthrough stops working after a Proxmox kernel update.
Solution: Kernel updates can change IOMMU behavior. After updates, verify VFIO binding:

			
lspci -nnk -s 27:00
dmesg | grep -i vfio
update-initramfs -u -k all

Always test GPU passthrough after host kernel updates before relying on the AI assistant for important work.

LinkedIn Version

BUILDING A PRIVATE AI ASSISTANT ON MY HOMELAB

I built a self-hosted AI assistant using Ollama and Open WebUI, running on my Proxmox homelab with an NVIDIA RTX 3060 Ti.

Why? Privacy. Control. Learning.

Every prompt I type stays on my network. My models run on my GPU. My conversations are stored on my NAS. Nothing goes to the cloud.

The stack:

Proxmox VE for virtualization
Ubuntu VM with GPU passthrough (PCIe/IOMMU)
Ollama for local LLM inference
Open WebUI for a ChatGPT-like interface
NAS integration for persistent model storage

What surprised me:

Open WebUI is genuinely polished – rivals commercial AI interfaces
GPU passthrough was the hardest part (expect 2-4 hours first time)
7B/8B models on an 8GB GPU are great for daily tasks
The privacy benefit is more significant than I expected

The open-source AI ecosystem has matured to the point where running your own AI assistant is not just possible – it is practical.

If you have a homelab with a spare GPU, this is one of the most rewarding projects you can build right now.

Full setup guide on my blog: NetworkThinkTank.blog

#homelab #AI #selfhosted #Ollama #OpenWebUI #Proxmox #privacy #LLM #artificialintelligence #homelabbing

Social Media Teaser

Just deployed a fully self-hosted AI assistant on my Proxmox homelab using Ollama and Open WebUI – complete with GPU passthrough and NAS storage integration. Every prompt stays private, every model runs locally, and the web interface rivals ChatGPT. Full build guide with Docker configs and real deployment tips on the blog.

Follow-Up Article Ideas

“Scaling Up: Adding a Second GPU to Your Ollama Homelab for Larger Language Models” – Covers multi-GPU passthrough in Proxmox, running 13B and 70B parameter models across multiple GPUs, VRAM pooling strategies, and benchmarking multi-GPU vs. single-GPU inference performance.
“Building a RAG Pipeline: Teaching Your Self-Hosted AI About Your Own Documents” – Covers Retrieval Augmented Generation (RAG) setup with Open WebUI’s document upload feature, embedding models, vector databases (ChromaDB), indexing your personal knowledge base, and making your AI assistant an expert on your own files.
“Hardening Your Self-Hosted AI: Security Best Practices for Homelab LLM Deployments” – Covers network segmentation for AI services, authentication and access control in Open WebUI, SSL/TLS configuration, firewall rules, monitoring for unauthorized access, Docker security hardening, and safely exposing your AI assistant outside your home network with VPN or Cloudflare Tunnel.

Building a Complete CCNA Lab: EVE-NG to Proxmox Migration with Infrastructure as Code

April 22, 2026 itstuffsite Uncategorized

After weeks of building, troubleshooting, and optimizing my CCNA lab environment, I am excited to share the entire project — now fully documented and open-sourced on GitHub. This post walks through the journey from an initial EVE-NG deployment to a fully automated Proxmox-based lab using Terraform, Ansible, and custom shell scripts.

You can find the complete repository here: github.com/jczaldivar71/eve-ng-ccna-lab

Project Overview

The EVE-NG CCNA Lab project started as a straightforward network simulation environment for CCNA study. It quickly evolved into a full infrastructure-as-code project covering:

EVE-NG lab deployment with API-driven automation
Migration from EVE-NG to Proxmox for better performance and scalability
Custom shell scripts for image management, licensing, and node orchestration
A Python script (generate_readme.py) to auto-generate comprehensive documentation
qcow2 disk image optimization achieving a 39% storage reduction
Terraform and Ansible playbooks for reproducible infrastructure deployment

GitHub Documentation and the generate_readme.py Script

One of the key pieces of this project is the generate_readme.py Python script. Rather than manually maintaining a README that would inevitably fall out of sync with the actual project structure, I wrote a script that scans the repository and automatically generates a comprehensive README.md file.

The script inspects every directory — configs/, scripts/, terraform-ansible/, topology/, and images/ — and produces a fully formatted document with a table of contents, script references, setup instructions, and troubleshooting tips. Running it is as simple as:

cd scripts/
python generate_readme.py

The generated README covers 13 sections including Overview, Project Structure, Prerequisites, Quick Start, Lab Topology, Scripts Reference, qcow2 Image Management, EVE-NG API Usage, Proxmox Deployment, Configuration Files, Troubleshooting, Known Limitations, and License information. At 340 lines, it serves as a complete guide for anyone wanting to replicate or build upon this lab.

EVE-NG to Proxmox Migration

EVE-NG is a fantastic network emulation platform, but I ran into limitations around resource management and integration with modern IaC tools. The decision to migrate to Proxmox was driven by several factors:

Better resource control: Proxmox provides fine-grained CPU, memory, and storage allocation through its API
Terraform integration: The Proxmox Terraform provider enables declarative infrastructure definitions
Thin provisioning: Proxmox handles thin-provisioned qcow2 images natively, which was critical for storage optimization
Ansible compatibility: Post-deployment configuration is seamless with Ansible playbooks targeting Proxmox VMs

The migration involved exporting router and switch images from EVE-NG, converting and optimizing the qcow2 disk images, and then redeploying them on Proxmox using Terraform. The entire workflow is captured in the terraform-ansible/ directory of the repository.

Automation Scripts

The scripts/ directory contains six purpose-built shell scripts that automate every aspect of lab management:

eve-ng-api-auth.sh: Handles cookie-based API authentication with EVE-NG, exporting session tokens for use in subsequent API calls. Includes examples for listing labs, getting node details, and starting all nodes.
start-lab-nodes.sh: Automates the process of starting all lab nodes through the EVE-NG REST API with proper sequencing and health checks.
scp-upload-images.sh: Securely transfers qcow2 images to the EVE-NG or Proxmox host via SCP with progress tracking and integrity verification.
qcow2-optimize.sh: The image optimization workhorse — converts, compresses, and thin-provisions qcow2 disk images (more on this below).
fix-permissions.sh: Ensures correct file ownership and permissions on EVE-NG image directories, a common source of lab startup failures.
iol-license-fix.sh: Generates and applies the proper IOL (IOS on Linux) license file, which is required for Cisco IOL images to boot correctly.

Each script is documented with usage instructions and can be run independently or chained together for a complete deployment workflow.

qcow2 Image Optimization

One of the most impactful parts of this project was optimizing the qcow2 disk images. Network appliance images (Cisco IOSv, IOSvL2, CSR1000v, etc.) often ship with significant wasted space — preallocated but unused disk blocks that consume real storage.

The qcow2-optimize.sh script automates a multi-step optimization pipeline:

Sparsification: Uses virt-sparsify to zero out unused blocks within the guest filesystem
Compression: Applies qcow2 internal compression via qemu-img convert -c
Thin provisioning: Ensures metadata is set for thin-provisioned allocation on the hypervisor
Integrity check: Runs qemu-img check to verify image health post-optimization

The results were significant: total image storage dropped from 30GB to 18.3GB — a 39% reduction. This is especially meaningful in a home lab where storage is often limited. The optimized images boot identically to the originals but consume far less disk space on the Proxmox host.

Terraform and Ansible Deployment

The final piece of the puzzle is fully automated deployment using Terraform and Ansible. The terraform-ansible/ directory contains everything needed to stand up the lab from scratch:

Terraform handles the infrastructure provisioning:

VM creation on Proxmox with defined CPU, memory, and disk parameters
Network interface configuration with VLAN tagging
Cloud-init integration for initial bootstrapping
State management for tracking deployed resources

Ansible manages the post-deployment configuration:

init-proxmox.yml: Initializes the Proxmox host with required packages, storage configuration, and network bridges
deploy-vm.yml: Deploys individual VMs with their specific configurations
remove-gateway.yml: Cleans up default gateway routes that can interfere with lab routing exercises

Configuration variables are stored in group_vars/all.yml (with a .sample template provided), and the hosts inventory file defines the Proxmox target. The ansible.cfg sets sensible defaults for host key checking and privilege escalation.

With this setup, spinning up a complete CCNA lab goes from a manual multi-hour process to a single command:

cd terraform-ansible/
terraform init && terraform apply
ansible-playbook -i hosts deploy-vm.yml

What’s Next

This project is a living repository — I plan to continue adding to it as I progress through my CCNA studies and expand the lab. Future additions may include:

Additional topology configurations for specific CCNA exam topics
Integration with network monitoring tools
CI/CD pipeline for automated lab testing
Support for additional platforms (VIRL, GNS3)

If you are studying for the CCNA or building your own home lab, feel free to fork the repository and adapt it to your needs. Contributions and feedback are always welcome.

GitHub Repository: https://github.com/jczaldivar71/eve-ng-ccna-lab

How I Built an AI Network Monitoring Tool (Beginner Friendly)

April 13, 2026April 13, 2026 itstuffsite Uncategorized ai, artificial-intelligence, llm, python, technology

Hey everyone! If you have been following my blog, you know I love combining Python with network engineering. From automating backups with Netmiko to monitoring IP SLAs with DNA Center, I am always looking for ways to make our lives as network engineers easier. Today, I am excited to walk you through my latest project: an AI-Powered Network Health Checker. Don’t worry — this is totally beginner friendly. If you can write a basic Python script, you can follow along!

What Does This Tool Do?

In a nutshell, this tool pulls real-time data from your network devices (think CPU usage, memory utilization, interface errors, etc.), feeds that data into a simple machine learning model, and tells you whether each device is healthy or if there might be an issue. The output is super straightforward — you will see messages like “Device is healthy” or “Potential issue detected.” No PhD in data science required!

Step 1: Pulling Device Data with Python

Just like in my previous posts on network automation, we start by connecting to our devices and grabbing the data we need. I used the Netmiko library to SSH into each device and pull key metrics. Here is a simplified version of the script:

			
from netmiko import ConnectHandler
import re
device = {
    'device_type': 'cisco_ios',
    'host': '192.168.1.1',
    'username': 'admin',
    'password': 'yourpassword',
}
connection = ConnectHandler(**device)
cpu_output = connection.send_command('show processes cpu')
cpu_match = re.search(r'CPU utilization for five seconds: (\\d+)%', cpu_output)
cpu_usage = int(cpu_match.group(1)) if cpu_match else 0
mem_output = connection.send_command('show processes memory')
mem_match = re.search(r'Processor Pool Total:\\s+(\\d+)\\s+Used:\\s+(\\d+)', mem_output)
if mem_match:
    mem_total = int(mem_match.group(1))
    mem_used = int(mem_match.group(2))
    mem_usage = (mem_used / mem_total) * 100
else:
    mem_usage = 0
intf_output = connection.send_command('show interfaces')
error_matches = re.findall(r'(\\d+) input errors', intf_output)
total_errors = sum(int(e) for e in error_matches)
print(f"CPU Usage: {cpu_usage}%")
print(f"Memory Usage: {mem_usage:.1f}%")
print(f"Total Interface Errors: {total_errors}")
connection.disconnect()

		

This script connects to a Cisco IOS device, grabs CPU usage, memory utilization, and interface error counts. You can easily expand this to loop through multiple devices from an inventory file — just like we did in the backup config script project.

Step 2: Building a Simple ML Model for Anomaly Detection

Here is where the AI magic comes in — but I promise it is simpler than it sounds. We are using scikit-learn’s Isolation Forest algorithm, which is perfect for anomaly detection. It learns what “normal” looks like from your data and flags anything that seems off.

			
import numpy as np
from sklearn.ensemble import IsolationForest
training_data = np.array([
    [15, 40, 0], [20, 45, 1], [18, 42, 0],
    [22, 50, 2], [17, 38, 0], [19, 44, 1],
    [21, 47, 0], [16, 41, 1], [20, 43, 0], [18, 46, 2],
])
model = IsolationForest(contamination=0.1, random_state=42)
model.fit(training_data)
new_device_data = np.array([[cpu_usage, mem_usage, total_errors]])
prediction = model.predict(new_device_data)
if prediction[0] == 1:
    print("Device is healthy")
else:
    print("Potential issue detected")

		

The Isolation Forest works by randomly partitioning data points. Anomalies are isolated faster because they are different from the majority of the data. The contamination parameter tells the model roughly what percentage of data points are expected to be anomalies — I set it to 0.1 (10%) as a starting point, but you can tune this for your environment.

Step 3: Putting It All Together

Now let us combine everything into a single script that loops through your devices, pulls the data, and runs it through the model:

			
from netmiko import ConnectHandler
from sklearn.ensemble import IsolationForest
import numpy as np
import re
devices = [
    {'device_type': 'cisco_ios', 'host': '192.168.1.1', 'username': 'admin', 'password': 'yourpassword'},
    {'device_type': 'cisco_ios', 'host': '192.168.1.2', 'username': 'admin', 'password': 'yourpassword'},
]
training_data = np.array([
    [15, 40, 0], [20, 45, 1], [18, 42, 0],
    [22, 50, 2], [17, 38, 0], [19, 44, 1],
    [21, 47, 0], [16, 41, 1], [20, 43, 0], [18, 46, 2],
])
model = IsolationForest(contamination=0.1, random_state=42)
model.fit(training_data)
def get_device_metrics(device):
    connection = ConnectHandler(**device)
    cpu_output = connection.send_command('show processes cpu')
    cpu_match = re.search(r'CPU utilization for five seconds: (\\d+)%', cpu_output)
    cpu_usage = int(cpu_match.group(1)) if cpu_match else 0
    mem_output = connection.send_command('show processes memory')
    mem_match = re.search(r'Processor Pool Total:\\s+(\\d+)\\s+Used:\\s+(\\d+)', mem_output)
    mem_usage = (int(mem_match.group(2)) / int(mem_match.group(1))) * 100 if mem_match else 0
    intf_output = connection.send_command('show interfaces')
    error_matches = re.findall(r'(\\d+) input errors', intf_output)
    total_errors = sum(int(e) for e in error_matches)
    connection.disconnect()
    return [cpu_usage, mem_usage, total_errors]
for device in devices:
    print(f"Checking device: {device['host']}")
    metrics = get_device_metrics(device)
    print(f"  CPU: {metrics[0]}% | Memory: {metrics[1]:.1f}% | Errors: {metrics[2]}")
    prediction = model.predict(np.array([metrics]))
    if prediction[0] == 1:
        print("  Status: Device is healthy")
    else:
        print("  Status: Potential issue detected")

		

Here is what the output looks like:

			
Checking device: 192.168.1.1
  CPU: 18% | Memory: 43.2% | Errors: 0
  Status: Device is healthy
Checking device: 192.168.1.2
  CPU: 85% | Memory: 92.1% | Errors: 47
  Status: Potential issue detected

		

What’s Next?

This is just the starting point. Here are some ideas to take it further:

Add more metrics like uplink bandwidth utilization or BGP neighbor status
Save your model to a file using joblib so you don’t retrain every time
Set up a cron job or scheduled task to run the script at regular intervals
Send alerts via email or Slack when an issue is detected
Build a simple dashboard with Flask to visualize device health

Get the Code

I have uploaded the full project to my GitHub repo. Feel free to clone it, play around with it, and make it your own:

GitHub: https://github.com/NetworkThinkTank-Labs/ai-network-health-checker

Final Thoughts

If you are a network engineer who is curious about AI and machine learning, this is a great beginner project to get your feet wet. You don’t need to understand every detail of how Isolation Forest works under the hood — just know that it is a tool that can help you spot problems before they become outages.

As always, if you have questions or want to share how you have customized this for your own network, drop a comment below or reach out to me. Happy automating!

Automating Network Device Backups with Python and Netmiko

April 10, 2026 itstuffsite Uncategorized ai, artificial-intelligence, cloud, linux, technology

Backing up network device configurations is one of the most critical tasks in network administration. A missed backup could mean hours of manual reconfiguration after a failure. In this post, we will walk through a Python script that automates this process by connecting to a router or switch via SSH and saving the running-config to a local file.
We will use Netmiko, a popular Python library that simplifies SSH connections to network devices. Whether you manage a handful of switches or hundreds of routers, this script gives you a repeatable, automated way to capture configurations on demand.

Prerequisites

Before getting started, make sure you have:

Python 3.8 or higher installed
SSH access to your target network device
Device credentials (username and password)
The Netmiko library installed

To install Netmiko, run:

pip install netmiko

You can also clone the full project repository and install from the requirements file:

git clone https://github.com/NetworkThinkTank-Labs/backup-config-script.git
cd backup-config-script
pip install -r requirements.txt

Connecting to a Router or Switch

The core of our script uses Netmiko’s ConnectHandler to establish an SSH session. You provide the device type, hostname or IP address, and your credentials. Netmiko handles the SSH negotiation and drops you into an authenticated session.

Here is the connection function from our script:

def connect_to_device(host, username, password, device_type, port=22, enable_secret=None):
device = {
‘device_type’: device_type,
‘host’: host,
‘username’: username,
‘password’: password,
‘port’: port,
}
if enable_secret:
device[‘secret’] = enable_secret
connection = ConnectHandler(**device)
if enable_secret:
connection.enable()
return connection

Netmiko supports a wide range of device types including cisco_ios, cisco_nxos, arista_eos, and juniper_junos. You simply pass the appropriate device type string and Netmiko adapts its behavior accordingly.

Retrieving the Running Configuration

Once connected, pulling the running configuration is a single method call. We use send_command to execute show running-config on the device and capture the output as a string:

def backup_running_config(connection):
running_config = connection.send_command(‘show running-config’)
return running_config

Netmiko handles paging automatically, so even if your configuration is long, you will get the complete output without needing to send space or press Enter to page through it.

Saving the Configuration to a File

With the configuration captured in memory, the next step is writing it to a file. Our script creates a backups directory automatically and saves each configuration with a timestamped filename so you never overwrite a previous backup:

def save_config_to_file(config, hostname, output_dir=’backups’):
os.makedirs(output_dir, exist_ok=True)
timestamp = datetime.now().strftime(‘%Y-%m-%d_%H-%M-%S’)
safe_hostname = hostname.replace(‘.’, ‘‘).replace(‘:’, ‘‘)
filename = f'{safe_hostname}running-config{timestamp}.txt’
filepath = os.path.join(output_dir, filename)
with open(filepath, ‘w’) as f:
f.write(config)
return filepath

This produces backup files like:

backups/192_168_1_1_running-config_2026-04-10_14-30-00.txt

Running the Script

The script accepts command-line arguments so you can target any device without editing the code:

python backup_config.py –host 192.168.1.1 –username admin –password mypassword

You can also specify a different device type, SSH port, output directory, or enable password:

python backup_config.py –host 10.0.0.1 –username admin –password mypassword –device-type cisco_nxos –output-dir /var/backups/network

When you run the script, you will see output like this:

[] Connecting to 192.168.1.1:22 (cisco_ios)… [+] Successfully connected to 192.168.1.1 [] Retrieving running-config…
[+] Retrieved 15234 characters of configuration
[+] Configuration saved to backups/192_168_1_1_running-config_2026-04-10_14-30-00.txt
[+] Disconnected. Backup complete!

What is Next

This script provides a solid foundation for network configuration backups. Here are some ideas for extending it:

Loop through a list of devices from a CSV or YAML inventory file to back up your entire network in one run
Schedule the script with cron (Linux) or Task Scheduler (Windows) for automatic daily backups
Add email or webhook notifications on success or failure
Compare configurations between backups to detect unauthorized changes using difflib
Integrate with Git to version-control your configurations automatically

Wrapping Up

Automating network device backups does not have to be complicated. With Python and Netmiko, you can connect to any router or switch, pull the running configuration, and save it to a timestamped file in just a few lines of code.

Check out the full source code and setup instructions in the GitHub repository: https://github.com/NetworkThinkTank-Labs/backup-config-script

If you found this useful, check out my other posts on network automation including Monitoring IP SLAs with Python, DNA Center, and NetBox and Build a Home Lab Like a Pro. Stay tuned for more content from the NetworkThinkTank!