By NetworkThinkTank | April 23, 2026
Introduction
I got tired of sending my data to cloud AI services. Every prompt I typed into ChatGPT or Claude was being stored, analyzed, and used for training. For personal questions, code snippets with API keys, and private brainstorming sessions, that never sat well with me.
So I built my own. A fully self-hosted AI assistant running on my Proxmox homelab, powered by Ollama for local LLM inference and Open WebUI for a polished ChatGPT-like interface. The models run on my own NVIDIA GPU, the data stays on my NAS, and nothing leaves my network.
This guide walks you through exactly how I did it – from VM creation to pulling your first model and chatting with it through a clean web interface. If you have a Proxmox server and a spare GPU, you can have this running in an afternoon.
Prerequisites and Hardware Requirements
Here is what you need before starting:
Hardware
- Proxmox VE host (version 7.x or 8.x)
- NVIDIA GPU with at least 8GB VRAM (I use an RTX 3060 Ti 8GB)
- Minimum 16GB RAM allocated to the AI VM (32GB recommended)
- NAS with NFS or SMB shares available (Synology, TrueNAS, etc.)
- At least 50GB free storage for models (100GB+ recommended)
Software
- Proxmox VE installed and running
- Ubuntu Server 22.04 or 24.04 LTS ISO
- Docker and Docker Compose
- NVIDIA drivers (535+ recommended)
- NVIDIA Container Toolkit
Network
- Static IP or DHCP reservation for the AI VM
- Access to your NAS from the VM subnet
- Optional: domain name for reverse proxy
Architecture Overview
The stack looks like this:

All AI processing happens locally on the GPU inside the VM. Open WebUI provides the browser-based chat interface and connects to Ollama’s API on the backend. The NAS stores all model files and conversation data so nothing is lost if the VM needs rebuilding.
Step 1: Preparing the Proxmox VM
First, create a new VM in Proxmox optimized for AI workloads.
VM Configuration
- VM ID: 205
- Name: ollama-gpu
- OS: Ubuntu Server 22.04 LTS
- CPU: host type, 8 cores
- RAM: 16GB minimum (I use 32GB)
- Disk: 100GB on local-lvm (SSD preferred)
- Network: vmbr0, bridge mode
- BIOS: OVMF (UEFI) for GPU passthrough
Enabling IOMMU on the Proxmox Host
Edit GRUB configuration:
nano /etc/default/grub
For AMD CPUs, set:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"
For Intel CPUs, set:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
Update GRUB and reboot:
update-grubreboot
Add VFIO modules to /etc/modules:
vfiovfio_iommu_type1vfio_pcivfio_virqfd
Blacklist the NVIDIA drivers on the host:
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.confecho "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf
Find your GPU’s PCI IDs:
lspci -nn | grep NVIDIA
Example output:
27:00.0 VGA compatible controller [0300]: NVIDIA [10de:2489]27:00.1 Audio device [0403]: NVIDIA [10de:228b]
Bind to VFIO:
echo "options vfio-pci ids=10de:2489,10de:228b" > /etc/modprobe.d/vfio.conf
Update initramfs and reboot:
update-initramfs -u -k allreboot
Add GPU to the VM
In the Proxmox web UI:
VM 205 > Hardware > Add > PCI Device- Select your NVIDIA GPU- Check "All Functions"- Check "ROM-Bar"- Check "PCI-Express"- Set "Primary GPU" if this VM has no other display
Verify GPU Inside the VM
After booting the VM, run:
lspci | grep -i nvidia
You should see your GPU listed. If not, check IOMMU groups and VFIO binding on the Proxmox host.
Step 2: Installing NVIDIA Drivers and Docker
With the GPU visible inside the VM, install the required software.
Update the System
sudo apt update && sudo apt upgrade -ysudo reboot
Install NVIDIA Drivers
sudo apt install -y nvidia-driver-535 nvidia-utils-535sudo reboot
Verify GPU Access
nvidia-smi
Expected output shows your RTX 3060 Ti with driver version and CUDA version. If nvidia-smi fails, check that the GPU passthrough is configured correctly on the Proxmox host.
Install Docker
sudo apt install -y ca-certificates curl gnupgsudo install -m 0755 -d /etc/apt/keyringscurl -fsSL https://download.docker.com/linux/ubuntu/gpg | \ sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpgecho "deb [arch=$(dpkg --print-architecture) \ signed-by=/etc/apt/keyrings/docker.gpg] \ https://download.docker.com/linux/ubuntu \ $(. /etc/os-release && echo $VERSION_CODENAME) stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/nullsudo apt updatesudo apt install -y docker-ce docker-ce-cli containerd.io \ docker-buildx-plugin docker-compose-pluginsudo usermod -aG docker $USER
Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \ sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpgcurl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.listsudo apt updatesudo apt install -y nvidia-container-toolkitsudo nvidia-ctk runtime configure --runtime=dockersudo systemctl restart docker
Verify GPU in Docker
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
If this shows your GPU info inside the container, you are ready to deploy Ollama.
Step 3: Deploying Ollama
Create a project directory:
mkdir -p ~/ai-stackcd ~/ai-stack
Create docker-compose.yml:
version: "3.8"services: ollama: image: ollama/ollama:latest container_name: ollama restart: unless-stopped ports: - "11434:11434" volumes: - /mnt/nas/ollama-models:/root/.ollama environment: - NVIDIA_VISIBLE_DEVICES=all deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] open-webui: image: ghcr.io/open-webui/open-webui:main container_name: open-webui restart: unless-stopped ports: - "3000:8080" volumes: - /mnt/nas/openwebui-data:/app/backend/data environment: - OLLAMA_BASE_URL=http://ollama:11434 depends_on: - ollama
Start the stack:
docker compose up -d
Pull your first model:
docker exec -it ollama ollama pull llama3.1:8b
This downloads the Llama 3.1 8B parameter model, which is an excellent starting point for an 8GB GPU. The download is roughly 4.7GB and will be stored on your NAS mount.
Other Recommended Models for 8GB VRAM
ollama pull mistral:7b # Great for general tasksollama pull codellama:7b # Optimized for codingollama pull llama3.1:8b-instruct # Best for chat interactionsollama pull phi3:mini # Microsoft's compact modelollama pull gemma2:9b # Google's open model
Test the Ollama API
curl http://localhost:11434/api/generate -d '{ "model": "llama3.1:8b", "prompt": "Hello, how are you?", "stream": false}'
If you get a JSON response with generated text, Ollama is working.
Step 4: Configuring Open WebUI
Open your browser and navigate to:
http://<VM-IP>:3000
First-Time Setup
- Create an admin account (first user automatically becomes admin)
- Set a strong password – this is your AI assistant gateway
- Open WebUI will auto-detect Ollama at the configured URL
Connecting to Ollama
Open WebUI should automatically connect to Ollama using the OLLAMA_BASE_URL environment variable we set in Docker Compose. Verify by clicking Settings > Connections and confirming the Ollama URL shows http://ollama:11434 with a green status.
Key Settings to Configure
- Settings > General: Set default model to llama3.1:8b
- Settings > Interface: Enable chat history, code highlighting
- Settings > Models: View and manage downloaded models
- Settings > Audio: Enable speech-to-text if desired
- Settings > Images: Configure image generation if using a compatible model
Creating Custom Modelfiles
You can create specialized assistants using Ollama Modelfiles. Example – A coding assistant:
FROM codellama:7bSYSTEM "You are an expert programmer. You write clean, efficientcode with clear comments. When asked about code, provide workingexamples with explanations."PARAMETER temperature 0.3PARAMETER num_ctx 4096
Save this as coding-assistant.modelfile and create it:
docker exec -it ollama ollama create coding-assistant \ -f /path/to/coding-assistant.modelfile
This model then appears in Open WebUI as a selectable assistant.
Step 5: NAS Storage Integration
Storing models and data on your NAS ensures persistence and makes backups straightforward.
Mount NFS Shares on the VM
Install NFS client:
sudo apt install -y nfs-common
Create mount points:
sudo mkdir -p /mnt/nas/ollama-modelssudo mkdir -p /mnt/nas/openwebui-data
Add to /etc/fstab for persistent mounts:
192.168.1.100:/volume1/ai-models /mnt/nas/ollama-models nfs defaults,_netdev 0 0192.168.1.100:/volume1/ai-data /mnt/nas/openwebui-data nfs defaults,_netdev 0 0
Mount everything:
sudo mount -a
Verify mounts:
df -h | grep nas
Replace 192.168.1.100 with your NAS IP and adjust the share paths to match your NAS configuration (Synology, TrueNAS, etc.).
Important: Make sure Docker containers have write permissions to these mount points. Set ownership if needed:
sudo chown -R 1000:1000 /mnt/nas/ollama-modelssudo chown -R 1000:1000 /mnt/nas/openwebui-data
Backup Strategy
- NAS snapshots protect model data and conversations
- Export Open WebUI settings periodically from the admin panel
- Keep docker-compose.yml in a Git repository
- Document your Modelfile customizations
Step 6: Networking and Remote Access
Expose Services on Your LAN
By default, the services are accessible at:
- Ollama API:
http://<VM-IP>:11434 - Open WebUI:
http://<VM-IP>:3000
To make Ollama accessible to other machines on your network, ensure the Ollama container binds to 0.0.0.0 (default in our Docker Compose config).
Reverse Proxy with Nginx
Install Nginx on the VM (or use a dedicated reverse proxy VM):
sudo apt install -y nginx
Create /etc/nginx/sites-available/ai-assistant:
server { listen 80; server_name ai.homelab.local; location / { proxy_pass http://localhost:3000; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; proxy_read_timeout 300s; proxy_send_timeout 300s; }}
Enable the site:
sudo ln -s /etc/nginx/sites-available/ai-assistant \ /etc/nginx/sites-enabled/sudo nginx -tsudo systemctl reload nginx
Add ai.homelab.local to your DNS server or local hosts file pointing to the VM IP address.
SSL with Let’s Encrypt (if publicly accessible)
sudo apt install -y certbot python3-certbot-nginxsudo certbot --nginx -d ai.yourdomain.com
Firewall Rules
sudo ufw allow 80/tcpsudo ufw allow 443/tcpsudo ufw allow 3000/tcpsudo ufw allow 11434/tcpsudo ufw enable
Performance Tuning and Optimization
GPU Memory Management
- The RTX 3060 Ti has 8GB VRAM, which limits model size
- Stick to 7B-8B parameter models for best performance
- Use Q4_K_M quantized models for the best speed/quality balance
- Monitor VRAM usage:
nvidia-smi -l 1(updates every second)
Model Quantization Guide
Q4_K_M - Best balance of speed and quality (recommended)Q5_K_M - Slightly better quality, slightly slowerQ8_0 - Near full quality, uses significantly more VRAMF16 - Full precision, requires 2x the VRAM (not for 8GB cards)
Context Length vs. Speed
- Default context: 2048 tokens (fast, limited memory)
- Extended context: 4096 tokens (good balance)
- Maximum context: 8192+ tokens (slower, more VRAM usage)
Set in your Modelfile or at runtime:
PARAMETER num_ctx 4096
Monitoring Resource Usage
watch -n 1 nvidia-smi # GPU monitoringhtop # CPU and RAM monitoringdocker stats # Container resource usageiostat -x 1 # Disk I/O monitoring
Conclusion
After following this guide, you now have a fully self-hosted AI assistant running on your Proxmox homelab. Your data stays private, your models run locally on your GPU, and you have a clean web interface for interacting with multiple AI models.
The entire stack – Ollama for inference, Open WebUI for the interface, NAS for storage – runs reliably as a set of Docker containers inside a Proxmox VM with GPU passthrough. It survives reboots, updates cleanly, and scales as you add more models.
This is what homelabbing is about: taking control of your own infrastructure and running services that matter to you. A private AI assistant is one of the most practical and rewarding projects you can build today.
Real-World Deployment Tips
- Start with small models first: Pull llama3.1:8b before anything else. It fits comfortably in 8GB VRAM and responds fast. Get everything working before experimenting with larger models.
- Use NAS storage from day one: Do not store models on the VM’s local disk. When you inevitably rebuild the VM, you will lose hours re-downloading models. NAS storage makes rebuilds trivial.
- Pin your Docker image versions: Use specific tags instead of “latest” in production. An unexpected update broke my Open WebUI setup once when the API format changed between versions.
- Set OLLAMA_NUM_PARALLEL=1: On an 8GB card, running multiple concurrent requests causes out-of-memory crashes. Limit Ollama to one request at a time with this environment variable.
- Monitor VRAM proactively: Add
nvidia-smi -l 5to a tmux session so you always see GPU memory usage. VRAM exhaustion causes silent failures that are hard to debug. - Enable Docker restart policies: The “unless-stopped” restart policy in our Docker Compose file means containers recover automatically after host reboots or crashes.
- Test your NFS mounts under load: Some NAS devices throttle NFS under heavy I/O. Run model inference while monitoring NAS performance to catch bottlenecks early.
- Keep a shell alias for quick model pulls: Add to your .bashrc:
alias opull='docker exec -it ollama ollama pull'Then pulling models is just:opull mistral:7b
Honest Takeaways and Lessons Learned
- Local LLMs are not ChatGPT replacements (yet): The 7B-9B models that fit on an 8GB GPU are impressive but noticeably less capable than GPT-4 or Claude for complex reasoning. They excel at drafting, summarization, code completion, and brainstorming. Manage your expectations accordingly.
- GPU passthrough is the hardest part: Getting IOMMU groups clean, VFIO binding correct, and the GPU visible inside the VM took more troubleshooting than the entire rest of the stack combined. Once it works, it stays working, but expect 2-4 hours of debugging on your first attempt.
- Open WebUI is surprisingly polished: I expected a rough open-source interface. Instead, Open WebUI is genuinely pleasant to use daily. The chat interface, model switching, conversation history, and document upload features rival commercial products.
- Storage adds up fast: Each 7B model is 4-5GB. If you start collecting models (and you will), budget 100-200GB of NAS storage. I currently have 12 models taking up 67GB.
- The privacy benefit is real: Once you start using a local AI for sensitive queries – tax questions, medical research, private code review – you realize how uncomfortable it was sending that data to third-party servers. This alone justifies the project.
- Docker makes everything easier: Without Docker and the NVIDIA Container Toolkit, this setup would involve painful manual dependency management. The containerized approach means clean upgrades and easy rollbacks.
- Community models keep getting better: The open-source LLM ecosystem is evolving rapidly. Models that were state-of-the-art six months ago are now outperformed by newer releases. Check Ollama’s model library regularly for improvements.
Common Pitfalls and How to Avoid Them
Pitfall 1: IOMMU Group Conflicts
Problem: Your GPU shares an IOMMU group with other devices.
Solution: Check groups with: find /sys/kernel/iommu_groups/ -type l
If your GPU is not in a clean group, you may need an ACS override patch or a different PCIe slot. Move the GPU to a slot that isolates it in its own IOMMU group.
Pitfall 2: NVIDIA Driver Conflicts on Proxmox Host
Problem: The Proxmox host loads NVIDIA drivers before VFIO can claim the GPU.
Solution: Blacklist nouveau and nvidia in /etc/modprobe.d/blacklist.conf and ensure VFIO modules load first. Add softdep nvidia pre: vfio-pci to modprobe configuration.
Pitfall 3: Docker Cannot See the GPU
Problem: docker run --gpus all fails with “could not select device driver”.
Solution: The NVIDIA Container Toolkit is not installed or not configured. Run:
sudo nvidia-ctk runtime configure --runtime=dockersudo systemctl restart docker
Pitfall 4: Open WebUI Cannot Connect to Ollama
Problem: Open WebUI shows “Connection failed” for the Ollama backend.
Solution: Ensure both containers are on the same Docker network (Docker Compose handles this automatically). Verify the OLLAMA_BASE_URL is set to http://ollama:11434 (using the container name, not localhost).
Pitfall 5: Models Disappear After VM Reboot
Problem: Downloaded models are gone after restarting the VM.
Solution: The NFS/SMB mount is not persisting across reboots. Add the mount to /etc/fstab with the _netdev option and verify with sudo mount -a after reboot.
Pitfall 6: Out of Memory (OOM) Crashes
Problem: Ollama crashes or returns errors during inference.
Solution: You are likely running a model too large for your VRAM. Stick to 7B-8B models on 8GB cards. Set OLLAMA_NUM_PARALLEL=1 to prevent concurrent requests from exceeding VRAM. Monitor with nvidia-smi.
Pitfall 7: Slow Model Loading from NAS
Problem: Models take a very long time to load initially.
Solution: NFS over a 1Gbps connection is the bottleneck. Models are 4-5GB each, so initial load takes 30-40 seconds. Consider 10Gbps networking or storing frequently-used models on local SSD with NAS as backup.
Pitfall 8: GPU Passthrough Breaks After Proxmox Update
Problem: GPU passthrough stops working after a Proxmox kernel update.
Solution: Kernel updates can change IOMMU behavior. After updates, verify VFIO binding:
lspci -nnk -s 27:00dmesg | grep -i vfioupdate-initramfs -u -k all
Always test GPU passthrough after host kernel updates before relying on the AI assistant for important work.
LinkedIn Version
BUILDING A PRIVATE AI ASSISTANT ON MY HOMELAB
I built a self-hosted AI assistant using Ollama and Open WebUI, running on my Proxmox homelab with an NVIDIA RTX 3060 Ti.
Why? Privacy. Control. Learning.
Every prompt I type stays on my network. My models run on my GPU. My conversations are stored on my NAS. Nothing goes to the cloud.
The stack:
- Proxmox VE for virtualization
- Ubuntu VM with GPU passthrough (PCIe/IOMMU)
- Ollama for local LLM inference
- Open WebUI for a ChatGPT-like interface
- NAS integration for persistent model storage
What surprised me:
- Open WebUI is genuinely polished – rivals commercial AI interfaces
- GPU passthrough was the hardest part (expect 2-4 hours first time)
- 7B/8B models on an 8GB GPU are great for daily tasks
- The privacy benefit is more significant than I expected
The open-source AI ecosystem has matured to the point where running your own AI assistant is not just possible – it is practical.
If you have a homelab with a spare GPU, this is one of the most rewarding projects you can build right now.
Full setup guide on my blog: NetworkThinkTank.blog
#homelab #AI #selfhosted #Ollama #OpenWebUI #Proxmox #privacy #LLM #artificialintelligence #homelabbing
Social Media Teaser
Just deployed a fully self-hosted AI assistant on my Proxmox homelab using Ollama and Open WebUI – complete with GPU passthrough and NAS storage integration. Every prompt stays private, every model runs locally, and the web interface rivals ChatGPT. Full build guide with Docker configs and real deployment tips on the blog.
Follow-Up Article Ideas
- “Scaling Up: Adding a Second GPU to Your Ollama Homelab for Larger Language Models” – Covers multi-GPU passthrough in Proxmox, running 13B and 70B parameter models across multiple GPUs, VRAM pooling strategies, and benchmarking multi-GPU vs. single-GPU inference performance.
- “Building a RAG Pipeline: Teaching Your Self-Hosted AI About Your Own Documents” – Covers Retrieval Augmented Generation (RAG) setup with Open WebUI’s document upload feature, embedding models, vector databases (ChromaDB), indexing your personal knowledge base, and making your AI assistant an expert on your own files.
- “Hardening Your Self-Hosted AI: Security Best Practices for Homelab LLM Deployments” – Covers network segmentation for AI services, authentication and access control in Open WebUI, SSL/TLS configuration, firewall rules, monitoring for unauthorized access, Docker security hardening, and safely exposing your AI assistant outside your home network with VPN or Cloudflare Tunnel.