OpenClaw Health Monitoring System: Automated Service Health Checks with Discord Alerts

Date: May 10, 2026 | Tags: OpenClaw, Health Monitoring, Proxmox, systemd, Discord, Automation

Overview

The OpenClaw Health Monitor is a systemd-based health checking system that runs on the Proxmox host (10.10.20.252). It monitors critical OpenClaw services every 15 minutes and sends alerts to Discord when issues are detected. This post covers the architecture, components, and management of the health monitoring system.

How It Works

The health monitoring system uses a systemd timer (openclaw-health-alerts.timer) that triggers a Python health check script every 15 minutes. The script checks each monitored service via HTTP requests and reports any failures to Discord through a relay server.

The alert flow is as follows:

  1. Timer triggers the service every 15 minutes
  2. 2. Health check script checks each monitored service
  3. 3. If any service is CRITICAL, an alert payload is built
  4. 4. Alert is sent via HTTP POST to the relay server
  5. 5. Relay forwards the payload to Discord webhook
  6. 6. Success is indicated by HTTP 204 response from Discord

Components

Systemd Timer

  • Unit: openclaw-health-alerts.timer
  • – Schedule: Every 15 minutes
  • – Triggers: openclaw-health-alerts.service
  • – Location: /etc/systemd/system/openclaw-health-alerts.timer

Health Check Script

  • Location: /opt/openclaw/scripts/openclaw_health_alerts.py
  • – Log File: /var/log/openclaw/health_alerts.log

Monitored Services

The following services are monitored:

  • openclaw-gateway (10.10.20.207, port 15750) – HTTP 200 check
  • – ollama (10.10.20.29, port 11434) – HTTP 200 check
  • – openwebui (10.10.29.29, port 3000) – HTTP 200 check
  • – openclaw-agent (CT 206, port 3000) – get_exec > HTTP 200 check

Discord Alerting

Alerts are sent to Discord via a relay server:

  • Relay URL: http://10.10.20.36:8888/relay
  • – Relay Host: Windows machine at 10.10.20.36
  • – Relay Port: TCP 8888
  • – The relay forwards health alert payloads to the Discord webhook endpoint.

Status Indicators

  • OK: Service responding with HTTP 200
  • – CRITICAL: Service unreachable or not responding

Management Commands

Check timer status:

systemctl status openclaw-health-alerts.timer –no-pager

List active timers:

systemctl list-timers | grep -i openclaw

View recent service logs (last 80 lines):

journalctl -u openclaw-health-alerts.service -n 80 –no-pager

Manually trigger a health check:

systemctl start openclaw-health-alerts.service

Enable/disable the timer:

systemctl enable openclaw-health-alerts.timer

systemctl disable openclaw-health-alerts.timer

Troubleshooting

Timer not running:

systemctl start openclaw-health-alerts.timer

systemctl enable openclaw-health-alerts.timer

Discord alerts not sending:

  1. Check relay server is running on 10.10.20.36:8888
  2. 2. Verify Windows firewall allows inbound TCP 8888
  3. 3. Check Discord webhook URL is valid (not expired/deleted)

Service showing CRITICAL:

  1. SSH to the relevant host
  2. 2. Check if the service container/process is running
  3. 3. Verify network connectivity between Proxmox and service host

The full documentation is available in the NetworkThinkTank-Labs GitHub repository in the README-OpenClaw-Health.md file.

Leave a Reply