Loading...
Loading...
Configures automated infrastructure monitoring with mobile alerts (ntfy.sh and Home Assistant) and implements auto-recovery for common failures. Use when setting up monitoring, configuring mobile notifications, enabling auto-recovery, or troubleshooting alert delivery. Triggers on "setup monitoring", "configure alerts", "mobile notifications", "enable auto-recovery", "monitoring not working", or "not getting alerts". Works with ntfy.sh push notifications, Docker container health checks, Bash monitoring scripts, and optional Home Assistant automation integration.
npx skill4agent add dawiddutoit/custom-claude infrastructure-monitoring-setup# 1. Create unique ntfy topic
TOPIC="infra-$(openssl rand -hex 8)"
echo "Your topic: $TOPIC"
# 2. Add to .env
echo "ALERT_ENABLED=true" >> /home/dawiddutoit/projects/network/.env
echo "NTFY_SERVER=https://ntfy.sh" >> /home/dawiddutoit/projects/network/.env
echo "NTFY_TOPIC=$TOPIC" >> /home/dawiddutoit/projects/network/.env
echo "AUTO_RECOVER=true" >> /home/dawiddutoit/projects/network/.env
# 3. Install systemd service
sudo cp /home/dawiddutoit/projects/network/systemd/infrastructure-monitor.* /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now infrastructure-monitor.timer
# 4. Test
/home/dawiddutoit/projects/network/scripts/infrastructure-monitor.shinfra-YOUR-RANDOM-IDhttps://ntfy.shTOPIC="infra-$(openssl rand -hex 8)"
echo "Your unique topic: $TOPIC"# Navigate to project directory
cd /home/dawiddutoit/projects/network
# Add monitoring variables
cat >> .env << EOF
# Monitoring & Alerts
ALERT_ENABLED=true
NTFY_SERVER=https://ntfy.sh
NTFY_TOPIC=$TOPIC
AUTO_RECOVER=true
EOFgrep -A4 "Monitoring & Alerts" /home/dawiddutoit/projects/network/.env# Monitoring & Alerts
ALERT_ENABLED=true
NTFY_SERVER=https://ntfy.sh
NTFY_TOPIC=infra-a3f7d92b4c8e1f56
AUTO_RECOVER=true| Variable | Purpose | Default |
|---|---|---|
| Enable mobile push notifications | |
| ntfy.sh server URL | |
| Unique topic for your alerts | None (required) |
| Enable automatic recovery | |
# Edit .env
nano /home/dawiddutoit/projects/network/.env
# Change: AUTO_RECOVER=false# Copy service files
sudo cp /home/dawiddutoit/projects/network/systemd/infrastructure-monitor.service /etc/systemd/system/
sudo cp /home/dawiddutoit/projects/network/systemd/infrastructure-monitor.timer /etc/systemd/system/
# Reload systemd
sudo systemctl daemon-reload
# Enable and start timer
sudo systemctl enable infrastructure-monitor.timer
sudo systemctl start infrastructure-monitor.timer# Check timer status
systemctl list-timers infrastructure-monitor.timer
# Check service status
sudo systemctl status infrastructure-monitor.timer● infrastructure-monitor.timer - Run infrastructure monitoring every 5 minutes
Loaded: loaded (/etc/systemd/system/infrastructure-monitor.timer; enabled)
Active: active (waiting) since...# Run monitoring manually
/home/dawiddutoit/projects/network/scripts/infrastructure-monitor.sh# Test sending to topic directly
curl -d "Test from infrastructure monitoring" https://ntfy.sh/$TOPICnotify.mobile_app_your_phone# Edit .env
nano /home/dawiddutoit/projects/network/.env
# Add HA configuration
HA_NOTIFICATIONS_ENABLED=true
HA_BASE_URL=http://192.168.68.123:8123
HA_ACCESS_TOKEN=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
HA_NOTIFY_SERVICE=notify.mobile_app_your_phone# Run monitoring (should send to both ntfy and HA)
/home/dawiddutoit/projects/network/scripts/infrastructure-monitor.sh# Test HA API access
curl -H "Authorization: Bearer YOUR_TOKEN" \
http://192.168.68.123:8123/api/
# Test notification service
curl -X POST \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"message": "Test from infrastructure monitoring"}' \
http://192.168.68.123:8123/api/services/notify/mobile_app_your_phone# View live monitoring logs
sudo journalctl -u infrastructure-monitor.service -f
# Or check persistent log
tail -f /var/log/infrastructure-monitor.log| Issue | Detection | Recovery Action |
|---|---|---|
| Stuck cloudflared | No registrations in 10 min | Restart cloudflared container |
| Docker network isolation | Ping fails between containers | Recreate bridge network |
| Inactive Ethernet | WiFi used instead of eth0 | Activate Ethernet connection |
| Service failures | HTTP health checks fail | Restart affected containers |
# Simulate stuck tunnel
docker stop cloudflared
# Wait 5 minutes (next monitoring run)
# Check logs - should show tunnel restarted
# Verify tunnel recovered
docker ps | grep cloudflared
docker logs cloudflared | grep "Registered tunnel"# Live monitoring logs
sudo journalctl -u infrastructure-monitor.service -f
# Last 50 lines
sudo journalctl -u infrastructure-monitor.service -n 50
# Logs from today
sudo journalctl -u infrastructure-monitor.service --since today
# Logs with timestamps
sudo journalctl -u infrastructure-monitor.service -o short-iso# Live tail
tail -f /var/log/infrastructure-monitor.log
# Last 100 lines
tail -100 /var/log/infrastructure-monitor.log
# Search for errors
grep -i error /var/log/infrastructure-monitor.log
# Search for recoveries
grep -i "recovered" /var/log/infrastructure-monitor.log# Show next run time
systemctl list-timers infrastructure-monitor.timer
# Show timer configuration
systemctl cat infrastructure-monitor.timer# Stop monitoring temporarily
sudo systemctl stop infrastructure-monitor.timer
# Restart monitoring
sudo systemctl start infrastructure-monitor.timer
# Disable monitoring (survives reboot)
sudo systemctl disable infrastructure-monitor.timer
# Re-enable monitoring
sudo systemctl enable infrastructure-monitor.timer| File | Purpose |
|---|---|
| Monitoring architecture, recovery strategies, ntfy.sh details |
| Example configurations, alert formats, log outputs |
| Test script for alert delivery |