infrastructure-monitoring-setup

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Works with infrastructure-monitor.sh script, systemd timer, ntfy.sh push notifications,
支持infrastructure-monitor.sh脚本、systemd定时器、ntfy.sh推送通知

Infrastructure Monitoring Setup Skill

基础设施监控搭建技能

Complete setup and configuration of automated infrastructure monitoring with mobile push notifications and auto-recovery capabilities.
完成带有移动推送通知和自动恢复功能的自动化基础设施监控的完整搭建与配置。

Quick Start

快速开始

Quick setup for monitoring (5 minutes):
bash
undefined
监控快速搭建(5分钟完成):
bash
undefined

1. Create unique ntfy topic

1. Create unique ntfy topic

TOPIC="infra-$(openssl rand -hex 8)" echo "Your topic: $TOPIC"
TOPIC="infra-$(openssl rand -hex 8)" echo "Your topic: $TOPIC"

2. Add to .env

2. Add to .env

echo "ALERT_ENABLED=true" >> /home/dawiddutoit/projects/network/.env echo "NTFY_SERVER=https://ntfy.sh" >> /home/dawiddutoit/projects/network/.env echo "NTFY_TOPIC=$TOPIC" >> /home/dawiddutoit/projects/network/.env echo "AUTO_RECOVER=true" >> /home/dawiddutoit/projects/network/.env
echo "ALERT_ENABLED=true" >> /home/dawiddutoit/projects/network/.env echo "NTFY_SERVER=https://ntfy.sh" >> /home/dawiddutoit/projects/network/.env echo "NTFY_TOPIC=$TOPIC" >> /home/dawiddutoit/projects/network/.env echo "AUTO_RECOVER=true" >> /home/dawiddutoit/projects/network/.env

3. Install systemd service

3. Install systemd service

sudo cp /home/dawiddutoit/projects/network/systemd/infrastructure-monitor.* /etc/systemd/system/ sudo systemctl daemon-reload sudo systemctl enable --now infrastructure-monitor.timer
sudo cp /home/dawiddutoit/projects/network/systemd/infrastructure-monitor.* /etc/systemd/system/ sudo systemctl daemon-reload sudo systemctl enable --now infrastructure-monitor.timer

4. Test

4. Test

/home/dawiddutoit/projects/network/scripts/infrastructure-monitor.sh

Then install ntfy app on phone and subscribe to your topic.
/home/dawiddutoit/projects/network/scripts/infrastructure-monitor.sh

随后在手机上安装ntfy应用并订阅你的主题。

Table of Contents

目录

  1. When to Use This Skill
  2. What This Skill Does
  3. Instructions
    • 3.1 Install ntfy Mobile App
    • 3.2 Configure Monitoring in .env
    • 3.3 Install Systemd Timer
    • 3.4 Test Monitoring and Alerts
    • 3.5 Configure Home Assistant Integration (Optional)
    • 3.6 Verify Auto-Recovery
    • 3.7 View Monitoring Logs
  4. Supporting Files
  5. Expected Outcomes
  6. Requirements
  7. Red Flags to Avoid
  1. 何时使用该技能
  2. 该技能的功能
  3. 操作指南
    • 3.1 安装ntfy移动应用
    • 3.2 在.env中配置监控
    • 3.3 安装Systemd定时器
    • 3.4 测试监控与告警
    • 3.5 配置Home Assistant集成(可选)
    • 3.6 验证自动恢复功能
    • 3.7 查看监控日志
  4. 相关文件
  5. 预期效果
  6. 前置要求
  7. 注意事项

When to Use This Skill

1. 何时使用该技能

Explicit Triggers:
  • "Setup monitoring"
  • "Configure mobile alerts"
  • "Enable auto-recovery"
  • "Setup ntfy notifications"
  • "Configure Home Assistant alerts"
Implicit Triggers:
  • Want to be notified of infrastructure failures
  • Need automated recovery for common issues
  • Infrastructure has been down without detection
  • Want proactive monitoring
Debugging Triggers:
  • "Why am I not getting alerts?"
  • "Is monitoring working?"
  • "How to test notifications?"
明确触发场景:
  • 「搭建监控系统」
  • 「配置移动告警」
  • 「启用自动恢复」
  • 「搭建ntfy通知」
  • 「配置Home Assistant告警」
隐含触发场景:
  • 希望收到基础设施故障通知
  • 需要为常见问题启用自动恢复
  • 基础设施曾在无检测的情况下宕机
  • 希望实现主动监控
排查触发场景:
  • 「为什么收不到告警?」
  • 「监控是否正常运行?」
  • 「如何测试通知功能?」

What This Skill Does

2. 该技能的功能

  1. Mobile Alerts - Configures ntfy.sh push notifications to phone
  2. Auto-Recovery - Enables automatic fixes for common failures
  3. HA Integration - Optional Home Assistant notification integration
  4. Systemd Service - Installs timer to run monitoring every 5 minutes
  5. Tests Setup - Verifies notifications and recovery work
  6. Logs Access - Shows how to view monitoring logs
  7. Troubleshooting - Diagnoses alert delivery issues
  1. 移动告警 - 配置ntfy.sh手机推送通知
  2. 自动恢复 - 启用常见故障的自动修复功能
  3. Home Assistant集成 - 可选的Home Assistant通知集成
  4. Systemd服务 - 安装定时器实现每5分钟运行一次监控
  5. 测试搭建 - 验证通知与恢复功能是否正常
  6. 日志访问 - 展示如何查看监控日志
  7. 故障排查 - 诊断告警投递问题

Instructions

3. 操作指南

3.1 Install ntfy Mobile App

3.1 安装ntfy移动应用

Install app:
Subscribe to topic:
  1. Open ntfy app
  2. Tap "+" to add subscription
  3. Enter topic:
    infra-YOUR-RANDOM-ID
    (you'll generate this in step 3.2)
  4. Server:
    https://ntfy.sh
  5. Tap "Subscribe"
Note: You need the topic ID from step 3.2 before subscribing. Come back here after generating it.
安装应用:
订阅主题:
  1. 打开ntfy应用
  2. 点击「+」添加订阅
  3. 输入主题:
    infra-YOUR-RANDOM-ID
    (将在3.2步骤生成)
  4. 服务器:
    https://ntfy.sh
  5. 点击「订阅」
注意: 你需要先完成3.2步骤获取主题ID,再返回此处完成订阅。

3.2 Configure Monitoring in .env

3.2 在.env中配置监控

Generate unique topic ID:
bash
TOPIC="infra-$(openssl rand -hex 8)"
echo "Your unique topic: $TOPIC"
Save this topic ID - you'll use it in the ntfy app.
Add monitoring configuration to .env:
bash
undefined
生成唯一主题ID:
bash
TOPIC="infra-$(openssl rand -hex 8)"
echo "Your unique topic: $TOPIC"
保存该主题ID,后续将在ntfy应用中使用。
将监控配置添加到.env:
bash
undefined

Navigate to project directory

Navigate to project directory

cd /home/dawiddutoit/projects/network
cd /home/dawiddutoit/projects/network

Add monitoring variables

Add monitoring variables

cat >> .env << EOF
cat >> .env << EOF

Monitoring & Alerts

Monitoring & Alerts

ALERT_ENABLED=true NTFY_SERVER=https://ntfy.sh NTFY_TOPIC=$TOPIC AUTO_RECOVER=true EOF

**Verify configuration:**

```bash
grep -A4 "Monitoring & Alerts" /home/dawiddutoit/projects/network/.env
Expected:
undefined
ALERT_ENABLED=true NTFY_SERVER=https://ntfy.sh NTFY_TOPIC=$TOPIC AUTO_RECOVER=true EOF

**验证配置:**

```bash
grep -A4 "Monitoring & Alerts" /home/dawiddutoit/projects/network/.env
预期输出:
undefined

Monitoring & Alerts

Monitoring & Alerts

ALERT_ENABLED=true NTFY_SERVER=https://ntfy.sh NTFY_TOPIC=infra-a3f7d92b4c8e1f56 AUTO_RECOVER=true

**Configuration options:**

| Variable | Purpose | Default |
|----------|---------|---------|
| `ALERT_ENABLED` | Enable mobile push notifications | `false` |
| `NTFY_SERVER` | ntfy.sh server URL | `https://ntfy.sh` |
| `NTFY_TOPIC` | Unique topic for your alerts | None (required) |
| `AUTO_RECOVER` | Enable automatic recovery | `true` |

**To disable auto-recovery but keep alerts:**
```bash
ALERT_ENABLED=true NTFY_SERVER=https://ntfy.sh NTFY_TOPIC=infra-a3f7d92b4c8e1f56 AUTO_RECOVER=true

**配置选项:**

| 变量名 | 用途 | 默认值 |
|--------|------|--------|
| `ALERT_ENABLED` | 启用移动推送通知 | `false` |
| `NTFY_SERVER` | ntfy.sh服务器地址 | `https://ntfy.sh` |
| `NTFY_TOPIC` | 告警专属主题 | 无(必填) |
| `AUTO_RECOVER` | 启用自动恢复 | `true` |

**如需保留告警但禁用自动恢复:**
```bash

Edit .env

编辑.env

nano /home/dawiddutoit/projects/network/.env
nano /home/dawiddutoit/projects/network/.env

Change: AUTO_RECOVER=false

修改:AUTO_RECOVER=false

undefined
undefined

3.3 Install Systemd Timer

3.3 安装Systemd定时器

Install systemd service and timer to run monitoring every 5 minutes:
bash
undefined
安装systemd服务与定时器,实现每5分钟运行一次监控:
bash
undefined

Copy service files

复制服务文件

sudo cp /home/dawiddutoit/projects/network/systemd/infrastructure-monitor.service /etc/systemd/system/ sudo cp /home/dawiddutoit/projects/network/systemd/infrastructure-monitor.timer /etc/systemd/system/
sudo cp /home/dawiddutoit/projects/network/systemd/infrastructure-monitor.service /etc/systemd/system/ sudo cp /home/dawiddutoit/projects/network/systemd/infrastructure-monitor.timer /etc/systemd/system/

Reload systemd

重载systemd

sudo systemctl daemon-reload
sudo systemctl daemon-reload

Enable and start timer

启用并启动定时器

sudo systemctl enable infrastructure-monitor.timer sudo systemctl start infrastructure-monitor.timer

**Verify timer is active:**

```bash
sudo systemctl enable infrastructure-monitor.timer sudo systemctl start infrastructure-monitor.timer

**验证定时器是否激活:**

```bash

Check timer status

查看定时器状态

systemctl list-timers infrastructure-monitor.timer
systemctl list-timers infrastructure-monitor.timer

Check service status

查看服务状态

sudo systemctl status infrastructure-monitor.timer

Expected:
● infrastructure-monitor.timer - Run infrastructure monitoring every 5 minutes Loaded: loaded (/etc/systemd/system/infrastructure-monitor.timer; enabled) Active: active (waiting) since...

**Timer configuration:**
- Runs every 5 minutes
- Starts 1 minute after boot
- Persistent (survives reboots)
sudo systemctl status infrastructure-monitor.timer

预期输出:
● infrastructure-monitor.timer - Run infrastructure monitoring every 5 minutes Loaded: loaded (/etc/systemd/system/infrastructure-monitor.timer; enabled) Active: active (waiting) since...

**定时器配置:**
- 每5分钟运行一次
- 系统启动1分钟后开始运行
- 持久化(重启后依然有效)

3.4 Test Monitoring and Alerts

3.4 测试监控与告警

Test monitoring script:
bash
undefined
测试监控脚本:
bash
undefined

Run monitoring manually

手动运行监控

/home/dawiddutoit/projects/network/scripts/infrastructure-monitor.sh

Expected output shows:
- Docker containers checked
- Tunnel connectivity tested
- Service health verified
- Network interface status
- Alert sent to ntfy topic

**Test alert delivery:**

Within 30 seconds, you should receive push notification on phone with infrastructure status.

**If no notification received:**

Check ntfy topic subscription:
```bash
/home/dawiddutoit/projects/network/scripts/infrastructure-monitor.sh

预期输出包含:
- Docker容器检查结果
- 隧道连通性测试
- 服务健康状态验证
- 网络接口状态
- 已向ntfy主题发送告警

**测试告警投递:**

30秒内,你应在手机上收到包含基础设施状态的推送通知。

**若未收到通知:**

检查ntfy主题订阅:
```bash

Test sending to topic directly

直接向主题发送测试消息

curl -d "Test from infrastructure monitoring" https://ntfy.sh/$TOPIC

If direct curl works but monitoring doesn't:
- Check ALERT_ENABLED=true in .env
- Verify NTFY_TOPIC matches app subscription
- Check script has network access
curl -d "Test from infrastructure monitoring" https://ntfy.sh/$TOPIC

若直接curl请求有效但监控脚本无响应:
- 检查.env中ALERT_ENABLED=true
- 验证NTFY_TOPIC与应用订阅的主题一致
- 检查脚本是否具备网络访问权限

3.5 Configure Home Assistant Integration (Optional)

3.5 配置Home Assistant集成(可选)

Why use Home Assistant integration:
  • Centralized home automation alerts
  • Can trigger automations based on infrastructure status
  • Redundancy with ntfy.sh
  • Integration with existing HA notifications
Prerequisites:
  • Home Assistant running and accessible
  • HA mobile app installed (for notify.mobile_app_* service)
Step 1: Create Long-Lived Access Token
  1. Go to Home Assistant: http://192.168.68.123:8123
  2. Click your profile (bottom left)
  3. Scroll to "Long-Lived Access Tokens"
  4. Click "Create Token"
  5. Name: "Infrastructure Monitoring"
  6. Copy token (shown only once)
Step 2: Find Notification Service Name
  1. In Home Assistant: Developer Tools → Services
  2. Filter by "notify"
  3. Find your mobile app service:
    notify.mobile_app_your_phone
Step 3: Add to .env
bash
undefined
为什么使用Home Assistant集成:
  • 集中管理家庭自动化告警
  • 可根据基础设施状态触发自动化流程
  • 与ntfy.sh形成冗余
  • 与现有Home Assistant通知系统集成
前置要求:
  • Home Assistant已运行且可访问
  • 已安装Home Assistant移动应用(用于notify.mobile_app_*服务)
步骤1:创建长期访问令牌
  1. 打开Home Assistant:http://192.168.68.123:8123
  2. 点击左下角个人资料
  3. 滚动至「长期访问令牌」
  4. 点击「创建令牌」
  5. 命名:「基础设施监控」
  6. 复制令牌(仅显示一次)
步骤2:查找通知服务名称
  1. 在Home Assistant中:开发者工具 → 服务
  2. 筛选「notify」
  3. 找到你的移动应用服务:
    notify.mobile_app_your_phone
步骤3:添加到.env
bash
undefined

Edit .env

编辑.env

nano /home/dawiddutoit/projects/network/.env
nano /home/dawiddutoit/projects/network/.env

Add HA configuration

添加Home Assistant配置

HA_NOTIFICATIONS_ENABLED=true HA_BASE_URL=http://192.168.68.123:8123 HA_ACCESS_TOKEN=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9... HA_NOTIFY_SERVICE=notify.mobile_app_your_phone

**Step 4: Test HA Notifications**

```bash
HA_NOTIFICATIONS_ENABLED=true HA_BASE_URL=http://192.168.68.123:8123 HA_ACCESS_TOKEN=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9... HA_NOTIFY_SERVICE=notify.mobile_app_your_phone

**步骤4:测试Home Assistant通知**

```bash

Run monitoring (should send to both ntfy and HA)

运行监控(应同时向ntfy和Home Assistant发送通知)

/home/dawiddutoit/projects/network/scripts/infrastructure-monitor.sh

Check you receive notification in Home Assistant companion app.

**Troubleshooting HA notifications:**

```bash
/home/dawiddutoit/projects/network/scripts/infrastructure-monitor.sh

检查是否在Home Assistant companion app中收到通知。

**排查Home Assistant通知问题:**

```bash

Test HA API access

测试Home Assistant API访问

curl -H "Authorization: Bearer YOUR_TOKEN"
http://192.168.68.123:8123/api/
curl -H "Authorization: Bearer YOUR_TOKEN"
http://192.168.68.123:8123/api/

Test notification service

测试通知服务

curl -X POST
-H "Authorization: Bearer YOUR_TOKEN"
-H "Content-Type: application/json"
-d '{"message": "Test from infrastructure monitoring"}'
http://192.168.68.123:8123/api/services/notify/mobile_app_your_phone
undefined
curl -X POST
-H "Authorization: Bearer YOUR_TOKEN"
-H "Content-Type: application/json"
-d '{"message": "Test from infrastructure monitoring"}'
http://192.168.68.123:8123/api/services/notify/mobile_app_your_phone
undefined

3.6 Verify Auto-Recovery

3.6 验证自动恢复功能

Monitor logs to see auto-recovery in action:
bash
undefined
查看日志以观察自动恢复过程:
bash
undefined

View live monitoring logs

实时查看监控日志

sudo journalctl -u infrastructure-monitor.service -f
sudo journalctl -u infrastructure-monitor.service -f

Or check persistent log

或查看持久化日志

tail -f /var/log/infrastructure-monitor.log

**Auto-recovery capabilities:**

| Issue | Detection | Recovery Action |
|-------|-----------|----------------|
| Stuck cloudflared | No registrations in 10 min | Restart cloudflared container |
| Docker network isolation | Ping fails between containers | Recreate bridge network |
| Inactive Ethernet | WiFi used instead of eth0 | Activate Ethernet connection |
| Service failures | HTTP health checks fail | Restart affected containers |

**Test auto-recovery:**

```bash
tail -f /var/log/infrastructure-monitor.log

**自动恢复能力:**

| 问题 | 检测方式 | 恢复操作 |
|------|----------|----------|
| cloudflared卡顿 | 10分钟内无注册记录 | 重启cloudflared容器 |
| Docker网络隔离 | 容器间ping不通 | 重建桥接网络 |
| 以太网未激活 | 使用WiFi而非eth0 | 激活以太网连接 |
| 服务故障 | HTTP健康检查失败 | 重启受影响的容器 |

**测试自动恢复:**

```bash

Simulate stuck tunnel

模拟隧道卡顿

docker stop cloudflared
docker stop cloudflared

Wait 5 minutes (next monitoring run)

等待5分钟(下一次监控运行)

Check logs - should show tunnel restarted

查看日志 - 应显示隧道已重启

Verify tunnel recovered

验证隧道恢复

docker ps | grep cloudflared docker logs cloudflared | grep "Registered tunnel"
undefined
docker ps | grep cloudflared docker logs cloudflared | grep "Registered tunnel"
undefined

3.7 View Monitoring Logs

3.7 查看监控日志

View systemd service logs:
bash
undefined
查看systemd服务日志:
bash
undefined

Live monitoring logs

实时监控日志

sudo journalctl -u infrastructure-monitor.service -f
sudo journalctl -u infrastructure-monitor.service -f

Last 50 lines

查看最近50行

sudo journalctl -u infrastructure-monitor.service -n 50
sudo journalctl -u infrastructure-monitor.service -n 50

Logs from today

查看今日日志

sudo journalctl -u infrastructure-monitor.service --since today
sudo journalctl -u infrastructure-monitor.service --since today

Logs with timestamps

查看带时间戳的日志

sudo journalctl -u infrastructure-monitor.service -o short-iso

**View persistent log file:**

```bash
sudo journalctl -u infrastructure-monitor.service -o short-iso

**查看持久化日志文件:**

```bash

Live tail

实时尾部查看

tail -f /var/log/infrastructure-monitor.log
tail -f /var/log/infrastructure-monitor.log

Last 100 lines

查看最近100行

tail -100 /var/log/infrastructure-monitor.log
tail -100 /var/log/infrastructure-monitor.log

Search for errors

搜索错误

grep -i error /var/log/infrastructure-monitor.log
grep -i error /var/log/infrastructure-monitor.log

Search for recoveries

搜索恢复记录

grep -i "recovered" /var/log/infrastructure-monitor.log

**Check timer schedule:**

```bash
grep -i "recovered" /var/log/infrastructure-monitor.log

**查看定时器计划:**

```bash

Show next run time

查看下次运行时间

systemctl list-timers infrastructure-monitor.timer
systemctl list-timers infrastructure-monitor.timer

Show timer configuration

查看定时器配置

systemctl cat infrastructure-monitor.timer

**Monitoring controls:**

```bash
systemctl cat infrastructure-monitor.timer

**监控控制命令:**

```bash

Stop monitoring temporarily

临时停止监控

sudo systemctl stop infrastructure-monitor.timer
sudo systemctl stop infrastructure-monitor.timer

Restart monitoring

重启监控

sudo systemctl start infrastructure-monitor.timer
sudo systemctl start infrastructure-monitor.timer

Disable monitoring (survives reboot)

禁用监控(重启后依然保持禁用)

sudo systemctl disable infrastructure-monitor.timer
sudo systemctl disable infrastructure-monitor.timer

Re-enable monitoring

重新启用监控

sudo systemctl enable infrastructure-monitor.timer
undefined
sudo systemctl enable infrastructure-monitor.timer
undefined

Supporting Files

4. 相关文件

FilePurpose
references/reference.md
Monitoring architecture, recovery strategies, ntfy.sh details
examples/examples.md
Example configurations, alert formats, log outputs
scripts/test-notifications.sh
Test script for alert delivery
文件用途
references/reference.md
监控架构、恢复策略、ntfy.sh详细说明
examples/examples.md
示例配置、告警格式、日志输出
scripts/test-notifications.sh
告警投递测试脚本

Expected Outcomes

5. 预期效果

Success:
  • ntfy app receives push notifications
  • Monitoring runs every 5 minutes
  • Auto-recovery fixes common failures within 5 minutes
  • Logs show monitoring activity
  • Home Assistant notifications working (if configured)
Partial Success:
  • Monitoring runs but alerts not received (check topic subscription)
  • Alerts received but auto-recovery disabled (set AUTO_RECOVER=true)
Failure Indicators:
  • No notifications received after 10 minutes
  • Timer not running (check systemctl status)
  • Script fails with errors (check logs)
  • HA notifications not working (check token/service name)
成功状态:
  • ntfy应用收到推送通知
  • 监控每5分钟运行一次
  • 自动恢复功能在5分钟内修复常见故障
  • 日志显示监控活动
  • Home Assistant通知正常(若已配置)
部分成功状态:
  • 监控正常运行但收不到告警(检查主题订阅)
  • 收到告警但自动恢复功能未启用(设置AUTO_RECOVER=true)
失败标识:
  • 10分钟后仍未收到通知
  • 定时器未运行(检查systemctl状态)
  • 脚本运行报错(查看日志)
  • Home Assistant通知失效(检查令牌/服务名称)

Requirements

6. 前置要求

  • Infrastructure server running Linux with systemd
  • Mobile device with ntfy app installed
  • Internet connectivity for ntfy.sh
  • .env file with monitoring configuration
  • Home Assistant (optional, for HA integration)
  • 运行Linux且带systemd的基础设施服务器
  • 安装了ntfy应用的移动设备
  • 可访问ntfy.sh的网络连接
  • 包含监控配置的.env文件
  • Home Assistant(可选,用于集成)

Red Flags to Avoid

7. 注意事项

  • Do not use public/guessable ntfy topic (security risk)
  • Do not share ntfy topic publicly (anyone can subscribe)
  • Do not disable monitoring without alternative alerting
  • Do not ignore persistent alerts (investigate root cause)
  • Do not run monitoring script too frequently (causes noise)
  • Do not commit .env with ntfy topic to git (privacy)
  • Do not use AUTO_RECOVER=false without manual monitoring
  • 请勿使用公开/易猜测的ntfy主题(存在安全风险)
  • 请勿公开分享ntfy主题(任何人都可订阅)
  • 请勿在无替代告警方案的情况下禁用监控
  • 请勿忽略持续告警(需排查根本原因)
  • 请勿过于频繁地运行监控脚本(会产生冗余信息)
  • 请勿将包含ntfy主题的.env文件提交至git(隐私风险)
  • 请勿在未配置手动监控的情况下设置AUTO_RECOVER=false

Notes

补充说明

  • Monitoring checks run every 5 minutes via systemd timer
  • ntfy.sh is free and doesn't require account
  • Topic ID should be random and private (security by obscurity)
  • Auto-recovery attempts fixes before alerting as critical
  • Alert levels: 🔴 Critical (manual intervention), ⚠️ Warning (recovery in progress)
  • HA integration is optional and works alongside ntfy.sh
  • Logs persist across reboots at /var/log/infrastructure-monitor.log
  • Maximum detection time: 5 minutes (timer interval)
  • Monitoring survives server reboots (systemd timer enabled)
  • Use infrastructure-health-check skill for manual on-demand checks
  • 监控检查通过systemd定时器每5分钟运行一次
  • ntfy.sh是免费服务,无需注册账号
  • 主题ID应随机且私密(通过模糊实现安全)
  • 自动恢复会先尝试修复,再发送严重告警
  • 告警级别:🔴 严重(需人工干预),⚠️ 警告(恢复中)
  • Home Assistant集成是可选的,可与ntfy.sh同时使用
  • 日志持久化存储在/var/log/infrastructure-monitor.log,重启后依然保留
  • 最大检测时间:5分钟(定时器间隔)
  • 监控在服务器重启后依然有效(systemd定时器已启用)
  • 如需手动按需检查,请使用infrastructure-health-check技能