infrastructure-health-check

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Works with docker-compose, Caddy, Pi-hole, and Cloudflare services.
适用于docker-compose、Caddy、Pi-hole和Cloudflare服务。

Infrastructure Health Check

基础设施健康检查

Comprehensive health verification for all network infrastructure services.
对所有网络基础设施服务进行全面的健康验证。

Quick Start

快速开始

Run a full infrastructure health check:
bash
cd /home/dawiddutoit/projects/network && ./scripts/health-check.sh
Or invoke this skill with: "Check infrastructure health" or "Is everything running?"
运行完整的基础设施健康检查:
bash
cd /home/dawiddutoit/projects/network && ./scripts/health-check.sh
或者通过以下指令调用该技能:"检查基础设施健康状况" 或 "所有服务都在运行吗?"

Table of Contents

目录

  1. When to Use This Skill
  2. What This Skill Does
  3. Instructions
    • 3.1 Docker Container Status
    • 3.2 Caddy HTTPS Verification
    • 3.3 Pi-hole DNS Check
    • 3.4 Cloudflare Tunnel Status
    • 3.5 Webhook Endpoint Test
    • 3.6 SSL Certificate Validity
    • 3.7 Cloudflare Access Verification
    • 3.8 Generate Health Report
  4. Supporting Files
  5. Expected Outcomes
  6. Requirements
  7. Red Flags to Avoid
  1. 何时使用该技能
  2. 该技能的功能
  3. 操作说明
    • 3.1 Docker容器状态
    • 3.2 Caddy HTTPS验证
    • 3.3 Pi-hole DNS检查
    • 3.4 Cloudflare Tunnel状态
    • 3.5 Webhook端点测试
    • 3.6 SSL证书有效性
    • 3.7 Cloudflare Access验证
    • 3.8 生成健康报告
  4. 支持文件
  5. 预期结果
  6. 环境要求
  7. 需要避免的警示信号

When to Use This Skill

何时使用该技能

Explicit Triggers:
  • "Check infrastructure health"
  • "Is everything running?"
  • "Check service status"
  • "Verify SSL certificates"
  • "Check tunnel connection"
  • "Diagnose network issues"
Implicit Triggers:
  • After restarting Docker services
  • After network configuration changes
  • Before deploying new services
  • When services seem unresponsive
Debugging Triggers:
  • "Why can't I access pihole.temet.ai?"
  • "Services are not responding"
  • "SSL certificate errors"
  • "Authentication not working"
明确触发场景:
  • "检查基础设施健康状况"
  • "所有服务都在运行吗?"
  • "检查服务状态"
  • "验证SSL证书"
  • "检查隧道连接"
  • "诊断网络问题"
隐含触发场景:
  • 重启Docker服务后
  • 修改网络配置后
  • 部署新服务前
  • 服务无响应时
调试触发场景:
  • "为什么我无法访问pihole.temet.ai?"
  • "服务无响应"
  • "SSL证书错误"
  • "认证失败"

What This Skill Does

该技能的功能

Performs 8 health checks and generates a comprehensive status report:
  1. Docker Containers - Verifies all containers are running and healthy
  2. Caddy HTTPS - Tests reverse proxy is serving HTTPS correctly
  3. Pi-hole DNS - Confirms DNS resolution is working
  4. Cloudflare Tunnel - Checks tunnel connectivity to Cloudflare
  5. Webhook Endpoint - Tests GitHub webhook accessibility
  6. SSL Certificates - Validates certificate validity and expiration
  7. Cloudflare Access - Verifies authentication is configured
  8. Overall Status - Aggregates results into pass/fail summary
执行8项健康检查并生成综合状态报告:
  1. Docker容器 - 验证所有容器是否正常运行且状态健康
  2. Caddy HTTPS - 测试反向代理是否正确提供HTTPS服务
  3. Pi-hole DNS - 确认DNS解析功能正常
  4. Cloudflare Tunnel - 检查与Cloudflare的隧道连通性
  5. Webhook端点 - 测试GitHub Webhook的可访问性
  6. SSL证书 - 验证证书的有效性和过期时间
  7. Cloudflare Access - 验证认证配置是否正确
  8. 整体状态 - 将所有结果汇总为通过/失败的总结报告

Instructions

操作说明

3.1 Docker Container Status

3.1 Docker容器状态

Check all containers are running:
bash
cd /home/dawiddutoit/projects/network && docker compose ps --format "table {{.Name}}\t{{.Status}}\t{{.Health}}"
Expected containers:
ContainerStatusPurpose
piholeUp (healthy)DNS + Ad blocking
caddyUpReverse proxy
cloudflaredUpCloudflare Tunnel
webhookUpGitHub auto-deploy
Check for issues:
bash
docker compose ps --filter "status=exited"
docker compose ps --filter "health=unhealthy"
检查所有容器是否运行:
bash
cd /home/dawiddutoit/projects/network && docker compose ps --format "table {{.Name}}\t{{.Status}}\t{{.Health}}"
预期运行的容器:
容器名称状态用途
piholeUp (healthy)DNS + 广告拦截
caddyUp反向代理
cloudflaredUpCloudflare Tunnel
webhookUpGitHub自动部署
检查问题:
bash
docker compose ps --filter "status=exited"
docker compose ps --filter "health=unhealthy"

3.2 Caddy HTTPS Verification

3.2 Caddy HTTPS验证

Test Caddy is serving HTTPS for each domain:
bash
undefined
测试Caddy是否为每个域名提供HTTPS服务:
bash
undefined

Test Pi-hole

测试Pi-hole

curl -sI https://pihole.temet.ai --max-time 5 | head -1
curl -sI https://pihole.temet.ai --max-time 5 | head -1

Test Jaeger

测试Jaeger

curl -sI https://jaeger.temet.ai --max-time 5 | head -1
curl -sI https://jaeger.temet.ai --max-time 5 | head -1

Test Langfuse

测试Langfuse

curl -sI https://langfuse.temet.ai --max-time 5 | head -1

**Expected:** `HTTP/2 200` or `HTTP/2 302` (redirect to auth)

**Check Caddy logs for errors:**
```bash
docker logs caddy --tail 20 2>&1 | grep -iE "error|warn|fail"
curl -sI https://langfuse.temet.ai --max-time 5 | head -1

**预期结果:** `HTTP/2 200` 或 `HTTP/2 302`(重定向至认证页面)

**查看Caddy日志中的错误:**
```bash
docker logs caddy --tail 20 2>&1 | grep -iE "error|warn|fail"

3.3 Pi-hole DNS Check

3.3 Pi-hole DNS检查

Verify DNS resolution is working:
bash
undefined
验证DNS解析功能是否正常:
bash
undefined

Check Pi-hole can resolve local domains

检查Pi-hole能否解析本地域名

docker exec pihole dig +short @127.0.0.1 pihole.temet.ai
docker exec pihole dig +short @127.0.0.1 pihole.temet.ai

Check from host

从主机端检查

dig @localhost pihole.temet.ai +short
dig @localhost pihole.temet.ai +short

Check external DNS

检查外部DNS

dig @1.1.1.1 pihole.temet.ai +short

**Expected:** Returns IP address (192.168.68.135 for local, Cloudflare IP for external)

**Check Pi-hole status:**
```bash
docker exec pihole pihole status
dig @1.1.1.1 pihole.temet.ai +short

**预期结果:** 返回IP地址(本地为192.168.68.135,外部为Cloudflare IP)

**检查Pi-hole状态:**
```bash
docker exec pihole pihole status

3.4 Cloudflare Tunnel Status

3.4 Cloudflare Tunnel状态

Verify tunnel is connected:
bash
undefined
验证隧道是否已连接:
bash
undefined

Check tunnel logs for connection status

查看隧道日志中的连接状态

docker logs cloudflared --tail 30 2>&1 | grep -iE "connected|registered|error|failed"
docker logs cloudflared --tail 30 2>&1 | grep -iE "connected|registered|error|failed"

Check tunnel process is running

检查隧道进程是否运行

docker exec cloudflared pgrep -f cloudflared

**Expected output contains:**
- `Registered tunnel connection` - Tunnel is connected
- `Connection ... registered` - Healthy connection

**Warning signs:**
- `connection failed` - Network issues
- `error` - Configuration problems
- No recent log entries - Process may be stuck
docker exec cloudflared pgrep -f cloudflared

**预期输出包含:**
- `Registered tunnel connection` - 隧道已连接
- `Connection ... registered` - 连接状态健康

**警示信号:**
- `connection failed` - 网络问题
- `error` - 配置问题
- 无最新日志条目 - 进程可能已停滞

3.5 Webhook Endpoint Test

3.5 Webhook端点测试

Verify webhook is accessible:
bash
undefined
验证Webhook是否可访问:
bash
undefined

Test webhook health endpoint locally

本地测试Webhook健康端点

Test via domain (if local)

通过域名测试(本地环境)

curl -sI https://webhook.temet.ai/hooks/health --max-time 5 | head -1

**Expected:** `OK` response or `HTTP/2 200`
curl -sI https://webhook.temet.ai/hooks/health --max-time 5 | head -1

**预期结果:** 返回`OK`响应或`HTTP/2 200`

3.6 SSL Certificate Validity

3.6 SSL证书有效性

Check certificate details for each domain:
bash
for domain in pihole jaeger langfuse ha code; do
  echo "=== $domain.temet.ai ==="
  echo | openssl s_client -servername $domain.temet.ai \
    -connect $domain.temet.ai:443 2>/dev/null | \
    openssl x509 -noout -dates -issuer 2>/dev/null || echo "FAILED"
  echo
done
Expected output:
notBefore=<date>
notAfter=<date>
issuer=C = US, O = Let's Encrypt, CN = R11
Check certificate expiration:
bash
undefined
检查每个域名的证书详情:
bash
for domain in pihole jaeger langfuse ha code; do
  echo "=== $domain.temet.ai ==="
  echo | openssl s_client -servername $domain.temet.ai \
    -connect $domain.temet.ai:443 2>/dev/null | \
    openssl x509 -noout -dates -issuer 2>/dev/null || echo "FAILED"
  echo
done
预期输出:
notBefore=<日期>
notAfter=<日期>
issuer=C = US, O = Let's Encrypt, CN = R11
检查证书过期时间:
bash
undefined

Get days until expiration

获取证书剩余有效期天数

for domain in pihole jaeger langfuse; do echo -n "$domain.temet.ai: " echo | openssl s_client -servername $domain.temet.ai
-connect $domain.temet.ai:443 2>/dev/null |
openssl x509 -noout -checkend 2592000 && echo "OK (>30 days)" || echo "RENEW SOON" done
undefined
for domain in pihole jaeger langfuse; do echo -n "$domain.temet.ai: " echo | openssl s_client -servername $domain.temet.ai
-connect $domain.temet.ai:443 2>/dev/null |
openssl x509 -noout -checkend 2592000 && echo "OK (>30天)" || echo "需尽快续签" done
undefined

3.7 Cloudflare Access Verification

3.7 Cloudflare Access验证

Check Access is configured for protected services:
bash
undefined
检查受保护服务的Access配置:
bash
undefined

Test that Access is intercepting (should redirect to login)

测试Access是否拦截请求(应重定向至登录页面)

curl -sI https://pihole.temet.ai --max-time 5 | grep -E "^(HTTP|location|cf-)"

**Expected for protected services:**
- `HTTP/2 302` with redirect to cloudflareaccess.com login
- OR `HTTP/2 200` if already authenticated

**Check Access configuration via API:**
```bash
source /home/dawiddutoit/projects/network/.env
curl -s "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/access/apps" \
  -H "Authorization: Bearer ${CLOUDFLARE_ACCESS_API_TOKEN}" | \
  python3 -c "import sys,json; apps=json.load(sys.stdin).get('result',[]); print('\n'.join([f\"{a['name']}: {a['domain']}\" for a in apps]))"
curl -sI https://pihole.temet.ai --max-time 5 | grep -E "^(HTTP|location|cf-)"

**受保护服务的预期结果:**
- `HTTP/2 302` 并重定向至cloudflareaccess.com登录页面
- 或`HTTP/2 200`(已完成认证)

**通过API检查Access配置:**
```bash
source /home/dawiddutoit/projects/network/.env
curl -s "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/access/apps" \
  -H "Authorization: Bearer ${CLOUDFLARE_ACCESS_API_TOKEN}" | \
  python3 -c "import sys,json; apps=json.load(sys.stdin).get('result',[]); print('\n'.join([f\"{a['name']}: {a['domain']}\" for a in apps]))"

3.8 Generate Health Report

3.8 生成健康报告

Aggregate all checks into a summary report:
========================================
  Infrastructure Health Report
  Generated: $(date)
========================================

DOCKER CONTAINERS
-----------------
[PASS] pihole: running (healthy)
[PASS] caddy: running
[PASS] cloudflared: running
[PASS] webhook: running

HTTPS ENDPOINTS
---------------
[PASS] pihole.temet.ai: HTTP/2 200
[PASS] jaeger.temet.ai: HTTP/2 200
[PASS] langfuse.temet.ai: HTTP/2 200

DNS RESOLUTION
--------------
[PASS] Local DNS: 192.168.68.135
[PASS] External DNS: resolving via Cloudflare

CLOUDFLARE TUNNEL
-----------------
[PASS] Tunnel: connected

WEBHOOK
-------
[PASS] Endpoint: responding

SSL CERTIFICATES
----------------
[PASS] pihole.temet.ai: valid, expires in 67 days
[PASS] jaeger.temet.ai: valid, expires in 67 days
[PASS] langfuse.temet.ai: valid, expires in 67 days

CLOUDFLARE ACCESS
-----------------
[PASS] pihole.temet.ai: protected
[PASS] jaeger.temet.ai: protected
[PASS] langfuse.temet.ai: protected
[PASS] webhook.temet.ai: bypass (public)

========================================
  Overall Status: ALL CHECKS PASSED
========================================
将所有检查结果汇总为总结报告:
========================================
  基础设施健康报告
  生成时间: $(date)
========================================

DOCKER容器状态
-----------------
[通过] pihole: 运行中(健康)
[通过] caddy: 运行中
[通过] cloudflared: 运行中
[通过] webhook: 运行中

HTTPS端点状态
---------------
[通过] pihole.temet.ai: HTTP/2 200
[通过] jaeger.temet.ai: HTTP/2 200
[通过] langfuse.temet.ai: HTTP/2 200

DNS解析状态
--------------
[通过] 本地DNS: 192.168.68.135
[通过] 外部DNS: 通过Cloudflare解析

CLOUDFLARE隧道状态
-----------------
[通过] 隧道: 已连接

WEBHOOK状态
-------
[通过] 端点: 正常响应

SSL证书状态
----------------
[通过] pihole.temet.ai: 有效,剩余67天过期
[通过] jaeger.temet.ai: 有效,剩余67天过期
[通过] langfuse.temet.ai: 有效,剩余67天过期

CLOUDFLARE Access状态
-----------------
[通过] pihole.temet.ai: 已受保护
[通过] jaeger.temet.ai: 已受保护
[通过] langfuse.temet.ai: 已受保护
[通过] webhook.temet.ai: 绕过(公开访问)

========================================
  整体状态: 所有检查均通过
========================================

Supporting Files

支持文件

FilePurpose
scripts/health-check.sh
Automated health check script
references/troubleshooting.md
Common issues and solutions
examples/examples.md
Example health check outputs
文件用途
scripts/health-check.sh
自动化健康检查脚本
references/troubleshooting.md
常见问题与解决方案
examples/examples.md
健康检查输出示例

Expected Outcomes

预期结果

Success (All Checks Pass):
  • All 4 containers running
  • HTTPS endpoints responding with 200/302
  • DNS resolving correctly
  • Tunnel connected to Cloudflare
  • Webhook accessible
  • Certificates valid with >30 days remaining
  • Access configured for protected services
Partial Failure:
  • One or more containers down -> Restart with
    docker compose up -d
  • Certificate expiring soon -> Will auto-renew, monitor
  • Access misconfigured -> Run
    ./scripts/cf-access-setup.sh setup
Critical Failure:
  • Multiple containers down -> Check Docker daemon, disk space
  • Tunnel disconnected -> Check internet, tunnel token
  • DNS not resolving -> Check Pi-hole container, router DNS settings
  • All certificates invalid -> Check Cloudflare API token
成功(所有检查通过):
  • 4个容器全部运行
  • HTTPS端点返回200/302响应
  • DNS解析正常
  • 隧道已连接至Cloudflare
  • Webhook可访问
  • 证书有效且剩余有效期超过30天
  • 受保护服务已配置Access
部分失败:
  • 一个或多个容器停止运行 -> 使用
    docker compose up -d
    重启
  • 证书即将过期 -> 会自动续签,需监控状态
  • Access配置错误 -> 运行
    ./scripts/cf-access-setup.sh setup
    修复
严重失败:
  • 多个容器停止运行 -> 检查Docker守护进程、磁盘空间
  • 隧道断开连接 -> 检查网络、隧道令牌
  • DNS无法解析 -> 检查Pi-hole容器、路由器DNS设置
  • 所有证书无效 -> 检查Cloudflare API令牌

Requirements

环境要求

Environment:
  • Docker and Docker Compose running
  • Access to
    /home/dawiddutoit/projects/network
  • .env
    file with Cloudflare credentials
  • Network connectivity
Services:
  • pihole container
  • caddy container
  • cloudflared container
  • webhook container
运行环境:
  • Docker和Docker Compose已运行
  • 可访问
    /home/dawiddutoit/projects/network
    目录
  • 包含Cloudflare凭据的
    .env
    文件
  • 网络连通性
依赖服务:
  • pihole容器
  • caddy容器
  • cloudflared容器
  • webhook容器

Red Flags to Avoid

需要避免的警示信号

  • Do not ignore certificate expiration warnings
  • Do not skip DNS checks when troubleshooting access issues
  • Do not assume tunnel is connected without checking logs
  • Do not run health checks without network connectivity
  • Do not ignore container health status (unhealthy state)
  • Do not forget to check both local and external DNS resolution
  • Do not assume HTTP 302 is a failure (it's auth redirect)
  • 不要忽略证书过期警告
  • 排查访问问题时不要跳过DNS检查
  • 不要在未查看日志的情况下假设隧道已连接
  • 不要在无网络连通性时运行健康检查
  • 不要忽略容器健康状态(不健康状态)
  • 不要忘记同时检查本地和外部DNS解析
  • 不要假设HTTP 302是失败(这是认证重定向)

Notes

注意事项

  • Health checks should be run from the Pi (192.168.68.135) for accurate local results
  • Remote access testing requires being outside the home network
  • Certificate auto-renewal happens 30 days before expiration
  • Cloudflare Tunnel reconnects automatically after brief disconnections
  • Pi-hole DNS may cache results for up to 5 minutes
  • Run
    ./scripts/health-check.sh
    for automated checking
  • 为获得准确的本地检查结果,应在Pi设备(192.168.68.135)上运行健康检查
  • 远程访问测试需要处于家庭网络之外
  • 证书会在过期前30天自动续签
  • Cloudflare Tunnel在短暂断开后会自动重连
  • Pi-hole DNS的缓存结果最长保留5分钟
  • 运行
    ./scripts/health-check.sh
    执行自动化检查