check-production

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

/check-production

/check-production

Audit production health. Output findings as structured report.
审计生产环境健康状态,输出结构化检查报告。

What This Does

功能说明

  1. Query Sentry for unresolved issues
  2. Check Vercel logs for recent errors
  3. Test health endpoints
  4. Check GitHub Actions for CI/CD failures
  5. Output prioritized findings (P0-P3)
This is a primitive. It only investigates and reports. Use
/log-production-issues
to create GitHub issues or
/triage
to fix.
  1. 查询Sentry中的未解决问题
  2. 检查Vercel日志中的近期错误
  3. 测试健康检查端点
  4. 检查GitHub Actions的CI/CD失败情况
  5. 输出按优先级划分的检查结果(P0-P3)
这是一个基础检查工具,仅负责调查和报告。可使用
/log-production-issues
创建GitHub问题工单,或使用
/triage
来修复问题。

Process

检查流程

1. Sentry Check

1. Sentry检查

bash
undefined
bash
undefined

Run triage script if available

Run triage script if available

~/.claude/skills/triage/scripts/check_sentry.sh 2>/dev/null || echo "Sentry check unavailable"

Or spawn Sentry MCP query if configured.
~/.claude/skills/triage/scripts/check_sentry.sh 2>/dev/null || echo "Sentry check unavailable"

如果已配置,也可触发Sentry MCP查询。

2. Vercel Logs Check

2. Vercel日志检查

bash
undefined
bash
undefined

Check for recent errors

Check for recent errors

~/.claude/skills/triage/scripts/check_vercel_logs.sh 2>/dev/null || vercel logs --output json 2>/dev/null | head -50
undefined
~/.claude/skills/triage/scripts/check_vercel_logs.sh 2>/dev/null || vercel logs --output json 2>/dev/null | head -50
undefined

3. Health Endpoints

3. 健康检查端点

bash
undefined
bash
undefined

Test health endpoint

Test health endpoint

~/.claude/skills/triage/scripts/check_health_endpoints.sh 2>/dev/null || curl -sf "$(grep NEXT_PUBLIC_APP_URL .env.local 2>/dev/null | cut -d= -f2)/api/health" | jq .
undefined
~/.claude/skills/triage/scripts/check_health_endpoints.sh 2>/dev/null || curl -sf "$(grep NEXT_PUBLIC_APP_URL .env.local 2>/dev/null | cut -d= -f2)/api/health" | jq .
undefined

4. GitHub CI/CD Check

4. GitHub CI/CD检查

bash
undefined
bash
undefined

Check for failed workflow runs on default branch

Check for failed workflow runs on default branch

gh run list --branch main --status failure --limit 5 2>/dev/null ||
gh run list --branch master --status failure --limit 5 2>/dev/null
gh run list --branch main --status failure --limit 5 2>/dev/null ||
gh run list --branch master --status failure --limit 5 2>/dev/null

Get details on most recent failure

Get details on most recent failure

gh run list --status failure --limit 1 --json databaseId,name,conclusion,createdAt,headBranch 2>/dev/null
gh run list --status failure --limit 1 --json databaseId,name,conclusion,createdAt,headBranch 2>/dev/null

Check for stale/stuck workflows

Check for stale/stuck workflows

gh run list --status in_progress --json databaseId,name,createdAt 2>/dev/null

**What to look for:**
- Failed runs on main/master branch (broken CI)
- Failed runs on feature branches blocking PRs
- Stuck/in-progress runs that should have completed
- Patterns in failure types (tests, lint, build, deploy)
gh run list --status in_progress --json databaseId,name,createdAt 2>/dev/null

**检查重点:**
- 主分支(main/master)的运行失败情况(CI流程断裂)
- 功能分支的运行失败情况(阻塞PR合并)
- 长时间处于运行中状态的停滞流程
- 失败类型的规律(测试、代码检查、构建、部署)

5. Quick Application Checks

5. 快速应用检查

bash
undefined
bash
undefined

Check for error handling gaps

Check for error handling gaps

grep -rE "catch\s*(\s*)" --include=".ts" --include=".tsx" src/ app/ 2>/dev/null | head -5
grep -rE "catch\s*(\s*)" --include=".ts" --include=".tsx" src/ app/ 2>/dev/null | head -5

Empty catch blocks = silent failures

Empty catch blocks = silent failures

undefined
undefined

Output Format

输出格式

markdown
undefined
markdown
undefined

Production Health Check

生产环境健康检查报告

P0: Critical (Active Production Issues)

P0:严重级别(当前生产环境问题)

  • [SENTRY-123] PaymentIntent failed - 23 users affected (Score: 147) Location: api/checkout.ts:45 First seen: 2h ago
  • [SENTRY-123] PaymentIntent失败 - 影响23位用户(评分:147) 位置:api/checkout.ts:45 首次出现时间:2小时前

P1: High (Degraded Performance / Broken CI)

P1:高优先级(性能下降/CI流程断裂)

  • Health endpoint slow: /api/health responding in 2.3s (should be <500ms)
  • Vercel logs show 5xx errors in last hour (count: 12)
  • [CI] Main branch failing: "Build" workflow (run #1234) Failed step: "Type check" Error: Type 'string' is not assignable to type 'number'
  • 健康检查端点响应缓慢:/api/health响应时间2.3秒(标准应<500毫秒)
  • Vercel日志显示最近1小时内出现12次5xx错误
  • [CI] 主分支运行失败:"Build"工作流(运行编号#1234) 失败步骤:"类型检查" 错误信息:类型'string'无法赋值给类型'number'

P2: Medium (Warnings)

P2:中优先级(警告)

  • 3 empty catch blocks found (silent failures)
  • Health endpoint missing database connectivity check
  • [CI] 3 feature branch workflows failing (blocking PRs)
  • 发现3个空catch块(静默失败隐患)
  • 健康检查端点缺少数据库连通性检查
  • [CI] 3个功能分支工作流失败(阻塞PR合并)

P3: Low (Improvements)

P3:低优先级(优化建议)

  • Consider adding Sentry performance monitoring
  • Health endpoint could include more service checks
  • 建议添加Sentry性能监控
  • 健康检查端点可增加更多服务状态检查

Summary

总结

  • P0: 1 | P1: 3 | P2: 3 | P3: 2
  • Recommendation: Fix P0 immediately, then fix main branch CI
undefined
  • P0:1项 | P1:3项 | P2:3项 | P3:2项
  • 建议:立即修复P0问题,随后修复主分支CI流程
undefined

Priority Mapping

优先级映射

SignalPriority
Active errors affecting usersP0
5xx errors, slow responsesP1
Main branch CI/CD failingP1
Feature branch CI blocking PRsP2
Silent failures, missing checksP2
Missing monitoring, improvementsP3
信号优先级
影响用户的活跃错误P0
5xx错误、响应缓慢P1
主分支CI/CD失败P1
功能分支CI阻塞PRP2
静默失败、缺失检查P2
缺失监控、优化建议P3

Health Endpoint Anti-Pattern

健康检查端点反模式

Health checks that lie are worse than no health check. Example:
typescript
// ❌ BAD: Reports "ok" without checking
return { status: "ok", services: { database: "ok" } };

// ✅ GOOD: Honest liveness probe (no fake service status)
return { status: "ok", timestamp: new Date().toISOString() };

// ✅ BETTER: Real readiness probe
const dbStatus = await checkDatabase() ? "ok" : "error";
return { status: dbStatus === "ok" ? "ok" : "degraded", services: { database: dbStatus } };
If you can't verify a service, don't report on it. False "ok" status masks outages.
提供虚假状态的健康检查不如没有健康检查。示例:
typescript
// ❌ 错误示例:未做实际检查就返回"ok"
return { status: "ok", services: { database: "ok" } };

// ✅ 正确示例:真实的存活探针(不返回虚假服务状态)
return { status: "ok", timestamp: new Date().toISOString() };

// ✅ 更优示例:真实的就绪探针
const dbStatus = await checkDatabase() ? "ok" : "error";
return { status: dbStatus === "ok" ? "ok" : "degraded", services: { database: dbStatus } };
如果无法验证某个服务的状态,不要上报该服务的状态。虚假的"ok"状态会掩盖故障。

Analytics Note

分析说明

This skill checks production health (errors, logs, endpoints), not product analytics.
For analytics auditing, see
/check-observability
. Note:
  • PostHog is REQUIRED for product analytics (has MCP server)
  • Vercel Analytics is NOT acceptable (no CLI/API/MCP - unusable for our workflow)
If you need to investigate user behavior or funnels during incident response, query PostHog via MCP.
本技能用于检查生产环境健康状态(错误、日志、端点),不涉及产品分析。 如需进行分析审计,请查看
/check-observability
。注意:
  • PostHog是产品分析的必备工具(具备MCP服务器)
  • Vercel Analytics不符合要求(无CLI/API/MCP,无法融入我们的工作流)
如果在事件响应过程中需要调查用户行为或转化漏斗,请通过MCP查询PostHog。

Related

相关技能

  • /log-production-issues
    - Create GitHub issues from findings
  • /triage
    - Fix production issues
  • /observability
    - Set up monitoring infrastructure
  • /log-production-issues
    - 根据检查结果创建GitHub问题工单
  • /triage
    - 修复生产环境问题
  • /observability
    - 搭建监控基础设施