ops-inspector
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseStandalone Install Note
独立安装说明
If this environment only installed the current skill, start from the CloudBase main entry and use the published paths for sibling skills.
cloudbase/references/...- CloudBase main entry:
https://cnb.cool/tencent/cloud/cloudbase/cloudbase-skills/-/git/raw/main/skills/cloudbase/SKILL.md - Current skill raw source:
https://cnb.cool/tencent/cloud/cloudbase/cloudbase-skills/-/git/raw/main/skills/cloudbase/references/ops-inspector/SKILL.md
Keep local paths for files that ship with the current skill directory. When this file points to a sibling skill such as or , use the standalone fallback URL shown next to that reference.
references/...cloud-functionscloudrun-development如果当前环境仅安装了本Skill,请从CloudBase主入口开始,使用已发布的路径访问同级Skill。
cloudbase/references/...- CloudBase主入口:
https://cnb.cool/tencent/cloud/cloudbase/cloudbase-skills/-/git/raw/main/skills/cloudbase/SKILL.md - 当前Skill原始源码:
https://cnb.cool/tencent/cloud/cloudbase/cloudbase-skills/-/git/raw/main/skills/cloudbase/references/ops-inspector/SKILL.md
对于随当前Skill目录一起提供的文件,请保留本地路径。当本文件指向或等同级Skill时,请使用该参考旁显示的独立备用URL。
references/...cloud-functionscloudrun-developmentActivation Contract
激活约定
Use this first when
优先使用场景
- The user wants to check the health or status of CloudBase resources (cloud functions, CloudRun, databases, storage, etc.).
- The user reports errors, failures, or abnormal behavior and wants a quick diagnosis.
- The user asks for an "inspection", "health check", "巡检", "诊断", or "troubleshooting" of their CloudBase environment.
- The user wants to review recent error logs across services.
- 用户需要检查CloudBase资源(云函数、CloudRun、数据库、存储等)的健康状态或运行状态时。
- 用户报告错误、故障或异常行为,需要快速诊断时。
- 用户要求对其CloudBase环境进行「巡检」「健康检查」「诊断」或「故障排查」时。
- 用户需要查看各服务近期的错误日志时。
Read before writing code if
如需编写代码请先阅读
- The inspection reveals code-level issues in cloud functions or CloudRun services — then read the relevant implementation skill before suggesting fixes.
- The user wants to fix a problem found during inspection rather than just diagnose it.
- 巡检发现云函数或CloudRun服务存在代码层面的问题——此时请先阅读相关实现Skill,再提出修复建议。
- 用户希望修复巡检中发现的问题,而非仅进行诊断时。
Then also read
同时需要阅读的Skill
- Cloud function issues -> (standalone fallback:
../cloud-functions/SKILL.md)https://cnb.cool/tencent/cloud/cloudbase/cloudbase-skills/-/git/raw/main/skills/cloudbase/references/cloud-functions/SKILL.md - CloudRun issues -> (standalone fallback:
../cloudrun-development/SKILL.md)https://cnb.cool/tencent/cloud/cloudbase/cloudbase-skills/-/git/raw/main/skills/cloudbase/references/cloudrun-development/SKILL.md - Database issues -> (standalone fallback:
../relational-database-tool/SKILL.md) orhttps://cnb.cool/tencent/cloud/cloudbase/cloudbase-skills/-/git/raw/main/skills/cloudbase/references/relational-database-tool/SKILL.md(standalone fallback:../no-sql-web-sdk/SKILL.md)https://cnb.cool/tencent/cloud/cloudbase/cloudbase-skills/-/git/raw/main/skills/cloudbase/references/no-sql-web-sdk/SKILL.md - Platform overview -> (standalone fallback:
../cloudbase-platform/SKILL.md)https://cnb.cool/tencent/cloud/cloudbase/cloudbase-skills/-/git/raw/main/skills/cloudbase/references/cloudbase-platform/SKILL.md
- 云函数问题 -> (独立备用URL:
../cloud-functions/SKILL.md)https://cnb.cool/tencent/cloud/cloudbase/cloudbase-skills/-/git/raw/main/skills/cloudbase/references/cloud-functions/SKILL.md - CloudRun问题 -> (独立备用URL:
../cloudrun-development/SKILL.md)https://cnb.cool/tencent/cloud/cloudbase/cloudbase-skills/-/git/raw/main/skills/cloudbase/references/cloudrun-development/SKILL.md - 数据库问题 -> (独立备用URL:
../relational-database-tool/SKILL.md)或https://cnb.cool/tencent/cloud/cloudbase/cloudbase-skills/-/git/raw/main/skills/cloudbase/references/relational-database-tool/SKILL.md(独立备用URL:../no-sql-web-sdk/SKILL.md)https://cnb.cool/tencent/cloud/cloudbase/cloudbase-skills/-/git/raw/main/skills/cloudbase/references/no-sql-web-sdk/SKILL.md - 平台概览 -> (独立备用URL:
../cloudbase-platform/SKILL.md)https://cnb.cool/tencent/cloud/cloudbase/cloudbase-skills/-/git/raw/main/skills/cloudbase/references/cloudbase-platform/SKILL.md
Do NOT use for
请勿使用场景
- Deploying new resources or writing application code. This skill is read-only and diagnostic.
- Replacing proper monitoring/alerting infrastructure. It provides point-in-time inspection, not continuous monitoring.
- Directly fixing problems — it diagnoses and recommends; actual fixes should use the appropriate implementation skill.
- 部署新资源或编写应用代码。本Skill仅用于只读诊断。
- 替代正规的监控/告警基础设施。它仅提供特定时间点的巡检,而非持续监控。
- 直接修复问题——本Skill仅负责诊断并给出建议,实际修复需使用对应的实现Skill。
Common mistakes / gotchas
常见错误/注意事项
- Running a full inspection without first confirming the environment is bound (tool must show logged-in and env-bound state).
auth - Ignoring CLS log service status — if CLS is not enabled, will fail; always check first with
queryLogs.queryLogs(action="checkLogService") - Searching logs without a time range — this can return excessive or irrelevant results. Always scope searches to a relevant time window.
- Treating a single error log as the root cause without correlating across resources. A function error may stem from a database or config issue.
- 未确认环境已绑定就运行全面巡检(工具必须显示已登录且已绑定环境状态)。
auth - 忽略CLS日志服务状态——如果CLS未启用,会执行失败;请始终先通过
queryLogs检查状态。queryLogs(action="checkLogService") - 未指定时间范围就搜索日志——这可能返回过多或无关结果。请始终将搜索范围限定在相关时间窗口内。
- 未关联跨资源信息就将单个错误日志视为根本原因。函数错误可能源于数据库或配置问题。
Minimal checklist
最小检查清单
- Environment is bound and accessible ()
envQuery(action="info") - CLS log service is enabled ()
queryLogs(action="checkLogService") - All target resources are listed before diving into details
- Time range is specified for any log searches
- Findings are summarized with severity levels and actionable recommendations
- 环境已绑定且可访问()
envQuery(action="info") - CLS日志服务已启用()
queryLogs(action="checkLogService") - 在深入细节前,已列出所有目标资源
- 所有日志搜索均指定了时间范围
- 检查结果按严重程度分级汇总,并附上可执行建议
How to use this skill (for a coding agent)
如何使用本Skill(面向编码Agent)
Inspection Modes
巡检模式
The skill supports two modes based on user intent:
| Mode | When to use | Scope |
|---|---|---|
| Full inspection | User asks for a general health check / 巡检 / 全面检查 | All resource types in the environment |
| Targeted inspection | User reports a specific error or asks about a specific resource | One resource type or a specific resource |
本Skill根据用户意图支持两种模式:
| 模式 | 使用场景 | 范围 |
|---|---|---|
| 全面巡检 | 用户要求进行常规健康检查/巡检/全面检查 | 环境内所有资源类型 |
| 定向巡检 | 用户报告特定错误或询问特定资源情况 | 单一资源类型或特定资源 |
Full Inspection Workflow
全面巡检流程
Follow these steps in order for a comprehensive environment health check:
Step 1 — Environment Check
envQuery(action="info")Confirm the environment is accessible. Record the for console link generation.
envIdStep 2 — Log Service Status
queryLogs(action="checkLogService")If CLS is not enabled, note this as a warning — log-based diagnosis will be unavailable. Recommend enabling CLS in the console:
https://tcb.cloud.tencent.com/dev?envId=${envId}#/devops/logStep 3 — Cloud Functions Inspection
queryFunctions(action="listFunctions")For each function, check:
- Status: Is the function in an active/deployed state?
- Recent errors:
queryFunctions(action="listFunctionLogs", functionName="<name>", startTime="<recent>") - Common issues:
- Timeout errors (execution exceeded limit)
- Memory limit exceeded
- Runtime errors (unhandled exceptions)
- Cold start frequency
Step 4 — CloudRun Services Inspection
queryCloudRun(action="list")For each service, check:
- Status: Is the service running?
- Detail:
queryCloudRun(action="detail", detailServerName="<name>") - Common issues:
- Service not running (scaled to zero or crashed)
- Image pull failures
- OOMKilled events
- Health check failures
Step 5 — Error Log Aggregation (if CLS is enabled)
queryLogs(action="searchLogs", queryString="ERROR", service="tcb", startTime="<24h-ago>", limit=50)
queryLogs(action="searchLogs", queryString="ERROR", service="tcbr", startTime="<24h-ago>", limit=50)Look for patterns:
- Repeated error messages (same error many times)
- Cascading failures (errors in multiple services around the same time)
- Timeout patterns
Step 6 — Summary Report
Generate a structured report:
markdown
undefined请按以下步骤完成环境全面健康检查:
步骤1 — 环境检查
envQuery(action="info")确认环境可访问。记录用于生成控制台链接。
envId步骤2 — 日志服务状态检查
queryLogs(action="checkLogService")如果CLS未启用,请标记为警告——基于日志的诊断将无法进行。建议在控制台启用CLS:
https://tcb.cloud.tencent.com/dev?envId=${envId}#/devops/log步骤3 — 云函数巡检
queryFunctions(action="listFunctions")对每个函数,检查以下内容:
- 状态:函数是否处于活跃/已部署状态?
- 近期错误:
queryFunctions(action="listFunctionLogs", functionName="<name>", startTime="<recent>") - 常见问题:
- 超时错误(执行时间超过限制)
- 内存超限
- 运行时错误(未处理异常)
- 冷启动频率
步骤4 — CloudRun服务巡检
queryCloudRun(action="list")对每个服务,检查以下内容:
- 状态:服务是否在运行?
- 详情:
queryCloudRun(action="detail", detailServerName="<name>") - 常见问题:
- 服务未运行(已缩容至零或崩溃)
- 镜像拉取失败
- OOMKilled事件
- 健康检查失败
步骤5 — 错误日志汇总(CLS已启用时)
queryLogs(action="searchLogs", queryString="ERROR", service="tcb", startTime="<24h-ago>", limit=50)
queryLogs(action="searchLogs", queryString="ERROR", service="tcbr", startTime="<24h-ago>", limit=50)查找以下模式:
- 重复出现的错误消息(同一错误多次出现)
- 连锁故障(多个服务在同一时间段出现错误)
- 超时模式
步骤6 — 汇总报告
生成结构化报告:
markdown
undefinedCloudBase Resource Inspection Report
CloudBase资源巡检报告
Environment: ${envId}
Inspection Time: ${timestamp}
环境:${envId}
巡检时间:${timestamp}
Overall Health: ✅ Healthy / ⚠️ Warnings Found / ❌ Issues Found
整体健康状态:✅ 健康 / ⚠️ 发现警告 / ❌ 发现问题
Cloud Functions
云函数
| Function | Status | Recent Errors | Severity |
|---|---|---|---|
| ... | ... | ... | ... |
| 函数 | 状态 | 近期错误 | 严重程度 |
|---|---|---|---|
| ... | ... | ... | ... |
CloudRun Services
CloudRun服务
| Service | Status | Issues | Severity |
|---|---|---|---|
| ... | ... | ... | ... |
| 服务 | 状态 | 问题 | 严重程度 |
|---|---|---|---|
| ... | ... | ... | ... |
Error Log Summary
错误日志汇总
- Total errors in last 24h: N
- Top error patterns: ...
- 过去24小时总错误数:N
- 高频错误模式:...
Recommendations
建议
- ...
- ...
- ...
- ...
Console Links
控制台链接
- Cloud Functions: https://tcb.cloud.tencent.com/dev?envId=${envId}#/scf
- CloudRun: https://tcb.cloud.tencent.com/dev?envId=${envId}#/platform-run
- Logs: https://tcb.cloud.tencent.com/dev?envId=${envId}#/devops/log
undefinedTargeted Inspection Workflow
定向巡检流程
When the user specifies a resource type or a specific resource:
- Cloud function errors: then
queryFunctions(action="listFunctionLogs", functionName="<name>")queryLogs(action="searchLogs", queryString="* AND functionName:<name> AND level:ERROR", ...) - CloudRun errors: then
queryCloudRun(action="detail", detailServerName="<name>")queryLogs(action="searchLogs", queryString="ERROR", service="tcbr", ...) - Database issues: Check or
querySqlDatabasedepending on typereadNoSqlDatabaseStructure - General error search:
queryLogs(action="searchLogs", queryString="<error-keyword>", ...)
当用户指定资源类型或特定资源时:
- 云函数错误:先执行,再执行
queryFunctions(action="listFunctionLogs", functionName="<name>")queryLogs(action="searchLogs", queryString="* AND functionName:<name> AND level:ERROR", ...) - CloudRun错误:先执行,再执行
queryCloudRun(action="detail", detailServerName="<name>")queryLogs(action="searchLogs", queryString="ERROR", service="tcbr", ...) - 数据库问题:根据数据库类型,使用或
querySqlDatabase检查readNoSqlDatabaseStructure - 通用错误搜索:
queryLogs(action="searchLogs", queryString="<error-keyword>", ...)
AIOps Methodology
AIOps方法论
This skill follows AIOps principles for intelligent inspection:
- Data Collection: Gather logs and resource states via MCP tools
- Pattern Recognition: Identify recurring errors, anomaly patterns, and correlations across services
- Root Cause Hypothesis: Based on error patterns, suggest likely root causes (e.g., a function timeout may be caused by a database query bottleneck)
- Actionable Recommendations: Provide specific, prioritized remediation steps with links to relevant skills and console pages
本Skill遵循AIOps原则进行智能巡检:
- 数据收集:通过MCP工具收集日志和资源状态
- 模式识别:识别重复错误、异常模式及跨服务关联关系
- 根本原因假设:基于错误模式,推测可能的根本原因(例如,函数超时可能由数据库查询瓶颈导致)
- 可执行建议:提供具体、分优先级的修复步骤,并附上相关Skill和控制台页面链接
Severity Levels
严重程度分级
| Level | Icon | Meaning |
|---|---|---|
| Critical | ❌ | Service is down or data is at risk; requires immediate action |
| Warning | ⚠️ | Errors detected but service is still partially functional; investigate soon |
| Info | ℹ️ | No errors found; informational status only |
| Healthy | ✅ | Resource is operating normally |
| 级别 | 图标 | 含义 |
|---|---|---|
| 严重 | ❌ | 服务已宕机或数据面临风险;需立即处理 |
| 警告 | ⚠️ | 检测到错误但服务仍可部分运行;请尽快排查 |
| 信息 | ℹ️ | 未发现错误;仅为信息性状态 |
| 健康 | ✅ | 资源运行正常 |
Preferred Tool Map
推荐工具映射
| Operation | MCP Tool Call |
|---|---|
| Check environment | |
| Check CLS status | |
| List cloud functions | |
| Get function detail | |
| Get function logs | |
| Get function log detail | |
| List CloudRun services | |
| Get CloudRun detail | |
| Search CLS logs | |
| Check NoSQL structure | |
| Check MySQL status | |
| 操作 | MCP工具调用 |
|---|---|
| 检查环境 | |
| 检查CLS状态 | |
| 列出云函数 | |
| 获取函数详情 | |
| 获取函数日志 | |
| 获取函数日志详情 | |
| 列出CloudRun服务 | |
| 获取CloudRun详情 | |
| 搜索CLS日志 | |
| 检查NoSQL结构 | |
| 检查MySQL状态 | |
Common CLS Query Patterns
常见CLS查询模式
| Scenario | queryString |
|---|---|
| All errors | |
| Function timeout | |
| Function OOM | |
| CloudRun crash | |
| Specific function errors | |
| 5xx HTTP errors | |
| Cold start issues | |
| 场景 | queryString |
|---|---|
| 所有错误 | |
| 函数超时 | |
| 函数内存超限 | |
| CloudRun崩溃 | |
| 特定函数错误 | |
| 5xx HTTP错误 | |
| 冷启动问题 | |
Time Range Guidance
时间范围指南
- Quick check: Last 1 hour (= 1 hour ago)
startTime - Standard inspection: Last 24 hours
- Trend analysis: Last 7 days
- Specific incident: Narrow to the reported time window
Always use ISO 8601 format for /, e.g., .
startTimeendTime"2025-01-15 00:00:00"- 快速检查:过去1小时(= 1小时前)
startTime - 标准巡检:过去24小时
- 趋势分析:过去7天
- 特定事件:缩小至报告的时间窗口
startTimeendTime"2025-01-15 00:00:00"Related Skills
相关Skill
- — Cloud function development, deployment, and debugging
cloud-functions - — CloudRun backend deployment and management
cloudrun-development - — General platform knowledge and console navigation
cloudbase-platform - — MySQL database management and diagnostics
relational-database-tool
- — 云函数开发、部署与调试
cloud-functions - — CloudRun后端部署与管理
cloudrun-development - — 通用平台知识与控制台导航
cloudbase-platform - — MySQL数据库管理与诊断
relational-database-tool