dogfood
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDogfood Skill
Dogfood技能
VCS Provider
VCS提供商
This skill uses VCS operations through Exarchos MCP actions (, etc.).
These actions automatically detect and route to the correct VCS provider (GitHub, GitLab, Azure DevOps).
No // commands needed — the MCP server handles provider dispatch.
create_issueghglabaz该技能通过Exarchos MCP操作(等)使用VCS操作。
这些操作会自动检测并路由到正确的VCS提供商(GitHub、GitLab、Azure DevOps)。
无需使用//命令——MCP服务器会处理提供商的调度。
create_issueghglabazOverview
概述
Retrospective analysis of Exarchos MCP tool usage. Uses the MCP server's own self-service capabilities as the primary diagnostic instrument — describe APIs, views, playbooks, and runbooks turned inward to diagnose failures.
Three distinct failure modes require different fixes — code changes, documentation updates, or skill instruction improvements. Mixing them wastes effort.
对Exarchos MCP工具使用情况的回顾性分析。将MCP服务器自身的自助服务能力作为主要诊断工具——描述用于诊断故障的API、视图、剧本(playbook)和运行手册(runbook)。
三种不同的故障模式需要不同的修复方式——代码变更、文档更新或技能指令改进。混淆它们会浪费精力。
Platform-Agnosticity
平台无关性
Per : the MCP server is the self-sufficient, platform-agnostic core. The debug trace relies entirely on MCP tools — not conversation introspection — so it works for any MCP client. Conversation scanning is supplementary.
docs/designs/2026-03-09-platform-agnosticity.mdDiagnostic self-service tools: for HSM verification, for adherence checks, for event schema/catalog comparison, for schema/gate metadata, for step conformance, // views for health metrics.
describe(topology)describe(playbook)describe(eventTypes, emissionGuide)describe(actions)runbook(phase)pipelineconvergencetelemetry根据:MCP服务器是自给自足、平台无关的核心。调试跟踪完全依赖MCP工具——而非对话自省——因此它适用于任何MCP客户端。对话扫描仅作为补充。
docs/designs/2026-03-09-platform-agnosticity.md诊断自助工具: 用于HSM验证,用于合规性检查,用于事件模式/目录对比,用于模式/网关元数据,用于步骤一致性检查,//视图用于健康指标。
describe(topology)describe(playbook)describe(eventTypes, emissionGuide)describe(actions)runbook(phase)pipelineconvergencetelemetryTriggers
触发条件
Activate this skill when:
- User runs or
/dogfood/dogfood - User asks "what went wrong this session" or "review the failures"
- User wants to triage errors from a workflow run
- End of a workflow session to capture learnings
在以下情况激活该技能:
- 用户运行或
/dogfood命令/dogfood - 用户询问“本次会话出了什么问题”或“审查故障”
- 用户想要分类处理工作流运行中的错误
- 工作流会话结束时捕获经验教训
Process
流程
Step 1: Debug Trace via MCP Self-Service
步骤1:通过MCP自助服务进行调试跟踪
Query the MCP server's own self-service capabilities to build a ground-truth diagnostic picture. This is the primary investigation method — it uses the same tools any MCP client has access to.
查询MCP服务器自身的自助服务能力,构建真实的诊断画面。这是主要的调查方法——它使用任何MCP客户端都能访问的相同工具。
1a. Identify Active Workflows
1a. 识别活跃工作流
Use with to get an aggregated view of active workflows with their phases and task counts.
exarchos_viewaction: "pipeline"If specifies a workflow or feature ID, scope to that workflow. Otherwise, inspect all non-terminal workflows.
$ARGUMENTS使用带参数的获取包含阶段和任务计数的活跃工作流聚合视图。
action: "pipeline"exarchos_view如果指定了工作流或功能ID,则限定到该工作流。否则,检查所有非终端工作流。
$ARGUMENTS1b. Inspect Workflow State and Topology
1b. 检查工作流状态与拓扑
For each relevant workflow:
- Read state — to retrieve current phase, tasks, reviews, gate results.
exarchos_workflow get - Read topology — to get the HSM definition. Compare the agent's phase transition attempts against valid transitions. Invalid transition attempts = documentation issue (skill prescribed wrong path) or user error.
exarchos_workflow describe(topology: "<workflowType>") - Check guard prerequisites — For events, look up the guard in the topology to understand unmet preconditions.
workflow.guard-failed
对于每个相关工作流:
- 读取状态 — 使用获取当前阶段、任务、审核结果和网关结果。
exarchos_workflow get - 读取拓扑 — 使用获取HSM定义。将代理的阶段转换尝试与有效转换进行对比。无效的转换尝试=文档问题(技能指定了错误路径)或用户错误。
exarchos_workflow describe(topology: "<workflowType>") - 检查网关先决条件 — 对于事件,在拓扑中查找网关以了解未满足的前置条件。
workflow.guard-failed
1c. Playbook Adherence Check
1c. 剧本合规性检查
Use to retrieve phase playbooks. For each phase executed, compare playbook's , , , , , and against what the agent actually did and what skill docs prescribe.
exarchos_workflow describe(playbook: "<workflowType>")toolseventstransitionCriteriaguardPrerequisiteshumanCheckpointcompactGuidancePlaybook violations are diagnostic gold:
- Agent deviated and skill docs told it to → documentation issue (skill contradicts playbook)
- Agent deviated and skill docs agree with playbook → user error
- Playbook is wrong (prescribes invalid tools/events) → code bug
使用获取阶段剧本。对于每个已执行的阶段,将剧本的、、、、和与代理实际执行的操作以及技能文档的规定进行对比。
exarchos_workflow describe(playbook: "<workflowType>")toolseventstransitionCriteriaguardPrerequisiteshumanCheckpointcompactGuidance剧本违规是诊断关键:
- 代理偏离操作且技能文档要求如此 → 文档问题(技能与剧本矛盾)
- 代理偏离操作但技能文档与剧本一致 → 用户错误
- 剧本存在错误(指定了无效的工具/事件) → 代码缺陷
1d. Event Log Analysis
1d. 事件日志分析
Use on the workflow's event stream. Look for:
exarchos_event query(stream)- Rejected events — absent from log despite agent attempts (corroborate with conversation errors)
- Missing events — compare against playbook field and
events. Missing model-emitted events = documentation gap or user error.exarchos_event describe(emissionGuide: true) - Sequence anomalies — wrong order, duplicates, or timeline gaps
- Schema mismatches — use to get authoritative JSON Schema. Compare actual payloads against schema for semantically wrong fields.
describe(eventTypes: [...])
对工作流的事件流使用。查找:
exarchos_event query(stream)- 被拒绝的事件 — 代理尝试发送但未出现在日志中(与对话错误相互印证)
- 缺失的事件 — 与剧本的字段和
events的结果对比。模型未发出预期事件=文档缺失或用户错误。exarchos_event describe(emissionGuide: true) - 序列异常 — 顺序错误、重复或时间线间隙
- 模式不匹配 — 使用获取权威JSON Schema。将实际负载与模式对比,检查语义错误的字段。
describe(eventTypes: [...])
1e. Orchestrate Action and Gate Analysis
1e. 编排操作与网关分析
- Schema verification — for authoritative schemas. Compare agent's parameters against schema to detect stale skill docs or improvisation.
exarchos_orchestrate describe(actions: [...]) - Gate metadata — Describe output includes . Check: did the agent treat blocking/non-blocking correctly? Did expected auto-emissions fire?
{ blocking, dimension, autoEmits } - Gate convergence — for per-dimension (D1-D5) pass rates. Low convergence suggests systemic gate issues.
exarchos_view convergence
- 模式验证 — 使用获取权威模式。将代理的参数与模式对比,检测过时的技能文档或即兴操作。
exarchos_orchestrate describe(actions: [...]) - 网关元数据 — 描述输出包含。检查:代理是否正确处理了阻塞/非阻塞?预期的自动事件是否触发?
{ blocking, dimension, autoEmits } - 网关收敛性 — 使用查看各维度(D1-D5)的通过率。低收敛性表明存在系统性网关问题。
exarchos_view convergence
1f. Runbook Conformance Check
1f. 运行手册一致性检查
Use to retrieve relevant runbooks. Check: step ordering, decision branch correctness (steps with fields), directive adherence (//), and completeness.
exarchos_orchestrate runbook(phase)decideonFailstopcontinueretrytemplateVars使用获取相关运行手册。检查:步骤顺序、决策分支正确性(含字段的步骤)、指令合规性(//)以及的完整性。
exarchos_orchestrate runbook(phase)decideonFailstopcontinueretrytemplateVars1g. Telemetry Review
1g. 遥测数据审查
Use for per-tool performance. Flag: high error rates (systemic issues), high invocation counts (retry loops), and tools never invoked that the playbook prescribes.
exarchos_view telemetry使用查看各工具的性能。标记:高错误率(系统性问题)、高调用次数(重试循环)以及剧本规定但从未被调用的工具。
exarchos_view telemetryStep 2: Scan Session for Failed Tool Calls
步骤2:扫描会话中的失败工具调用
Supplement the debug trace with client-side context — review conversation for failed Exarchos tool calls.
Note: Platform-dependent step (requires conversation history). Skip on platforms without introspection; the debug trace is self-sufficient.
Target tools: , , , ,
exarchos_workflowexarchos_eventexarchos_orchestrateexarchos_viewexarchos_syncError signals: , , , Zod failures (, , ), , , , CAS exhaustion, retry sequences, successful-after-retry calls.
INVALID_INPUTVALIDATION_ERRORBATCH_APPEND_FAILEDinvalid_typeinvalid_enum_valueunrecognized_keysENOENTCLAIM_FAILEDSEQUENCE_CONFLICT使用客户端上下文补充调试跟踪——审查对话中的Exarchos工具失败调用。
注意: 该步骤依赖平台(需要对话历史)。在不支持自省的平台上跳过此步骤;调试跟踪已足够。
目标工具: 、、、、
exarchos_workflowexarchos_eventexarchos_orchestrateexarchos_viewexarchos_sync错误信号: 、、、Zod失败(、、)、、、、CAS耗尽、重试序列、重试后成功的调用。
INVALID_INPUTVALIDATION_ERRORBATCH_APPEND_FAILEDinvalid_typeinvalid_enum_valueunrecognized_keysENOENTCLAIM_FAILEDSEQUENCE_CONFLICTStep 3: Diagnose Each Failure
步骤3:诊断每个故障
Merge debug trace and conversation scan findings. For each failure document:
- What was attempted — action, parameters, intent
- What went wrong — error message and validation path
- Server-side evidence — event log, state, describe output, views
- Authoritative reference — the self-service query providing ground truth (playbook, topology, schema, runbook)
- Root cause — per
references/root-cause-patterns.md - Fix category — code, docs, or user behavior
Flag discrepancies only visible via server-side inspection as trace-only findings.
合并调试跟踪和对话扫描的结果。为每个故障记录:
- 尝试的操作 — 动作、参数、意图
- 故障情况 — 错误消息和验证路径
- 服务器端证据 — 事件日志、状态、描述输出、视图
- 权威参考 — 提供真实依据的自助查询(剧本、拓扑、模式、运行手册)
- 根本原因 — 根据
references/root-cause-patterns.md - 修复类别 — 代码、文档或用户行为
将仅通过服务器端检查发现的差异标记为仅跟踪发现。
Step 4: Categorize into Buckets
步骤4:分类到对应类别
Assign each failure to exactly one root cause bucket:
将每个故障分配到恰好一个根本原因类别:
Bucket 1: Code Bug
类别1:代码缺陷
The MCP server, event store, or workflow engine has a defect.
Signals: Schema rejects valid input (confirmed via ), CAS failures with no concurrent writers, gate over-enforcement, identical-parameter retry succeeds (race condition), state corruption, topology/engine mismatch, auto-emission failure.
describeAction: File bug issue with reproduction steps, expected vs actual, and suggested fix.
MCP服务器、事件存储或工作流引擎存在缺陷。
信号: 模式拒绝有效输入(通过确认)、无并发写入时的CAS失败、网关过度限制、相同参数重试成功(竞态条件)、状态损坏、拓扑/引擎不匹配、自动事件触发失败。
describe操作: 创建缺陷工单,包含复现步骤、预期与实际结果以及建议修复方案。
Bucket 2: Documentation Issue
类别2:文档问题
Skill docs are wrong, incomplete, or out of sync with the MCP server's self-service output.
Signals: Skill payload doesn't match schema, skill/playbook divergence, skill documents nonexistent topology paths, missing event types (compare emission guide), retry-based field discovery, runbook/skill contradictions, compactGuidance drift.
describeAction: File docs issue with file:line, the discrepancy, and correct information from output.
describe技能文档存在错误、不完整或与MCP服务器的自助服务输出不同步。
信号: 技能负载与模式不匹配、技能/剧本不一致、技能文档记录了不存在的拓扑路径、缺失事件类型(与事件触发指南对比)、基于重试的字段发现、运行手册/技能矛盾、compactGuidance偏离。
describe操作: 创建文档工单,包含文件:行号、差异点以及来自输出的正确信息。
describeBucket 3: User Error
类别3:用户错误
The agent misused a tool in a way both docs and output correctly describe.
describeSignals: Format mismatch (confirmed by + docs agreement), invalid sequence (topology confirms), missing context both skill and playbook prescribe, runbook deviation without justification.
describeAction: Note for skill improvement if errors are frequent.
代理以技能文档和输出均正确描述的方式误用了工具。
describe信号: 格式不匹配(通过+文档一致确认)、无效序列(拓扑确认)、技能和剧本均要求的上下文缺失、无正当理由偏离运行手册。
describe操作: 如果错误频繁发生,记录下来用于技能改进。
Step 5: Generate Report
步骤5:生成报告
Produce the report using the template from . Include:
references/report-template.md- Summary counts per bucket
- Debug trace summary (workflows inspected, events reviewed, describe queries issued, views consulted)
- Each failure with full diagnosis (including authoritative self-service references)
- Trace-only findings section (issues only visible via server-side inspection)
- Playbook/runbook adherence summary
- Actionable next steps (draft issue bodies for bugs/docs issues)
使用中的模板生成报告。包含:
references/report-template.md- 每个类别的汇总计数
- 调试跟踪摘要(检查的工作流数量、审查的事件数量、执行的describe查询数量、查阅的视图、仅跟踪发现的数量)
- 每个故障的完整诊断(包括权威自助服务参考)
- 仅跟踪发现部分(仅通过服务器端检查发现的问题)
- 剧本/运行手册合规性摘要
- 可执行的下一步操作(缺陷/文档工单的草稿内容)
Step 6: Offer to File Issues
步骤6:提议创建工单
For findings in the Code Bug and Documentation Issue buckets, offer to create GitHub issues:
typescript
exarchos_orchestrate({ action: "create_issue", title: "<type>: <summary>", body: "<issue body>", labels: ["bug"] })Only file issues with user confirmation — present the draft first.
对于代码缺陷和文档问题类别中的发现,提议创建GitHub工单:
typescript
exarchos_orchestrate({ action: "create_issue", title: "<type>: <summary>", body: "<issue body>", labels: ["bug"] })仅在用户确认后创建工单——先展示草稿。
Required Output Format
要求的输出格式
json
{
"session_summary": {
"total_tool_calls": 0,
"failed_tool_calls": 0,
"failure_rate": "0%",
"debug_trace": {
"workflows_inspected": 0,
"events_reviewed": 0,
"describe_queries": 0,
"views_consulted": [],
"trace_only_findings": 0
}
},
"playbook_adherence": {
"phases_checked": 0,
"violations": [
{
"phase": "delegate",
"field": "events",
"expected": "team.spawned, team.task.assigned",
"actual": "none emitted",
"bucket": "documentation_issue"
}
]
},
"runbook_conformance": {
"runbooks_checked": 0,
"deviations": []
},
"buckets": {
"code_bug": [],
"documentation_issue": [],
"user_error": []
},
"findings": [
{
"id": 1,
"bucket": "code_bug | documentation_issue | user_error",
"tool": "exarchos_workflow",
"action": "set",
"error": "INVALID_INPUT: ...",
"root_cause": "Schema rejects null branch on pending tasks",
"trace_evidence": "describe(actions: ['set']) shows branch as required string; event log confirms no task.updated event",
"authoritative_ref": "exarchos_workflow describe(actions: ['set']) → TaskSchema",
"severity": "HIGH | MEDIUM | LOW",
"suggested_fix": "Accept nullable branch in TaskSchema",
"issue_draft": {
"title": "bug: workflow task schema rejects null branch",
"labels": ["bug"],
"body": "..."
}
}
],
"trace_only_findings": [
{
"id": "T1",
"description": "State drift: agent assumed phase was 'delegate' but server shows 'plan'",
"evidence": "exarchos_workflow get shows phase=plan; topology confirms plan→delegate requires planReviewComplete guard",
"authoritative_ref": "exarchos_workflow describe(topology: 'feature') → guards",
"bucket": "documentation_issue",
"suggested_fix": "Skill should instruct agent to verify phase via get before proceeding"
}
]
}json
{
"session_summary": {
"total_tool_calls": 0,
"failed_tool_calls": 0,
"failure_rate": "0%",
"debug_trace": {
"workflows_inspected": 0,
"events_reviewed": 0,
"describe_queries": 0,
"views_consulted": [],
"trace_only_findings": 0
}
},
"playbook_adherence": {
"phases_checked": 0,
"violations": [
{
"phase": "delegate",
"field": "events",
"expected": "team.spawned, team.task.assigned",
"actual": "none emitted",
"bucket": "documentation_issue"
}
]
},
"runbook_conformance": {
"runbooks_checked": 0,
"deviations": []
},
"buckets": {
"code_bug": [],
"documentation_issue": [],
"user_error": []
},
"findings": [
{
"id": 1,
"bucket": "code_bug | documentation_issue | user_error",
"tool": "exarchos_workflow",
"action": "set",
"error": "INVALID_INPUT: ...",
"root_cause": "Schema rejects null branch on pending tasks",
"trace_evidence": "describe(actions: ['set']) shows branch as required string; event log confirms no task.updated event",
"authoritative_ref": "exarchos_workflow describe(actions: ['set']) → TaskSchema",
"severity": "HIGH | MEDIUM | LOW",
"suggested_fix": "Accept nullable branch in TaskSchema",
"issue_draft": {
"title": "bug: workflow task schema rejects null branch",
"labels": ["bug"],
"body": "..."
}
}
],
"trace_only_findings": [
{
"id": "T1",
"description": "State drift: agent assumed phase was 'delegate' but server shows 'plan'",
"evidence": "exarchos_workflow get shows phase=plan; topology confirms plan→delegate requires planReviewComplete guard",
"authoritative_ref": "exarchos_workflow describe(topology: 'feature') → guards",
"bucket": "documentation_issue",
"suggested_fix": "Skill should instruct agent to verify phase via get before proceeding"
}
]
}Anti-Patterns
反模式
| Don't | Do Instead |
|---|---|
| Skip the debug trace and only scan conversation | Always query MCP self-service tools first — conversation scan is supplementary |
| Guess what the schema expects | Use |
| Assess playbook adherence from memory | Query |
| Assume the topology without checking | Query |
| Blame the user when skill docs contradict the playbook | If skill docs diverge from playbook/describe output, it's a documentation issue |
| File duplicate issues | Check existing open/closed issues before drafting |
| Categorize retries as separate failures | Group retry sequences as a single finding |
| Ignore successful-after-retry calls | These reveal friction even though they eventually worked |
| Include non-Exarchos failures | Scope strictly to the 5 Exarchos tools — other MCP failures are out of scope |
| Report conversation-only findings without trace corroboration | Cross-reference every finding with server-side state when possible |
| 不要做 | 正确做法 |
|---|---|
| 跳过调试跟踪,仅扫描对话 | 始终先查询MCP自助服务工具——对话扫描仅作为补充 |
| 猜测模式的要求 | 使用 |
| 凭记忆评估剧本合规性 | 查询 |
| 不检查就假设拓扑 | 查询 |
| 当技能文档与剧本矛盾时指责用户 | 如果技能文档与剧本/describe输出不一致,这是文档问题 |
| 创建重复工单 | 起草前检查现有开放/已关闭工单 |
| 将重试归类为单独的故障 | 将重试序列归为单个发现 |
| 忽略重试后成功的调用 | 这些调用即使最终成功也揭示了摩擦点 |
| 包含非Exarchos故障 | 严格限定为5个Exarchos工具——其他MCP故障不在范围内 |
| 报告仅对话发现而无跟踪佐证 | 尽可能用服务器端状态交叉验证每个发现 |