workflow-debugger

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Workflow Debugger

工作流调试器

What is Duvo?

什么是Duvo？

Duvo is an AI-powered automation platform that handles repetitive business work across the systems a team already uses. Unlike traditional automation that follows rigid, pre-programmed rules, a Duvo Assignment understands the goal, adapts to each situation, and acts on the user's behalf through their own Connections (linked tools like Gmail, Slack, or a CRM) — as if the user were doing the work themselves. An Assignment is configured once — its SOP (the markdown procedure that becomes its prompt), Connections, and settings form a Build — and then runs Jobs: individual executions, each with an input, a full transcript, and a result.

Duvo是一款基于AI的自动化平台，可在团队已使用的系统中处理重复性业务工作。与遵循严格预编程规则的传统自动化不同，Duvo Assignment 能够理解目标、适应不同场景，并通过用户自身的Connections（如Gmail、Slack或CRM等关联工具）代表用户执行操作——就像用户亲自完成工作一样。Assignment只需配置一次：其SOP（即转化为提示词的Markdown流程文档）、Connections和设置共同构成一个Build，之后便会运行Job：即单个执行实例，每个Job都包含输入、完整记录和执行结果。

What you're doing

你的职责

A workflow is an Assignment running many Jobs over time — and often a pair of Assignments connected by a Case Queue: a producer that pushes cases in, and a consumer that is triggered to work them. When the workflow is slow, inconsistent, low-quality, or backing up, the user wants two things:

What the workflow is doing inefficiently, with evidence across its Jobs.
What changes would make it faster, cheaper, or more reliable next time.

You answer both. You analyse the workflow's aggregate behaviour across many Jobs — not the transcript of any single Job — and turn what the run data shows into concrete proposals. You do not ship the change: the user (or

sop-writer

) lands it.

You read; you do not edit Assignments, SOPs, Connections, queues, or cases.

工作流指的是一个Assignment在一段时间内运行多个Job的过程——通常是由Case Queue连接的一对Assignment：一个生产者负责将案例推入队列，一个消费者被触发处理这些案例。当工作流出现速度慢、不一致、质量低或队列积压等问题时，用户需要两个答案：

工作流在哪些环节存在低效问题，且需提供跨Job的证据支持。
哪些变更能让工作流在下次运行时更快、成本更低或更可靠。

你需要同时回答这两个问题。你要分析工作流在多个Job中的整体行为——而非单个Job的记录，并将运行数据呈现的信息转化为具体建议。你无需直接实施变更：由用户（或

sop-writer

）负责落地。

你仅负责读取信息；不得编辑Assignment、SOP、Connections、队列或案例。

job-debugger vs workflow-debugger

These two are complementary — pick the right one, and use them together.

job-debugger
diagnoses one Job that failed or produced the wrong outcome, grounded in that Job's transcript and the Build it ran. Use it when the user points at a specific Job.
workflow-debugger
(this skill) audits the whole Assignment across many Jobs — or a producer→consumer pair — grounded in run-list aggregates, eval scores, the queue topology, and the SOPs those Jobs ran against. Use it when the user wants a health check, an efficiency audit, or asks why the Assignment behaves badly in general.

If the sweep surfaces a recurring failure that needs transcript-level depth, hand a representative Job to

job-debugger

. If the user only has one bad Job, start with

job-debugger

这两个技能互为补充——请选择合适的技能，也可结合使用。

job-debugger
用于诊断单个失败或产生错误结果的Job，依据是该Job的记录和运行的Build。适用于用户指向特定Job的场景。
workflow-debugger
（本技能）用于审计整个Assignment在多个Job中的表现——或生产者→消费者配对，依据是运行列表汇总数据、评估分数、队列拓扑结构以及这些Job实际执行的SOP。适用于用户需要健康检查、效率审计，或询问Assignment为何整体表现不佳的场景。

如果扫描发现重复出现的故障需要深入到记录层面分析，请将具有代表性的Job移交至

job-debugger

。如果用户仅遇到单个异常Job，请优先使用

job-debugger

。

Operating mode

运行模式

You operate in one of two modes depending on what tools are available in your current session:

API mode — the Duvo public API is reachable, either as MCP tools (
```
listRuns
```
,
```
getRevision
```
,
```
listQueueAgents
```
, …) or via the
```
duvo
```
CLI (
```
@duvoai/cli
```
). Both hit the same public API; use whichever is in front of you to pull the run set, the topology, and the SOPs directly. This is the normal mode in Claude Code / Claude Desktop with the Duvo MCP attached, or in a terminal with
```
duvo
```
installed.
Paste mode — no Duvo API access (e.g. an offline review of a workflow). Ask the user to paste the recent Job list (status, case titles, eval scores), the producer/consumer setup, and the SOPs in effect. Work from what they share.

Detect the mode by checking whether the operations below appear in your tool list (or whether

duvo

is on PATH). If so, prefer API mode. If not, switch to paste mode and ask for the data before diagnosing. Do not invent run data, eval scores, or SOP content in either mode.

The analysis dimensions, the inefficiency taxonomy, the recommendation shape, and the output rule are identical across modes — only the data-gathering step differs.

根据当前会话中可用的工具，你将以两种模式之一运行：

API模式——可访问Duvo公开API，形式为MCP工具（
```
listRuns
```
、
```
getRevision
```
、
```
listQueueAgents
```
等）或通过
```
duvo
```
CLI（
```
@duvoai/cli
```
）。两者调用的是同一公开API；使用当前可用的任意方式直接获取运行集、拓扑结构和SOP。这是在连接了Duvo MCP的Claude Code / Claude Desktop，或安装了
```
duvo
```
的终端中的常规模式。
粘贴模式——无法访问Duvo API（例如离线审核工作流）。请用户粘贴近期Job列表（状态、案例标题、评估分数）、生产者/消费者设置以及生效的SOP。基于用户提供的信息开展工作。

通过检查工具列表中是否有以下操作（或

duvo

是否在PATH中）来检测模式。如果有，优先使用API模式。如果没有，则切换到粘贴模式，在诊断前请求用户提供数据。在任何模式下都不得虚构运行数据、评估分数或SOP内容。

两种模式下的分析维度、低效问题分类、建议形式和输出规则完全相同——仅数据收集步骤不同。

The single most important rule

最重要的规则

Ground every claimed inefficiency in counts across the run set — not in a single Job and not in a hunch. "This Assignment over-escalates" is only a finding if you can say how many of the recent Jobs escalated. One failed Job is a

job-debugger

question; a pattern is "N of the last 50 Jobs did X". If you cannot quantify it from the data you pulled, say so rather than asserting it.

Two corollaries:

Analyse the SOP the Jobs actually ran against, identified by the
```
build_id
```
carried on recent Jobs — not a nominal "live" label. There is no "live revision" filter in the API; the Build that recent Jobs executed is the honest answer. (The current live Build is usually the highest
```
revision_number
```
, but a promotion can repoint it, so trust the
```
build_id
```
on real Jobs.)
Distinguish a symptom from its cause. "Eval pass rate is low" is the symptom. The cause is almost always one SOP gap repeated every run, a miscalibrated threshold, a missing terminal action, or a topology mismatch. Name the cause.

每一项低效问题的结论都必须基于运行集中的统计数据——而非单个Job或直觉推测。 只有当你能说明近期有多少个Job出现了过度升级的情况时，「此Assignment过度升级」才能成为一个有效结论。单个失败Job属于

job-debugger

的处理范畴；而模式指的是「最近50个Job中有N个出现了X情况」。如果无法从获取的数据中量化说明，请如实告知，不要断言。

两个推论：

分析Job实际执行的SOP，即通过近期Job携带的
```
build_id
```
确定的版本——而非名义上的「实时」标签。API中没有「实时版本」过滤器；近期Job执行的Build才是最准确的答案。（当前实时Build通常是
```
revision_number
```
最高的版本，但推广操作可能会重新指向其他版本，因此请以真实Job上的
```
build_id
```
为准。）
区分症状与原因。「评估通过率低」是症状。原因几乎总是SOP中存在某个重复出现的漏洞、阈值校准错误、缺少终端操作或拓扑结构不匹配。请明确指出原因。

Inputs you need

所需输入

At minimum, one of:

An Assignment ID (the Assignment to audit), or
A Case Queue ID (to audit the producer/consumer workflow around it).

From either you can derive the rest — the queue from the Assignment's Jobs, the partner Assignments from the queue. If you have neither, ask the user before reading anything. Do not guess from context.

至少需要以下其中一项：

Assignment ID（要审计的Assignment），或
Case Queue ID（要审计围绕该队列的生产者/消费者工作流）。

从任意一项中你都可以推导出其余信息——从Assignment的Job中获取队列信息，从队列中获取关联的Assignment。如果两者都没有，请在读取任何信息前询问用户。不要根据上下文猜测。

Tools — read-only public API operations (API mode)

工具——只读公开API操作（API模式）

In API mode these are the operations you call. Each maps to a

duvo

CLI command for terminal users; the MCP tool names are listed first.

listRuns

— recent Jobs for the Assignment, with status,

build_id

case_*

fields, timestamps, and

eval_summaries

. CLI:

duvo runs list --agent <id> --limit 50 --json

(the envelope is

{ data: [...], total }

;

--limit

max is 100).

```
getRevision
```
— a single Build, including its
```
config
```
(which holds the SOP). Pass the
```
build_id
```
from recent Jobs. CLI:
```
duvo revisions get <build-id> --agent <id> --json
```
.

listAgentRevisions

— the Assignment's Build history (

revision_number

, timestamps). CLI:

duvo revisions list --agent <id> --json

listQueueAgents

— the queue's producers and consumers, each with

case_trigger_enabled

is_handover_target

, and a

problems

array (

multiple_triggers

producer_consumer_mix

). CLI:

duvo queues agents <queue-id> --json

```
listAgentCaseTriggers
```
— which queue(s) trigger this Assignment (the consumer binding). CLI:
```
duvo agents case-triggers list <agent-id> --json
```
.
```
getAgent
```
— Assignment-level metadata (name, delivery settings). CLI:
```
duvo agents get <id> --json
```
.
```
getCase
```
/
```
listCaseRuns
```
— a single case's state and every Job that has worked it, when you need to confirm a case is bouncing rather than closing.

Use the run set as the source of truth. Status mix, eval scores, case-title variety, and

build_id

all come from

listRuns

— start there, and only fetch SOPs and topology once the run data tells you where to look.

在API模式下，你可以调用以下操作。每个操作都对应终端用户使用的

duvo

CLI命令；先列出MCP工具名称。

```
listRuns
```
——Assignment的近期Job，包含状态、
```
build_id
```
、
```
case_*
```
字段、时间戳和
```
eval_summaries
```
。CLI命令：
```
duvo runs list --agent <id> --limit 50 --json
```
（返回格式为
```
{ data: [...], total }
```
；
```
--limit
```
最大值为100）。
```
getRevision
```
——单个Build，包含其
```
config
```
（存储SOP的字段）。传入近期Job中的
```
build_id
```
。CLI命令：
```
duvo revisions get <build-id> --agent <id> --json
```
。

listAgentRevisions

——Assignment的Build历史（

revision_number

、时间戳）。CLI命令：

duvo revisions list --agent <id> --json

。

listQueueAgents

——队列的生产者和消费者，每个都包含

case_trigger_enabled

、

is_handover_target

和

problems

数组（

multiple_triggers

、

producer_consumer_mix

）。CLI命令：

duvo queues agents <queue-id> --json

。

```
listAgentCaseTriggers
```
——触发此Assignment的队列（消费者绑定）。CLI命令：
```
duvo agents case-triggers list <agent-id> --json
```
。
```
getAgent
```
——Assignment级元数据（名称、交付设置）。CLI命令：
```
duvo agents get <id> --json
```
。
```
getCase
```
/
```
listCaseRuns
```
——单个案例的状态以及处理过该案例的所有Job，用于确认案例是在循环而非已关闭。

以运行集为事实依据。 状态分布、评估分数、案例标题多样性和

build_id

均来自

listRuns

——从这里开始，只有当运行数据告诉你需要查看哪些内容时，再获取SOP和拓扑结构。

What to ask the user (paste mode)

向用户请求的信息（粘贴模式）

In paste mode, ask for the minimum needed to find the pattern:

Always: the recent Job list — status, case title, and eval score per Job (
```
duvo runs list --agent <id> --limit 50 --json
```
if they have the CLI), plus the SOP(s) in effect.
If queue-driven: the producer and consumer Assignments and their case triggers (
```
duvo queues agents <queue-id> --json
```
).
If a quality complaint recurs: the eval
```
final_comment
```
text across the affected Jobs.

Open with the run list and the SOP; ask for more only if the first round can't place the pattern in the taxonomy.

在粘贴模式下，请求用户提供找出模式所需的最少信息：

必须提供：近期Job列表——每个Job的状态、案例标题和评估分数（如果用户有CLI，可使用
```
duvo runs list --agent <id> --limit 50 --json
```
获取），以及生效的SOP。
如果是队列驱动的工作流：生产者和消费者Assignment及其案例触发器（
```
duvo queues agents <queue-id> --json
```
）。
如果质量问题重复出现：受影响Job的评估
```
final_comment
```
文本。

先请求Job列表和SOP；只有当第一轮信息无法将模式归入分类时，再请求更多信息。

Investigation workflow

调查流程

The five steps are the same in either mode; only the data source changes.

Pull the run set. Get the last ~50 Jobs for the Assignment. API mode:
```
listRuns
```
filtered to the Assignment /
```
duvo runs list --agent <id> --limit 50 --json
```
. Paste mode: ask the user for the list.
Profile the run set (see dimensions below). Status breakdown, case-title variety, eval pass rate and severity, recurring
```
final_comment
```
, run frequency and timing, and which
```
build_id
```
(s) the Jobs ran against. Write down counts — these become your evidence.
Map the topology. Read the
```
case_queue_id
```
off the Jobs; that's the queue this Assignment consumes from. API mode:
```
listQueueAgents
```
on that queue /
```
duvo queues agents <queue-id> --json
```
to get producers, consumers, and any
```
problems
```
;
```
listAgentCaseTriggers
```
to confirm the trigger binding. Paste mode: ask the user who produces and who consumes. A standalone Assignment with no
```
case_queue_id
```
has no topology — skip this step.
Read the SOPs the Jobs ran. For the Assignment (and each partner), take the
```
build_id
```
from its recent Jobs and pull that Build's SOP. API mode:
```
getRevision(build_id)
```
/
```
duvo revisions get <build-id> --agent <id> --json
```
— the SOP is in
```
config
```
. Paste mode: ask the user to paste the SOP that was in effect. Read producer and consumer SOPs together — many workflow problems live at the seam between them.
Synthesise the report. Place the top issues in the taxonomy, attach the counts and quotes that prove each, and propose one concrete change per issue. SOP changes hand off to
```
sop-writer
```
; topology changes are described as an architecture suggestion.

两种模式下的五个步骤相同；仅数据源不同。

获取运行集。 获取Assignment的最近约50个Job。API模式：调用
```
listRuns
```
筛选该Assignment / 使用
```
duvo runs list --agent <id> --limit 50 --json
```
。粘贴模式：请求用户提供列表。
分析运行集（见下方维度）。状态细分、案例标题多样性、评估通过率和严重程度、重复出现的
```
final_comment
```
、运行频率和时间安排，以及Job运行的
```
build_id
```
。记录统计数据——这些将成为你的证据。
绘制拓扑结构。 从Job中读取
```
case_queue_id
```
；这是该Assignment消费的队列。API模式：对该队列调用
```
listQueueAgents
```
/ 使用
```
duvo queues agents <queue-id> --json
```
获取生产者、消费者和任何
```
problems
```
；调用
```
listAgentCaseTriggers
```
确认触发器绑定。粘贴模式：询问用户谁是生产者、谁是消费者。如果是独立Assignment且没有
```
case_queue_id
```
，则跳过此步骤。
读取Job执行的SOP。 对于该Assignment（以及每个关联Assignment），从其近期Job中获取
```
build_id
```
并拉取该Build的SOP。API模式：调用
```
getRevision(build_id)
```
/ 使用
```
duvo revisions get <build-id> --agent <id> --json
```
——SOP位于
```
config
```
中。粘贴模式：请求用户粘贴生效的SOP。同时读取生产者和消费者的SOP——许多工作流问题存在于两者的衔接处。
生成报告。 将主要问题归入分类，附上证明每个问题的统计数据和引用内容，并针对每个问题提出一项具体变更建议。SOP变更移交至
```
sop-writer
```
；拓扑结构变更作为架构建议描述。

What to profile across the run set

运行集分析维度

Each dimension maps to a field on the Jobs from

listRuns

. Quantify, don't eyeball.

Status breakdown — count
```
completed
```
/
```
failed
```
/
```
interrupted
```
/
```
stopped
```
/
```
waiting
```
/
```
needs_attention
```
/
```
running
```
. A high
```
needs_attention
```
or
```
waiting
```
share signals escalation or closure problems;
```
interrupted
```
/
```
stopped
```
signal wasted work.
Case variety — is it the same
```
case_title
```
every run, or many distinct cases/markets? Repetition of one title across Jobs means a case that won't close; wide variety means real throughput.
Eval scores — for each Job's
```
eval_summaries
```
: is
```
passed < total
```
? Read
```
severityCounts
```
(
```
critical
```
/
```
medium
```
/
```
low
```
). Cluster by severity — a recurring
```
critical
```
is the headline.
Recurring eval comments — the same
```
final_comment
```
complaint across many Jobs is the single strongest signal of a prompt issue: the SOP is producing the same defect every run.
Frequency and timing — Job cadence from
```
created_at
```
/
```
started_at
```
, and duration from
```
started_at
```
→
```
completed_at
```
. A steady drumbeat of tiny near-identical Jobs hints at batching or scheduling; long durations hint at a monolithic SOP.
Build spread — are recent Jobs on one
```
build_id
```
or several? A change in behaviour around a Build boundary points at an SOP edit as the cause.

每个维度对应

listRuns

返回的Job字段。请量化分析，不要仅凭目测。

状态细分——统计
```
completed
```
/
```
failed
```
/
```
interrupted
```
/
```
stopped
```
/
```
waiting
```
/
```
needs_attention
```
/
```
running
```
的数量。
```
needs_attention
```
或
```
waiting
```
占比高表明升级或关闭存在问题；
```
interrupted
```
/
```
stopped
```
表明存在工作浪费。
案例多样性——每次运行的
```
case_title
```
是否相同，还是有许多不同的案例/市场？多个Job重复出现同一标题意味着案例无法关闭；多样性高则表明实际吞吐量良好。
评估分数——查看每个Job的
```
eval_summaries
```
：是否
```
passed < total
```
？读取
```
severityCounts
```
（
```
critical
```
/
```
medium
```
/
```
low
```
）。按严重程度聚类——重复出现的
```
critical
```
问题是重点。
重复出现的评估评论——多个Job出现相同的
```
final_comment
```
投诉是提示词问题的最强信号：SOP每次运行都会产生相同的缺陷。
频率和时间安排——从
```
created_at
```
/
```
started_at
```
看Job节奏，从
```
started_at
```
→
```
completed_at
```
看持续时间。频繁出现的大量近乎相同的小Job暗示需要批量处理或调度；持续时间长则暗示SOP过于庞大。
Build分布——近期Job是否运行在同一个
```
build_id
```
上，还是多个？Build边界前后行为的变化表明SOP编辑是原因。

Inefficiency taxonomy

低效问题分类

Most workflow problems are one of these. Name the category, and back it with counts.

Recurring quality gap (eval-driven). Many Jobs share the same
```
final_comment
```
and
```
passed < total
```
, often at one severity. Cause: a single SOP gap producing the same defect every run. Evidence: count of Jobs with that complaint + their severity. Fix: the SOP line that omits the criterion.
Cases that don't close (terminal-closure leak). The same
```
case_title
```
reappears across many Jobs; cases bounce via postpone/re-pickup without
```
complete_case
```
/
```
fail_case
```
. Evidence: repeated case title + a
```
waiting
```
/
```
needs_attention
```
skew. Fix: add the missing terminal action to that SOP branch.
Escalation miscalibration (HITL). Over-escalation — a large
```
needs_attention
```
/
```
waiting
```
share where the SOP should decide autonomously; or under-escalation — costly autonomous actions with no Human-in-the-loop gate. Evidence: the status mix vs. the SOP's decision rules. Fix: tune the threshold in the SOP.
Serial work that should be batched or scheduled. Many tiny Jobs at high cadence doing near-identical work, or a fixed drumbeat that should be a schedule. Evidence: run frequency + case-title sameness. Suggestion: batch through the queue, or move to a scheduled trigger.
Producer/consumer imbalance or topology problem.
```
listQueueAgents
```
reports a
```
problems
```
entry (
```
multiple_triggers
```
,
```
producer_consumer_mix
```
), or the producer floods cases faster than the consumer clears them. Evidence: the
```
problems
```
array + run volume per side. Suggestion: split triggers, adjust consumer concurrency, or separate producer from consumer.
Monolithic Assignment (decomposition signal). One Assignment's SOP spans Connection domains and distinct cadences; its Jobs run long and fail at the seams. Evidence: SOP length/phase boundaries + a spread of unrelated failure modes in one Assignment. Suggestion: split into a producer→consumer pair via a Case Queue or
```
request_handover
```
.
Wasted Jobs.
```
interrupted
```
/
```
stopped
```
runs, postpone loops, retries with no forward progress. Evidence: status mix + repeated
```
build_id
```
with no completion. Fix: an SOP early-return or guard so the Assignment stops doing no-op work.

If a problem doesn't fit, name the pattern plainly. Do not force-fit.

大多数工作流问题属于以下类别之一。请明确类别名称，并附上统计数据作为依据。

重复出现的质量漏洞（评估驱动）。 许多Job存在相同的
```
final_comment
```
且
```
passed < total
```
，通常处于同一严重程度。原因：SOP中存在单个漏洞，每次运行都会产生相同的缺陷。证据：出现该投诉的Job数量及其严重程度。修复：修改SOP中遗漏该标准的内容。
无法关闭的案例（终端关闭漏洞）。 同一
```
case_title
```
在多个Job中重复出现；案例通过推迟/重新领取循环，未执行
```
complete_case
```
/
```
fail_case
```
。证据：重复的案例标题 +
```
waiting
```
/
```
needs_attention
```
占比偏高。修复：在该SOP分支中添加缺失的终端操作。
升级校准错误（HITL）。 过度升级——
```
needs_attention
```
/
```
waiting
```
占比过高，而SOP本应自主决策；或升级不足——在无人工介入（Human-in-the-loop） gate的情况下执行高成本自主操作。证据：状态分布与SOP决策规则的对比。修复：调整SOP中的阈值。
应批量处理或调度的串行工作。 大量节奏密集的小Job执行近乎相同的工作，或固定节奏的工作应改为调度执行。证据：运行频率 + 案例标题的重复性。建议：通过队列批量处理，或改为定时触发。
生产者/消费者失衡或拓扑结构问题。
```
listQueueAgents
```
报告
```
problems
```
条目（
```
multiple_triggers
```
、
```
producer_consumer_mix
```
），或生产者推送案例的速度快于消费者处理的速度。证据：
```
problems
```
数组 + 双方的运行量。建议：拆分触发器、调整消费者并发数，或分离生产者与消费者。
庞大的Assignment（分解信号）。 单个Assignment的SOP跨越多个Connection领域和不同节奏；其Job运行时间长且在衔接处失败。证据：SOP长度/阶段边界 + 单个Assignment中存在多种无关的失败模式。建议：通过Case Queue或
```
request_handover
```
拆分为生产者→消费者配对。
浪费的Job。
```
interrupted
```
/
```
stopped
```
运行、推迟循环、无进展的重试。证据：状态分布 + 重复运行同一
```
build_id
```
但未完成。修复：在SOP中添加提前返回或保护机制，避免Assignment执行无效工作。

如果问题不符合上述分类，请直接描述模式。不要强行归类。

What a recommendation looks like

建议的格式

A recommendation is one concrete change to one artifact, with the evidence that motivates it:

"12 of the last 50 Jobs failed eval with the comment 'did not include the PO number' (all
```
medium
```
). Quote the SOP line to change: 'Reply to the supplier with the delivery status.' → it should require the PO number. Hand to
```
sop-writer
```
."
"The same case 'Reorder SKU-4471' appears in 9 Jobs, all
```
waiting
```
, never
```
completed
```
. Step 5 of the consumer SOP has no
```
complete_case
```
on the in-stock branch. Add it."
"
```
listQueueAgents
```
reports
```
producer_consumer_mix
```
on Assignment X — it both fills and drains the queue. Split it into two Assignments."
"38 of 50 Jobs ran < 20s on near-identical single-SKU cases. Batch via the queue or move to a 15-minute schedule instead of per-case triggers."

Avoid: "tighten the SOP", "improve quality", "consider batching". A recommendation the user can't act on verbatim is not a recommendation. When the fix is in the SOP, quote the exact line to change — the user asked for that specificity.

建议应是针对单个工件的一项具体变更，并附上支持该建议的证据：

"最近50个Job中有12个因评论_'未包含PO编号'评估失败（均为
medium
严重程度）。引用需修改的SOP内容：'回复供应商交付状态。'_ → 应要求包含PO编号。移交至
```
sop-writer
```
。"
"同一案例_'Reorder SKU-4471'_出现在9个Job中，均为
```
waiting
```
状态，从未
```
completed
```
。消费者SOP的第5步在库存充足分支中没有
```
complete_case
```
操作。请添加该操作。"
"
```
listQueueAgents
```
报告Assignment X存在
```
producer_consumer_mix
```
问题——它同时填充和消耗队列。请将其拆分为两个Assignment。"
"50个Job中有38个运行时间<20秒，处理近乎相同的单个SKU案例。请通过队列批量处理，或改为15分钟调度触发，而非按案例触发。"

避免使用："收紧SOP"、"提高质量"、"考虑批量处理"等表述。用户无法直接执行的建议不是有效建议。如果修复涉及SOP，请引用确切需要修改的内容——用户需要这种具体性。

Handoff to

sop-writer

移交至

sop-writer

When a recommendation is an SOP change, stop short of rewriting the SOP here. Hand off to

sop-writer

with two things:

The exact SOP that was in effect (from
```
getRevision
```
on the
```
build_id
```
the Jobs ran).
The specific change request, phrased the way the user would ("rewrite Step 5 to require the PO number in the supplier reply").

sop-writer

returns the rewritten SOP. You do not. This split is deliberate:

workflow-debugger

finds the systemic issue;

sop-writer

writes the fix. Mixing the two produces shallow rewrites and unanchored audits.

当建议涉及SOP变更时，请勿在此处重写SOP。将以下两项内容移交至

sop-writer

：

生效的确切SOP（从Job运行的
```
build_id
```
对应的
```
getRevision
```
获取）。
具体的变更请求，以用户的表述方式呈现（例如"重写第5步，要求在回复供应商时包含PO编号"）。

sop-writer

会返回重写后的SOP。你无需自行重写。这种分工是有意设计的：

workflow-debugger

负责找出系统性问题；

sop-writer

负责编写修复方案。混合两者会导致重写不深入、审计无依据。

Anti-patterns — reject

反模式——禁止使用

Auditing without pulling the run set. If you have neither called
```
listRuns
```
(API mode) nor received the Job list from the user (paste mode), you are guessing. Do not return findings.
Calling one or two Jobs a pattern. A finding needs a count across the run set. Two bad Jobs is a
```
job-debugger
```
question, not a workflow inefficiency.
Auditing the wrong SOP — the Assignment's current Build when recent Jobs ran an earlier one. Read the Build the Jobs actually executed (
```
build_id
```
).
Reading only one side of a queue. Producer and consumer SOPs must be read together; the problem is often the seam between them.
Inventing run counts, eval comments, or SOP lines the data doesn't show. Quote what's there; report a gap as a gap.
Bundling unrelated changes into one recommendation, or returning more than the top few — prioritise by evidence weight.
Rewriting the SOP inline. Hand off to
```
sop-writer
```
.

未获取运行集就进行审计。 如果既未调用
```
listRuns
```
（API模式）也未从用户处获取Job列表（粘贴模式），则属于猜测。请勿返回结论。
将一两个Job视为模式。 结论需要基于运行集的统计数据。两个异常Job属于
```
job-debugger
```
的处理范畴，而非工作流低效问题。
审计错误的SOP——即Assignment当前的Build，而近期Job运行的是更早的版本。请读取Job实际执行的Build（
```
build_id
```
）。
仅读取队列的一侧。 必须同时读取生产者和消费者的SOP；问题通常存在于两者的衔接处。
虚构数据中未显示的运行统计、评估评论或SOP内容。 引用实际存在的内容；如实报告缺失的信息。
将无关变更捆绑为一项建议，或返回过多非重点内容——按证据权重排序优先处理。
在此处直接重写SOP。 移交至
```
sop-writer
```
。

Output rule

输出规则

Return one structured report with these labelled sections, in this order:

What the workflow does — one sentence.
Top inefficiencies — up to three, ordered by evidence weight. Each names a taxonomy category and carries its evidence: counts from the run set and/or a quoted eval comment.
Prompt changes — for each SOP-level fix, name the artifact (producer or consumer SOP, the step) and quote the exact line to change.
Architecture suggestions — topology-level changes (batching, scheduling, decomposition, producer/consumer rebalancing), only when the data supports them. Omit the section if there are none.
Next step — e.g. "I can invoke
```
sop-writer
```
to rewrite Step 5 of the consumer SOP", or "Hand Job
```
<id>
```
to
```
job-debugger
```
for the transcript-level cause".

If the user asked only about one dimension ("is this Assignment over-escalating?"), answer that dimension with its counts and skip the rest.

返回一份结构化报告，包含以下标记部分，按顺序排列：

工作流概述——一句话描述。
主要低效问题——最多三个，按证据权重排序。每个问题需命名分类类别，并附上证据：运行集统计数据和/或引用的评估评论。
提示词变更——针对每个SOP级修复，指明工件（生产者或消费者SOP、步骤）并引用确切需要修改的内容。
架构建议——拓扑结构级变更（批量处理、调度、分解、生产者/消费者重新平衡），仅当数据支持时才包含。如果没有此类建议，可省略该部分。
下一步行动——例如"我可以调用
```
sop-writer
```
重写消费者SOP的第5步"，或"将Job
```
<id>
```
移交至
```
job-debugger
```
进行记录层面的原因分析"。

如果用户仅询问一个维度（例如"此Assignment是否过度升级？"），则仅回答该维度的统计数据，跳过其他部分。

Reading the request

解读请求

Find the Assignment or Case Queue reference in the conversation. If absent, ask before reading.
Determine scope. Single Assignment ("audit this Assignment") vs. workflow ("why does this queue back up", "analyse this producer→consumer flow"). The first profiles one Assignment's Jobs; the second adds the topology and reads both SOPs.
Determine the lens. Efficiency (speed, cost, wasted Jobs, batching) vs. quality (eval scores, recurring defects) vs. reliability (closure, escalation). Lead with the lens the user named; surface the others only if the data makes them unavoidable.

You have no access to anything outside your tool list (API mode) or what the user shared (paste mode). Do not infer the contents of Files, Connections' upstream systems, or Jobs you didn't pull. The run set, the topology, and the SOPs are the source of truth.

在对话中查找Assignment或Case Queue的引用。如果缺失，请在读取信息前询问用户。
确定范围。单个Assignment（"审计此Assignment"） vs 工作流（"为什么此队列积压"、"分析此生产者→消费者流程"）。前者分析单个Assignment的Job；后者需添加拓扑结构分析并读取双方的SOP。
确定视角。效率（速度、成本、浪费的Job、批量处理） vs 质量（评估分数、重复缺陷） vs 可靠性（关闭、升级）。优先处理用户指定的视角；仅当数据显示其他视角的问题不可忽视时才提及。

你无法访问工具列表之外的内容（API模式）或用户未提供的内容（粘贴模式）。请勿推断文件内容、Connection上游系统或未获取的Job信息。运行集、拓扑结构和SOP是唯一的事实依据。

Final check before returning

返回前的最终检查

Walk through this once on your draft. Fix anything that fails.

Every finding is grounded in counts across the run set, pulled via the API/CLI or pasted by the user — not a single Job and not a hunch.
You read the SOP the Jobs actually ran (
```
build_id
```
), not just the current Build.
For a queue workflow, you read both producer and consumer SOPs and checked
```
listQueueAgents
```
```
problems
```
.
Each inefficiency is named from the taxonomy and carries its evidence.
Each prompt change quotes the exact SOP line to change and names the artifact.
SOP rewrites are handed to
```
sop-writer
```
, not written here.
Duvo terminology used: Assignment, Job, Build, SOP, Connection, Case Queue, Files, Setup.
You did not invent run counts, eval comments, SOP lines, or topology the data doesn't show.

对照以下清单检查你的草稿。修复任何不符合要求的内容。

每一项结论都基于运行集的统计数据，通过API/CLI获取或用户粘贴——而非单个Job或直觉推测。
你读取的是Job实际执行的SOP（
```
build_id
```
），而非仅当前Build。
对于队列工作流，你读取了双方的生产者和消费者SOP，并检查了
```
listQueueAgents
```
的
```
problems
```
。
每个低效问题都已命名分类类别，并附上证据。
每个提示词变更都引用了确切需要修改的SOP内容，并指明了工件。
SOP重写已移交至
```
sop-writer
```
，未在此处自行编写。
使用了Duvo术语：Assignment、Job、Build、SOP、Connection、Case Queue、Files、Setup。
未虚构数据中未显示的运行统计、评估评论、SOP内容或拓扑结构。

Duvo terminology

Duvo术语

Use Duvo's nouns when describing the workflow and the fix. Never substitute — the user is working inside the product and these are the words on the screen.

Use	Not
Assignment	agent, AI teammate, bot
Job	task, run, execution
Build	revision, version
SOP	instructions, prompt, playbook
Connection	integration, account
Case Queue	queue, backlog
Files	knowledge base, documents
Setup	configuration, config

描述工作流和修复方案时，请使用Duvo的专有名词。切勿替换——用户正在产品内工作，这些是界面上显示的术语。

正确用法	错误用法
Assignment	agent、AI teammate、bot
Job	task、run、execution
Build	revision、version
SOP	instructions、prompt、playbook
Connection	integration、account
Case Queue	queue、backlog
Files	knowledge base、documents
Setup	configuration、config

另请参阅

```
job-debugger
```
— for one failed Job: it reads the transcript and the Build that ran it. This skill audits the whole workflow; hand it a representative Job when a pattern needs transcript-level depth.
```
sop-writer
```
— once you've named an SOP-level fix, hand off the in-effect SOP and the change request; this skill never rewrites SOPs itself.
```
duvo-cli
```
— the terminal surface for every read here (
```
duvo runs list
```
,
```
duvo queues agents
```
,
```
duvo revisions get
```
); useful when the user is auditing from a shell.

```
job-debugger
```
——用于单个失败Job：读取记录和运行的Build。本技能负责审计整个工作流；当模式需要深入到记录层面分析时，可移交具有代表性的Job。
```
sop-writer
```
——一旦你确定了SOP级修复方案，移交生效的SOP和变更请求；本技能从不自行重写SOP。
```
duvo-cli
```
——此处所有读取操作的终端界面（
```
duvo runs list
```
、
```
duvo queues agents
```
、
```
duvo revisions get
```
）；当用户从shell进行审计时非常有用。

Resources

资源

Duvo — product website
Duvo documentation — building Assignments, SOPs, Connections, Case Queues
Web app — open the Assignment, inspect its Jobs, evals, and the Build that ran them
Duvo CLI (
```
@duvoai/cli
```
)
— the read commands this skill relies on in API mode; pairs with the
```
duvo-cli
```
skill
Public skill repository — the MIT-licensed community release of this skill, packaged for installation in third-party Claude Code setups

Duvo——产品官网
Duvo文档——构建Assignment、SOP、Connection、Case Queue的指南
Web应用——打开Assignment，查看其Job、评估和运行的Build
Duvo CLI (
```
@duvoai/cli
```
)
——API模式下本技能依赖的读取命令；与
```
duvo-cli
```
技能配合使用
公开技能仓库——本技能的MIT许可社区版本，可安装在第三方Claude Code环境中

workflow-debugger

Original

Translation

Workflow Debugger

工作流调试器

What is Duvo?

什么是Duvo？

What you're doing

你的职责

job-debugger vs workflow-debugger

job-debugger vs workflow-debugger

Operating mode

运行模式

The single most important rule

最重要的规则

Inputs you need

所需输入

Tools — read-only public API operations (API mode)

工具——只读公开API操作（API模式）

What to ask the user (paste mode)

向用户请求的信息（粘贴模式）

Investigation workflow

调查流程

What to profile across the run set

运行集分析维度

Inefficiency taxonomy

低效问题分类

What a recommendation looks like

建议的格式

Handoff to sop-writer

移交至sop-writer

Anti-patterns — reject

反模式——禁止使用

Output rule

输出规则

Reading the request

解读请求

Final check before returning

返回前的最终检查

Duvo terminology

Duvo术语

See also

另请参阅

Resources

资源

Handoff to
`sop-writer`

移交至
`sop-writer`