mirror-doctor

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Mirror Pipeline Doctor

Mirror管道诊断工具

Diagnose and fix existing Mirror pipeline problems by running CLI commands, identifying root causes, and executing fixes.

通过运行CLI命令、确定根本原因并执行修复方案，诊断并修复现有Mirror管道的问题。

Boundaries

适用范围

Diagnose and fix EXISTING Mirror pipeline problems.
Do not build new pipelines — use
```
/mirror
```
for config reference or
```
/turbo-builder
```
for new Turbo pipelines.
Do not serve as a command reference — use
```
/mirror
```
for CLI syntax and flag lookups.
Do not handle Turbo pipelines — use
```
/turbo-doctor
```
for
```
goldsky turbo
```
problems.
Do not create secrets — use
```
/secrets
```
for credential management. But DO check whether secrets exist as part of diagnosis.

诊断并修复现有Mirror管道的问题。
不负责构建新管道——如需配置参考请使用
```
/mirror
```
，如需构建新Turbo管道请使用
```
/turbo-builder
```
。
不提供命令参考——如需CLI语法和参数查询请使用
```
/mirror
```
。
不处理Turbo管道问题——
```
goldsky turbo
```
相关问题请使用
```
/turbo-doctor
```
。
不创建密钥——如需凭证管理请使用
```
/secrets
```
。但会在诊断过程中检查密钥是否存在。

Diagnostic Workflow

诊断流程

Follow these steps in order. Each step builds on the previous one.

按以下顺序执行步骤，每一步都基于上一步的结果推进。

Step 1: Verify Authentication

步骤1：验证身份认证

Run

goldsky project list 2>&1

to confirm the user is logged in.

If logged in: Note the project name and continue.
If not logged in: Direct the user to
```
/auth-setup
```
. Do not proceed until auth works.

运行

goldsky project list 2>&1

确认用户已登录。

已登录：记录项目名称并继续。
未登录：引导用户使用
```
/auth-setup
```
完成认证。认证完成前不要继续后续步骤。

Step 2: Identify the Pipeline

步骤2：定位目标管道

Run

goldsky pipeline list --include-runtime-details 2>&1

to list all Mirror pipelines with their status.

If the user already named a pipeline, confirm it exists in the list. If not, show the list and ask which pipeline they want to diagnose.

Note both the desired status (ACTIVE, INACTIVE, PAUSED) and the runtime status (STARTING, RUNNING, FAILING, TERMINATED) — Mirror pipelines have both, and the combination tells the story.

运行

goldsky pipeline list --include-runtime-details 2>&1

列出所有Mirror管道及其状态。

如果用户已指定管道名称，确认该管道存在于列表中。如果未指定，展示列表并询问用户要诊断的管道。

同时记录期望状态（ACTIVE、INACTIVE、PAUSED）和运行时状态（STARTING、RUNNING、FAILING、TERMINATED）——Mirror管道同时具备这两种状态，二者的组合能反映问题全貌。

Step 3: Triage by Status

步骤3：按状态分类处理

The desired + runtime status combination determines the diagnostic path:

Desired	Runtime	Meaning	Action
ACTIVE	RUNNING	Healthy — pipeline is processing data	Ask user what symptom they're seeing. Proceed to Step 4.
ACTIVE	STARTING	Pipeline is initializing	Ask how long. If >10 min, proceed to Step 4.
ACTIVE	FAILING	Pipeline is encountering errors but hasn't terminated yet	Proceed to Step 4 immediately — this is time-sensitive.
ACTIVE	TERMINATED	Most common failure. Pipeline wanted to run but crashed.	Proceed to Step 4.
PAUSED	TERMINATED	User paused the pipeline (snapshot was taken).	Ask if they want to resume: `goldsky pipeline start <name> --from-snapshot last`
INACTIVE	TERMINATED	User stopped the pipeline (no snapshot).	Ask if they want to start: `goldsky pipeline start <name>`

ACTIVE + TERMINATED is the most common case. The pipeline's desired status is ACTIVE (it should be running) but the runtime has terminated due to an error. Focus the diagnosis here.

期望状态与运行时状态的组合决定了诊断路径：

期望状态	运行时状态	含义	操作
ACTIVE	RUNNING	健康状态——管道正在处理数据	询问用户遇到的症状，继续步骤4。
ACTIVE	STARTING	管道正在初始化	询问已持续时长。如果超过10分钟，继续步骤4。
ACTIVE	FAILING	管道正在报错但尚未终止	立即进入步骤4——此情况具有时效性。
ACTIVE	TERMINATED	最常见故障。管道本应运行但已崩溃。	进入步骤4。
PAUSED	TERMINATED	用户已暂停管道（已生成快照）。	询问用户是否要恢复： `goldsky pipeline start <name> --from-snapshot last`
INACTIVE	TERMINATED	用户已停止管道（无快照）。	询问用户是否要启动： `goldsky pipeline start <name>`

ACTIVE + TERMINATED是最常见的情况。管道的期望状态为ACTIVE（应处于运行状态），但运行时因错误已终止，需重点针对此情况进行诊断。

Step 4: Gather Diagnostic Data

步骤4：收集诊断数据

Run these commands to understand what went wrong:

bash

undefined

运行以下命令排查问题原因：

bash

undefined

Get error details and runtime metrics

获取错误详情和运行时指标

goldsky pipeline monitor <name> 2>&1

Check for in-flight requests blocking operations

检查是否有进行中的请求阻塞操作

goldsky pipeline monitor <name> --update-request 2>&1

Get the pipeline definition to check for misconfig

获取管道定义以检查配置错误

goldsky pipeline get <name> --definition 2>&1

Get pipeline info including version

获取包含版本信息的管道详情

goldsky pipeline info <name> 2>&1

Check available snapshots

检查可用快照

goldsky pipeline snapshots list <name> 2>&1


Run these in sequence and analyze the output before proceeding. The monitor output is the most important — it shows error messages, records received/written metrics, and runtime status transitions.

goldsky pipeline snapshots list <name> 2>&1


按顺序运行这些命令并分析输出后再继续。其中`monitor`命令的输出最为重要——它会展示错误信息、记录接收/写入指标以及运行时状态变化。

Step 5: Match Error Patterns

步骤5：匹配错误模式

Based on the diagnostic data, match against these known patterns:

根据诊断数据，匹配以下已知错误模式：

Bad or Missing Secret

密钥错误或缺失

Symptoms: Pipeline terminates shortly after starting. Monitor shows credential or authentication errors.

Verify: Run

goldsky secret list 2>&1

and cross-reference with the

secret_name

values in the pipeline definition from Step 4.

Fix:

If the secret doesn't exist, direct the user to
```
/secrets
```
to create it.
If the secret exists but credentials are wrong, create a new secret (secrets are immutable — you create a replacement with the same name).

Restart:

goldsky pipeline restart <name> --from-snapshot last

症状：管道启动后不久即终止。

monitor

输出显示凭证或身份认证错误。

验证：运行

goldsky secret list 2>&1

，并与步骤4中管道定义里的

secret_name

值交叉核对。

修复方案：

如果密钥不存在，引导用户使用
```
/secrets
```
创建密钥。
如果密钥存在但凭证错误，创建新密钥（密钥不可变——需创建同名替代密钥）。

重启管道：

goldsky pipeline restart <name> --from-snapshot last

Sink Unreachable

目标存储不可达

Symptoms: Connection timeout, connection refused, or network errors in the monitor output. Pipeline may cycle between FAILING and TERMINATED.

Common causes:

Firewall not allowing inbound from AWS us-west-2 (Mirror pipelines write from this region)
Database is down or restarted
Connection pool exhausted
Wrong port or host in the secret

Fix:

Verify the sink is reachable from us-west-2.
Check that the secret has the correct host, port, and credentials.

Once connectivity is restored, restart:

goldsky pipeline restart <name> --from-snapshot last

症状：

monitor

输出显示连接超时、连接被拒绝或网络错误。管道可能在FAILING和TERMINATED状态间循环。

常见原因：

防火墙未允许来自AWS us-west-2区域的入站请求（Mirror管道从此区域写入数据）
数据库已下线或重启
连接池耗尽
密钥中的端口或主机地址错误

修复方案：

验证目标存储可从us-west-2区域访问。
检查密钥中的主机、端口和凭证是否正确。

恢复连接后，重启管道：

goldsky pipeline restart <name> --from-snapshot last

Resource Exhaustion

资源耗尽

Symptoms: Pipeline runs for a while then terminates. Monitor may show high record counts or slow processing. Common during large backfills or pipelines with many sources/JOINs.

Fix:

Resize:

goldsky pipeline resize <name> <size>

— sizes are

xl

xxl

Start small and go up.
```
s
```
handles most workloads (up to 300K records/sec, ~8 subgraph sources). Use
```
l
```
or larger for big chain backfills or heavy JOINs.

症状：管道运行一段时间后终止。

monitor

可能显示高记录量或处理缓慢。常见于大规模回填或包含多个数据源/JOIN操作的管道。

修复方案：

调整资源规格：
```
goldsky pipeline resize <name> <size>
```
——规格包括
```
s
```
,
```
m
```
,
```
l
```
,
```
xl
```
,
```
xxl
```
。
从小规格开始逐步升级。
```
s
```
规格可处理大多数工作负载（最高30万条记录/秒，约8个子图数据源）。大规模链数据回填或复杂JOIN操作请使用
```
l
```
或更大规格。

In-Flight Request Blocking

进行中的请求阻塞

Symptoms: User tries to update, delete, or restart the pipeline but gets "Cannot process request, found existing request in-flight."

Diagnose:

goldsky pipeline monitor <name> --update-request

— this shows what operation is in progress (usually a snapshot).

Fix:

If the in-flight operation is a snapshot that's making progress, wait for it.
If it's stuck or unwanted:
```
goldsky pipeline cancel-update <name>
```
Then retry the original operation.

症状：用户尝试更新、删除或重启管道时收到错误："Cannot process request, found existing request in-flight."

诊断：运行

goldsky pipeline monitor <name> --update-request

——此命令会显示正在进行的操作（通常是快照创建）。

修复方案：

如果进行中的操作是正在推进的快照，请等待其完成。
如果操作卡住或无需继续：
```
goldsky pipeline cancel-update <name>
```
然后重试原操作。

Stuck Snapshot

快照卡住

Symptoms: Pipeline can't be paused, updated, or restarted because a snapshot creation is taking too long or failing. The

--update-request

monitor shows snapshot progress stuck at a percentage.

Fix:

Cancel the stuck snapshot:
```
goldsky pipeline cancel-update <name>
```

Restart without waiting for a new snapshot:

goldsky pipeline restart <name> --from-snapshot last

If there's no usable snapshot:
```
goldsky pipeline restart <name> --from-snapshot none
```
(starts from scratch — warn the user this reprocesses data)

症状：由于快照创建耗时过长或失败，管道无法暂停、更新或重启。

--update-request

监控显示快照进度卡在某个百分比。

修复方案：

取消卡住的快照：
```
goldsky pipeline cancel-update <name>
```

无需等待新快照即可重启：

goldsky pipeline restart <name> --from-snapshot last

如果没有可用快照：
```
goldsky pipeline restart <name> --from-snapshot none
```
（从头开始——需提醒用户这会重新处理所有数据）

Transform SQL Error

转换SQL错误

Symptoms: Pipeline terminates with SQL-related error messages. Could be syntax errors, referencing a non-existent column, or type mismatches.

Diagnose: Check the pipeline definition (

goldsky pipeline get <name> --definition

) and look at the

transforms

section.

Fix:

Identify the SQL error from the monitor output.
Fix the SQL in the pipeline YAML file.
Validate:
```
goldsky pipeline validate <file.yaml>
```

Reapply:

goldsky pipeline apply <file.yaml> --status ACTIVE --from-snapshot last

Use

/mirror

for SQL transform syntax reference if needed.

症状：管道因SQL相关错误终止。可能是语法错误、引用不存在的列或类型不匹配。

诊断：检查管道定义（

goldsky pipeline get <name> --definition

）中的

transforms

部分。

修复方案：

从
```
monitor
```
输出中定位SQL错误。
修复管道YAML文件中的SQL代码。
验证配置：
```
goldsky pipeline validate <file.yaml>
```

重新应用配置：

goldsky pipeline apply <file.yaml> --status ACTIVE --from-snapshot last

如需SQL转换语法参考，请使用

/mirror

。

Pipeline in Restart Loop

管道处于重启循环

Symptoms: Pipeline repeatedly cycles through STARTING → FAILING → TERMINATED. Monitor shows the same error recurring.

This is usually a symptom of another root cause — bad secret, sink unreachable, or resource issues. The pipeline keeps trying to start but hits the same wall.

Fix:

Identify the underlying error from the monitor (it's usually one of the patterns above).
Fix the root cause first.

Then restart:

goldsky pipeline restart <name> --from-snapshot last

症状：管道反复在STARTING → FAILING → TERMINATED状态间循环。

monitor

显示相同错误重复出现。

这通常是其他根本原因的表现——密钥错误、目标存储不可达或资源问题。管道持续尝试启动但遇到相同障碍。

修复方案：

从
```
monitor
```
输出中确定底层错误（通常属于上述模式之一）。
先修复根本原因。

然后重启管道：

goldsky pipeline restart <name> --from-snapshot last

Sink Downtime Cascade

目标存储宕机连锁反应

Symptoms: Pipeline was running fine, then the sink (database) went down temporarily. Pipeline auto-retried, then restarted its writers, then eventually terminated.

This is expected behavior — Mirror handles transient sink errors automatically (retry batch → restart writers → fail after prolonged issues).

Fix:

Confirm the sink is back up and healthy.

Restart from the last snapshot:

goldsky pipeline restart <name> --from-snapshot last

The pipeline will resume from where it left off, not reprocess everything.

症状：管道原本运行正常，之后目标存储（数据库）临时下线。管道自动重试，随后重启写入器，最终终止。

这是预期行为——Mirror会自动处理临时目标存储错误（重试批次 → 重启写入器 → 长时间故障后终止）。

修复方案：

确认目标存储已恢复正常。

从最后一个快照重启：

goldsky pipeline restart <name> --from-snapshot last

管道会从中断处恢复，不会重新处理所有数据。

Step 6: Present Diagnosis

步骤6：呈现诊断结果

After identifying the issue, present findings clearly:

undefined

确定问题后，清晰展示诊断结果：

undefined

Diagnosis

诊断结果

Pipeline: <name> Status: <desired> + <runtime> Issue: <one-line summary>

Root cause: <What's wrong and why>

Evidence:

<Error message or observation from monitor>
<Relevant detail from pipeline definition>

Recommended fix:

<Step 1>
<Step 2>

Prevention: <How to avoid this in the future, if applicable>

undefined

管道名称： <name> 状态： <期望状态> + <运行时状态> 问题： <一句话总结>

根本原因： <问题详情及原因>

证据：

<来自monitor的错误信息或观察结果>
<来自管道定义的相关细节>

推荐修复方案：

<步骤1>
<步骤2>

预防建议： <如何避免未来出现此类问题（如适用）>

undefined

Step 7: Execute Fix

步骤7：执行修复

Offer to run the fix commands directly. Always confirm with the user before executing:

Restart:

goldsky pipeline restart <name> --from-snapshot last

Resize:
```
goldsky pipeline resize <name> <size>
```
Cancel blocked operation:
```
goldsky pipeline cancel-update <name>
```

Restart from scratch:

goldsky pipeline restart <name> --from-snapshot none

(warn: reprocesses data)

Reapply config:

goldsky pipeline apply <file.yaml> --status ACTIVE --from-snapshot last

Delete and recreate:

goldsky pipeline delete <name> -f

then

goldsky pipeline apply <file.yaml> --status ACTIVE

(last resort)

After executing, verify recovery by running

goldsky pipeline monitor <name>

and watching for STARTING → RUNNING transition.

主动提出直接运行修复命令。执行前务必征得用户确认：

重启管道：

goldsky pipeline restart <name> --from-snapshot last

调整资源规格：
```
goldsky pipeline resize <name> <size>
```
取消阻塞操作：
```
goldsky pipeline cancel-update <name>
```

从头重启：

goldsky pipeline restart <name> --from-snapshot none

（提醒：会重新处理数据）

重新应用配置：

goldsky pipeline apply <file.yaml> --status ACTIVE --from-snapshot last

删除并重建：

goldsky pipeline delete <name> -f

然后

goldsky pipeline apply <file.yaml> --status ACTIVE

（最后手段）

执行后，运行

goldsky pipeline monitor <name>

验证恢复情况，观察状态是否从STARTING转为RUNNING。

Important Rules

重要规则

Always gather data before diagnosing. Never guess at the problem.
Check both desired AND runtime status — the combination matters.
Confirm with the user before running any destructive commands (delete, restart from scratch).
```
--from-snapshot last
```
preserves progress.
```
--from-snapshot none
```
starts over. Default to
```
last
```
unless there's a reason not to.
Transient errors are auto-retried for up to 6 hours. Non-transient errors terminate immediately. If the pipeline terminated quickly after starting, it's likely a config issue (bad secret, wrong SQL), not a transient network blip.
If the problem is beyond CLI diagnosis, suggest contacting support@goldsky.com with the pipeline name, error messages, and project ID.

诊断前务必收集数据，切勿猜测问题。
同时检查期望状态和运行时状态——二者的组合至关重要。
运行任何破坏性命令（删除、从头重启）前需征得用户确认。
```
--from-snapshot last
```
会保留进度。
```
--from-snapshot none
```
会从头开始。除非有特殊原因，默认使用
```
last
```
。
临时错误会自动重试最长6小时。非临时错误会立即终止。如果管道启动后很快终止，大概率是配置问题（密钥错误、SQL错误），而非临时网络故障。
如果问题超出CLI诊断范围，建议用户联系support@goldsky.com，并提供管道名称、错误信息和项目ID。

When Bash is Not Available

当无法使用Bash时

If you don't have the Bash tool, output the diagnostic commands for the user to run, but structure them clearly:

Give one command at a time.
Explain what to look for in the output.
Based on their description of the output, proceed with the diagnosis.

This is the fallback path — always prefer running commands directly when Bash is available.

如果无法使用Bash工具，输出诊断命令供用户自行运行，但需清晰结构化：

一次提供一个命令。
说明需要从输出中关注的内容。
根据用户描述的输出结果，继续诊断流程。

这是 fallback 方案——当Bash可用时，优先直接运行命令。

mirror-doctor

Original

Translation

Mirror Pipeline Doctor

Mirror管道诊断工具

Boundaries

适用范围

Diagnostic Workflow

诊断流程

Step 1: Verify Authentication

步骤1：验证身份认证

Step 2: Identify the Pipeline

步骤2：定位目标管道

Step 3: Triage by Status

步骤3：按状态分类处理

Step 4: Gather Diagnostic Data

步骤4：收集诊断数据

Get error details and runtime metrics

获取错误详情和运行时指标

Check for in-flight requests blocking operations

检查是否有进行中的请求阻塞操作

Get the pipeline definition to check for misconfig

获取管道定义以检查配置错误

Get pipeline info including version

获取包含版本信息的管道详情

Check available snapshots

检查可用快照

Step 5: Match Error Patterns

步骤5：匹配错误模式

Bad or Missing Secret

密钥错误或缺失

Sink Unreachable

目标存储不可达

Resource Exhaustion

资源耗尽

In-Flight Request Blocking

进行中的请求阻塞

Stuck Snapshot

快照卡住

Transform SQL Error

转换SQL错误

Pipeline in Restart Loop

管道处于重启循环

Sink Downtime Cascade

目标存储宕机连锁反应

Step 6: Present Diagnosis

步骤6：呈现诊断结果

Diagnosis

诊断结果

Step 7: Execute Fix

步骤7：执行修复

Important Rules

重要规则

When Bash is Not Available

当无法使用Bash时

Related

相关工具