n8n-error-handling

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

n8n Error Handling

n8n 错误处理

Default n8n node behavior: error → workflow halts → caller gets nothing useful. For unattended workflows (webhook APIs, scheduled jobs, queue workers), that default is wrong. The symptom is "the integration just stopped working" with no log, no message, no clue.
This skill is about handling errors so failures are loud, structured, and recoverable. Or best case scenario, handled in a way where it self heals.
n8n节点默认行为:出错 → 工作流终止 → 调用方无法获取有效信息。对于无人值守工作流(Webhook API、定时任务、队列 worker),这种默认行为并不合适。表现为「集成突然停止工作」,但无日志、无提示、无任何线索。
本技能旨在介绍如何处理错误,让失败情况清晰可见、结构化且可恢复。最佳情况下,可实现自我修复。

Non-negotiables

必须遵循的规则

For any API-shaped workflow (webhook trigger paired with
Respond to Webhook
):
  1. Every fallible node's error output is wired, and both paths end at a
    Respond to Webhook
    .
    No hanging error branches, or the caller would see a timeout. "Fallible" = HTTP, DB, third-party API, file operation, anything that throws.
  2. Status code maps to cause. Caller's fault → 4xx, your fault → 5xx. A 200 default on an error path produces silent failure: caller thinks success, processes empty data.
For any unattended workflow (scheduled, cron, queue-driven, agent tool):
<!-- TEMPORARY: update when workflow settings are editable by mcp -->
  1. Configure a workflow-level error workflow. Catches what escapes per-node handling: timeouts, crashes between nodes, errors in unwired nodes. See
    references/ERROR_WORKFLOWS.md
    . Currently, only the user can set error workflows through the UI in the workflow settings.
对于任何API类型工作流(Webhook触发器搭配
Respond to Webhook
):
  1. 每个可能出错的节点的错误输出都需连接,且两条路径最终都指向
    Respond to Webhook
    不能有未连接的错误分支,否则调用方会遇到超时。「可能出错的节点」包括HTTP、数据库、第三方API、文件操作等任何可能抛出错误的节点。
  2. 状态码与错误原因匹配。 调用方错误 → 4xx,我方错误 → 5xx。若错误路径默认返回200,会导致静默失败:调用方认为请求成功,进而处理空数据。
对于任何无人值守工作流(定时任务、 cron、队列驱动、Agent工具):
<!-- 临时注释:当MCP可编辑工作流设置时更新 -->
  1. 配置工作流级别的错误工作流。 捕获节点级处理遗漏的错误:超时、节点间崩溃、未连接节点的错误。详见
    references/ERROR_WORKFLOWS.md
    。目前,仅用户可通过UI在工作流设置中配置错误工作流。

Strong defaults

推荐默认配置

  • Error response bodies are structured. Not just "Internal Server Error". Use
    { "error": "<short identifier>", "message": "<human-readable>" }
    . See
    references/RESPONSE_SHAPES.md
    .
  • Network-calling nodes have
    retryOnFail
    configured.
    Transient 429s and upstream blips get absorbed before reaching the error path. See "Self-healing on transient failures" below.
  • 错误响应体结构化。 不能仅返回「Internal Server Error」。应使用
    { "error": "<简短标识>", "message": "<易读描述>" }
    格式。详见
    references/RESPONSE_SHAPES.md
  • 网络调用节点需配置
    retryOnFail
    短暂的429错误和上游故障会在进入错误路径前被自动重试吸收。详见下文「瞬态故障的自我修复」。

When error handling can be looser

错误处理可放宽的场景

Internal one-off workflows where you're the only user, you watch each run, and the cost of failure is "I notice and re-run". Default
onError: 'stopWorkflow'
is fine. The line: if anyone other than you sees the output (downstream system, end user, on-call), the non-negotiables apply.
仅你自己使用的内部一次性工作流,你会监控每次运行,且失败成本仅为「我发现后重新运行」。此时默认的
onError: 'stopWorkflow'
是可行的。判断标准:若除你之外的任何人(下游系统、终端用户、值班人员)会看到输出,则必须遵循上述规则。

API workflow shape

API工作流结构

The canonical webhook-API workflow:
Webhook trigger
  ├── (success path)  → Process → Respond to Webhook (200, body)
  └── (any node's error output)
                       → Respond to Webhook (5xx, structured error body)
                       → Optional: log to error tracker / logger / notify channel
For a complete walkthrough including how to wire multiple fallible nodes to a single error responder, see
references/API_WORKFLOWS.md
.
标准Webhook-API工作流:
Webhook触发器
  ├── (成功路径)  → 处理逻辑 → Respond to Webhook (200,响应体)
  └── (任意节点的错误输出)
                       → Respond to Webhook (5xx,结构化错误响应体)
                       → 可选:记录至错误追踪工具/日志/通知渠道
关于如何将多个可能出错的节点连接到单个错误响应器的完整指南,详见
references/API_WORKFLOWS.md

Schema validator (Set IIFE)

Schema验证器(Set IIFE模式)

For any webhook API doing input validation, lift the Set-based schema validator pattern into the endpoint instead of writing IF/Switch chains per field. The two example files are the source of truth:
  • references/examples/validation-subworkflow.ts
    : the bare pattern (Webhook → Set with the validation IIFE → Respond, expression-driven status code). Useful as a minimal demo.
  • references/examples/validation-subworkflow-usage.ts
    : the endpoint pattern (Webhook → Set → If valid → your business logic → 200 success / 400 with the standard
    {error: "validation_error", message, details, request_schema}
    body). Lift this into your endpoint and replace the NoOp placeholder with real logic.
The procedure for an agent using this:
  1. Lift the usage-example structure into the new endpoint. Webhook → Set (Validate Schema) → If Params Valid → your logic → success/400 Respond. Don't reinvent.
  2. Edit the IIFE for your schema. Update
    REQUIRED_SCHEMA
    and the per-field checks inside the Set node's expression for your endpoint's input shape. The pattern below the schema constant is mechanical: presence check, type check, constraint check, push to
    errors[]
    .
  3. Leave the output shape alone.
    valid
    ,
    validationError
    ,
    details
    ,
    requiredSchema
    are the contract the Respond node consumes. Renaming them breaks the response body.
The full procedure, supported constraint patterns, schema design rules, and the
={{ ... }}
wrapping gotchas live in
references/API_WORKFLOWS.md
"Schema validator (Set IIFE)".
对于任何需要输入验证的Webhook API,应采用基于Set的Schema验证器模式,而非为每个字段编写IF/Switch分支。以下两个示例文件为权威实现:
  • references/examples/validation-subworkflow.ts
    :基础模式(Webhook → 带验证IIFE的Set → 响应,状态码由表达式驱动)。适合作为最小演示。
  • references/examples/validation-subworkflow-usage.ts
    :端点模式(Webhook → Set → 验证通过 → 业务逻辑 → 200成功 / 400返回标准格式
    {error: "validation_error", message, details, request_schema}
    响应体)。可将此模式应用到你的端点,并用真实逻辑替换NoOp占位符。
Agent使用此模式的步骤:
  1. 将示例结构迁移到新端点。 Webhook → Set(验证Schema)→ 参数验证通过 → 你的逻辑 → 成功/400响应。不要重新发明轮子。
  2. 针对你的Schema编辑IIFE。 更新
    REQUIRED_SCHEMA
    以及Set节点表达式中的每个字段检查逻辑,以匹配你的端点输入结构。Schema常量下方的模式是固定流程:存在性检查、类型检查、约束检查、将错误推入
    errors[]
  3. 不要修改输出结构。
    valid
    validationError
    details
    requiredSchema
    是响应节点依赖的约定字段。重命名这些字段会破坏响应体。
完整步骤、支持的约束模式、Schema设计规则以及
={{ ... }}
包裹的注意事项,详见
references/API_WORKFLOWS.md
中的「Schema验证器(Set IIFE模式)」章节。

Per-node error setup (recap)

节点级错误设置回顾

Each fallible node needs two changes (per
n8n-connections
ERROR_OUTPUTS.md
):
  1. Set
    onError: 'continueErrorOutput'
    on the node config.
  2. Wire
    output(1)
    to your error handler.
Both required. One without the other is the silent-failure mode.
每个可能出错的节点需进行两处修改(详见
n8n-connections
ERROR_OUTPUTS.md
):
  1. 在节点配置中设置
    onError: 'continueErrorOutput'
  2. output(1)
    连接到你的错误处理程序
两者缺一不可。缺少任何一个都会导致静默失败模式。

Self-healing on transient failures

瞬态故障的自我修复

Before wiring the error path, configure node-level retry on any node making a network call (HTTP Request, comms like Gmail/Slack/Discord, DB, AI, third-party API nodes). Transient 429s and brief upstream blips get absorbed, so the error output then only fires on real failures, and alerts and 5xx responses reflect actual problems instead of noise.
ts
{
    retryOnFail: true,
    maxTries: 3,
    waitBetweenTries: 5000,    // ms; 5000 is the max and should be your default
}
Works on any node that calls a network service, not just HTTP Request. The engine retries on any error, with no per-status-code filter, and the engine caps
maxTries
at 5 and
waitBetweenTries
at 5000ms (
packages/core/src/execution-engine/workflow-execute.ts
). See
n8n-node-configuration
HTTP_NODES.md
and
AI_NODES.md
for node-specific notes.
在连接错误路径之前,为所有进行网络调用的节点(HTTP请求、Gmail/Slack/Discord等通讯节点、数据库、AI、第三方API节点)配置节点级重试。短暂的429错误和上游故障会被自动吸收,因此错误输出仅在真正的故障时触发,警报和5xx响应反映的是实际问题而非噪音。
ts
{
    retryOnFail: true,
    maxTries: 3,
    waitBetweenTries: 5000,    // 毫秒;5000为最大值,应作为你的默认配置
}
此配置适用于任何调用网络服务的节点,而非仅HTTP请求节点。引擎会对任何错误进行重试,无状态码过滤,且引擎将
maxTries
上限设为5,
waitBetweenTries
上限设为5000毫秒(详见
packages/core/src/execution-engine/workflow-execute.ts
)。节点特定说明详见
n8n-node-configuration
HTTP_NODES.md
AI_NODES.md

Response shapes: map the cause to the status code

响应结构:将原因映射到状态码

A 5xx response with
text/plain "Internal Server Error"
is technically 5xx but useless. And not every error is 5xx. Match the status code to why the request failed.
Common mistake: wiring every error path to a single
Respond to Webhook
returning 500 "internal_error". Every failure looks the same to the caller, even when they sent bad input. Breaks monitoring: you can't distinguish real outages from bad caller input.
Default mapping by cause:
CauseStatusError codePath
Required field missing or wrong type400
validation_error
Validate up front with the Set-based schema validator (see
references/examples/validation-subworkflow.ts
for the bare pattern and
validation-subworkflow-usage.ts
for the endpoint template), or, for trivial cases, an inline IF/Switch + dedicated 400 Respond. Don't go through error outputs. Ideally echo the schema in the response so the caller can self-correct.
Auth missing or invalid401
unauthorized
Same. Check up front, return 401 directly.
Authenticated but not allowed403
forbidden
Same.
Resource ID exists in request, doesn't in your data404
not_found
Branch off the lookup result, not the lookup error.
Operation conflicts with current state (duplicate, race)409
conflict
Detect with logic, not error output.
Caller exceeded rate limit429
rate_limit_exceeded
Set
Retry-After
header.
Node threw and you don't know why500
internal_error
The error-output path.
Third-party API errored502
upstream_error
Error output of the HTTP Request node.
Workflow can't currently process (downstream down, rate-limited upstream)503
service_unavailable
Detect via specific error, return with hint.
Third-party API timed out504
upstream_timeout
Error output filtered by error message.
Two distinct flows:
  • Validation failures (4xx) are checked upstream of the work, via IF/Switch branches, not error outputs. Use a dedicated Respond per shape (400 missing field, 401 no auth, etc.).
  • Execution failures (5xx) come out of error outputs ("we tried, something broke"). A single error responder for all 5xx is fine. Differentiate the body's
    error
    code by inspecting the failed node where useful.
One Respond, expression-driven status code. When the error path differs only by status code and message text (same body shape, no header/content-type changes), don't fan out to N Respond nodes via a Switch. The Respond to Webhook node accepts expressions in its
Response Code
and body fields. Compute the code inline, so one Respond node carries the whole error path.
ts
// Response Code on a single Respond to Webhook node:
{{ (() => {
    const msg = $json.error?.message || $json.message || ''
    if (msg.includes('INVALID_ID')) return 400
    if (/429|too many/i.test(msg)) return 429
    if (/openrouter|anthropic|llm/i.test(msg)) return 502
    return 500
})() }}
Switch + N Responds only earn their place when responses diverge structurally: different headers, different body shapes, redirects, different content types. Same shape with a different number is one expression-driven Respond.
For full conventions including correlation IDs, retryable-vs-fatal flags, validation details, and rate-limit shapes, see
references/RESPONSE_SHAPES.md
.
返回5xx状态码和
text/plain "Internal Server Error"
虽然技术上符合规范,但毫无用处。并非所有错误都是5xx。需将状态码与请求失败的原因匹配。
常见错误: 将所有错误路径连接到单个返回500 "internal_error" 的
Respond to Webhook
。对调用方而言,所有失败看起来都一样,即使是他们发送了错误输入。这会破坏监控:你无法区分真正的 outage 和调用方的错误输入。
按原因的默认映射:
原因状态码错误码处理路径
必填字段缺失或类型错误400
validation_error
提前使用基于Set的Schema验证器进行验证(基础模式详见
references/examples/validation-subworkflow.ts
,端点模板详见
validation-subworkflow-usage.ts
);对于简单场景,可使用内联IF/Switch + 专用的400响应。不要走错误输出路径。理想情况下,在响应中返回Schema,以便调用方自行修正。
认证信息缺失或无效401
unauthorized
同上。提前检查,直接返回401。
已认证但无权限403
forbidden
同上。
请求中包含资源ID,但我方数据中不存在404
not_found
从查询结果分支处理,而非查询错误输出。
操作与当前状态冲突(重复、竞争)409
conflict
通过逻辑检测,而非错误输出。
调用方超出速率限制429
rate_limit_exceeded
设置
Retry-After
响应头。
节点抛出未知错误500
internal_error
错误输出路径。
第三方API出错502
upstream_error
HTTP请求节点的错误输出。
工作流当前无法处理请求(下游服务宕机、上游速率限制)503
service_unavailable
通过特定错误检测,返回提示信息。
第三方API超时504
upstream_timeout
过滤错误消息后的错误输出。
两种不同流程:
  • 验证失败(4xx) 在执行工作逻辑之前检查,通过IF/Switch分支处理,而非错误输出。为每种情况使用专用的响应节点(400字段缺失、401无认证等)。
  • 执行失败(5xx) 来自错误输出(「我们尝试了,但发生故障」)。可为所有5xx使用单个错误响应器。必要时可通过检查失败节点来区分响应体中的
    error
    码。
单个响应节点,表达式驱动状态码。 当错误路径仅状态码和消息文本不同(响应体结构相同,无头部/内容类型变化)时,无需通过Switch分支到N个响应节点。
Respond to Webhook
节点支持在「响应码」和响应体字段中使用表达式。可内联计算状态码,因此单个响应节点即可处理整个错误路径。
ts
// 单个Respond to Webhook节点的响应码表达式:
{{ (() => {
    const msg = $json.error?.message || $json.message || ''
    if (msg.includes('INVALID_ID')) return 400
    if (/429|too many/i.test(msg)) return 429
    if (/openrouter|anthropic|llm/i.test(msg)) return 502
    return 500
})() }}
仅当响应在结构上存在差异时(不同头部、不同响应体结构、重定向、不同内容类型),才需要使用Switch + N个响应节点。结构相同仅数值不同的情况,使用单个表达式驱动的响应节点即可。
关于关联ID、可重试/致命标识、验证详情和速率限制结构等完整约定,详见
references/RESPONSE_SHAPES.md

Workflow-level error workflows

工作流级错误工作流

For unattended workflows, configure the instance's error workflow (or per-workflow override) to point at a workflow that:
  1. Captures the failure (workflow name, execution ID, error, stack).
  2. Notifies someone (Slack, email, on-call).
  3. Optionally enqueues a retry (with backoff).
Catches what per-node handling misses: timeouts, crashes between nodes, errors in unwired nodes.
See
references/ERROR_WORKFLOWS.md
.
对于无人值守工作流,配置实例级错误工作流(或按工作流覆盖),指向一个具备以下功能的工作流:
  1. 捕获失败信息(工作流名称、执行ID、错误、堆栈)。
  2. 通知相关人员(Slack、邮件、值班人员)。
  3. 可选:将重试请求加入队列(带退避策略)。
捕获节点级处理遗漏的错误:超时、节点间崩溃、未连接节点的错误。
详见
references/ERROR_WORKFLOWS.md

Reference files

参考文件

FileRead when
references/API_WORKFLOWS.md
Building or reviewing a webhook-trigger / respond-to-webhook workflow
references/ERROR_WORKFLOWS.md
Setting up workflow-level error catching for production workflows
references/RESPONSE_SHAPES.md
Defining the response body conventions for your APIs
文件阅读场景
references/API_WORKFLOWS.md
构建或评审Webhook触发/响应Webhook的工作流时
references/ERROR_WORKFLOWS.md
为生产环境工作流设置工作流级错误捕获时
references/RESPONSE_SHAPES.md
为你的API定义响应体约定时

Anti-patterns

反模式

Anti-patternWhat goes wrongFix
Webhook → process → respond, no error branchCaller gets timeout or empty 500Wire every fallible node's
output(1)
to a Respond to Webhook
Single Respond to Webhook for both pathsBody shape doesn't tell caller what happenedTwo Respond nodes, one per path, with explicit codes and bodies
Error path returns 200 with
{ "error": ... }
body
Caller's HTTP client treats it as success, so error handling never firesAlways 4xx/5xx for error paths
Catching errors in Code node and returning them as dataDownstream sees error-shaped data, workflow continuesLet it throw, configure
onError: 'continueErrorOutput'
and wire the error path
Production workflow with no workflow-level error workflowA genuine failure goes nowhereSet up an error workflow. See
ERROR_WORKFLOWS.md
Generic "Internal Server Error" on every failureCan't distinguish caller bug from upstream from rate limitStructured error codes. See
RESPONSE_SHAPES.md
Production node calls a flaky or rate-limited API with no
retryOnFail
Every transient 429 or upstream blip surfaces as a 5xx, and alerts fire on noiseSet
retryOnFail: true, maxTries: 3, waitBetweenTries: 5000
on the node. See "Self-healing on transient failures"
500 for everything not a 200Caller can't separate their bad input from your outage, so their monitoring fires on your noiseMap cause → status code. Caller issues are 4xx.
Switch over the error message → N Respond nodes that differ only by status code5 nodes for what's one Respond with an expression-driven
Response Code
Compute the code inline in a single Respond. See "One Respond, expression-driven status code" above.
反模式问题修复方案
Webhook → 处理 → 响应,无错误分支调用方遇到超时或空500响应将每个可能出错的节点的
output(1)
连接到
Respond to Webhook
单个Respond to Webhook处理成功和错误路径响应体无法告知调用方具体问题使用两个响应节点,分别对应成功和错误路径,设置明确的状态码和响应体
错误路径返回200状态码和
{ "error": ... }
响应体
调用方的HTTP客户端将其视为成功,错误处理逻辑不会触发错误路径始终返回4xx/5xx状态码
在Code节点中捕获错误并作为数据返回下游系统会将错误形状的数据视为正常数据,工作流继续执行让错误抛出,配置
onError: 'continueErrorOutput'
并连接错误路径
生产环境工作流未配置工作流级错误工作流真正的故障无迹可寻设置错误工作流,详见
ERROR_WORKFLOWS.md
所有失败都返回通用的「Internal Server Error」无法区分调用方错误、上游错误和速率限制使用结构化错误码,详见
RESPONSE_SHAPES.md
生产环境节点调用不稳定或有速率限制的API,但未配置
retryOnFail
每次短暂的429错误或上游故障都会触发5xx响应,警报被噪音淹没在节点上设置
retryOnFail: true, maxTries: 3, waitBetweenTries: 5000
,详见「瞬态故障的自我修复」
除200外所有情况都返回500调用方无法区分自身输入错误和我方 outage,导致其监控被我方噪音触发将原因映射到状态码:调用方问题返回4xx
根据错误消息使用Switch分支到N个仅状态码不同的响应节点用5个节点实现单个带表达式驱动状态码的响应节点即可完成的功能在单个响应节点内联计算状态码,详见上文「单个响应节点,表达式驱动状态码」