n8n-error-handling

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

n8n Error Handling

n8n 错误处理

Default n8n node behavior: error → workflow halts → caller gets nothing useful. For unattended workflows (webhook APIs, scheduled jobs, queue workers), that default is wrong. The symptom is "the integration just stopped working" with no log, no message, no clue.

This skill is about handling errors so failures are loud, structured, and recoverable. Or best case scenario, handled in a way where it self heals.

n8n节点默认行为：出错 → 工作流终止 → 调用方无法获取有效信息。对于无人值守工作流（Webhook API、定时任务、队列 worker），这种默认行为并不合适。表现为「集成突然停止工作」，但无日志、无提示、无任何线索。

本技能旨在介绍如何处理错误，让失败情况清晰可见、结构化且可恢复。最佳情况下，可实现自我修复。

Non-negotiables

必须遵循的规则

For any API-shaped workflow (webhook trigger paired with

Respond to Webhook

Every fallible node's error output is wired, and both paths end at a
Respond to Webhook
. No hanging error branches, or the caller would see a timeout. "Fallible" = HTTP, DB, third-party API, file operation, anything that throws.
Status code maps to cause. Caller's fault → 4xx, your fault → 5xx. A 200 default on an error path produces silent failure: caller thinks success, processes empty data.

For any unattended workflow (scheduled, cron, queue-driven, agent tool):

Configure a workflow-level error workflow. Catches what escapes per-node handling: timeouts, crashes between nodes, errors in unwired nodes. See
```
references/ERROR_WORKFLOWS.md
```
. Currently, only the user can set error workflows through the UI in the workflow settings.

对于任何API类型工作流（Webhook触发器搭配

Respond to Webhook

）：

每个可能出错的节点的错误输出都需连接，且两条路径最终都指向
Respond to Webhook
。不能有未连接的错误分支，否则调用方会遇到超时。「可能出错的节点」包括HTTP、数据库、第三方API、文件操作等任何可能抛出错误的节点。
状态码与错误原因匹配。 调用方错误 → 4xx，我方错误 → 5xx。若错误路径默认返回200，会导致静默失败：调用方认为请求成功，进而处理空数据。

对于任何无人值守工作流（定时任务、 cron、队列驱动、Agent工具）：

配置工作流级别的错误工作流。 捕获节点级处理遗漏的错误：超时、节点间崩溃、未连接节点的错误。详见
```
references/ERROR_WORKFLOWS.md
```
。目前，仅用户可通过UI在工作流设置中配置错误工作流。

Strong defaults

When error handling can be looser

错误处理可放宽的场景

Internal one-off workflows where you're the only user, you watch each run, and the cost of failure is "I notice and re-run". Default

onError: 'stopWorkflow'

is fine. The line: if anyone other than you sees the output (downstream system, end user, on-call), the non-negotiables apply.

仅你自己使用的内部一次性工作流，你会监控每次运行，且失败成本仅为「我发现后重新运行」。此时默认的

onError: 'stopWorkflow'

是可行的。判断标准：若除你之外的任何人（下游系统、终端用户、值班人员）会看到输出，则必须遵循上述规则。

API workflow shape

API工作流结构

The canonical webhook-API workflow:

Webhook trigger
  ├── (success path)  → Process → Respond to Webhook (200, body)
  └── (any node's error output)
                       → Respond to Webhook (5xx, structured error body)
                       → Optional: log to error tracker / logger / notify channel

For a complete walkthrough including how to wire multiple fallible nodes to a single error responder, see

references/API_WORKFLOWS.md

标准Webhook-API工作流：

Webhook触发器
  ├── (成功路径)  → 处理逻辑 → Respond to Webhook (200，响应体)
  └── (任意节点的错误输出)
                       → Respond to Webhook (5xx，结构化错误响应体)
                       → 可选：记录至错误追踪工具/日志/通知渠道

关于如何将多个可能出错的节点连接到单个错误响应器的完整指南，详见

references/API_WORKFLOWS.md

。

Schema validator (Set IIFE)

Schema验证器（Set IIFE模式）

For any webhook API doing input validation, lift the Set-based schema validator pattern into the endpoint instead of writing IF/Switch chains per field. The two example files are the source of truth:

```
references/examples/validation-subworkflow.ts
```
: the bare pattern (Webhook → Set with the validation IIFE → Respond, expression-driven status code). Useful as a minimal demo.
```
references/examples/validation-subworkflow-usage.ts
```
: the endpoint pattern (Webhook → Set → If valid → your business logic → 200 success / 400 with the standard
```
{error: "validation_error", message, details, request_schema}
```
body). Lift this into your endpoint and replace the NoOp placeholder with real logic.

The procedure for an agent using this:

Lift the usage-example structure into the new endpoint. Webhook → Set (Validate Schema) → If Params Valid → your logic → success/400 Respond. Don't reinvent.
Edit the IIFE for your schema. Update
```
REQUIRED_SCHEMA
```
and the per-field checks inside the Set node's expression for your endpoint's input shape. The pattern below the schema constant is mechanical: presence check, type check, constraint check, push to
```
errors[]
```
.
Leave the output shape alone.
```
valid
```
,
```
validationError
```
,
```
details
```
,
```
requiredSchema
```
are the contract the Respond node consumes. Renaming them breaks the response body.

The full procedure, supported constraint patterns, schema design rules, and the

={{ ... }}

wrapping gotchas live in

references/API_WORKFLOWS.md

"Schema validator (Set IIFE)".

对于任何需要输入验证的Webhook API，应采用基于Set的Schema验证器模式，而非为每个字段编写IF/Switch分支。以下两个示例文件为权威实现：

```
references/examples/validation-subworkflow.ts
```
：基础模式（Webhook → 带验证IIFE的Set → 响应，状态码由表达式驱动）。适合作为最小演示。
```
references/examples/validation-subworkflow-usage.ts
```
：端点模式（Webhook → Set → 验证通过 → 业务逻辑 → 200成功 / 400返回标准格式
```
{error: "validation_error", message, details, request_schema}
```
响应体）。可将此模式应用到你的端点，并用真实逻辑替换NoOp占位符。

Agent使用此模式的步骤：

将示例结构迁移到新端点。 Webhook → Set（验证Schema）→ 参数验证通过 → 你的逻辑 → 成功/400响应。不要重新发明轮子。
针对你的Schema编辑IIFE。 更新
```
REQUIRED_SCHEMA
```
以及Set节点表达式中的每个字段检查逻辑，以匹配你的端点输入结构。Schema常量下方的模式是固定流程：存在性检查、类型检查、约束检查、将错误推入
```
errors[]
```
。
不要修改输出结构。
```
valid
```
、
```
validationError
```
、
```
details
```
、
```
requiredSchema
```
是响应节点依赖的约定字段。重命名这些字段会破坏响应体。

完整步骤、支持的约束模式、Schema设计规则以及

={{ ... }}

包裹的注意事项，详见

references/API_WORKFLOWS.md

中的「Schema验证器（Set IIFE模式）」章节。

Per-node error setup (recap)

节点级错误设置回顾

Each fallible node needs two changes (per

n8n-connections

ERROR_OUTPUTS.md

Set
onError: 'continueErrorOutput'
on the node config.
Wire
output(1)
to your error handler.

Both required. One without the other is the silent-failure mode.

每个可能出错的节点需进行两处修改（详见

n8n-connections

ERROR_OUTPUTS.md

）：

在节点配置中设置
onError: 'continueErrorOutput'
将
output(1)
连接到你的错误处理程序

两者缺一不可。缺少任何一个都会导致静默失败模式。

Self-healing on transient failures

瞬态故障的自我修复

Before wiring the error path, configure node-level retry on any node making a network call (HTTP Request, comms like Gmail/Slack/Discord, DB, AI, third-party API nodes). Transient 429s and brief upstream blips get absorbed, so the error output then only fires on real failures, and alerts and 5xx responses reflect actual problems instead of noise.

{
    retryOnFail: true,
    maxTries: 3,
    waitBetweenTries: 5000,    // ms; 5000 is the max and should be your default
}

Works on any node that calls a network service, not just HTTP Request. The engine retries on any error, with no per-status-code filter, and the engine caps

maxTries

at 5 and

waitBetweenTries

at 5000ms (

packages/core/src/execution-engine/workflow-execute.ts

). See

n8n-node-configuration

HTTP_NODES.md

and

AI_NODES.md

for node-specific notes.

在连接错误路径之前，为所有进行网络调用的节点（HTTP请求、Gmail/Slack/Discord等通讯节点、数据库、AI、第三方API节点）配置节点级重试。短暂的429错误和上游故障会被自动吸收，因此错误输出仅在真正的故障时触发，警报和5xx响应反映的是实际问题而非噪音。

{
    retryOnFail: true,
    maxTries: 3,
    waitBetweenTries: 5000,    // 毫秒；5000为最大值，应作为你的默认配置
}

此配置适用于任何调用网络服务的节点，而非仅HTTP请求节点。引擎会对任何错误进行重试，无状态码过滤，且引擎将

maxTries

上限设为5，

waitBetweenTries

上限设为5000毫秒（详见

packages/core/src/execution-engine/workflow-execute.ts

）。节点特定说明详见

n8n-node-configuration

HTTP_NODES.md

和

AI_NODES.md

。

Response shapes: map the cause to the status code

响应结构：将原因映射到状态码

A 5xx response with

text/plain "Internal Server Error"

is technically 5xx but useless. And not every error is 5xx. Match the status code to why the request failed.

Common mistake: wiring every error path to a single

Respond to Webhook

returning 500 "internal_error". Every failure looks the same to the caller, even when they sent bad input. Breaks monitoring: you can't distinguish real outages from bad caller input.

Default mapping by cause:

Cause	Status	Error code	Path
Required field missing or wrong type	400	`validation_error`	Validate up front with the Set-based schema validator (see `references/examples/validation-subworkflow.ts` for the bare pattern and `validation-subworkflow-usage.ts` for the endpoint template), or, for trivial cases, an inline IF/Switch + dedicated 400 Respond. Don't go through error outputs. Ideally echo the schema in the response so the caller can self-correct.
Auth missing or invalid	401	`unauthorized`	Same. Check up front, return 401 directly.
Authenticated but not allowed	403	`forbidden`	Same.
Resource ID exists in request, doesn't in your data	404	`not_found`	Branch off the lookup result, not the lookup error.
Operation conflicts with current state (duplicate, race)	409	`conflict`	Detect with logic, not error output.
Caller exceeded rate limit	429	`rate_limit_exceeded`	Set `Retry-After` header.
Node threw and you don't know why	500	`internal_error`	The error-output path.
Third-party API errored	502	`upstream_error`	Error output of the HTTP Request node.
Workflow can't currently process (downstream down, rate-limited upstream)	503	`service_unavailable`	Detect via specific error, return with hint.
Third-party API timed out	504	`upstream_timeout`	Error output filtered by error message.

Two distinct flows:

Validation failures (4xx) are checked upstream of the work, via IF/Switch branches, not error outputs. Use a dedicated Respond per shape (400 missing field, 401 no auth, etc.).
Execution failures (5xx) come out of error outputs ("we tried, something broke"). A single error responder for all 5xx is fine. Differentiate the body's
```
error
```
code by inspecting the failed node where useful.

One Respond, expression-driven status code. When the error path differs only by status code and message text (same body shape, no header/content-type changes), don't fan out to N Respond nodes via a Switch. The Respond to Webhook node accepts expressions in its

Response Code

and body fields. Compute the code inline, so one Respond node carries the whole error path.

// Response Code on a single Respond to Webhook node:
{{ (() => {
    const msg = $json.error?.message || $json.message || ''
    if (msg.includes('INVALID_ID')) return 400
    if (/429|too many/i.test(msg)) return 429
    if (/openrouter|anthropic|llm/i.test(msg)) return 502
    return 500
})() }}

Switch + N Responds only earn their place when responses diverge structurally: different headers, different body shapes, redirects, different content types. Same shape with a different number is one expression-driven Respond.

For full conventions including correlation IDs, retryable-vs-fatal flags, validation details, and rate-limit shapes, see

references/RESPONSE_SHAPES.md

返回5xx状态码和

text/plain "Internal Server Error"

虽然技术上符合规范，但毫无用处。并非所有错误都是5xx。需将状态码与请求失败的原因匹配。

常见错误： 将所有错误路径连接到单个返回500 "internal_error" 的

Respond to Webhook

。对调用方而言，所有失败看起来都一样，即使是他们发送了错误输入。这会破坏监控：你无法区分真正的 outage 和调用方的错误输入。

按原因的默认映射：

原因	状态码	错误码	处理路径
必填字段缺失或类型错误	400	`validation_error`	提前使用基于Set的Schema验证器进行验证（基础模式详见 `references/examples/validation-subworkflow.ts` ，端点模板详见 `validation-subworkflow-usage.ts` ）；对于简单场景，可使用内联IF/Switch + 专用的400响应。不要走错误输出路径。理想情况下，在响应中返回Schema，以便调用方自行修正。
认证信息缺失或无效	401	`unauthorized`	同上。提前检查，直接返回401。
已认证但无权限	403	`forbidden`	同上。
请求中包含资源ID，但我方数据中不存在	404	`not_found`	从查询结果分支处理，而非查询错误输出。
操作与当前状态冲突（重复、竞争）	409	`conflict`	通过逻辑检测，而非错误输出。
调用方超出速率限制	429	`rate_limit_exceeded`	设置 `Retry-After` 响应头。
节点抛出未知错误	500	`internal_error`	错误输出路径。
第三方API出错	502	`upstream_error`	HTTP请求节点的错误输出。
工作流当前无法处理请求（下游服务宕机、上游速率限制）	503	`service_unavailable`	通过特定错误检测，返回提示信息。
第三方API超时	504	`upstream_timeout`	过滤错误消息后的错误输出。

两种不同流程：

验证失败（4xx） 在执行工作逻辑之前检查，通过IF/Switch分支处理，而非错误输出。为每种情况使用专用的响应节点（400字段缺失、401无认证等）。
执行失败（5xx） 来自错误输出（「我们尝试了，但发生故障」）。可为所有5xx使用单个错误响应器。必要时可通过检查失败节点来区分响应体中的
```
error
```
码。

单个响应节点，表达式驱动状态码。 当错误路径仅状态码和消息文本不同（响应体结构相同，无头部/内容类型变化）时，无需通过Switch分支到N个响应节点。

Respond to Webhook

节点支持在「响应码」和响应体字段中使用表达式。可内联计算状态码，因此单个响应节点即可处理整个错误路径。

// 单个Respond to Webhook节点的响应码表达式：
{{ (() => {
    const msg = $json.error?.message || $json.message || ''
    if (msg.includes('INVALID_ID')) return 400
    if (/429|too many/i.test(msg)) return 429
    if (/openrouter|anthropic|llm/i.test(msg)) return 502
    return 500
})() }}

仅当响应在结构上存在差异时（不同头部、不同响应体结构、重定向、不同内容类型），才需要使用Switch + N个响应节点。结构相同仅数值不同的情况，使用单个表达式驱动的响应节点即可。

关于关联ID、可重试/致命标识、验证详情和速率限制结构等完整约定，详见

references/RESPONSE_SHAPES.md

。

Workflow-level error workflows

工作流级错误工作流

For unattended workflows, configure the instance's error workflow (or per-workflow override) to point at a workflow that:

Captures the failure (workflow name, execution ID, error, stack).
Notifies someone (Slack, email, on-call).
Optionally enqueues a retry (with backoff).

Catches what per-node handling misses: timeouts, crashes between nodes, errors in unwired nodes.

See

references/ERROR_WORKFLOWS.md

对于无人值守工作流，配置实例级错误工作流（或按工作流覆盖），指向一个具备以下功能的工作流：

捕获失败信息（工作流名称、执行ID、错误、堆栈）。
通知相关人员（Slack、邮件、值班人员）。
可选：将重试请求加入队列（带退避策略）。

捕获节点级处理遗漏的错误：超时、节点间崩溃、未连接节点的错误。

详见

references/ERROR_WORKFLOWS.md

。

Reference files

参考文件

File	Read when
`references/API_WORKFLOWS.md`	Building or reviewing a webhook-trigger / respond-to-webhook workflow
`references/ERROR_WORKFLOWS.md`	Setting up workflow-level error catching for production workflows
`references/RESPONSE_SHAPES.md`	Defining the response body conventions for your APIs

文件	阅读场景
`references/API_WORKFLOWS.md`	构建或评审Webhook触发/响应Webhook的工作流时
`references/ERROR_WORKFLOWS.md`	为生产环境工作流设置工作流级错误捕获时
`references/RESPONSE_SHAPES.md`	为你的API定义响应体约定时

Anti-patterns

反模式

Anti-pattern	What goes wrong	Fix
Webhook → process → respond, no error branch	Caller gets timeout or empty 500	Wire every fallible node's `output(1)` to a Respond to Webhook
Single Respond to Webhook for both paths	Body shape doesn't tell caller what happened	Two Respond nodes, one per path, with explicit codes and bodies
Error path returns 200 with `{ "error": ... }` body	Caller's HTTP client treats it as success, so error handling never fires	Always 4xx/5xx for error paths
Catching errors in Code node and returning them as data	Downstream sees error-shaped data, workflow continues	Let it throw, configure `onError: 'continueErrorOutput'` and wire the error path
Production workflow with no workflow-level error workflow	A genuine failure goes nowhere	Set up an error workflow. See `ERROR_WORKFLOWS.md`
Generic "Internal Server Error" on every failure	Can't distinguish caller bug from upstream from rate limit	Structured error codes. See `RESPONSE_SHAPES.md`
Production node calls a flaky or rate-limited API with no `retryOnFail`	Every transient 429 or upstream blip surfaces as a 5xx, and alerts fire on noise	Set `retryOnFail: true, maxTries: 3, waitBetweenTries: 5000` on the node. See "Self-healing on transient failures"
500 for everything not a 200	Caller can't separate their bad input from your outage, so their monitoring fires on your noise	Map cause → status code. Caller issues are 4xx.
Switch over the error message → N Respond nodes that differ only by status code	5 nodes for what's one Respond with an expression-driven `Response Code`	Compute the code inline in a single Respond. See "One Respond, expression-driven status code" above.

反模式	问题	修复方案
Webhook → 处理 → 响应，无错误分支	调用方遇到超时或空500响应	将每个可能出错的节点的 `output(1)` 连接到 `Respond to Webhook`
单个Respond to Webhook处理成功和错误路径	响应体无法告知调用方具体问题	使用两个响应节点，分别对应成功和错误路径，设置明确的状态码和响应体
错误路径返回200状态码和 `{ "error": ... }` 响应体	调用方的HTTP客户端将其视为成功，错误处理逻辑不会触发	错误路径始终返回4xx/5xx状态码
在Code节点中捕获错误并作为数据返回	下游系统会将错误形状的数据视为正常数据，工作流继续执行	让错误抛出，配置 `onError: 'continueErrorOutput'` 并连接错误路径
生产环境工作流未配置工作流级错误工作流	真正的故障无迹可寻	设置错误工作流，详见 `ERROR_WORKFLOWS.md`
所有失败都返回通用的「Internal Server Error」	无法区分调用方错误、上游错误和速率限制	使用结构化错误码，详见 `RESPONSE_SHAPES.md`
生产环境节点调用不稳定或有速率限制的API，但未配置 `retryOnFail`	每次短暂的429错误或上游故障都会触发5xx响应，警报被噪音淹没	在节点上设置 `retryOnFail: true, maxTries: 3, waitBetweenTries: 5000` ，详见「瞬态故障的自我修复」
除200外所有情况都返回500	调用方无法区分自身输入错误和我方 outage，导致其监控被我方噪音触发	将原因映射到状态码：调用方问题返回4xx
根据错误消息使用Switch分支到N个仅状态码不同的响应节点	用5个节点实现单个带表达式驱动状态码的响应节点即可完成的功能	在单个响应节点内联计算状态码，详见上文「单个响应节点，表达式驱动状态码」