n8n Error Handling
Default n8n node behavior: error → workflow halts → caller gets nothing useful. For unattended workflows (webhook APIs, scheduled jobs, queue workers), that default is wrong. The symptom is "the integration just stopped working" with no log, no message, no clue.
This skill is about handling errors so failures are loud, structured, and recoverable. Or best case scenario, handled in a way where it self heals.
Non-negotiables
For any
API-shaped workflow (webhook trigger paired with
):
- Every fallible node's error output is wired, and both paths end at a . No hanging error branches, or the caller would see a timeout. "Fallible" = HTTP, DB, third-party API, file operation, anything that throws.
- Status code maps to cause. Caller's fault → 4xx, your fault → 5xx. A 200 default on an error path produces silent failure: caller thinks success, processes empty data.
For any unattended workflow (scheduled, cron, queue-driven, agent tool):
<!-- TEMPORARY: update when workflow settings are editable by mcp -->
- Configure a workflow-level error workflow. Catches what escapes per-node handling: timeouts, crashes between nodes, errors in unwired nodes. See
references/ERROR_WORKFLOWS.md
. Currently, only the user can set error workflows through the UI in the workflow settings.
Strong defaults
- Error response bodies are structured. Not just "Internal Server Error". Use
{ "error": "<short identifier>", "message": "<human-readable>" }
. See references/RESPONSE_SHAPES.md
.
- Network-calling nodes have configured. Transient 429s and upstream blips get absorbed before reaching the error path. See "Self-healing on transient failures" below.
When error handling can be looser
Internal one-off workflows where you're the only user, you watch each run, and the cost of failure is "I notice and re-run". Default
is fine. The line: if anyone other than you sees the output (downstream system, end user, on-call), the non-negotiables apply.
API workflow shape
The canonical webhook-API workflow:
Webhook trigger
├── (success path) → Process → Respond to Webhook (200, body)
└── (any node's error output)
→ Respond to Webhook (5xx, structured error body)
→ Optional: log to error tracker / logger / notify channel
For a complete walkthrough including how to wire multiple fallible nodes to a single error responder, see
references/API_WORKFLOWS.md
.
Schema validator (Set IIFE)
For any webhook API doing input validation, lift the Set-based schema validator pattern into the endpoint instead of writing IF/Switch chains per field. The two example files are the source of truth:
references/examples/validation-subworkflow.ts
: the bare pattern (Webhook → Set with the validation IIFE → Respond, expression-driven status code). Useful as a minimal demo.
references/examples/validation-subworkflow-usage.ts
: the endpoint pattern (Webhook → Set → If valid → your business logic → 200 success / 400 with the standard {error: "validation_error", message, details, request_schema}
body). Lift this into your endpoint and replace the NoOp placeholder with real logic.
The procedure for an agent using this:
- Lift the usage-example structure into the new endpoint. Webhook → Set (Validate Schema) → If Params Valid → your logic → success/400 Respond. Don't reinvent.
- Edit the IIFE for your schema. Update and the per-field checks inside the Set node's expression for your endpoint's input shape. The pattern below the schema constant is mechanical: presence check, type check, constraint check, push to .
- Leave the output shape alone. , , , are the contract the Respond node consumes. Renaming them breaks the response body.
The full procedure, supported constraint patterns, schema design rules, and the
wrapping gotchas live in
references/API_WORKFLOWS.md
"Schema validator (Set IIFE)".
Per-node error setup (recap)
Each fallible node needs two changes (per
):
- Set
onError: 'continueErrorOutput'
on the node config.
- Wire to your error handler.
Both required. One without the other is the silent-failure mode.
Self-healing on transient failures
Before wiring the error path, configure node-level retry on any node making a network call (HTTP Request, comms like Gmail/Slack/Discord, DB, AI, third-party API nodes). Transient 429s and brief upstream blips get absorbed, so the error output then only fires on real failures, and alerts and 5xx responses reflect actual problems instead of noise.
ts
{
retryOnFail: true,
maxTries: 3,
waitBetweenTries: 5000, // ms; 5000 is the max and should be your default
}
Works on any node that calls a network service, not just HTTP Request. The engine retries on
any error, with no per-status-code filter, and the engine caps
at 5 and
at 5000ms (
packages/core/src/execution-engine/workflow-execute.ts
). See
and
for node-specific notes.
Response shapes: map the cause to the status code
A 5xx response with
text/plain "Internal Server Error"
is technically 5xx but useless. And not every error is 5xx. Match the status code to
why the request failed.
Common mistake: wiring every error path to a single
returning 500 "internal_error". Every failure looks the same to the caller, even when they sent bad input. Breaks monitoring: you can't distinguish real outages from bad caller input.
Default mapping by cause:
| Cause | Status | Error code | Path |
|---|
| Required field missing or wrong type | 400 | | Validate up front with the Set-based schema validator (see references/examples/validation-subworkflow.ts
for the bare pattern and validation-subworkflow-usage.ts
for the endpoint template), or, for trivial cases, an inline IF/Switch + dedicated 400 Respond. Don't go through error outputs. Ideally echo the schema in the response so the caller can self-correct. |
| Auth missing or invalid | 401 | | Same. Check up front, return 401 directly. |
| Authenticated but not allowed | 403 | | Same. |
| Resource ID exists in request, doesn't in your data | 404 | | Branch off the lookup result, not the lookup error. |
| Operation conflicts with current state (duplicate, race) | 409 | | Detect with logic, not error output. |
| Caller exceeded rate limit | 429 | | Set header. |
| Node threw and you don't know why | 500 | | The error-output path. |
| Third-party API errored | 502 | | Error output of the HTTP Request node. |
| Workflow can't currently process (downstream down, rate-limited upstream) | 503 | | Detect via specific error, return with hint. |
| Third-party API timed out | 504 | | Error output filtered by error message. |
Two distinct flows:
- Validation failures (4xx) are checked upstream of the work, via IF/Switch branches, not error outputs. Use a dedicated Respond per shape (400 missing field, 401 no auth, etc.).
- Execution failures (5xx) come out of error outputs ("we tried, something broke"). A single error responder for all 5xx is fine. Differentiate the body's code by inspecting the failed node where useful.
One Respond, expression-driven status code. When the error path differs only by status code and message text (same body shape, no header/content-type changes), don't fan out to N Respond nodes via a Switch. The Respond to Webhook node accepts expressions in its
and body fields. Compute the code inline, so one Respond node carries the whole error path.
ts
// Response Code on a single Respond to Webhook node:
{{ (() => {
const msg = $json.error?.message || $json.message || ''
if (msg.includes('INVALID_ID')) return 400
if (/429|too many/i.test(msg)) return 429
if (/openrouter|anthropic|llm/i.test(msg)) return 502
return 500
})() }}
Switch + N Responds only earn their place when responses diverge structurally: different headers, different body shapes, redirects, different content types. Same shape with a different number is one expression-driven Respond.
For full conventions including correlation IDs, retryable-vs-fatal flags, validation details, and rate-limit shapes, see
references/RESPONSE_SHAPES.md
.
Workflow-level error workflows
For unattended workflows, configure the instance's error workflow (or per-workflow override) to point at a workflow that:
- Captures the failure (workflow name, execution ID, error, stack).
- Notifies someone (Slack, email, on-call).
- Optionally enqueues a retry (with backoff).
Catches what per-node handling misses: timeouts, crashes between nodes, errors in unwired nodes.
See
references/ERROR_WORKFLOWS.md
.
Reference files
| File | Read when |
|---|
references/API_WORKFLOWS.md
| Building or reviewing a webhook-trigger / respond-to-webhook workflow |
references/ERROR_WORKFLOWS.md
| Setting up workflow-level error catching for production workflows |
references/RESPONSE_SHAPES.md
| Defining the response body conventions for your APIs |
Anti-patterns
| Anti-pattern | What goes wrong | Fix |
|---|
| Webhook → process → respond, no error branch | Caller gets timeout or empty 500 | Wire every fallible node's to a Respond to Webhook |
| Single Respond to Webhook for both paths | Body shape doesn't tell caller what happened | Two Respond nodes, one per path, with explicit codes and bodies |
| Error path returns 200 with body | Caller's HTTP client treats it as success, so error handling never fires | Always 4xx/5xx for error paths |
| Catching errors in Code node and returning them as data | Downstream sees error-shaped data, workflow continues | Let it throw, configure onError: 'continueErrorOutput'
and wire the error path |
| Production workflow with no workflow-level error workflow | A genuine failure goes nowhere | Set up an error workflow. See |
| Generic "Internal Server Error" on every failure | Can't distinguish caller bug from upstream from rate limit | Structured error codes. See |
| Production node calls a flaky or rate-limited API with no | Every transient 429 or upstream blip surfaces as a 5xx, and alerts fire on noise | Set retryOnFail: true, maxTries: 3, waitBetweenTries: 5000
on the node. See "Self-healing on transient failures" |
| 500 for everything not a 200 | Caller can't separate their bad input from your outage, so their monitoring fires on your noise | Map cause → status code. Caller issues are 4xx. |
| Switch over the error message → N Respond nodes that differ only by status code | 5 nodes for what's one Respond with an expression-driven | Compute the code inline in a single Respond. See "One Respond, expression-driven status code" above. |