Skip to content

Global Error Handler

Imagine about forty N8N workflows running continuously — GitHub synchronisation, Docker updates, Telegram notifications, content pipeline. When one of them crashes at 3 AM, how do you know what happened, whether it is serious, and what to do?

The Global Error Handler (GEH) is the answer: a centralised system that intercepts every error, classifies it automatically, stores it in a dedicated queue, and sends a Telegram notification with action buttons. A single entry point for every error in the N8N infrastructure.

WorkflowNodesRole
Global Error Handler19Capture, classification, notification
GEH Callback Actions30Retry, AI analysis, ignore, fix
GEH Fix Applier31Application and rollback of AI fixes
DLQ Weekly Digest8Weekly error summary

GEH Callback Actions · 30 nodes

Global Error Handler · 19 nodes

~42 N8N workflows · Error Trigger

Extract Error · redact secrets

Check error_handling_config

DLQ Insert · err_

Classify · keyword rules

Notification Hub · Telegram with buttons

Retry · 6n

AI Details · 9n

Ignore · 1n

AI Fix · 12n

GEH Fix Applier · 31 nodes

DLQ Weekly Digest · 8 nodes


Before the GEH, N8N errors were silent. A workflow failed, N8N noted it in its internal logs, and nobody knew before noticing a malfunction. No classification, no notification, no visibility.

ProblemWithout GEHWith GEH
Silent errorsDiscovered by chance in logsImmediate Telegram notification
No context”Workflow failed” without detailsClassification, failing node, stack trace
Manual retryOpen N8N, find the execution, restart[Retry] button in Telegram
No historyErrors lost after N8N purgePersistent Dead Letter Queue
Repetitive errorsSame alert in a loopDeduplication + per-workflow config

The GEH does not handle retries blindly. It relies on the native N8N retry at the node level:

LevelMechanismWhen
NodeNative N8N Retry on FailTransient errors (timeout, 503, rate limit)
WorkflowRetry button via GEHWhen all node retries are exhausted

When a workflow fails, here is what happens, step by step:

1. Capture — The Error Trigger intercepts the failure (including activation errors).

2. Extraction — A Code node normalises the context: source workflow, failing node, error message, stack trace. Sensitive data (tokens, passwords, connection strings) are automatically masked by a redact() function.

3. Configuration — The GEH consults the error_handling_config table to determine whether this workflow has specific rules (notifications disabled, temporary suppression, custom max retries).

4. DLQ insert — The error is stored in the error_dead_letter_queue table with a unique identifier (err_<16hex>).

5. Classification — Keyword rules analyse the error message to determine type and severity:

Detected typeKeywordsSeverity
timeouttimeout, ETIMEDOUT, deadlinewarning
networkECONNREFUSED, ENOTFOUND, socketwarning
authentication401, 403, unauthorizedcritical
rate_limit429, rate limit, quotawarning
data_validationinvalid, schema, parse errorinfo
resourceout of memory, disk fullcritical
configurationmissing credential, not foundcritical

6. Notification — If notifications are enabled for that workflow, the GEH calls the Notification Hub with a formatted message and action buttons.

When the notification arrives on Telegram, it offers four buttons:

[Retry] — Restarts the failed execution via the N8N API. Before restarting, the system checks that the error is eligible for retry (can_auto_retry). The retry counter is incremented in the DLQ.

[Details] — Requests an AI analysis from Claude. The first request generates the analysis (error type, probable cause, fix suggestion); subsequent ones use the cache stored in the DLQ. Useful to understand a complex error without opening N8N.

[Ignore] — Marks the error as resolved in the DLQ. Useful for false positives or ephemeral errors already fixed.

[Fix] — Advanced feature: Claude analyses the failing workflow and proposes an automatic fix. The fix is stored as a proposal, then a second workflow (GEH Fix Applier) handles application with backup, confirmation and rollback.

Each workflow can have its own rules in the error_handling_config table:

ParameterDefaultUsage
error_handling_enabledtrueDisable for a workflow under maintenance
max_retries3Override the retry count
notify_on_errortrueMute notifications without disabling the DLQ
auto_retry_enabledfalseAutomatic retry without intervention
suppress_untilnullTemporary suppression (ISO timestamp)

Every Sunday at 9 AM, the DLQ Weekly Digest workflow generates a summary of the week’s errors and sends it through the Notification Hub. This digest helps spot patterns: a workflow failing regularly, a recurring error type, retries that never solve the problem.

The digest includes:

  • Total error count by severity
  • Top failing workflows
  • Unresolved errors (retry_status = pending/exhausted)
  • Trends compared to the previous week

LimitImpactMitigation
Rule-based classificationUnknown types classified as “unknown”On-demand AI analysis via [Details]
No auto-retryEach retry requires a clickauto_retry_enabled flag prepared but not deployed
Experimental AI FixProposed fixes are not always applicableDouble confirmation before application + automatic rollback
No correlationRelated errors not groupedIdentifiable via the weekly digest

If error volume grows:

  • Enable auto-retry for transient errors (timeout, rate limit)
  • Group similar errors in the digest
  • Add an alert threshold: “5 errors from the same workflow in 1h = escalate”

If cross-workflow correlation is needed:

  • Trace execution chains (workflow A calls B which calls C)
  • If C fails, show the full chain context
  • Allow retry from the parent workflow

If the team grows:

  • Assign errors by domain (Docker → ops, Odoo → business)
  • Escalate if not handled after a configurable delay
  • Grafana dashboard with DLQ metrics

  • Glossary — Dead Letter Queue, Error Trigger