---
title: Global Error Handler
url: https://blog.guigpap.com/en/workflows/error-handler/
url_md: https://blog.guigpap.com/en/workflows/error-handler.md
category: automation
date: '2026-03-28'
maturite: production
techno:
  - n8n
  - telegram
  - claude
application:
  - automation
  - operations
---

# Global Error Handler

> Centralised N8N error handling with smart classification, Dead Letter Queue and automatic retry

## 1. What? — Definition and context

Imagine about forty N8N workflows running continuously — GitHub synchronisation, Docker updates, Telegram notifications, content pipeline. When one of them crashes at 3 AM, how do you know what happened, whether it is serious, and what to do?

The **Global Error Handler** (GEH) is the answer: a centralised system that intercepts every error, classifies it automatically, stores it in a dedicated queue, and sends a Telegram notification with action buttons. A single entry point for every error in the N8N infrastructure.

> **Note - Dead Letter Queue**
>
> A **Dead Letter Queue** (DLQ) is a concept borrowed from messaging systems: when a message cannot be processed, instead of being lost, it is stored in a dedicated queue for later investigation. Here, every N8N error becomes a DLQ entry with its full context.

### The 4 workflows of the system

| Workflow | Nodes | Role |
|----------|-------|------|
| **Global Error Handler** | 19 | Capture, classification, notification |
| **GEH Callback Actions** | 30 | Retry, AI analysis, ignore, fix |
| **GEH Fix Applier** | 31 | Application and rollback of AI fixes |
| **DLQ Weekly Digest** | 8 | Weekly error summary |

### Architecture

```mermaid
flowchart TD
  WFs["~42 N8N workflows · Error Trigger"]

  subgraph GEH["Global Error Handler · 19 nodes"]
    direction TB
    Extract["Extract Error · redact secrets"]
    Config["Check error_handling_config"]
    DLQ["DLQ Insert · err_<id>"]
    Classify["Classify · keyword rules"]
  end

  Hub["Notification Hub · Telegram with buttons"]

  subgraph CB["GEH Callback Actions · 30 nodes"]
    direction TB
    Retry["Retry · 6n"]
    Details["AI Details · 9n"]
    Ignore["Ignore · 1n"]
    Fix["AI Fix · 12n"]
  end

  FixApplier["GEH Fix Applier · 31 nodes"]
  Digest["DLQ Weekly Digest · 8 nodes"]

  WFs --> Extract --> Config --> DLQ --> Classify --> Hub
  Hub --> CB
  Fix --> FixApplier
  DLQ --> Digest --> Hub
```

---

## 2. Why? — Stakes and motivations

Before the GEH, N8N errors were silent. A workflow failed, N8N noted it in its internal logs, and nobody knew before noticing a malfunction. No classification, no notification, no visibility.

### Problems solved

| Problem | Without GEH | With GEH |
|---------|-------------|----------|
| **Silent errors** | Discovered by chance in logs | Immediate Telegram notification |
| **No context** | "Workflow failed" without details | Classification, failing node, stack trace |
| **Manual retry** | Open N8N, find the execution, restart | [Retry] button in Telegram |
| **No history** | Errors lost after N8N purge | Persistent Dead Letter Queue |
| **Repetitive errors** | Same alert in a loop | Deduplication + per-workflow config |

### Two-level retry strategy

The GEH does not handle retries blindly. It relies on the native N8N retry at the node level:

| Level | Mechanism | When |
|-------|-----------|------|
| **Node** | Native N8N Retry on Fail | Transient errors (timeout, 503, rate limit) |
| **Workflow** | Retry button via GEH | When all node retries are exhausted |

> **Tip - Native Retry on Fail**
>
> Every HTTP, Telegram or AI node is configured with 2-3 automatic retries and a 5-10 second delay. If it succeeds, the workflow continues normally. The GEH only intervenes when those retries are exhausted — the error is therefore truly persistent.

---

## 3. How? — Technical implementation

### An error's journey

When a workflow fails, here is what happens, step by step:

**1. Capture** — The Error Trigger intercepts the failure (including activation errors).

**2. Extraction** — A Code node normalises the context: source workflow, failing node, error message, stack trace. Sensitive data (tokens, passwords, connection strings) are automatically masked by a `redact()` function.

**3. Configuration** — The GEH consults the `error_handling_config` table to determine whether this workflow has specific rules (notifications disabled, temporary suppression, custom max retries).

**4. DLQ insert** — The error is stored in the `error_dead_letter_queue` table with a unique identifier (`err_<16hex>`).

**5. Classification** — Keyword rules analyse the error message to determine type and severity:

| Detected type | Keywords | Severity |
|---------------|----------|----------|
| `timeout` | timeout, ETIMEDOUT, deadline | warning |
| `network` | ECONNREFUSED, ENOTFOUND, socket | warning |
| `authentication` | 401, 403, unauthorized | critical |
| `rate_limit` | 429, rate limit, quota | warning |
| `data_validation` | invalid, schema, parse error | info |
| `resource` | out of memory, disk full | critical |
| `configuration` | missing credential, not found | critical |

**6. Notification** — If notifications are enabled for that workflow, the GEH calls the [Notification Hub](/en/workflows/notification-hub/) with a formatted message and action buttons.

> **Caution - Loop prevention**
>
> The GEH itself has no Error Workflow. If the GEH crashes, it does not trigger another GEH. The Notification Hub, on its side, has a minimal fallback (direct Telegram) to avoid error cascades.

### The 4 Telegram actions

When the notification arrives on Telegram, it offers four buttons:

**[Retry]** — Restarts the failed execution via the N8N API. Before restarting, the system checks that the error is eligible for retry (`can_auto_retry`). The retry counter is incremented in the DLQ.

**[Details]** — Requests an AI analysis from Claude. The first request generates the analysis (error type, probable cause, fix suggestion); subsequent ones use the cache stored in the DLQ. Useful to understand a complex error without opening N8N.

**[Ignore]** — Marks the error as resolved in the DLQ. Useful for false positives or ephemeral errors already fixed.

**[Fix]** — Advanced feature: Claude analyses the failing workflow and proposes an automatic fix. The fix is stored as a proposal, then a second workflow (GEH Fix Applier) handles application with backup, confirmation and rollback.

### Per-workflow configuration

Each workflow can have its own rules in the `error_handling_config` table:

| Parameter | Default | Usage |
|-----------|---------|-------|
| `error_handling_enabled` | true | Disable for a workflow under maintenance |
| `max_retries` | 3 | Override the retry count |
| `notify_on_error` | true | Mute notifications without disabling the DLQ |
| `auto_retry_enabled` | false | Automatic retry without intervention |
| `suppress_until` | null | Temporary suppression (ISO timestamp) |

> **Danger - Maintenance mode**
>
> When N8N self-updates (Docker self-update), a maintenance flag is inserted into `error_handling_config` to suppress notifications during the restart. Without this, every worker that stops would generate a false alarm.

### DLQ Weekly Digest

Every Sunday at 9 AM, the DLQ Weekly Digest workflow generates a summary of the week's errors and sends it through the Notification Hub. This digest helps spot patterns: a workflow failing regularly, a recurring error type, retries that never solve the problem.

The digest includes:
- Total error count by severity
- Top failing workflows
- Unresolved errors (retry_status = pending/exhausted)
- Trends compared to the previous week

---

## 4. What if? — Outlook and limits

### Current limits

| Limit | Impact | Mitigation |
|-------|--------|------------|
| **Rule-based classification** | Unknown types classified as "unknown" | On-demand AI analysis via [Details] |
| **No auto-retry** | Each retry requires a click | `auto_retry_enabled` flag prepared but not deployed |
| **Experimental AI Fix** | Proposed fixes are not always applicable | Double confirmation before application + automatic rollback |
| **No correlation** | Related errors not grouped | Identifiable via the weekly digest |

### Evolution scenarios

**If error volume grows**:
- Enable auto-retry for transient errors (timeout, rate limit)
- Group similar errors in the digest
- Add an alert threshold: "5 errors from the same workflow in 1h = escalate"

**If cross-workflow correlation is needed**:
- Trace execution chains (workflow A calls B which calls C)
- If C fails, show the full chain context
- Allow retry from the parent workflow

**If the team grows**:
- Assign errors by domain (Docker → ops, Odoo → business)
- Escalate if not handled after a configurable delay
- Grafana dashboard with DLQ metrics

---

## Related pages

### Infrastructure
- [N8N in queue mode](/en/infrastructure/n8n-queue-mode/) — Backend running the workflows
- [Monitoring Stack](/en/infrastructure/monitoring-stack/) — Prometheus and Grafana

### Workflows
- [Notification Hub](/en/workflows/notification-hub/) — Error notification routing
- [Telegram Orchestrator](/en/workflows/telegram-orchestrator/) — Receiver for Retry/Details/Ignore callbacks
- [Docker Auto-Updates](/en/workflows/docker-updates/) — Maintenance mode during updates

### Reference
- [Glossary](/en/reference/glossary/) — Dead Letter Queue, Error Trigger

## Metadonnees agent

- Cet article est issu du blog GuiGPaP Lab.
- Contexte global du blog: https://blog.guigpap.com/llms.txt
- Contact auteur: https://odoo.guigpap.com/mon-cv
- Licence: CC-BY-SA 4.0