Global Health Check

1. What? — Definition and context

The Global Health Check workflow monitors the health of Docker containers every 5 minutes. It detects unhealthy or unexpectedly stopped containers and notifies admins via Telegram.

Detection method

Method	Tool	Status
Filtered docker ps	SSH + CLI	Current (production)
Prometheus cAdvisor	`container_health_status`	Not working (Docker 29+)
HTTP endpoints	Curl to /healthz	Specified, not implemented

Monitored services

Service	Stack	Critical
Caddy	security	Yes
CrowdSec	security	Yes
N8N	n8n	Yes
N8N-Postgres	n8n	Yes
Redis	n8n	Yes
N8N Workers	n8n	No
Odoo	odoo	Yes
Odoo-Postgres	odoo	Yes
Qdrant	ai	No
Claude-Ollama	ai	No
Prometheus	monitoring	No
Grafana	monitoring	No

2. Why? — Stakes and motivations

Problems solved

Problem	Without health check	With health check
Container crash	Discovered by a user	Alert within 5 minutes
Unhealthy service	No visibility	Automatic detection
Extended downtime	No notification	Quick intervention

Why SSH instead of Prometheus?

Method	Advantage	Drawback
SSH + docker ps	Always works	No Prometheus history
cAdvisor metrics	History, graphs	Docker 29+ bug

3. How? — Technical implementation

Current architecture

A Docker Health Check with Retry sub-workflow (jDN2QV3nEMGacrCvgEBBV) wraps this check with a retry policy (2 attempts spaced 30s apart) before notifying, which filters out false positives on containers in the middle of a restart.

Detection command

docker ps -a \
  --filter "health=unhealthy" \
  --filter "status=exited" \
  --format json

This command returns:

Containers with a failed healthcheck (health=unhealthy)
Containers stopped unexpectedly (status=exited)

N8N configuration

SSH Docker Health Node:

Type: SSH
Host: localhost
Command: docker ps -a --filter "health=unhealthy" --filter "status=exited" --format json

Parse Results (Code):

const output = $json.stdout;
if (!output || output.trim() === '') {
  return [{ json: { count: 0, containers: [] } }];
}

const containers = output.trim().split('\n')
  .filter(line => line)
  .map(line => JSON.parse(line))
  .map(c => ({
    name: c.Names,
    status: c.State,
    health: c.Status
  }));

return [{
  json: {
    count: containers.length,
    containers: containers
  }
}];

Notification format

{
  "source": "health_check",
  "type": "health_issue",
  "severity": "critical",
  "title": "2 container(s) in trouble",
  "message": "Detected containers:\n- n8n-worker-1 (unhealthy)\n- redis (exited)",
  "container": "n8n-worker-1",
  "containers": ["n8n-worker-1", "redis"],
  "timestamp": "2026-01-20T10:00:00.000Z"
}

Sample Telegram notification

🚨 HEALTH CHECK ALERT

2 container(s) in trouble

Detected containers:
❌ n8n-worker-1 (unhealthy)
❌ redis (exited)

Affected stack: n8n-stack

[🔄 Restart] [📋 Logs] [🔇 Mute 1h]

Useful commands

# See unhealthy containers
docker ps -a --filter "health=unhealthy"

# See stopped containers
docker ps -a --filter "status=exited"

# Health of a specific container
docker inspect --format='{{.State.Health.Status}}' n8n

# Docker health-check logs
docker inspect --format='{{range .State.Health.Log}}{{.Output}}{{end}}' n8n

4. What if? — Outlook and limits

Current limits

Limit	Impact	Mitigation
No HTTP checks	Detects Docker state, not application state	Planned evolution
No daily report	No historical view	Grafana dashboard
5 min interval	Max detection latency 5 min	Acceptable for personal usage

Evolution scenarios

If HTTP checks are needed:

Add curl checks to each service’s /healthz
Differentiate Docker healthy vs HTTP responding
More precise alerts

If a daily report is needed:

Aggregate incidents over 24h
Compute uptime per service
Send digest at 8 AM

If cAdvisor becomes functional again:

Migrate to Prometheus metrics
Drop the SSH check
Native history and graphs

Troubleshooting

Problem	Check
False positives	Containers without healthcheck return “none”
SSH timeout	Valid SSH credential? Port 22 reachable?
No notification	Workflow active? Notification Hub OK?
Too many alerts	Adjust filters in docker ps

Workflows

Notification Hub — Notification routing
Docker Auto-Updates — Image updates

Infrastructure

Monitoring Stack — Prometheus & Grafana
Security Stack — Caddy & CrowdSec