Skip to content

Global Health Check

The Global Health Check workflow monitors the health of Docker containers every 5 minutes. It detects unhealthy or unexpectedly stopped containers and notifies admins via Telegram.

MethodToolStatus
Filtered docker psSSH + CLICurrent (production)
Prometheus cAdvisorcontainer_health_statusNot working (Docker 29+)
HTTP endpointsCurl to /healthzSpecified, not implemented
ServiceStackCritical
CaddysecurityYes
CrowdSecsecurityYes
N8Nn8nYes
N8N-Postgresn8nYes
Redisn8nYes
N8N Workersn8nNo
OdooodooYes
Odoo-PostgresodooYes
QdrantaiNo
Claude-OllamaaiNo
PrometheusmonitoringNo
GrafanamonitoringNo

ProblemWithout health checkWith health check
Container crashDiscovered by a userAlert within 5 minutes
Unhealthy serviceNo visibilityAutomatic detection
Extended downtimeNo notificationQuick intervention
MethodAdvantageDrawback
SSH + docker psAlways worksNo Prometheus history
cAdvisor metricsHistory, graphsDocker 29+ bug

No

Yes

Schedule · 5 min

SSH Docker Health · docker ps --filter

Parse Results

count > 0 ?

Skip

Prepare Notification

Notification Hub

Telegram

A Docker Health Check with Retry sub-workflow (jDN2QV3nEMGacrCvgEBBV) wraps this check with a retry policy (2 attempts spaced 30s apart) before notifying, which filters out false positives on containers in the middle of a restart.

Fenêtre de terminal
docker ps -a \
--filter "health=unhealthy" \
--filter "status=exited" \
--format json

This command returns:

  • Containers with a failed healthcheck (health=unhealthy)
  • Containers stopped unexpectedly (status=exited)

SSH Docker Health Node:

Type: SSH
Host: localhost
Command: docker ps -a --filter "health=unhealthy" --filter "status=exited" --format json

Parse Results (Code):

const output = $json.stdout;
if (!output || output.trim() === '') {
return [{ json: { count: 0, containers: [] } }];
}
const containers = output.trim().split('\n')
.filter(line => line)
.map(line => JSON.parse(line))
.map(c => ({
name: c.Names,
status: c.State,
health: c.Status
}));
return [{
json: {
count: containers.length,
containers: containers
}
}];
{
"source": "health_check",
"type": "health_issue",
"severity": "critical",
"title": "2 container(s) in trouble",
"message": "Detected containers:\n- n8n-worker-1 (unhealthy)\n- redis (exited)",
"container": "n8n-worker-1",
"containers": ["n8n-worker-1", "redis"],
"timestamp": "2026-01-20T10:00:00.000Z"
}
🚨 HEALTH CHECK ALERT
2 container(s) in trouble
Detected containers:
❌ n8n-worker-1 (unhealthy)
❌ redis (exited)
Affected stack: n8n-stack
[🔄 Restart] [📋 Logs] [🔇 Mute 1h]
Fenêtre de terminal
# See unhealthy containers
docker ps -a --filter "health=unhealthy"
# See stopped containers
docker ps -a --filter "status=exited"
# Health of a specific container
docker inspect --format='{{.State.Health.Status}}' n8n
# Docker health-check logs
docker inspect --format='{{range .State.Health.Log}}{{.Output}}{{end}}' n8n

LimitImpactMitigation
No HTTP checksDetects Docker state, not application statePlanned evolution
No daily reportNo historical viewGrafana dashboard
5 min intervalMax detection latency 5 minAcceptable for personal usage

If HTTP checks are needed:

  • Add curl checks to each service’s /healthz
  • Differentiate Docker healthy vs HTTP responding
  • More precise alerts

If a daily report is needed:

  • Aggregate incidents over 24h
  • Compute uptime per service
  • Send digest at 8 AM

If cAdvisor becomes functional again:

  • Migrate to Prometheus metrics
  • Drop the SSH check
  • Native history and graphs
ProblemCheck
False positivesContainers without healthcheck return “none”
SSH timeoutValid SSH credential? Port 22 reachable?
No notificationWorkflow active? Notification Hub OK?
Too many alertsAdjust filters in docker ps