--- title: Global Health Check url: https://blog.guigpap.com/en/workflows/health-check/ url_md: https://blog.guigpap.com/en/workflows/health-check.md category: automation date: '2026-01-31' maturite: production techno: - docker - n8n - telegram - prometheus application: - monitoring - operations --- # Global Health Check > Proactive Docker infrastructure monitoring with detection of failed containers ## 1. What? — Definition and context The **Global Health Check** workflow monitors the health of Docker containers every 5 minutes. It detects unhealthy or unexpectedly stopped containers and notifies admins via Telegram. > **Note - Health check** > > A **health check** is a periodic verification of a service's state. Docker can mark a container as "unhealthy" if its own checks fail (e.g., a web server that no longer responds). ### Detection method | Method | Tool | Status | |--------|------|--------| | **Filtered docker ps** | SSH + CLI | Current (production) | | Prometheus cAdvisor | `container_health_status` | Not working (Docker 29+) | | HTTP endpoints | Curl to /healthz | Specified, not implemented | ### Monitored services | Service | Stack | Critical | |---------|-------|----------| | Caddy | security | Yes | | CrowdSec | security | Yes | | N8N | n8n | Yes | | N8N-Postgres | n8n | Yes | | Redis | n8n | Yes | | N8N Workers | n8n | No | | Odoo | odoo | Yes | | Odoo-Postgres | odoo | Yes | | Qdrant | ai | No | | Claude-Ollama | ai | No | | Prometheus | monitoring | No | | Grafana | monitoring | No | --- ## 2. Why? — Stakes and motivations ### Problems solved | Problem | Without health check | With health check | |---------|---------------------|-------------------| | **Container crash** | Discovered by a user | Alert within 5 minutes | | **Unhealthy service** | No visibility | Automatic detection | | **Extended downtime** | No notification | Quick intervention | ### Why SSH instead of Prometheus? > **Caution - cAdvisor bug** > > The `container_health_status` cAdvisor metric does not work on Docker 29+ with overlayfs. Metrics always return 0. | Method | Advantage | Drawback | |--------|-----------|----------| | **SSH + docker ps** | Always works | No Prometheus history | | cAdvisor metrics | History, graphs | Docker 29+ bug | --- ## 3. How? — Technical implementation ### Current architecture ```mermaid flowchart TD Sched["Schedule · 5 min"] SSH["SSH Docker Health · docker ps --filter"] Parse["Parse Results"] HasIssues{"count > 0 ?"} Skip["Skip"] Prep["Prepare Notification"] Hub["Notification Hub"] TG["Telegram"] Sched --> SSH --> Parse --> HasIssues HasIssues -->|No| Skip HasIssues -->|Yes| Prep --> Hub --> TG ``` A `Docker Health Check with Retry` sub-workflow (`jDN2QV3nEMGacrCvgEBBV`) wraps this check with a retry policy (2 attempts spaced 30s apart) before notifying, which filters out false positives on containers in the middle of a restart. ### Detection command ```bash docker ps -a \ --filter "health=unhealthy" \ --filter "status=exited" \ --format json ``` This command returns: - Containers with a failed healthcheck (`health=unhealthy`) - Containers stopped unexpectedly (`status=exited`) ### N8N configuration **SSH Docker Health Node:** ```yaml Type: SSH Host: localhost Command: docker ps -a --filter "health=unhealthy" --filter "status=exited" --format json ``` **Parse Results (Code):** ```javascript const output = $json.stdout; if (!output || output.trim() === '') { return [{ json: { count: 0, containers: [] } }]; } const containers = output.trim().split('\n') .filter(line => line) .map(line => JSON.parse(line)) .map(c => ({ name: c.Names, status: c.State, health: c.Status })); return [{ json: { count: containers.length, containers: containers } }]; ``` ### Notification format ```json { "source": "health_check", "type": "health_issue", "severity": "critical", "title": "2 container(s) in trouble", "message": "Detected containers:\n- n8n-worker-1 (unhealthy)\n- redis (exited)", "container": "n8n-worker-1", "containers": ["n8n-worker-1", "redis"], "timestamp": "2026-01-20T10:00:00.000Z" } ``` ### Sample Telegram notification ``` 🚨 HEALTH CHECK ALERT 2 container(s) in trouble Detected containers: ❌ n8n-worker-1 (unhealthy) ❌ redis (exited) Affected stack: n8n-stack [🔄 Restart] [📋 Logs] [🔇 Mute 1h] ``` ### Useful commands ```bash # See unhealthy containers docker ps -a --filter "health=unhealthy" # See stopped containers docker ps -a --filter "status=exited" # Health of a specific container docker inspect --format='{{.State.Health.Status}}' n8n # Docker health-check logs docker inspect --format='{{range .State.Health.Log}}{{.Output}}{{end}}' n8n ``` --- ## 4. What if? — Outlook and limits ### Current limits | Limit | Impact | Mitigation | |-------|--------|------------| | **No HTTP checks** | Detects Docker state, not application state | Planned evolution | | **No daily report** | No historical view | Grafana dashboard | | **5 min interval** | Max detection latency 5 min | Acceptable for personal usage | ### Evolution scenarios **If HTTP checks are needed**: - Add curl checks to each service's /healthz - Differentiate Docker healthy vs HTTP responding - More precise alerts **If a daily report is needed**: - Aggregate incidents over 24h - Compute uptime per service - Send digest at 8 AM **If cAdvisor becomes functional again**: - Migrate to Prometheus metrics - Drop the SSH check - Native history and graphs ### Troubleshooting | Problem | Check | |---------|-------| | False positives | Containers without healthcheck return "none" | | SSH timeout | Valid SSH credential? Port 22 reachable? | | No notification | Workflow active? Notification Hub OK? | | Too many alerts | Adjust filters in docker ps | --- ## Related pages ### Workflows - [Notification Hub](/en/workflows/notification-hub/) — Notification routing - [Docker Auto-Updates](/en/workflows/docker-updates/) — Image updates ### Infrastructure - [Monitoring Stack](/en/infrastructure/monitoring-stack/) — Prometheus & Grafana - [Security Stack](/en/infrastructure/security-stack/) — Caddy & CrowdSec ## Metadonnees agent - Cet article est issu du blog GuiGPaP Lab. - Contexte global du blog: https://blog.guigpap.com/llms.txt - Contact auteur: https://odoo.guigpap.com/mon-cv - Licence: CC-BY-SA 4.0