Global Health Check
1. What? — Definition and context
Section titled “1. What? — Definition and context”The Global Health Check workflow monitors the health of Docker containers every 5 minutes. It detects unhealthy or unexpectedly stopped containers and notifies admins via Telegram.
Detection method
Section titled “Detection method”| Method | Tool | Status |
|---|---|---|
| Filtered docker ps | SSH + CLI | Current (production) |
| Prometheus cAdvisor | container_health_status | Not working (Docker 29+) |
| HTTP endpoints | Curl to /healthz | Specified, not implemented |
Monitored services
Section titled “Monitored services”| Service | Stack | Critical |
|---|---|---|
| Caddy | security | Yes |
| CrowdSec | security | Yes |
| N8N | n8n | Yes |
| N8N-Postgres | n8n | Yes |
| Redis | n8n | Yes |
| N8N Workers | n8n | No |
| Odoo | odoo | Yes |
| Odoo-Postgres | odoo | Yes |
| Qdrant | ai | No |
| Claude-Ollama | ai | No |
| Prometheus | monitoring | No |
| Grafana | monitoring | No |
2. Why? — Stakes and motivations
Section titled “2. Why? — Stakes and motivations”Problems solved
Section titled “Problems solved”| Problem | Without health check | With health check |
|---|---|---|
| Container crash | Discovered by a user | Alert within 5 minutes |
| Unhealthy service | No visibility | Automatic detection |
| Extended downtime | No notification | Quick intervention |
Why SSH instead of Prometheus?
Section titled “Why SSH instead of Prometheus?”| Method | Advantage | Drawback |
|---|---|---|
| SSH + docker ps | Always works | No Prometheus history |
| cAdvisor metrics | History, graphs | Docker 29+ bug |
3. How? — Technical implementation
Section titled “3. How? — Technical implementation”Current architecture
Section titled “Current architecture”A Docker Health Check with Retry sub-workflow (jDN2QV3nEMGacrCvgEBBV) wraps this check with a retry policy (2 attempts spaced 30s apart) before notifying, which filters out false positives on containers in the middle of a restart.
Detection command
Section titled “Detection command”docker ps -a \ --filter "health=unhealthy" \ --filter "status=exited" \ --format jsonThis command returns:
- Containers with a failed healthcheck (
health=unhealthy) - Containers stopped unexpectedly (
status=exited)
N8N configuration
Section titled “N8N configuration”SSH Docker Health Node:
Type: SSHHost: localhostCommand: docker ps -a --filter "health=unhealthy" --filter "status=exited" --format jsonParse Results (Code):
const output = $json.stdout;if (!output || output.trim() === '') { return [{ json: { count: 0, containers: [] } }];}
const containers = output.trim().split('\n') .filter(line => line) .map(line => JSON.parse(line)) .map(c => ({ name: c.Names, status: c.State, health: c.Status }));
return [{ json: { count: containers.length, containers: containers }}];Notification format
Section titled “Notification format”{ "source": "health_check", "type": "health_issue", "severity": "critical", "title": "2 container(s) in trouble", "message": "Detected containers:\n- n8n-worker-1 (unhealthy)\n- redis (exited)", "container": "n8n-worker-1", "containers": ["n8n-worker-1", "redis"], "timestamp": "2026-01-20T10:00:00.000Z"}Sample Telegram notification
Section titled “Sample Telegram notification”🚨 HEALTH CHECK ALERT
2 container(s) in trouble
Detected containers:❌ n8n-worker-1 (unhealthy)❌ redis (exited)
Affected stack: n8n-stack
[🔄 Restart] [📋 Logs] [🔇 Mute 1h]Useful commands
Section titled “Useful commands”# See unhealthy containersdocker ps -a --filter "health=unhealthy"
# See stopped containersdocker ps -a --filter "status=exited"
# Health of a specific containerdocker inspect --format='{{.State.Health.Status}}' n8n
# Docker health-check logsdocker inspect --format='{{range .State.Health.Log}}{{.Output}}{{end}}' n8n4. What if? — Outlook and limits
Section titled “4. What if? — Outlook and limits”Current limits
Section titled “Current limits”| Limit | Impact | Mitigation |
|---|---|---|
| No HTTP checks | Detects Docker state, not application state | Planned evolution |
| No daily report | No historical view | Grafana dashboard |
| 5 min interval | Max detection latency 5 min | Acceptable for personal usage |
Evolution scenarios
Section titled “Evolution scenarios”If HTTP checks are needed:
- Add curl checks to each service’s /healthz
- Differentiate Docker healthy vs HTTP responding
- More precise alerts
If a daily report is needed:
- Aggregate incidents over 24h
- Compute uptime per service
- Send digest at 8 AM
If cAdvisor becomes functional again:
- Migrate to Prometheus metrics
- Drop the SSH check
- Native history and graphs
Troubleshooting
Section titled “Troubleshooting”| Problem | Check |
|---|---|
| False positives | Containers without healthcheck return “none” |
| SSH timeout | Valid SSH credential? Port 22 reachable? |
| No notification | Workflow active? Notification Hub OK? |
| Too many alerts | Adjust filters in docker ps |
Related pages
Section titled “Related pages”Workflows
Section titled “Workflows”- Notification Hub — Notification routing
- Docker Auto-Updates — Image updates
Infrastructure
Section titled “Infrastructure”- Monitoring Stack — Prometheus & Grafana
- Security Stack — Caddy & CrowdSec