---
title: Health Check Global
url: https://blog.guigpap.com/fr/workflows/health-check/
url_md: https://blog.guigpap.com/fr/workflows/health-check.md
category: automation
date: '2026-01-31'
maturite: production
techno:
  - docker
  - n8n
  - telegram
  - prometheus
application:
  - monitoring
  - operations
---

# Health Check Global

> Monitoring proactif de l'infrastructure Docker avec détection des conteneurs en échec

## 1. Quoi ? — Définition et contexte

Le workflow **Health Check Global** surveille la santé des conteneurs Docker toutes les 5 minutes. Il détecte les conteneurs en état unhealthy ou arrêtés inopinément et notifie les admins via Telegram.

> **Note - Health check**
>
> Un **health check** est une vérification périodique de l'état d'un service. Docker peut marquer un conteneur comme "unhealthy" si ses propres vérifications échouent (ex: un serveur web qui ne répond plus).

### Méthode de détection

| Méthode | Outil | Status |
|---------|-------|--------|
| **Docker ps filtré** | SSH + CLI | Actuel (production) |
| Prometheus cAdvisor | `container_health_status` | Non fonctionnel (Docker 29+) |
| HTTP endpoints | Curl vers /healthz | Spécifié, non implémenté |

### Services surveillés

| Service | Stack | Critical |
|---------|-------|----------|
| Caddy | security | Oui |
| CrowdSec | security | Oui |
| N8N | n8n | Oui |
| N8N-Postgres | n8n | Oui |
| Redis | n8n | Oui |
| N8N Workers | n8n | Non |
| Odoo | odoo | Oui |
| Odoo-Postgres | odoo | Oui |
| Qdrant | ai | Non |
| Claude-Ollama | ai | Non |
| Prometheus | monitoring | Non |
| Grafana | monitoring | Non |

---

## 2. Pourquoi ? — Enjeux et motivations

### Problèmes résolus

| Problème | Sans health check | Avec health check |
|----------|-------------------|-------------------|
| **Container crash** | Découvert par un utilisateur | Alerte en 5 minutes |
| **Service unhealthy** | Pas de visibilité | Détection automatique |
| **Downtime prolongé** | Pas de notification | Intervention rapide |

### Pourquoi SSH au lieu de Prometheus ?

> **Caution - Bug cAdvisor**
>
> La métrique `container_health_status` de cAdvisor ne fonctionne pas sur Docker 29+ avec overlayfs. Les métriques retournent toujours 0.

| Méthode | Avantage | Inconvénient |
|---------|----------|--------------|
| **SSH + docker ps** | Fonctionne toujours | Pas d'historique Prometheus |
| cAdvisor metrics | Historique, graphes | Bug Docker 29+ |

---

## 3. Comment ? — Mise en œuvre technique

### Architecture actuelle

```mermaid
flowchart TD
  Sched["Schedule · 5 min"]
  SSH["SSH Docker Health · docker ps --filter"]
  Parse["Parse Results"]
  HasIssues{"count > 0 ?"}
  Skip["Skip"]
  Prep["Prepare Notification"]
  Hub["Notification Hub"]
  TG["Telegram"]

  Sched --> SSH --> Parse --> HasIssues
  HasIssues -->|No| Skip
  HasIssues -->|Yes| Prep --> Hub --> TG
```

Un sub-workflow `Docker Health Check with Retry` (`jDN2QV3nEMGacrCvgEBBV`) wrap ce check avec une politique de retry (2 tentatives espacées de 30s) avant de notifier, ce qui filtre les faux positifs sur les containers en cours de redémarrage.

### Commande de détection

```bash
docker ps -a \
  --filter "health=unhealthy" \
  --filter "status=exited" \
  --format json
```

Cette commande retourne :
- Les conteneurs avec healthcheck échoué (`health=unhealthy`)
- Les conteneurs arrêtés inopinément (`status=exited`)

### Configuration N8N

**Node SSH Docker Health :**

```yaml
Type: SSH
Host: localhost
Command: docker ps -a --filter "health=unhealthy" --filter "status=exited" --format json
```

**Parse Results (Code) :**

```javascript
const output = $json.stdout;
if (!output || output.trim() === '') {
  return [{ json: { count: 0, containers: [] } }];
}

const containers = output.trim().split('\n')
  .filter(line => line)
  .map(line => JSON.parse(line))
  .map(c => ({
    name: c.Names,
    status: c.State,
    health: c.Status
  }));

return [{
  json: {
    count: containers.length,
    containers: containers
  }
}];
```

### Format de notification

```json
{
  "source": "health_check",
  "type": "health_issue",
  "severity": "critical",
  "title": "2 container(s) en problème",
  "message": "Containers détectés:\n- n8n-worker-1 (unhealthy)\n- redis (exited)",
  "container": "n8n-worker-1",
  "containers": ["n8n-worker-1", "redis"],
  "timestamp": "2026-01-20T10:00:00.000Z"
}
```

### Exemple de notification Telegram

```
🚨 HEALTH CHECK ALERT

2 container(s) en problème

Containers détectés:
❌ n8n-worker-1 (unhealthy)
❌ redis (exited)

Stack affecté: n8n-stack

[🔄 Restart] [📋 Logs] [🔇 Mute 1h]
```

### Commandes utiles

```bash
# Voir les conteneurs unhealthy
docker ps -a --filter "health=unhealthy"

# Voir les conteneurs arrêtés
docker ps -a --filter "status=exited"

# Health d'un conteneur spécifique
docker inspect --format='{{.State.Health.Status}}' n8n

# Logs des health checks Docker
docker inspect --format='{{range .State.Health.Log}}{{.Output}}{{end}}' n8n
```

---

## 4. Et si ? — Perspectives et limites

### Limites actuelles

| Limite | Impact | Mitigation |
|--------|--------|------------|
| **Pas d'HTTP checks** | Détecte état Docker, pas applicatif | Prévu en évolution |
| **Pas de rapport quotidien** | Pas de vue historique | Dashboard Grafana |
| **5min intervalle** | Latence détection max 5min | Acceptable pour usage perso |

### Scénarios d'évolution

**Si besoin de HTTP checks** :
- Ajouter des checks curl vers /healthz de chaque service
- Différencier Docker healthy vs HTTP répondant
- Alertes plus précises

**Si besoin de rapport quotidien** :
- Agréger les incidents sur 24h
- Calculer uptime par service
- Envoyer digest à 8h

**Si cAdvisor redevient fonctionnel** :
- Migrer vers métriques Prometheus
- Supprimer le check SSH
- Historique et graphes natifs

### Troubleshooting

| Problème | Vérification |
|----------|--------------|
| Faux positifs | Conteneurs sans healthcheck retournent "none" |
| SSH timeout | Credential SSH valide ? Port 22 accessible ? |
| Pas de notification | Workflow actif ? Notification Hub ok ? |
| Trop d'alertes | Ajuster les filtres dans docker ps |

---

## Pages liées

### Workflows
- [Notification Hub](/fr/workflows/notification-hub/) — Routage notifications
- [Docker Auto-Updates](/fr/workflows/docker-updates/) — Mises à jour images

### Infrastructure
- [Monitoring Stack](/fr/infrastructure/monitoring-stack/) — Prometheus & Grafana
- [Security Stack](/fr/infrastructure/security-stack/) — Caddy & CrowdSec

## Metadonnees agent

- Cet article est issu du blog GuiGPaP Lab.
- Contexte global du blog: https://blog.guigpap.com/llms.txt
- Contact auteur: https://odoo.guigpap.com/mon-cv
- Licence: CC-BY-SA 4.0