Docker Auto-Updates

1. What? — Definition and context

The Docker Auto-Updates workflow automates Docker image updates on the VPS. DIUN detects new versions, the N8N hub classifies each image (critical / base / app), and a sub-workflow runs the full cycle: backup, pull/build, health check, rollback if needed, notification.

Refactored architecture (#275)

Workflow	ID	Nodes	Role
Docker DIUN (parent)	`WdSepRkceMzI0QQ5`	36	Triggers, per-category routing, classification
DIUN Update Executor (SW-1)	`VSFJv2DbdkvQ2cl1`	25	Shared lifecycle: backup, update, health check, rollback, notify
DIUN Queue Processor (SW-2)	`Eb3QIk8DnEQIXeSl`	6	03h-05h queue processing loop
DIUN Approval Handler (SW-3)	`JLsR7JSMAQDwzqEX`	20	Classification, approval, rejection, deferred Telegram report

Before #275, this workflow had 101 monolithic nodes. Splitting into hub + 3 sub-workflows lets each cycle be tested in isolation and lets the DIUN Update Executor be reused from other triggers (manual rebuild, file provider).

Architecture diagram

Image categories

Category	Behaviour	Examples
critical	Immediate update + health check	caddy, crowdsec, security-stack
base	Queued for the 03h-05h window	postgres, redis, prometheus
app	Admin approval required via Telegram	n8n, odoo, grafana, ai-stack

2. Why? — Stakes and motivations

Problems solved

Problem	Without this workflow	With this workflow
Outdated images	Unpatched vulnerabilities	Policy-driven automatic updates
User downtime	Updates during business hours	Nightly maintenance window
Risky updates	No prior validation	Approval for critical apps
Manual rollback	Late intervention	Health check + automatic rollback
No version visibility	”Something changed”	Before/after version tracking in the notification

Why three categories?

Category	Justification
critical	Security priority, immediate update (Caddy = entry point)
base	Stable infrastructure, nightly update to minimise impact
app	Business-critical, human validation required

Why a shared Update Executor sub-workflow?

The update cycle (backup → pull/build → up → health check → rollback / notify) is identical regardless of trigger. Extracting it into SW-1 enables:

Benefit	Detail
Reuse	Same guarantees for DIUN, file provider, manual rebuild
Isolated tests	Mockable independently from the trigger
Maintenance	Single place to change the health-check strategy

3. How? — Technical implementation

Data Tables

`image_policies` — Per-image policy

Column	Type	Description
`image_key`	Text	Unique key: `project/service`
`category`	Text	critical / base / app
`backup`	Boolean	Backup required before update
`custom_build`	Boolean	Custom image (build vs pull)
`github_repo`	Text	owner/repo for changelog
`compose_dir`	Text	Absolute path of the docker-compose

`pending_updates` — 03h-05h queue

Column	Type	Description
`image_key`	Text	Unique key
`image`	Text	Full image name
`project`	Text	Project name
`service`	Text	Service name
`custom_build`	Boolean	Build instead of pull
`created_at`	Text	ISO timestamp

`pending_updates_approvals` — Pending approvals

App approvals waiting for a Telegram answer. TTL 7d; beyond that, the approval is automatically re-requested at the next scan.

Generated Docker commands

# Standard image
docker compose -f /path/to/stack/docker-compose.yaml pull
docker compose -f /path/to/stack/docker-compose.yaml up -d

# Custom image (custom_build = true)
docker compose -f /path/to/stack/docker-compose.yaml pull --ignore-buildable
docker compose -f /path/to/stack/docker-compose.yaml build --no-cache
docker compose -f /path/to/stack/docker-compose.yaml up -d

Approval flow (app images)

Post-update health check

After every update, SW-1 verifies for 2 minutes that containers are healthy:

docker compose -f /path/to/stack/docker-compose.yaml ps --format json

If a container stays unhealthy after 120 s, a rollback is triggered: docker compose pull <previous_digest> then up -d. A critical notification is sent to the Hub with the diagnostic details.

Self-update Docker (n8n-stack)

Updating N8N itself is tricky: the update workflow runs in the container it updates. The self-restart pattern in use:

Step	Action
1	Insert maintenance flag in `error_handling_config` (suppress notifications)
2	Trigger an external script via `nohup` that waits 10s then `docker compose up -d --force-recreate`
3	The N8N workflow stops (the container restarts)
4	On restart, a cleanup workflow removes the maintenance flag and notifies success

Version tracking

Every update captures before/after versions for traceability. The format uses SSH markers to extract versions from the Dockerfile or image:tag:

n8n-custom:latest 2.4.8 → 2.5.0
caddy-crowdsec:latest 2.8.4 → 2.8.5

Telegram notifications display the version transition, so the nature of the change (patch, minor, major) is visible at a glance.

File Provider Auto-Rebuild

When a base image changes (node:20-slim, caddy:builder…), custom images that depend on it must be rebuilt. DIUN watches those images via its file provider (reading base-images.yml), and the workflow detects the dependency.

Base image	Custom service	Category
`caddy:builder`, `caddy:latest`	security-stack/caddy	critical (immediate rebuild)
`n8nio/n8n:latest`	n8n-stack/n8n	app (approval required)
`node:20-slim`, `ghcr.io/astral-sh/uv`	ai-stack/cli-ollama	app (approval required)

Incident Response (AI-Assisted)

When Prometheus Alertmanager signals a problem (container down, CPU spike, disk full), an Incident Response workflow triggers a Claude diagnostic:

Severity	Behaviour
TRIVIAL	Auto-remediation (restart) + health check + notification
MODERATE	Claude proposes + applies the fix, monitoring for 5 min
COMPLEX	Claude produces a plan, waits for human approval (30-min timeout)

For COMPLEX cases, Claude produces a detailed plan sent over Telegram with [Execute] [Edit] [Ignore] buttons. Same Plan Engine pattern as the Conversational system.

Useful commands

# Force a DIUN check
docker exec diun diun --test

# DIUN logs
docker logs diun --tail 50

# View configured policies
# N8N → Data → Tables → image_policies (26 current entries)

# View pending updates
# N8N → Data → Tables → pending_updates

4. What if? — Outlook and limits

Current limits

Limit	Impact	Mitigation
No automated tests	Risk of undetected regressions	Basic health check only
Manual rollback possible	If rollback fails, intervention required	Immediate notification with context
DIUN dependency	No detection if DIUN is down	Container monitoring via Health Check
No canary	Direct update, no progressive rollout	Acceptable on a single-VPS setup

Evolution scenarios

If an update causes a regression:

Health check detects the problem
Automatic rollback to the previous image
Notification with details for investigation

If updates are too frequent:

Tune the DIUN polling (currently 6h)
Create a “stable” category with monthly updates
Filter by semantic versioning (major only)

If deeper testing is needed:

Post-update smoke tests via the monitoring-stack
Observation delay before success notification
Dedicated staging environment for prior tests (extra VPS cost)

If a tighter CVE coupling is wanted:

Reject an update that would introduce a new uncorrected CRITICAL
Use the Security CVE Watch feed as gating

Infrastructure

VPS Architecture — Big picture
Notify Stack — DIUN that triggers updates
Security Stack — Caddy protected against accidental restart

Workflows

Telegram Orchestrator — Receives approvals
Notification Hub — Success/failure routing
Security CVE Watch — CVE scan coupled to pre-update gating
Error Handler — DIUN error capture

Reference

Glossary — DIUN, Health Check, Self-restart, File Provider