Skip to content

Monitoring Stack: Prometheus + Grafana

The Monitoring Stack delivers observability for the whole infrastructure. It collects system and application metrics, visualises them through dashboards, and triggers alerts when anomalies appear.

ServicePortMemory limitRole
Prometheus90902 GBMetric collection and storage (pull mode)
Grafana30001 GBVisualisation and dashboards
Alertmanager9093512 MBAlert routing and grouping
Node Exporter9100256 MBSystem metrics (CPU, RAM, disk)
Docker Exporter9487256 MBPer-container metrics (CPU, RAM, state)
OTEL Collector4317/4318 (in), 8889 (scrape)512 MBClaude Code telemetry ingestion

Grafana · :3000

Linux System

Docker Containers

Claude Code

Alertmanager · :9093

Group alerts

Route → N8N

Prometheus · :9090

Scrape every 15s

Evaluate alerting rules

Store 15 days / 5 GB max

Scrape targets

Node Exporter · host:9100

Docker Exporter · host:9487

OTEL Collector · host:8889

Docker Engine · host:9323

N8N webhook · /webhook/prometheus/alert

Notification Hub → Telegram


ProblemWithout monitoringWith monitoring
Container crashReported by a userImmediate alert
Disk fullService unreachableCaught before saturation
Memory leakRandom OOM killsTrend visible, preventive action
Claude costsEnd-of-month surpriseReal-time tracking
AlertTriggered byObserved value
ContainerDownService crashQuick detection, manual or auto restart
Claude Code telemetryClaude sessionsTrack time spent and tokens used
DiskSpaceLowDisk space < 15%Prevention before incidents
HighMemoryUsageRAM > 85%Not yet triggered (sufficient headroom)

prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'docker-exporter'
static_configs:
- targets: ['docker-exporter:9487']
- job_name: 'docker-engine'
static_configs:
- targets: ['host.docker.internal:9323']
- job_name: 'otel-collector'
static_configs:
- targets: ['otel-collector:8889']
# In docker-compose.yaml, Prometheus command
command:
- '--storage.tsdb.retention.time=15d'
- '--storage.tsdb.retention.size=5GB'
prometheus/alerts.yml
groups:
- name: infrastructure
rules:
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High memory on {{ $labels.instance }}"
description: "Memory usage: {{ $value | printf \"%.1f\" }}%"
- alert: HighCPUUsage
expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
- alert: DiskSpaceLow
expr: (1 - (node_filesystem_avail_bytes{fstype!="tmpfs"} / node_filesystem_size_bytes)) * 100 > 85
for: 5m
labels:
severity: critical
- alert: ContainerDown
expr: absent(container_last_seen{name!=""})
for: 1m
labels:
severity: critical
- alert: ServiceDown
expr: up == 0
for: 1m
labels:
severity: critical
alertmanager/alertmanager.yml
global:
resolve_timeout: 5m
route:
receiver: 'n8n'
group_by: ['alertname', 'severity']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receivers:
- name: 'n8n'
webhook_configs:
- url: 'http://n8n:5678/webhook/prometheus/alert'
send_resolved: true

The Notification Hub inspects severity to route alerts: criticals → instant Telegram, warnings → grouped.

Claude Code configuration to export telemetry:

~/.claude/settings.json
{
"env": {
"CLAUDE_CODE_ENABLE_TELEMETRY": "1",
"OTEL_METRICS_EXPORTER": "otlp",
"OTEL_LOGS_EXPORTER": "otlp",
"OTEL_EXPORTER_OTLP_PROTOCOL": "http/protobuf",
"OTEL_EXPORTER_OTLP_ENDPOINT": "http://localhost:4318",
"OTEL_SERVICE_NAME": "claude-code"
}
}
MetricDescription
claude_code_token_usage_tokens_totalTokens per model and type
claude_code_cost_usage_USD_totalCumulative cost in USD
claude_code_active_time_seconds_totalActive time
claude_code_lines_of_code_count_totalLines changed
# CPU usage percentage
100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Memory usage percentage
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
# Disk usage percentage
(1 - (node_filesystem_avail_bytes{fstype!="tmpfs"} / node_filesystem_size_bytes)) * 100
# Container memory usage (top 5)
topk(5, container_memory_usage_bytes{name!=""})
# Claude Code tokens total by model
sum(claude_code_token_usage_tokens_total) by (model)
# Claude Code cost USD
sum(claude_code_cost_usage_USD_total)
DashboardMetrics
Linux SystemCPU, RAM, disk, network, load average
Docker ContainersCPU/RAM per container, I/O, restarts
Claude CodeTokens, costs, active time, lines of code

Claude Code telemetry → Odoo integration

Section titled “Claude Code telemetry → Odoo integration”

The full pipeline goes beyond raw observability: a SessionEnd hook on the dev machine sends session metadata to N8N, which queries Prometheus for the metrics (tokens, cost, active time) and updates the matching Odoo task via XML-RPC.

~/.claude (SessionEnd hook)
▼ POST /webhook/telemetry/session-end
N8N Telemetry workflow
│ Query Prometheus for the session
▼ XML-RPC to Odoo
project.task (x_claude_time_total, x_claude_cost_total, …)

See Claude Code Telemetry for the workflow side.

LimitImpactMitigation
15-day retentionNo long-term historyExport to S3/Thanos if needed
No tracingLimited workflow debuggingConsider Jaeger if needed
Single OTEL CollectorSPOF for telemetryAcceptable for personal use

If history > 15 days is required:

  • Deploy Thanos for long-term storage
  • Or export snapshots to S3

If N8N workflow tracing is needed:

  • Add Jaeger or Tempo
  • Instrument N8N with OTEL traces

If metric volume explodes:

  • Increase Prometheus retention
  • Consider Victoria Metrics (more efficient)
Fenêtre de terminal
# Inspect Prometheus targets
curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job: .labels.job, health: .health}'
# Test exporter connectivity
docker exec prometheus wget -qO- http://node-exporter:9100/metrics | head
# Check active alerts
curl http://localhost:9093/api/v1/alerts
# Test the N8N webhook
curl -X POST http://n8n:5678/webhook/prometheus/alert \
-H "Content-Type: application/json" \
-d '{"alerts":[{"labels":{"alertname":"test"}}]}'

  • Glossary — Prometheus, PromQL, OTEL, scrape