Conversational System
1. What? — Definition and context
Section titled “1. What? — Definition and context”Beyond simple commands (/docker status, /n8n workflows), some requests require a real conversation: “find the Dupont contact in Odoo, then show me his unpaid invoices, and create a follow-up project”. This requires context, memory between messages, and the ability to execute multi-step actions.
The Conversational System turns the Telegram bot into an AI assistant capable of holding multi-turn conversations, calling tools (Docker, Odoo, N8N, web search), executing multi-step plans, and dynamically selecting MCP tools — all with persistent memory through Redis.
The three layers of the system
Section titled “The three layers of the system”| Layer | Components | Role |
|---|---|---|
| Conversations | Agent (20n), Manager (20n), Command Handlers (35n), Callback Handler (94n), Summarizer (12n) | Conversation lifecycle, routing, memory |
| Tools | Tool Router (10n), Web Search (6n), Lead Intelligence (11n) | Action execution and enrichment |
| Plans & MCP | Plan Engine (55n), MCP Menu, MCP Confirmation (3n) | Multi-step actions and advanced tools |
Overall architecture
Section titled “Overall architecture”2. Why? — Stakes and motivations
Section titled “2. Why? — Stakes and motivations”Before the conversational system, every interaction with the bot was atomic. You sent a command, you received a reply, and the bot forgot everything. For complex tasks, you had to chain commands manually by copying the output of one step into the next.
Problems solved
Section titled “Problems solved”| Problem | Without conversations | With conversations |
|---|---|---|
| No memory | Bot forgets between messages | Persistent Redis session |
| Atomic actions | One command = one action | Automatic multi-step plans |
| No context | ”show his invoices” → whose? | The AI keeps the thread of the discussion |
| Limited tools | Fixed commands /docker, /n8n | Natural language + dynamic tool calls |
| No search | No web access | Gemini grounding + lead intelligence |
Architecture choices
Section titled “Architecture choices”Why a 2-pass LLM flow rather than a single call?
| Approach | Advantage | Drawback |
|---|---|---|
| 1 pass | Faster | The AI can only talk, not act |
| 2 passes | Tool detection → execution → summary | 2x LLM latency on tool calls |
The 2nd pass is essential: it lets Claude reformulate technical results (Docker JSON, Odoo XML-RPC) into a readable human reply.
3. How? — Technical implementation
Section titled “3. How? — Technical implementation”Lifecycle management
Section titled “Lifecycle management”Seven commands manage conversations:
| Command | Action |
|---|---|
/new | Creates a conversation (archives the previous one if active) |
/conv | Lists conversations with pagination |
/endconv | Archives the active conversation |
/model | Changes the LLM model (Codex Max/Standard/Mini, Gemini Pro/Flash) |
/plan | Shows the status of the current plan |
/templates | Lists the 5 available templates |
/mcp | Configures MCP tools for this conversation |
Two Data Tables manage the state:
- conversations — Title, model, template, status, turn counter, MCP config, associated plan
- active_conversations — One row per user, points to the current conversation
When a text message arrives and a conversation is active, the Orchestrator short-circuits the normal routing (AI Router, Command Router) and sends the message directly to the Conversation Agent.
A message’s journey
Section titled “A message’s journey”When you write “what are the active projects in Odoo?” in an active conversation:
1. Interception — The Orchestrator detects the active conversation and routes to the Conversation Agent.
2. Retry check — The system checks whether a plan is awaiting correction (escalation). If so, the message is treated as a correction, not as a new message.
3. Prompt build — The system prompt is built according to the active template, with descriptions of available tools and MCP instructions if active.
4. First Claude call — The LLM receives the message with the full session context (via Redis). It can answer in plain text, or generate a tool block:
| Block | Routing |
|---|---|
```tool | Docker, Odoo, N8N, web search call |
```plan | Multi-step plan creation |
```mcp | Direct N8N MCP tool call |
5. Execution — Depending on the detected type, the system routes to the Tool Router, the Plan Engine, or the MCP gateway.
6. Second Claude call — The raw results are reinjected into the context, and Claude generates a readable response. A restrictive prompt prevents new tool blocks at this stage.
7. Send — The reply is sent to Telegram (truncated to 4000 characters if needed).
8. Compression — If the turn counter exceeds 40, the Conversation Summarizer automatically compresses history: old messages are summarised in one paragraph, and the last 10 messages are kept intact.
The Tool Router
Section titled “The Tool Router”When Claude generates a tool block, the Conversation Tool Router dispatches to the right service:
| Service | Handler | Sample actions |
|---|---|---|
docker | Service Handler Docker | status, restart, logs, update |
odoo | Service Handler Odoo | search_contact, search_invoice, list_projects |
n8n | Service Handler N8N | list_workflows, list_executions, toggle |
web_search | Conversation Web Search | Gemini search with sources |
lead_intelligence | Conversation Lead Intelligence | Contact enrichment (Odoo + web) |
The response is normalised with multi-signal failure detection (error field, success === false, HTTP status >= 400, empty text).
The Plan Engine
Section titled “The Plan Engine”The Plan Engine handles requests requiring multiple coordinated steps. When Claude detects that a task is too complex for a single tool call, it generates a plan:
User: "Check the status of all Docker stacks, and restart the ones that are down"
Claude generates a plan: Step 1: docker status (all stacks) Step 2: analyse results → identify the down ones Step 3: docker restart (down stacks)
The plan is presented with buttons:[Execute] [Modify] [Cancel]Each step can reference earlier results through variables ({{step_0_result}}). If a step fails, the system escalates with three options:
| Action | Behaviour |
|---|---|
| [Skip] | Skip the step + transitive cascade (cancels dependencies) |
| [Modify] | Asks for a textual correction, then resumes |
| [Cancel] | Cancels all remaining steps |
The MCP system
Section titled “The MCP system”The /mcp command opens a menu allowing you to enable or disable MCP (Model Context Protocol) tools for the current conversation. Twenty N8N tools are available, split into two categories:
| Category | Count | Behaviour |
|---|---|---|
| Read-only | 12 | Direct execution (list_workflows, get_workflow, search_nodes…) |
| Write | 8 | Mandatory Telegram confirmation before execution |
The menu shows each server with its ON/OFF status and the count of active tools. Tools can be enabled/disabled at the granularity of an individual tool or in bulk per server.
Four security layers protect operations:
- User whitelist — Only tools explicitly enabled in the MCP config are accessible
- mcp_enabled flag — When disabled, the CLI runs without any MCP server
- allowed_tools filter — The cli-ollama gateway blocks unauthorised calls (403)
- Telegram confirmation — Write tools trigger a confirmation workflow with [Approve] / [Reject] buttons and a 90-second timeout
Memory and compression
Section titled “Memory and compression”Conversation memory is managed by Redis through the memory_service of CLI Ollama. Each conversation has a unique session_id that serves as the Redis key.
When the turn count exceeds 40 (~20 exchanges), the Conversation Summarizer kicks in:
- Fetches all session messages via the REST API
- Separates the last 10 messages (to keep intact)
- Summarises the older messages into one paragraph (Claude, 200 words max)
- Empties the Redis session and reinjects the summary + recent messages
The user sees nothing — the compression is transparent and allows conversations that last hours without quality degradation.
4. What if? — Outlook and limits
Section titled “4. What if? — Outlook and limits”Current limits
Section titled “Current limits”| Limit | Impact | Mitigation |
|---|---|---|
| 2-pass latency | 5-10s per message with tool | Acceptable for personal usage |
| No streaming | Reply arrives in one block | ”typing…” indicator during processing |
| Linear plans | No parallel steps | Sufficient for current use cases |
| Compression at 40 turns | Loss of older details | Faithful summary, last 10 messages intact |
Evolution scenarios
Section titled “Evolution scenarios”If multi-user conversations are needed:
- Shared conversations with permissions
- Action history visible to the team
- Plan assignment to specific members
If more tools are needed:
- Add a service to the Tool Router (a new case in the Switch)
- Or enable an additional MCP server
- The system is extensible by design
If more complex plans are needed:
- Add support for parallel steps (concurrent execution)
- Allow nested sub-plans
- Integrate a per-step approval system for critical actions
Related pages
Section titled “Related pages”Infrastructure
Section titled “Infrastructure”- AI Stack — CLI Ollama, Redis, session memory
- N8N in queue mode — Backend workers
Workflows
Section titled “Workflows”- Telegram Orchestrator — Message and callback routing
- Voice Transcription — Voice messages in conversations
- Global Error Handler — Workflow error handling
Reference
Section titled “Reference”- Glossary — MCP, Session, Multi-turn