Skip to content

Conversational System

Beyond simple commands (/docker status, /n8n workflows), some requests require a real conversation: “find the Dupont contact in Odoo, then show me his unpaid invoices, and create a follow-up project”. This requires context, memory between messages, and the ability to execute multi-step actions.

The Conversational System turns the Telegram bot into an AI assistant capable of holding multi-turn conversations, calling tools (Docker, Odoo, N8N, web search), executing multi-step plans, and dynamically selecting MCP tools — all with persistent memory through Redis.

LayerComponentsRole
ConversationsAgent (20n), Manager (20n), Command Handlers (35n), Callback Handler (94n), Summarizer (12n)Conversation lifecycle, routing, memory
ToolsTool Router (10n), Web Search (6n), Lead Intelligence (11n)Action execution and enrichment
Plans & MCPPlan Engine (55n), MCP Menu, MCP Confirmation (3n)Multi-step actions and advanced tools

Conversation Agent · 20 nodes

Text message · active conversation

Build System Prompt

Call Claude · 1st pass

Response type

Claude 2nd pass · summarise

Telegram send

Plain text

tool block · Tool Router 10n

plan block · Plan Engine 55n

mcp block · cli-ollama


Before the conversational system, every interaction with the bot was atomic. You sent a command, you received a reply, and the bot forgot everything. For complex tasks, you had to chain commands manually by copying the output of one step into the next.

ProblemWithout conversationsWith conversations
No memoryBot forgets between messagesPersistent Redis session
Atomic actionsOne command = one actionAutomatic multi-step plans
No context”show his invoices” → whose?The AI keeps the thread of the discussion
Limited toolsFixed commands /docker, /n8nNatural language + dynamic tool calls
No searchNo web accessGemini grounding + lead intelligence

Why a 2-pass LLM flow rather than a single call?

ApproachAdvantageDrawback
1 passFasterThe AI can only talk, not act
2 passesTool detection → execution → summary2x LLM latency on tool calls

The 2nd pass is essential: it lets Claude reformulate technical results (Docker JSON, Odoo XML-RPC) into a readable human reply.


Seven commands manage conversations:

CommandAction
/newCreates a conversation (archives the previous one if active)
/convLists conversations with pagination
/endconvArchives the active conversation
/modelChanges the LLM model (Codex Max/Standard/Mini, Gemini Pro/Flash)
/planShows the status of the current plan
/templatesLists the 5 available templates
/mcpConfigures MCP tools for this conversation

Two Data Tables manage the state:

  • conversations — Title, model, template, status, turn counter, MCP config, associated plan
  • active_conversations — One row per user, points to the current conversation

When a text message arrives and a conversation is active, the Orchestrator short-circuits the normal routing (AI Router, Command Router) and sends the message directly to the Conversation Agent.

When you write “what are the active projects in Odoo?” in an active conversation:

1. Interception — The Orchestrator detects the active conversation and routes to the Conversation Agent.

2. Retry check — The system checks whether a plan is awaiting correction (escalation). If so, the message is treated as a correction, not as a new message.

3. Prompt build — The system prompt is built according to the active template, with descriptions of available tools and MCP instructions if active.

4. First Claude call — The LLM receives the message with the full session context (via Redis). It can answer in plain text, or generate a tool block:

BlockRouting
```toolDocker, Odoo, N8N, web search call
```planMulti-step plan creation
```mcpDirect N8N MCP tool call

5. Execution — Depending on the detected type, the system routes to the Tool Router, the Plan Engine, or the MCP gateway.

6. Second Claude call — The raw results are reinjected into the context, and Claude generates a readable response. A restrictive prompt prevents new tool blocks at this stage.

7. Send — The reply is sent to Telegram (truncated to 4000 characters if needed).

8. Compression — If the turn counter exceeds 40, the Conversation Summarizer automatically compresses history: old messages are summarised in one paragraph, and the last 10 messages are kept intact.

When Claude generates a tool block, the Conversation Tool Router dispatches to the right service:

ServiceHandlerSample actions
dockerService Handler Dockerstatus, restart, logs, update
odooService Handler Odoosearch_contact, search_invoice, list_projects
n8nService Handler N8Nlist_workflows, list_executions, toggle
web_searchConversation Web SearchGemini search with sources
lead_intelligenceConversation Lead IntelligenceContact enrichment (Odoo + web)

The response is normalised with multi-signal failure detection (error field, success === false, HTTP status >= 400, empty text).

The Plan Engine handles requests requiring multiple coordinated steps. When Claude detects that a task is too complex for a single tool call, it generates a plan:

User: "Check the status of all Docker stacks,
and restart the ones that are down"
Claude generates a plan:
Step 1: docker status (all stacks)
Step 2: analyse results → identify the down ones
Step 3: docker restart (down stacks)
The plan is presented with buttons:
[Execute] [Modify] [Cancel]

Each step can reference earlier results through variables ({{step_0_result}}). If a step fails, the system escalates with three options:

ActionBehaviour
[Skip]Skip the step + transitive cascade (cancels dependencies)
[Modify]Asks for a textual correction, then resumes
[Cancel]Cancels all remaining steps

The /mcp command opens a menu allowing you to enable or disable MCP (Model Context Protocol) tools for the current conversation. Twenty N8N tools are available, split into two categories:

CategoryCountBehaviour
Read-only12Direct execution (list_workflows, get_workflow, search_nodes…)
Write8Mandatory Telegram confirmation before execution

The menu shows each server with its ON/OFF status and the count of active tools. Tools can be enabled/disabled at the granularity of an individual tool or in bulk per server.

Four security layers protect operations:

  1. User whitelist — Only tools explicitly enabled in the MCP config are accessible
  2. mcp_enabled flag — When disabled, the CLI runs without any MCP server
  3. allowed_tools filter — The cli-ollama gateway blocks unauthorised calls (403)
  4. Telegram confirmation — Write tools trigger a confirmation workflow with [Approve] / [Reject] buttons and a 90-second timeout

Conversation memory is managed by Redis through the memory_service of CLI Ollama. Each conversation has a unique session_id that serves as the Redis key.

When the turn count exceeds 40 (~20 exchanges), the Conversation Summarizer kicks in:

  1. Fetches all session messages via the REST API
  2. Separates the last 10 messages (to keep intact)
  3. Summarises the older messages into one paragraph (Claude, 200 words max)
  4. Empties the Redis session and reinjects the summary + recent messages

The user sees nothing — the compression is transparent and allows conversations that last hours without quality degradation.


LimitImpactMitigation
2-pass latency5-10s per message with toolAcceptable for personal usage
No streamingReply arrives in one block”typing…” indicator during processing
Linear plansNo parallel stepsSufficient for current use cases
Compression at 40 turnsLoss of older detailsFaithful summary, last 10 messages intact

If multi-user conversations are needed:

  • Shared conversations with permissions
  • Action history visible to the team
  • Plan assignment to specific members

If more tools are needed:

  • Add a service to the Tool Router (a new case in the Switch)
  • Or enable an additional MCP server
  • The system is extensible by design

If more complex plans are needed:

  • Add support for parallel steps (concurrent execution)
  • Allow nested sub-plans
  • Integrate a per-step approval system for critical actions