Conversational System

1. What? — Definition and context

Beyond simple commands (/docker status, /n8n workflows), some requests require a real conversation: “find the Dupont contact in Odoo, then show me his unpaid invoices, and create a follow-up project”. This requires context, memory between messages, and the ability to execute multi-step actions.

The Conversational System turns the Telegram bot into an AI assistant capable of holding multi-turn conversations, calling tools (Docker, Odoo, N8N, web search), executing multi-step plans, and dynamically selecting MCP tools — all with persistent memory through Redis.

The three layers of the system

Layer	Components	Role
Conversations	Agent (33n), Manager (20n), Command Handlers (35n), Callback Handler (94n), Summarizer (12n)	Conversation lifecycle, routing, memory
Tools	Tool Router (10n), Web Search (6n), Lead Intelligence (11n)	Action execution and enrichment
Plans & MCP	Plan Engine (55n), MCP Menu, MCP Confirmation (3n)	Multi-step actions and advanced tools
Telegram rendering	Codex Progress Handler (11n), Data Table `codex_progress_buffer`	Streaming edit-in-place while Codex runs

Overall architecture

2. Why? — Stakes and motivations

Before the conversational system, every interaction with the bot was atomic. You sent a command, you received a reply, and the bot forgot everything. For complex tasks, you had to chain commands manually by copying the output of one step into the next.

Problems solved

Problem	Without conversations	With conversations
No memory	Bot forgets between messages	Persistent Redis session
Atomic actions	One command = one action	Automatic multi-step plans
No context	”show his invoices” → whose?	The AI keeps the thread of the discussion
Limited tools	Fixed commands /docker, /n8n	Natural language + dynamic tool calls
No search	No web access	Gemini grounding + lead intelligence

Architecture choices

Why a 2-pass LLM flow rather than a single call?

Approach	Advantage	Drawback
1 pass	Faster	The AI can only talk, not act
2 passes	Tool detection → execution → summary	2x LLM latency on tool calls

The 2nd pass is essential: it lets Claude reformulate technical results (Docker JSON, Odoo XML-RPC) into a readable human reply.

3. How? — Technical implementation

Lifecycle management

Seven commands manage conversations:

Command	Action
`/new`	Creates a conversation (archives the previous one if active)
`/conv`	Lists conversations with pagination
`/endconv`	Archives the active conversation
`/model`	Changes the LLM model (Codex Max/Standard/Mini, Gemini Pro/Flash)
`/plan`	Shows the status of the current plan
`/templates`	Lists the 5 available templates
`/mcp`	Configures MCP tools for this conversation

Two Data Tables manage the state:

conversations — Title, model, template, status, turn counter, MCP config, associated plan
active_conversations — One row per user, points to the current conversation

When a text message arrives and a conversation is active, the Orchestrator short-circuits the normal routing (AI Router, Command Router) and sends the message directly to the Conversation Agent.

A message’s journey

When you write “what are the active projects in Odoo?” in an active conversation:

1. Interception — The Orchestrator detects the active conversation and routes to the Conversation Agent.

2. Retry check — The system checks whether a plan is awaiting correction (escalation). If so, the message is treated as a correction, not as a new message.

3. Prompt build — The system prompt is built according to the active template, with descriptions of available tools and MCP instructions if active.

4. First Claude call — The LLM receives the message with the full session context (via Redis). It can answer in plain text, or generate a tool block:

Block	Routing
```tool	Docker, Odoo, N8N, web search call
```plan	Multi-step plan creation
```mcp	Direct N8N MCP tool call

5. Execution — Depending on the detected type, the system routes to the Tool Router, the Plan Engine, or the MCP gateway.

6. Second Claude call — The raw results are reinjected into the context, and Claude generates a readable response. A restrictive prompt prevents new tool blocks at this stage.

7. Send — The reply is sent to Telegram (truncated to 4000 characters if needed).

8. Compression — If the turn counter exceeds 40, the Conversation Summarizer automatically compresses history: old messages are summarised in one paragraph, and the last 10 messages are kept intact.

The Tool Router

When Claude generates a tool block, the Conversation Tool Router dispatches to the right service:

Service	Handler	Sample actions
`docker`	Service Handler Docker	status, restart, logs, update
`odoo`	Service Handler Odoo	search_contact, search_invoice, list_projects
`n8n`	Service Handler N8N	list_workflows, list_executions, toggle
`web_search`	Conversation Web Search	Gemini search with sources
`lead_intelligence`	Conversation Lead Intelligence	Contact enrichment (Odoo + web)

The response is normalised with multi-signal failure detection (error field, success === false, HTTP status >= 400, empty text).

The Plan Engine

The Plan Engine handles requests requiring multiple coordinated steps. When Claude detects that a task is too complex for a single tool call, it generates a plan:

User: "Check the status of all Docker stacks,
       and restart the ones that are down"

Claude generates a plan:
  Step 1: docker status (all stacks)
  Step 2: analyse results → identify the down ones
  Step 3: docker restart (down stacks)

The plan is presented with buttons:
[Execute] [Modify] [Cancel]

Each step can reference earlier results through variables ({{step_0_result}}). If a step fails, the system escalates with three options:

Action	Behaviour
[Skip]	Skip the step + transitive cascade (cancels dependencies)
[Modify]	Asks for a textual correction, then resumes
[Cancel]	Cancels all remaining steps

The MCP system

The /mcp command opens a menu allowing you to enable or disable MCP (Model Context Protocol) tools for the current conversation. Twenty N8N tools are available, split into two categories:

Category	Count	Behaviour
Read-only	12	Direct execution (list_workflows, get_workflow, search_nodes…)
Write	8	Mandatory Telegram confirmation before execution

The menu shows each server with its ON/OFF status and the count of active tools. Tools can be enabled/disabled at the granularity of an individual tool or in bulk per server.

Four security layers protect operations:

User whitelist — Only tools explicitly enabled in the MCP config are accessible
mcp_enabled flag — When disabled, the CLI runs without any MCP server
allowed_tools filter — The cli-ollama gateway blocks unauthorised calls (403)
Telegram confirmation — Write tools trigger a confirmation workflow with [Approve] / [Reject] buttons and a 90-second timeout

Memory and compression

Conversation memory is managed by Redis through the memory_service of CLI Ollama. Each conversation has a unique session_id that serves as the Redis key.

When the turn count exceeds 40 (~20 exchanges), the Conversation Summarizer kicks in:

Fetches all session messages via the REST API
Separates the last 10 messages (to keep intact)
Summarises the older messages into one paragraph (Claude, 200 words max)
Empties the Redis session and reinjects the summary + recent messages

The user sees nothing — the compression is transparent and allows conversations that last hours without quality degradation.

Real-time streaming

Codex emits intermediate events (reasoning, command calls, agent messages) before producing its final reply. Previously, each event was sent as a brand new Telegram message, which spammed the chat with 2 to 5 successive messages prefixed by .... The new pipeline collapses that into a single message that edits progressively, from the initial placeholder all the way to the formatted final reply.

The pipeline relies on three components:

Component	Role
`_execute_oneshot` (cli-ollama)	Reads Codex JSONL stdout line by line via `asyncio.readline()` and emits a `progress_webhook` POST on every `item.completed` during execution (instead of flushing everything at the end)
Data Table `codex_progress_buffer`	One row per active chat: `chat_id`, placeholder `message_id`, `accumulated_text`, `last_edit_ts`. Acts as the shared buffer between Conversation Agent and Codex Progress Handler
Codex Progress Handler	N8N workflow (11 nodes) that receives each progress event, loads the buffer, accumulates the chunk, throttles to 1 edit/second, and calls `editMessageText` on the placeholder

When the accumulation exceeds 3800 characters (margin under Telegram’s 4096 limit), the handler automatically sends a new message prefixed with (suite) and resets the buffer for that new message. Subsequent chunks edit the new message.

Once the full reply is ready, the Conversation Agent turns Send Response into an editMessageText on the same placeholder — the formatted final version (with the 💬 <title> • <model> header) overwrites all the intermediate accumulation. The row in the Data Table is removed by DT Delete Buffer so we start clean on the next message.

4. What if? — Outlook and limits

Current limits

Limit	Impact	Mitigation
2-pass latency	5-10s per message with tool	Acceptable for personal usage
1s streaming throttle	Codex chunks that arrive back-to-back may not all be rendered visually	Content stays accumulated in the buffer; the next non-throttled edit includes everything
Linear plans	No parallel steps	Sufficient for current use cases
Compression at 40 turns	Loss of older details	Faithful summary, last 10 messages intact

Evolution scenarios

If multi-user conversations are needed:

Shared conversations with permissions
Action history visible to the team
Plan assignment to specific members

If more tools are needed:

Add a service to the Tool Router (a new case in the Switch)
Or enable an additional MCP server
The system is extensible by design

If more complex plans are needed:

Add support for parallel steps (concurrent execution)
Allow nested sub-plans
Integrate a per-step approval system for critical actions

Infrastructure

AI Stack — CLI Ollama, Redis, session memory
N8N in queue mode — Backend workers

Workflows

Telegram Orchestrator — Message and callback routing
Voice Transcription — Voice messages in conversations
Global Error Handler — Workflow error handling

Reference

Glossary — MCP, Session, Multi-turn