Question Hub & Vision

1. What? — Definition and context

When Claude Code (or Codex, or Gemini) needs to ask a multiple-choice question — “Which framework should we use? React, Vue, or Svelte?” — it cannot do so ergonomically inside a terminal. The Question Hub displays that question in Telegram with interactive buttons, supports multi-select and pagination, and returns the answer to the CLI.

In addition, the Vision OCR system analyses photos sent on Telegram: business card, invoice, screenshot, handwritten note — each document type goes through a specialised extraction.

Two complementary systems

System	Nodes	Trigger	Role
Question Hub	~35 (parent + callback)	CLI webhook	Interactive Telegram questions
Vision OCR	14	Sub-workflow (photo)	Document extraction

2. Why? — Stakes and motivations

Problems solved

Problem	Without these workflows	With these workflows
Unreadable CLI questions	Numbered list inside the terminal	Telegram buttons with emojis
No multi-select	Type the numbers one by one	✅ toggle and confirmation
Useless photos	Photo = binary file, no info	Type-specific structured extraction
Generic OCR	Same processing for everything	Invoice ≠ business card ≠ screenshot

Six recognised document types

The Vision OCR classifies each photo before extracting it:

Type	Extracted fields
business_card	Name, function, company, email, phone
invoice	Vendor, number, lines, total, date
screenshot	Visible text, identified UI
handwritten_note	Transcription, confidence, language
general_document	Structured raw text
not_document	(Not a document — photo, landscape, etc.)

3. How? — Technical implementation

Question Hub: a question’s journey

1. Reception — The CLI sends a webhook with the options, the question type (single/multi-select), and a timeout (300s by default).

2. Formatting — N8N builds an inline keyboard sized to the option count. Beyond 4 options, pagination kicks in automatically (4 options per page with ◀️ ▶️ arrows).

3. Interaction — The user clicks options. In multi-select, every click toggles the ✅ and updates the keyboard in real time (via editMessageReplyMarkup). Selections are persisted in a Data Table to survive page changes.

4. Confirmation — A [Confirm] button validates the choices. The answer is returned to the CLI through the callback.

5. Free text — An optional [Other] button enables Telegram’s ForceReply mode to capture a free-form answer.

Everything happens inside a single Telegram message — no spam of one message per interaction.

Vision OCR: a photo’s journey

1. Classification — Gemini Flash analyses the base64 image and returns a document type with a confidence score.

2. Specialised extraction — Depending on the detected type, a specific prompt is sent to Gemini Vision. Each branch extracts different fields:

3. Normalisation — The reply is formatted as sanitised HTML for Telegram with a uniform contract: {status, docType, extracted, text}.

CLI Ollama profiles

The profile system lets you configure different AI personas with specific knowledge and tools. Each profile is a YAML file in /workspace/profiles/ that defines:

A specialised system prompt
A knowledge base (Markdown files injected into the context)
A list of allowed MCP tools (ceiling semantic)
Tools requiring Telegram approval

Two profiles are deployed: error-analyst (DLQ analysis, 5 read tools) and n8n-admin (workflow administration, 5 read + 2 write with confirmation).

4. What if? — Perspectives and limits

Current limits

Limit	Impact	Mitigation
5-min timeout	Question expires without an answer	Sufficient for interactive use
4 options/page	Lots of pages with 20+ options	Pagination preserving selections
OCR depends on Gemini	No local fallback	Fast and reliable in practice

Evolution scenarios

If more accurate OCR is needed:

Add specialised models (Tesseract for standard fonts)
Post-process invoices with total validation
Direct integration with Odoo accounting

If multi-user use:

Map CLI sessions to Telegram users
Queue if several questions arrive at once

Workflows

Telegram Orchestrator — Photo and callback routing
Voice transcription — Another input modality

Infrastructure

AI Stack — CLI Ollama and Gemini Vision