Question Hub & Vision
1. What? — Definition and context
Section titled “1. What? — Definition and context”When Claude Code (or Codex, or Gemini) needs to ask a multiple-choice question — “Which framework should we use? React, Vue, or Svelte?” — it cannot do so ergonomically inside a terminal. The Question Hub displays that question in Telegram with interactive buttons, supports multi-select and pagination, and returns the answer to the CLI.
In addition, the Vision OCR system analyses photos sent on Telegram: business card, invoice, screenshot, handwritten note — each document type goes through a specialised extraction.
Two complementary systems
Section titled “Two complementary systems”| System | Nodes | Trigger | Role |
|---|---|---|---|
| Question Hub | ~35 (parent + callback) | CLI webhook | Interactive Telegram questions |
| Vision OCR | 14 | Sub-workflow (photo) | Document extraction |
2. Why? — Stakes and motivations
Section titled “2. Why? — Stakes and motivations”Problems solved
Section titled “Problems solved”| Problem | Without these workflows | With these workflows |
|---|---|---|
| Unreadable CLI questions | Numbered list inside the terminal | Telegram buttons with emojis |
| No multi-select | Type the numbers one by one | ✅ toggle and confirmation |
| Useless photos | Photo = binary file, no info | Type-specific structured extraction |
| Generic OCR | Same processing for everything | Invoice ≠ business card ≠ screenshot |
Six recognised document types
Section titled “Six recognised document types”The Vision OCR classifies each photo before extracting it:
| Type | Extracted fields |
|---|---|
| business_card | Name, function, company, email, phone |
| invoice | Vendor, number, lines, total, date |
| screenshot | Visible text, identified UI |
| handwritten_note | Transcription, confidence, language |
| general_document | Structured raw text |
| not_document | (Not a document — photo, landscape, etc.) |
3. How? — Technical implementation
Section titled “3. How? — Technical implementation”Question Hub: a question’s journey
Section titled “Question Hub: a question’s journey”1. Reception — The CLI sends a webhook with the options, the question type (single/multi-select), and a timeout (300s by default).
2. Formatting — N8N builds an inline keyboard sized to the option count. Beyond 4 options, pagination kicks in automatically (4 options per page with ◀️ ▶️ arrows).
3. Interaction — The user clicks options. In multi-select, every click toggles the ✅ and updates the keyboard in real time (via editMessageReplyMarkup). Selections are persisted in a Data Table to survive page changes.
4. Confirmation — A [Confirm] button validates the choices. The answer is returned to the CLI through the callback.
5. Free text — An optional [Other] button enables Telegram’s ForceReply mode to capture a free-form answer.
Everything happens inside a single Telegram message — no spam of one message per interaction.
Vision OCR: a photo’s journey
Section titled “Vision OCR: a photo’s journey”1. Classification — Gemini Flash analyses the base64 image and returns a document type with a confidence score.
2. Specialised extraction — Depending on the detected type, a specific prompt is sent to Gemini Vision. Each branch extracts different fields:
3. Normalisation — The reply is formatted as sanitised HTML for Telegram with a uniform contract: {status, docType, extracted, text}.
CLI Ollama profiles
Section titled “CLI Ollama profiles”The profile system lets you configure different AI personas with specific knowledge and tools. Each profile is a YAML file in /workspace/profiles/ that defines:
- A specialised system prompt
- A knowledge base (Markdown files injected into the context)
- A list of allowed MCP tools (ceiling semantic)
- Tools requiring Telegram approval
Two profiles are deployed: error-analyst (DLQ analysis, 5 read tools) and n8n-admin (workflow administration, 5 read + 2 write with confirmation).
4. What if? — Perspectives and limits
Section titled “4. What if? — Perspectives and limits”Current limits
Section titled “Current limits”| Limit | Impact | Mitigation |
|---|---|---|
| 5-min timeout | Question expires without an answer | Sufficient for interactive use |
| 4 options/page | Lots of pages with 20+ options | Pagination preserving selections |
| OCR depends on Gemini | No local fallback | Fast and reliable in practice |
Evolution scenarios
Section titled “Evolution scenarios”If more accurate OCR is needed:
- Add specialised models (Tesseract for standard fonts)
- Post-process invoices with total validation
- Direct integration with Odoo accounting
If multi-user use:
- Map CLI sessions to Telegram users
- Queue if several questions arrive at once
Related pages
Section titled “Related pages”Workflows
Section titled “Workflows”- Telegram Orchestrator — Photo and callback routing
- Voice transcription — Another input modality
Infrastructure
Section titled “Infrastructure”- AI Stack — CLI Ollama and Gemini Vision