--- title: Voice Transcription url: https://blog.guigpap.com/en/workflows/voice-transcription/ url_md: https://blog.guigpap.com/en/workflows/voice-transcription.md category: automation date: '2026-01-31' maturite: production techno: - n8n - telegram application: - automation - content --- # Voice Transcription > Automatic Telegram voice-message transcription with smart service selection ## 1. What? — Definition and context The **Voice Transcription** workflow automatically transcribes Telegram voice messages into text. A smart routing system picks the optimal transcription service based on the message duration. > **Note - Whisper** > > **Whisper** is an AI model developed by OpenAI that converts speech to text. Groq offers an ultra-fast and free version, perfect for short messages. ElevenLabs Scribe handles long files with diarisation (speaker identification). ### Services used | Service | Usage | Advantage | |---------|-------|-----------| | **Groq Whisper** | Messages ≤ 30s | Free, fast (1-2s) | | **ElevenLabs Scribe** | Messages > 30s | Diarisation, long files | ### Routing by duration | Duration | Service | Reason | |----------|---------|--------| | ≤ 30 seconds | Groq Whisper | Fast, free | | > 30 seconds | ElevenLabs Scribe | Diarisation, long files | --- ## 2. Why? — Stakes and motivations ### Problems solved | Problem | Without transcription | With transcription | |---------|----------------------|--------------------| | **Mandatory listening** | Replay to understand | Readable text instantly | | **No search** | No ctrl+F on audio | Indexable text | | **Hard to share** | Send the audio file | Copy-paste the text | | **Accessibility** | Not accessible to deaf users | Universal text | ### Why two services? | Criterion | Groq Whisper | ElevenLabs Scribe | |-----------|--------------|-------------------| | **Cost** | Free | Paid (per hour) | | **Speed** | ~1-2s | ~10-30s | | **File limit** | 25 MB | 3 GB | | **Diarisation** | No | Yes | | **Best for** | Short messages | Meetings, podcasts | > **Tip - Why 30 seconds?** > > Groq Whisper is optimised for short files and returns the transcription in under 2 seconds. For longer files, ElevenLabs offers better quality with speaker identification. --- ## 3. How? — Technical implementation ### Architecture ```mermaid flowchart TD Trigger["Execute Workflow Trigger · passthrough"] Download["Telegram Get File · download voice"] IfDuration{"Duration ≤ 30s ?"} Groq["Groq Whisper · whisper-large-v3-turbo"] ElevenLabs["ElevenLabs Scribe v1"] Format["Format Response"] Return["Return to Orchestrator"] Trigger --> Download --> IfDuration IfDuration -->|Yes| Groq IfDuration -->|No| ElevenLabs Groq --> Format ElevenLabs --> Format Format --> Return ``` ### Workflow input ```json { "message": { "voice": { "file_id": "AwACAgIAAxkB...", "duration": 15, "mime_type": "audio/ogg" }, "from": { "id": 123456789, "first_name": "Guillaume" }, "chat": { "id": 123456789 } } } ``` ### Groq Whisper configuration (≤30s) **Community Node:** `n8n-nodes-groq` | Parameter | Value | |-----------|-------| | Credential | `Groq account - N8N` | | Operation | Transcribe | | Model | `whisper-large-v3-turbo` | | Input Data Field | `data` | | Language | `fr` (optional) | | Response Format | `json` | ### ElevenLabs Scribe configuration (>30s) **HTTP Request Node** | Parameter | Value | |-----------|-------| | Method | POST | | URL | `https://api.elevenlabs.io/v1/speech-to-text` | | Authentication | Header Auth → `ElevenLabs API` | | Body Content Type | Form-Data | **Form Parameters:** | Name | Type | Value | |------|------|-------| | file | Binary | `{{ $binary.data }}` | | model_id | String | `scribe_v1` | | language_code | String | `fr` | ### Output ```json { "success": true, "text": "Remind me to call Jean tomorrow", "duration": 15, "service": "groq" } ``` ### Integration with the orchestrator The [Telegram Orchestrator](/en/workflows/telegram-orchestrator/) detects voice notes and calls this sub-workflow: ```text IF message.voice exists: Execute Workflow: Voice Transcription Input: $json (contains message.voice) IF response.success: IF active conversation exists (#231/#232): Send Transcript Preview + route text to Conversation Agent ELSE: Send message: "🎤 {response.text}" ELSE: Send message: "❌ Transcription failed" ``` Since Phase 5 (#231/#232), if a conversation is active when a voice note arrives, the transcribed text is injected as a message into the conversation instead of being returned as-is to the user. This enables a voice discussion with the bot. ### Post-transcription callbacks | Callback | Action | |----------|--------| | `voice_retry_{msg_id}` | Retry transcription | | `voice_process_{msg_id}` | Process with Claude (summary, extraction) | | `voice_save_{msg_id}` | Save as a note | --- ## 4. What if? — Outlook and limits ### Limits and costs | Service | File limit | Cost | Speed | |---------|------------|------|-------| | Groq Whisper | 25 MB | Free | ~1-2s | | ElevenLabs Scribe | 3 GB | Paid (per hour) | ~10-30s | > **Caution - Groq quotas** > > Groq has per-minute request limits. For usage spikes, the workflow automatically falls back to ElevenLabs. ### Current limits | Limit | Impact | Mitigation | |-------|--------|------------| | **Groq quota** | Possible rate limiting | ElevenLabs fallback | | **OGG format** | Telegram-only format | APIs natively supported | | **No diarisation < 30s** | No speaker identification | Acceptable for short messages | ### Evolution scenarios **If Groq rate limit is hit**: - Temporarily set `duration ≤ 0` to force ElevenLabs - Or add OpenAI Whisper as an intermediate fallback **If systematic diarisation is needed**: - Route every message to ElevenLabs - Or use a local model with speaker detection **If multi-language is needed**: - Auto-detect the language - Adapt parameters according to detected language ### Troubleshooting | Problem | Check | |---------|-------| | Empty transcription | Does the audio file actually contain speech? | | ElevenLabs timeout | Files > 5min: increase timeout (180s) | | Groq rate limit | Check quotas, fall back to ElevenLabs | | Unsupported format | Telegram sends .ogg (Opus) — natively supported | --- ## Related pages ### Workflows - [Telegram Orchestrator](/en/workflows/telegram-orchestrator/) — Central hub - [Notification Hub](/en/workflows/notification-hub/) — Notification routing ### Infrastructure - [AI Stack](/en/infrastructure/ai-stack/) — Claude Ollama for post-processing ### External references - [Groq Speech-to-Text](https://console.groq.com/docs/speech-to-text) - [ElevenLabs STT API](https://elevenlabs.io/docs/api-reference/speech-to-text/convert) ## Metadonnees agent - Cet article est issu du blog GuiGPaP Lab. - Contexte global du blog: https://blog.guigpap.com/llms.txt - Contact auteur: https://odoo.guigpap.com/mon-cv - Licence: CC-BY-SA 4.0