The Voice Transcription workflow automatically transcribes Telegram voice messages into text. A smart routing system picks the optimal transcription service based on the message duration.
Service Usage Advantage Groq Whisper Messages ≤ 30s Free, fast (1-2s) ElevenLabs Scribe Messages > 30s Diarisation, long files
Duration Service Reason ≤ 30 seconds Groq Whisper Fast, free > 30 seconds ElevenLabs Scribe Diarisation, long files
Problem Without transcription With transcription Mandatory listening Replay to understand Readable text instantly No search No ctrl+F on audio Indexable text Hard to share Send the audio file Copy-paste the text Accessibility Not accessible to deaf users Universal text
Criterion Groq Whisper ElevenLabs Scribe Cost Free Paid (per hour) Speed ~1-2s ~10-30s File limit 25 MB 3 GB Diarisation No Yes Best for Short messages Meetings, podcasts
Execute Workflow Trigger · passthrough
Telegram Get File · download voice
Groq Whisper · whisper-large-v3-turbo
"file_id" : "AwACAgIAAxkB..." ,
"first_name" : "Guillaume"
Community Node: n8n-nodes-groq
Parameter Value Credential Groq account - N8NOperation Transcribe Model whisper-large-v3-turboInput Data Field dataLanguage fr (optional)Response Format json
HTTP Request Node
Parameter Value Method POST URL https://api.elevenlabs.io/v1/speech-to-textAuthentication Header Auth → ElevenLabs API Body Content Type Form-Data
Form Parameters:
Name Type Value file Binary {{ $binary.data }}model_id String scribe_v1language_code String fr
"text" : "Remind me to call Jean tomorrow" ,
The Telegram Orchestrator detects voice notes and calls this sub-workflow:
Execute Workflow: Voice Transcription
Input: $json (contains message.voice)
IF active conversation exists (#231/#232):
Send Transcript Preview + route text to Conversation Agent
Send message: "🎤 {response.text}"
Send message: "❌ Transcription failed"
Since Phase 5 (#231/#232), if a conversation is active when a voice note arrives, the transcribed text is injected as a message into the conversation instead of being returned as-is to the user. This enables a voice discussion with the bot.
Callback Action voice_retry_{msg_id}Retry transcription voice_process_{msg_id}Process with Claude (summary, extraction) voice_save_{msg_id}Save as a note
Service File limit Cost Speed Groq Whisper 25 MB Free ~1-2s ElevenLabs Scribe 3 GB Paid (per hour) ~10-30s
Limit Impact Mitigation Groq quota Possible rate limiting ElevenLabs fallback OGG format Telegram-only format APIs natively supported No diarisation < 30s No speaker identification Acceptable for short messages
If Groq rate limit is hit :
Temporarily set duration ≤ 0 to force ElevenLabs
Or add OpenAI Whisper as an intermediate fallback
If systematic diarisation is needed :
Route every message to ElevenLabs
Or use a local model with speaker detection
If multi-language is needed :
Auto-detect the language
Adapt parameters according to detected language
Problem Check Empty transcription Does the audio file actually contain speech? ElevenLabs timeout Files > 5min: increase timeout (180s) Groq rate limit Check quotas, fall back to ElevenLabs Unsupported format Telegram sends .ogg (Opus) — natively supported
AI Stack — Claude Ollama for post-processing