Skip to content

Voice Transcription

The Voice Transcription workflow automatically transcribes Telegram voice messages into text. A smart routing system picks the optimal transcription service based on the message duration.

ServiceUsageAdvantage
Groq WhisperMessages ≤ 30sFree, fast (1-2s)
ElevenLabs ScribeMessages > 30sDiarisation, long files
DurationServiceReason
≤ 30 secondsGroq WhisperFast, free
> 30 secondsElevenLabs ScribeDiarisation, long files

ProblemWithout transcriptionWith transcription
Mandatory listeningReplay to understandReadable text instantly
No searchNo ctrl+F on audioIndexable text
Hard to shareSend the audio fileCopy-paste the text
AccessibilityNot accessible to deaf usersUniversal text
CriterionGroq WhisperElevenLabs Scribe
CostFreePaid (per hour)
Speed~1-2s~10-30s
File limit25 MB3 GB
DiarisationNoYes
Best forShort messagesMeetings, podcasts

Yes

No

Execute Workflow Trigger · passthrough

Telegram Get File · download voice

Duration ≤ 30s ?

Groq Whisper · whisper-large-v3-turbo

ElevenLabs Scribe v1

Format Response

Return to Orchestrator

{
"message": {
"voice": {
"file_id": "AwACAgIAAxkB...",
"duration": 15,
"mime_type": "audio/ogg"
},
"from": {
"id": 123456789,
"first_name": "Guillaume"
},
"chat": {
"id": 123456789
}
}
}

Community Node: n8n-nodes-groq

ParameterValue
CredentialGroq account - N8N
OperationTranscribe
Modelwhisper-large-v3-turbo
Input Data Fielddata
Languagefr (optional)
Response Formatjson

HTTP Request Node

ParameterValue
MethodPOST
URLhttps://api.elevenlabs.io/v1/speech-to-text
AuthenticationHeader Auth → ElevenLabs API
Body Content TypeForm-Data

Form Parameters:

NameTypeValue
fileBinary{{ $binary.data }}
model_idStringscribe_v1
language_codeStringfr
{
"success": true,
"text": "Remind me to call Jean tomorrow",
"duration": 15,
"service": "groq"
}

The Telegram Orchestrator detects voice notes and calls this sub-workflow:

IF message.voice exists:
Execute Workflow: Voice Transcription
Input: $json (contains message.voice)
IF response.success:
IF active conversation exists (#231/#232):
Send Transcript Preview + route text to Conversation Agent
ELSE:
Send message: "🎤 {response.text}"
ELSE:
Send message: "❌ Transcription failed"

Since Phase 5 (#231/#232), if a conversation is active when a voice note arrives, the transcribed text is injected as a message into the conversation instead of being returned as-is to the user. This enables a voice discussion with the bot.

CallbackAction
voice_retry_{msg_id}Retry transcription
voice_process_{msg_id}Process with Claude (summary, extraction)
voice_save_{msg_id}Save as a note

ServiceFile limitCostSpeed
Groq Whisper25 MBFree~1-2s
ElevenLabs Scribe3 GBPaid (per hour)~10-30s
LimitImpactMitigation
Groq quotaPossible rate limitingElevenLabs fallback
OGG formatTelegram-only formatAPIs natively supported
No diarisation < 30sNo speaker identificationAcceptable for short messages

If Groq rate limit is hit:

  • Temporarily set duration ≤ 0 to force ElevenLabs
  • Or add OpenAI Whisper as an intermediate fallback

If systematic diarisation is needed:

  • Route every message to ElevenLabs
  • Or use a local model with speaker detection

If multi-language is needed:

  • Auto-detect the language
  • Adapt parameters according to detected language
ProblemCheck
Empty transcriptionDoes the audio file actually contain speech?
ElevenLabs timeoutFiles > 5min: increase timeout (180s)
Groq rate limitCheck quotas, fall back to ElevenLabs
Unsupported formatTelegram sends .ogg (Opus) — natively supported

  • AI Stack — Claude Ollama for post-processing