---
title: Voice Transcription
url: https://blog.guigpap.com/en/workflows/voice-transcription/
url_md: https://blog.guigpap.com/en/workflows/voice-transcription.md
category: automation
date: '2026-01-31'
maturite: production
techno:
  - n8n
  - telegram
application:
  - automation
  - content
---

# Voice Transcription

> Automatic Telegram voice-message transcription with smart service selection

## 1. What? — Definition and context

The **Voice Transcription** workflow automatically transcribes Telegram voice messages into text. A smart routing system picks the optimal transcription service based on the message duration.

> **Note - Whisper**
>
> **Whisper** is an AI model developed by OpenAI that converts speech to text. Groq offers an ultra-fast and free version, perfect for short messages. ElevenLabs Scribe handles long files with diarisation (speaker identification).

### Services used

| Service | Usage | Advantage |
|---------|-------|-----------|
| **Groq Whisper** | Messages ≤ 30s | Free, fast (1-2s) |
| **ElevenLabs Scribe** | Messages > 30s | Diarisation, long files |

### Routing by duration

| Duration | Service | Reason |
|----------|---------|--------|
| ≤ 30 seconds | Groq Whisper | Fast, free |
| > 30 seconds | ElevenLabs Scribe | Diarisation, long files |

---

## 2. Why? — Stakes and motivations

### Problems solved

| Problem | Without transcription | With transcription |
|---------|----------------------|--------------------|
| **Mandatory listening** | Replay to understand | Readable text instantly |
| **No search** | No ctrl+F on audio | Indexable text |
| **Hard to share** | Send the audio file | Copy-paste the text |
| **Accessibility** | Not accessible to deaf users | Universal text |

### Why two services?

| Criterion | Groq Whisper | ElevenLabs Scribe |
|-----------|--------------|-------------------|
| **Cost** | Free | Paid (per hour) |
| **Speed** | ~1-2s | ~10-30s |
| **File limit** | 25 MB | 3 GB |
| **Diarisation** | No | Yes |
| **Best for** | Short messages | Meetings, podcasts |

> **Tip - Why 30 seconds?**
>
> Groq Whisper is optimised for short files and returns the transcription in under 2 seconds. For longer files, ElevenLabs offers better quality with speaker identification.

---

## 3. How? — Technical implementation

### Architecture

```mermaid
flowchart TD
  Trigger["Execute Workflow Trigger · passthrough"]
  Download["Telegram Get File · download voice"]
  IfDuration{"Duration ≤ 30s ?"}
  Groq["Groq Whisper · whisper-large-v3-turbo"]
  ElevenLabs["ElevenLabs Scribe v1"]
  Format["Format Response"]
  Return["Return to Orchestrator"]

  Trigger --> Download --> IfDuration
  IfDuration -->|Yes| Groq
  IfDuration -->|No| ElevenLabs
  Groq --> Format
  ElevenLabs --> Format
  Format --> Return
```

### Workflow input

```json
{
  "message": {
    "voice": {
      "file_id": "AwACAgIAAxkB...",
      "duration": 15,
      "mime_type": "audio/ogg"
    },
    "from": {
      "id": 123456789,
      "first_name": "Guillaume"
    },
    "chat": {
      "id": 123456789
    }
  }
}
```

### Groq Whisper configuration (≤30s)

**Community Node:** `n8n-nodes-groq`

| Parameter | Value |
|-----------|-------|
| Credential | `Groq account - N8N` |
| Operation | Transcribe |
| Model | `whisper-large-v3-turbo` |
| Input Data Field | `data` |
| Language | `fr` (optional) |
| Response Format | `json` |

### ElevenLabs Scribe configuration (>30s)

**HTTP Request Node**

| Parameter | Value |
|-----------|-------|
| Method | POST |
| URL | `https://api.elevenlabs.io/v1/speech-to-text` |
| Authentication | Header Auth → `ElevenLabs API` |
| Body Content Type | Form-Data |

**Form Parameters:**

| Name | Type | Value |
|------|------|-------|
| file | Binary | `{{ $binary.data }}` |
| model_id | String | `scribe_v1` |
| language_code | String | `fr` |

### Output

```json
{
  "success": true,
  "text": "Remind me to call Jean tomorrow",
  "duration": 15,
  "service": "groq"
}
```

### Integration with the orchestrator

The [Telegram Orchestrator](/en/workflows/telegram-orchestrator/) detects voice notes and calls this sub-workflow:

```text
IF message.voice exists:
  Execute Workflow: Voice Transcription
  Input: $json (contains message.voice)

  IF response.success:
    IF active conversation exists (#231/#232):
      Send Transcript Preview + route text to Conversation Agent
    ELSE:
      Send message: "🎤 {response.text}"
  ELSE:
    Send message: "❌ Transcription failed"
```

Since Phase 5 (#231/#232), if a conversation is active when a voice note arrives, the transcribed text is injected as a message into the conversation instead of being returned as-is to the user. This enables a voice discussion with the bot.

### Post-transcription callbacks

| Callback | Action |
|----------|--------|
| `voice_retry_{msg_id}` | Retry transcription |
| `voice_process_{msg_id}` | Process with Claude (summary, extraction) |
| `voice_save_{msg_id}` | Save as a note |

---

## 4. What if? — Outlook and limits

### Limits and costs

| Service | File limit | Cost | Speed |
|---------|------------|------|-------|
| Groq Whisper | 25 MB | Free | ~1-2s |
| ElevenLabs Scribe | 3 GB | Paid (per hour) | ~10-30s |

> **Caution - Groq quotas**
>
> Groq has per-minute request limits. For usage spikes, the workflow automatically falls back to ElevenLabs.

### Current limits

| Limit | Impact | Mitigation |
|-------|--------|------------|
| **Groq quota** | Possible rate limiting | ElevenLabs fallback |
| **OGG format** | Telegram-only format | APIs natively supported |
| **No diarisation < 30s** | No speaker identification | Acceptable for short messages |

### Evolution scenarios

**If Groq rate limit is hit**:
- Temporarily set `duration ≤ 0` to force ElevenLabs
- Or add OpenAI Whisper as an intermediate fallback

**If systematic diarisation is needed**:
- Route every message to ElevenLabs
- Or use a local model with speaker detection

**If multi-language is needed**:
- Auto-detect the language
- Adapt parameters according to detected language

### Troubleshooting

| Problem | Check |
|---------|-------|
| Empty transcription | Does the audio file actually contain speech? |
| ElevenLabs timeout | Files > 5min: increase timeout (180s) |
| Groq rate limit | Check quotas, fall back to ElevenLabs |
| Unsupported format | Telegram sends .ogg (Opus) — natively supported |

---

## Related pages

### Workflows
- [Telegram Orchestrator](/en/workflows/telegram-orchestrator/) — Central hub
- [Notification Hub](/en/workflows/notification-hub/) — Notification routing

### Infrastructure
- [AI Stack](/en/infrastructure/ai-stack/) — Claude Ollama for post-processing

### External references
- [Groq Speech-to-Text](https://console.groq.com/docs/speech-to-text)
- [ElevenLabs STT API](https://elevenlabs.io/docs/api-reference/speech-to-text/convert)

## Metadonnees agent

- Cet article est issu du blog GuiGPaP Lab.
- Contexte global du blog: https://blog.guigpap.com/llms.txt
- Contact auteur: https://odoo.guigpap.com/mon-cv
- Licence: CC-BY-SA 4.0