---
title: Voice Transcription
url: https://blog.guigpap.com/fr/workflows/voice-transcription/
url_md: https://blog.guigpap.com/fr/workflows/voice-transcription.md
category: automation
date: '2026-01-31'
maturite: production
techno:
  - n8n
  - telegram
application:
  - automation
  - content
---

# Voice Transcription

> Transcription automatique des messages vocaux Telegram avec choix intelligent du service

## 1. Quoi ? — Définition et contexte

Le workflow **Voice Transcription** transcrit automatiquement les messages vocaux Telegram en texte. Un système de routage intelligent choisit le service de transcription optimal selon la durée du message.

> **Note - Whisper**
>
> **Whisper** est un modèle d'IA développé par OpenAI qui convertit la parole en texte. Groq propose une version ultra-rapide et gratuite, parfaite pour les messages courts. ElevenLabs Scribe gère les fichiers longs avec diarisation (identification des locuteurs).

### Services utilisés

| Service | Usage | Avantage |
|---------|-------|----------|
| **Groq Whisper** | Messages ≤ 30s | Gratuit, rapide (1-2s) |
| **ElevenLabs Scribe** | Messages > 30s | Diarisation, fichiers longs |

### Routage par durée

| Durée | Service | Raison |
|-------|---------|--------|
| ≤ 30 secondes | Groq Whisper | Rapide, gratuit |
| > 30 secondes | ElevenLabs Scribe | Diarisation, fichiers longs |

---

## 2. Pourquoi ? — Enjeux et motivations

### Problèmes résolus

| Problème | Sans transcription | Avec transcription |
|----------|--------------------|--------------------|
| **Écoute obligatoire** | Réécouter pour comprendre | Texte lisible instantanément |
| **Recherche impossible** | Pas de ctrl+F sur l'audio | Texte indexable |
| **Partage difficile** | Envoyer le fichier audio | Copier-coller le texte |
| **Accessibilité** | Pas accessible aux sourds | Texte universel |

### Pourquoi deux services ?

| Critère | Groq Whisper | ElevenLabs Scribe |
|---------|--------------|-------------------|
| **Coût** | Gratuit | Payant (par heure) |
| **Vitesse** | ~1-2s | ~10-30s |
| **Limite fichier** | 25 MB | 3 GB |
| **Diarisation** | Non | Oui |
| **Idéal pour** | Messages courts | Réunions, podcasts |

> **Tip - Pourquoi 30 secondes ?**
>
> Groq Whisper est optimisé pour les fichiers courts et retourne la transcription en moins de 2 secondes. Pour les fichiers plus longs, ElevenLabs offre une meilleure qualité avec identification des locuteurs.

---

## 3. Comment ? — Mise en œuvre technique

### Architecture

```mermaid
flowchart TD
  Trigger["Execute Workflow Trigger · passthrough"]
  Download["Telegram Get File · download voice"]
  IfDuration{"Duration ≤ 30s ?"}
  Groq["Groq Whisper · whisper-large-v3-turbo"]
  ElevenLabs["ElevenLabs Scribe v1"]
  Format["Format Response"]
  Return["Return to Orchestrator"]

  Trigger --> Download --> IfDuration
  IfDuration -->|Yes| Groq
  IfDuration -->|No| ElevenLabs
  Groq --> Format
  ElevenLabs --> Format
  Format --> Return
```

### Input du workflow

```json
{
  "message": {
    "voice": {
      "file_id": "AwACAgIAAxkB...",
      "duration": 15,
      "mime_type": "audio/ogg"
    },
    "from": {
      "id": 123456789,
      "first_name": "Guillaume"
    },
    "chat": {
      "id": 123456789
    }
  }
}
```

### Configuration Groq Whisper (≤30s)

**Community Node:** `n8n-nodes-groq`

| Paramètre | Valeur |
|-----------|--------|
| Credential | `Groq account - N8N` |
| Operation | Transcribe |
| Model | `whisper-large-v3-turbo` |
| Input Data Field | `data` |
| Language | `fr` (optionnel) |
| Response Format | `json` |

### Configuration ElevenLabs Scribe (>30s)

**HTTP Request Node**

| Paramètre | Valeur |
|-----------|--------|
| Method | POST |
| URL | `https://api.elevenlabs.io/v1/speech-to-text` |
| Authentication | Header Auth → `ElevenLabs API` |
| Body Content Type | Form-Data |

**Form Parameters:**

| Name | Type | Value |
|------|------|-------|
| file | Binary | `{{ $binary.data }}` |
| model_id | String | `scribe_v1` |
| language_code | String | `fr` |

### Output

```json
{
  "success": true,
  "text": "Rappelle-moi d'appeler Jean demain",
  "duration": 15,
  "service": "groq"
}
```

### Intégration avec l'orchestrateur

L'[Orchestrateur Telegram](/fr/workflows/telegram-orchestrator/) détecte les voice notes et appelle ce sub-workflow :

```text
IF message.voice exists:
  Execute Workflow: Voice Transcription
  Input: $json (contient message.voice)

  IF response.success:
    IF active conversation exists (#231/#232):
      Send Transcript Preview + route texte vers Conversation Agent
    ELSE:
      Send message: "🎤 {response.text}"
  ELSE:
    Send message: "❌ Transcription échouée"
```

Depuis Phase 5 (#231/#232), si une conversation est active au moment où arrive une note vocale, le texte transcrit est injecté comme message dans la conversation au lieu d'être renvoyé tel quel à l'utilisateur. Ça permet d'avoir une discussion vocale avec le bot.

### Callbacks post-transcription

| Callback | Action |
|----------|--------|
| `voice_retry_{msg_id}` | Réessayer transcription |
| `voice_process_{msg_id}` | Traiter avec Claude (résumé, extraction) |
| `voice_save_{msg_id}` | Sauvegarder en note |

---

## 4. Et si ? — Perspectives et limites

### Limites et coûts

| Service | Limite fichier | Coût | Vitesse |
|---------|---------------|------|---------|
| Groq Whisper | 25 MB | Gratuit | ~1-2s |
| ElevenLabs Scribe | 3 GB | Payant (par heure) | ~10-30s |

> **Caution - Quotas Groq**
>
> Groq a des limites de requêtes par minute. Pour les pics d'usage, le workflow fallback automatiquement vers ElevenLabs.

### Limites actuelles

| Limite | Impact | Mitigation |
|--------|--------|------------|
| **Quota Groq** | Rate limiting possible | Fallback ElevenLabs |
| **Format OGG** | Seul format Telegram | APIs supportent nativement |
| **Pas de diarisation < 30s** | Pas d'identification locuteurs | Acceptable pour messages courts |

### Scénarios d'évolution

**Si Groq rate limit atteint** :
- Basculer temporairement `duration ≤ 0` pour forcer ElevenLabs
- Ou ajouter OpenAI Whisper comme fallback intermédiaire

**Si besoin de diarisation systématique** :
- Router tous les messages vers ElevenLabs
- Ou utiliser un modèle local avec speaker detection

**Si besoin de langues multiples** :
- Détecter la langue automatiquement
- Adapter les paramètres selon la langue détectée

### Troubleshooting

| Problème | Vérification |
|----------|--------------|
| Transcription vide | Fichier audio contient réellement de la parole ? |
| Timeout ElevenLabs | Fichiers > 5min : augmenter timeout (180s) |
| Groq rate limit | Vérifier quotas, basculer vers ElevenLabs |
| Format non supporté | Telegram envoie du .ogg (Opus) — supporté nativement |

---

## Pages liées

### Workflows
- [Telegram Orchestrator](/fr/workflows/telegram-orchestrator/) — Hub central
- [Notification Hub](/fr/workflows/notification-hub/) — Routage notifications

### Infrastructure
- [AI Stack](/fr/infrastructure/ai-stack/) — Claude Ollama pour post-processing

### Références externes
- [Groq Speech-to-Text](https://console.groq.com/docs/speech-to-text)
- [ElevenLabs STT API](https://elevenlabs.io/docs/api-reference/speech-to-text/convert)

## Metadonnees agent

- Cet article est issu du blog GuiGPaP Lab.
- Contexte global du blog: https://blog.guigpap.com/llms.txt
- Contact auteur: https://odoo.guigpap.com/mon-cv
- Licence: CC-BY-SA 4.0