Dokumentation (english)

Whisper Large v2

Multilingual speech recognition and translation with word-level timestamps

OpenAI's Whisper Large-v2 transcribes audio with high accuracy across 99+ languages and can translate speech directly to English. Can be used with base weights or a fine-tuned checkpoint.

When to use:

  • Transcribing meetings, interviews, or voice recordings
  • Multilingual content requiring automatic language detection
  • Translating non-English audio to English text
  • Generating subtitles with word-level timestamps

Input: Audio file (WAV, MP3, etc.) + optional fine-tuned checkpoint Output: Transcribed text and word-level timestamps

Model Settings

Sampling Rate (default: 16000) Audio sampling rate in Hz. Must match the audio file's actual sampling rate.

  • 16000: Standard for speech — required by Whisper
  • Resample audio before inference if it differs from 16000 Hz

Inference Settings

Language (default: en) Language code for transcription (e.g., en, fr, de, zh, es).

  • Set to the audio's language for best accuracy
  • Leave as en if the audio is English

Task (default: transcribe, options: transcribe / translate) What to do with the audio.

  • transcribe: Output text in the original language of the audio
  • translate: Translate the audio to English regardless of source language

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor etwa 2 Stunden
Release: v4.0.0-production
Buildnummer: master@afa25ab
Historie: 72 Items