Dokumentation (english)

BGE-M3

Multilingual dense and sparse embeddings for retrieval and hybrid search

BGE-M3 from BAAI supports 100+ languages with three output types: dense vectors, sparse token-weight representations, and hybrid combinations. Ideal for retrieval pipelines that need both semantic and keyword matching.

When to use:

  • Multilingual document retrieval across 100+ languages
  • Hybrid search combining dense (semantic) and sparse (keyword) signals
  • Reranking pipelines using dense-sparse fusion

Input: Text string + optional fine-tuned checkpoint Output: Dense embedding vector (1024-dim) and/or sparse token-weight dictionary

Model Settings

Embedding Type (default: dense, required, options: dense / sparse / hybrid) Which embedding representation to produce.

  • dense: Standard 1024-dim vector — use for semantic similarity search
  • sparse: Token-weight dictionary (like BM25 but neural) — use for keyword-aware retrieval
  • hybrid: Both dense and sparse — best retrieval accuracy when combined with a reranker

On this page


Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor etwa 2 Stunden
Release: v4.0.0-production
Buildnummer: master@afa25ab
Historie: 72 Items