BGE-M3
Multilingual dense and sparse embeddings for retrieval and hybrid search
BGE-M3 from BAAI supports 100+ languages with three output types: dense vectors, sparse token-weight representations, and hybrid combinations. Ideal for retrieval pipelines that need both semantic and keyword matching.
When to use:
- Multilingual document retrieval across 100+ languages
- Hybrid search combining dense (semantic) and sparse (keyword) signals
- Reranking pipelines using dense-sparse fusion
Input: Text string + optional fine-tuned checkpoint Output: Dense embedding vector (1024-dim) and/or sparse token-weight dictionary
Model Settings
Embedding Type (default: dense, required, options: dense / sparse / hybrid) Which embedding representation to produce.
- dense: Standard 1024-dim vector — use for semantic similarity search
- sparse: Token-weight dictionary (like BM25 but neural) — use for keyword-aware retrieval
- hybrid: Both dense and sparse — best retrieval accuracy when combined with a reranker