Qwen-VL-2 Embedding
Multilingual multimodal embeddings with strong OCR and visual understanding
Qwen-VL-2 generates 3584-dimensional joint embeddings from images and text across 32+ languages. Strong performance on OCR-heavy documents, charts, and visual content.
When to use:
- Multilingual image-text retrieval (32+ languages)
- Documents with embedded text, charts, or OCR content
- Cross-modal similarity search in multilingual settings
Input:
- Image (required): Image to encode
- Text (optional): Optional text to pair with the image
Output: 3584-dimensional multimodal embedding vector
Inference Settings
No inference-time settings. Embeddings are computed deterministically.