Qwen-VL-2 Embedding

Qwen-VL-2 generates 3584-dimensional joint embeddings from images and text across 32+ languages. Strong performance on OCR-heavy documents, charts, and visual content.

When to use:

Multilingual image-text retrieval (32+ languages)
Documents with embedded text, charts, or OCR content
Cross-modal similarity search in multilingual settings

Input:

Image (required): Image to encode
Text (optional): Optional text to pair with the image

Output: 3584-dimensional multimodal embedding vector

Inference Settings

No inference-time settings. Embeddings are computed deterministically.

Qwen-VL-2 Embedding

Inference Settings

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

Qwen-VL-2 Embedding

Inference Settings

On this page

Command Palette