SigLIP SO400M
State-of-the-art image embeddings using sigmoid loss for cross-modal retrieval
SigLIP SO400M from Google uses sigmoid loss training for improved cross-modal alignment between images and text. Outperforms CLIP on many retrieval and zero-shot benchmarks while handling high-resolution inputs natively.
When to use:
- High-accuracy image-text similarity search
- Product image retrieval for e-commerce
- Visual search where fine-grained similarity matters
Input: Image file + optional fine-tuned checkpoint Output: 1152-dimensional embedding vector
Inference Settings
No inference-time settings. SigLIP encodes images deterministically.