Image Embeddings
Convert images to dense vector representations for search and similarity
Image embedding models map images into a fixed-size vector space where visually similar images produce close vectors. Use for image similarity search, clustering, anomaly detection, and zero-shot classification.
Available Models
- CLIP ViT-L/14 – Joint vision-language embeddings, enables text-to-image search (768 dimensions)
- SigLIP SO400M – State-of-the-art image embeddings with better cross-modal alignment than CLIP (1152 dimensions)