Multimodal Reranking
Score image-text or image-image similarity for retrieval and recommendation
Multimodal reranking models score relevance between images and text (or images and images). Use as a second-stage ranker after embedding-based retrieval to improve final result quality.
Available Models
- SigLIP Cross-Encoder – Image-text cross-encoder based on SigLIP for visual-text relevance scoring
- CLIP Cross-Encoder – CLIP-based cross-encoder for image-text and image-image similarity scoring