SigLIP Cross-Encoder
Cross-modal reranker based on SigLIP for image-text relevance scoring
SigLIP Cross-Encoder scores similarity between images and text using SigLIP's superior cross-modal alignment. Use as a second-stage reranker after embedding-based retrieval.
When to use:
- Second-stage reranking in image-text retrieval pipelines
- Product recommendation: find images most relevant to a text query
Input:
- Query Text (optional): Text query for image matching
- Query Image (optional): Image to match against candidates
- Candidate Images (required): List of images to rank
Output:
- Scores: Relevance score per candidate image
- Ranking: Candidate indices sorted by similarity
Inference Settings
No inference-time settings. Scoring is deterministic.