ViT Base

Vision Transformer Base (ViT-Base-Patch16-224) splits images into 16×16 pixel patches and applies self-attention to classify them. Achieves strong accuracy and benefits significantly from fine-tuning on domain-specific images.

When to use:

Custom image category classification after fine-tuning
Medical imaging, product categorization, quality inspection

Input: Image file (PNG, JPG) + optional fine-tuned checkpoint Output: Predicted class label and confidence scores

Inference Settings

No dedicated inference-time settings. The model classifies images deterministically using the loaded checkpoint. Output class labels are determined by the categories in the fine-tuned model.

ViT Base

Inference Settings

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

ViT Base

Inference Settings

On this page

Command Palette