Dokumentation (english)

SAM (Segment Anything Model)

Foundation model for promptable instance segmentation with points, boxes, or masks

SAM (Segment Anything Model) is a foundation model that can segment any object in an image through various prompts: point clicks, bounding boxes, or rough masks. Unlike traditional models requiring full retraining for new classes, SAM's zero-shot capabilities enable interactive segmentation for any object without additional training, making it revolutionary for annotation tools and flexible segmentation tasks.

When to Use SAM

SAM is ideal for:

  • Interactive segmentation with user prompts
  • Zero-shot segmentation without training data for specific classes
  • Annotation tools for creating training data
  • Flexible segmentation where object classes aren't predefined
  • Research and prototyping requiring quick segmentation

Note: SAM is inference-only in this system - no training supported, only fine-tuned checkpoint inference.

Strengths

  • Promptable: Segment anything by pointing or boxing
  • Zero-shot: Works on novel objects without training
  • Interactive: Real-time feedback for user-guided segmentation
  • Versatile: Multiple prompt types (points, boxes, masks)
  • Foundation model: Pre-trained on 1 billion+ masks
  • Multi-mask output: Generates multiple plausible segmentations

Weaknesses

  • Inference only: Cannot be trained in this system
  • Semantic labels not provided (just masks)
  • Requires user interaction for each segmentation
  • Not optimized for fully automatic batch processing
  • Large model size (~400MB for ViT-H variant)

Parameters

Inference Configuration

Input Image: Image to segment Finetuned Checkpoint (Optional): Fine-tuned SAM weights Prompt Points (Optional): List of (x, y) coordinates with labels (foreground/background) Prompt Boxes (Optional): Bounding box coordinates (x1, y1, x2, y2)

Multimask Output (Default: true)

  • Generate multiple masks with different levels of granularity
  • Recommended to keep true for flexibility
  • Model automatically ranks masks by quality score

Mask Threshold (Default: 0.0)

  • Threshold for converting soft masks to binary
  • 0.0 uses model's default (adaptive)
  • Increase (e.g., 0.5) for tighter masks

Usage Patterns

Point Prompts

Click on objects to segment them. Use positive points (foreground) and negative points (background) to refine.

Example: Click center of object (positive), click background areas (negative) to exclude unwanted regions

Box Prompts

Draw bounding box around object for quick segmentation.

Example: Drag box around person - SAM segments precise boundaries

Combining Prompts

Use both points and boxes for maximum control.

Example: Box around object + negative points to exclude overlapping objects

Configuration Tips

Best Practices

  • Start with single positive point on object center
  • Add negative points to refine boundaries
  • Use boxes for quick rough segmentation
  • Combine prompts for complex scenarios
  • multimask_output=true to see alternatives

When to Use SAM

Interactive Annotation: Creating training data for other models - SAM accelerates manual annotation

Zero-shot Tasks: Need to segment objects without training data - SAM works immediately

Flexible Applications: Object classes change frequently - no retraining needed

Prototyping: Quick experimentation with segmentation - iterate without training

When NOT to Use SAM

Fully Automatic: Need batch processing without interaction - use trained segmentation models instead

Semantic Labels: Need class labels not just masks - SAM doesn't classify, only segments

Real-time Automatic: Need automatic detection + segmentation - use Mask R-CNN or DETR Segmentation

Output

Segmentation Masks: Numpy arrays of binary masks Mask Image: Visualization of masks overlaid on input Scores: Quality/confidence scores for each predicted mask (when multimask=true)

Example Use Cases

Creating Training Data

Scenario: Need to annotate 1,000 images for custom segmentation task

Why SAM: Dramatically faster than manual pixel-level annotation. Click object, review mask, accept/refine. Can create training set in hours instead of days.

Research Prototyping

Scenario: Testing segmentation idea on new object types

Why SAM: Zero-shot capability means immediate results without collecting and annotating training data.

Interactive Photo Editing

Scenario: Consumer app for selecting and editing objects in photos

Why SAM: Users click objects, get instant precise selections without technical knowledge.

Flexible Segmentation System

Scenario: Segmentation needs change based on user requirements

Why SAM: Can segment any object on-demand without model retraining for each new class.

Comparison with Alternatives

SAM vs Mask R-CNN

Choose SAM when:

  • Interactive/promptable segmentation needed
  • Zero-shot on novel objects
  • Creating annotation tools
  • Object classes undefined or changing

Choose Mask R-CNN when:

  • Fully automatic segmentation required
  • Fixed set of known classes
  • Batch processing thousands of images
  • Need semantic class labels
  • Training data available

SAM vs DETR Segmentation

Choose SAM when:

  • Promptable interaction needed
  • No training data available
  • Quick prototyping
  • Flexible, undefined object classes

Choose DETR Segmentation when:

  • Automatic panoptic segmentation
  • Specific trained classes
  • Batch inference
  • Unified detection + segmentation
  • Can train custom model

SAM vs SegFormer

Choose SAM when:

  • Instance segmentation (separate objects)
  • Interactive prompting
  • Zero-shot capability needed

Choose SegFormer when:

  • Semantic segmentation (pixel classes)
  • Fully automatic processing
  • Dense scene labeling
  • Can train on custom data

Technical Notes

  • Model Variants: SAM comes in ViT-B, ViT-L, ViT-H (Huge is default, best quality)
  • Inference Speed: 50-200ms per image depending on prompt complexity and GPU
  • Memory: ~2-4GB GPU memory for inference
  • Fine-tuning: Possible outside this system, load fine-tuned checkpoints for specialized domains

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items