Dokumentation (english)

Image Classification

Train models to categorize images into predefined classes

Image classification is the task of assigning a label or category to an entire image. This is one of the most fundamental computer vision tasks, with applications ranging from medical diagnosis to product recognition and content moderation.

Learn About Image Classification

New to image classification? Visit our Image Classification Concepts Guide to learn about how these models work, common architectures, and best practices for data preparation.

Available Models

Vision Transformer (ViT) Models

Vision Transformers apply the transformer architecture to image classification by splitting images into patches and processing them as sequences.

  • ViT Base - Balanced model with 86M parameters, good for most use cases
  • ViT Large - Larger model with 304M parameters, higher accuracy but slower
  • ViT Small MSN - Smaller variant with masked self-supervised learning, efficient and accurate

ResNet Models

Residual Networks use skip connections to enable training of very deep networks, providing excellent accuracy-to-efficiency ratios.

  • ResNet-18 - Lightweight 18-layer model, fastest training and inference
  • ResNet-50 - 50-layer model, excellent balance of speed and accuracy
  • ResNet-101 - 101-layer model, highest accuracy in ResNet family

Efficient Models

Models optimized for speed, size, or mobile deployment while maintaining competitive accuracy.

  • EfficientNet-B0 - Compound scaling for optimal efficiency, great accuracy with fewer parameters
  • MobileNetV3-Small - Optimized for mobile and edge devices, minimal latency

Common Configuration

Training Images Folder Structure

All image classification models expect training images organized in class subfolders:

train_images/
├── class1/
│   ├── image1.jpg
│   ├── image2.jpg
│   └── ...
├── class2/
│   ├── image1.jpg
│   ├── image2.jpg
│   └── ...
└── class3/
    ├── image1.jpg
    └── ...

Key Training Parameters

Batch Size: Number of images processed together

  • Larger batches: Faster training, more GPU memory
  • Smaller batches: Less memory, potentially better generalization
  • Typical values: 4-32 depending on model size and GPU

Epochs: Number of complete passes through the training data

  • Too few: Underfitting, poor accuracy
  • Too many: Overfitting, poor generalization
  • Start with 1-10 epochs, adjust based on validation metrics

Learning Rate: Step size for model parameter updates

  • Too high: Training instability, divergence
  • Too low: Slow convergence, local minima
  • Typical range: 1e-5 to 5e-4 for fine-tuning

Eval Steps: Frequency of validation evaluations

  • Set to 1 to evaluate after each epoch
  • Higher values for large datasets to reduce overhead

Fine-tuning vs Training from Scratch

Fine-tuning (Recommended)

  • Uses pre-trained weights from ImageNet or similar datasets
  • Requires less data (hundreds to thousands of images)
  • Faster convergence (1-10 epochs typically sufficient)
  • Better for most practical applications

Training from Scratch

  • Starts with random initialization
  • Requires large datasets (tens of thousands of images)
  • Takes many more epochs to converge
  • Only recommended when you have abundant data

Understanding Metrics

Accuracy: Percentage of correct predictions

  • Primary metric for balanced datasets
  • Can be misleading for imbalanced classes

Loss: Measures how wrong the predictions are

  • Should decrease over training
  • Sudden increases indicate learning rate issues

Confusion Matrix: Shows per-class performance

  • Identifies which classes are confused with each other
  • Helps diagnose dataset quality issues

Choosing the Right Model

By Priority

Maximum Accuracy

  1. ViT Large (best overall, but slowest)
  2. ResNet-101 (excellent CNN alternative)
  3. EfficientNet-B0 (best parameter efficiency)

Fastest Training

  1. ResNet-18 (quickest to fine-tune)
  2. MobileNetV3-Small (fast and lightweight)
  3. ViT Small MSN (efficient transformer)

Smallest Model Size

  1. MobileNetV3-Small (~5MB)
  2. ResNet-18 (~45MB)
  3. EfficientNet-B0 (~20MB)

Best for Mobile/Edge

  1. MobileNetV3-Small (designed for mobile)
  2. EfficientNet-B0 (excellent efficiency)
  3. ResNet-18 (lightweight and fast)

By Use Case

Medical Imaging

  • ViT Large or ResNet-101 for maximum accuracy
  • Use higher resolution images if possible
  • Ensure balanced training data across classes

Product Recognition

  • EfficientNet-B0 for good accuracy with reasonable speed
  • ResNet-50 for production deployments
  • Focus on data augmentation for variety

Real-time Applications

  • MobileNetV3-Small for edge devices
  • ResNet-18 for server-side real-time
  • Consider quantization for further speedup

General Purpose

  • ResNet-50 for most use cases
  • ViT Base when you have sufficient data
  • EfficientNet-B0 for cloud deployments

Best Practices

Data Preparation

  1. Balance your dataset: Ensure similar numbers of images per class
  2. Image quality: Use consistent image sizes and quality
  3. Data augmentation: Helps prevent overfitting (rotation, flipping, color jitter)
  4. Validation split: Hold out 10-20% of data for validation

Training Strategy

  1. Start with low learning rate: 1e-5 to 5e-5 for fine-tuning
  2. Monitor training loss: Should decrease steadily
  3. Check for overfitting: Validation accuracy should improve with training accuracy
  4. Use early stopping: Stop if validation accuracy plateaus or decreases

GPU Considerations

  • ResNet models: Can train on CPU for small datasets, GPU recommended
  • ViT models: GPU strongly recommended due to transformer architecture
  • Batch size: Reduce if you encounter out-of-memory errors
  • Mixed precision: Enable for faster training on modern GPUs

Dataset Size Guidelines

Small Dataset (<1,000 images)

  • Use ResNet-18 or MobileNetV3-Small
  • Lower learning rate (1e-5)
  • More epochs (10-20)
  • Heavy data augmentation

Medium Dataset (1,000-10,000 images)

  • ResNet-50 or EfficientNet-B0 recommended
  • Standard learning rate (5e-5)
  • Moderate epochs (5-10)
  • Standard augmentation

Large Dataset (>10,000 images)

  • Any model works well
  • ViT models particularly effective
  • Can use higher learning rates
  • Less aggressive augmentation needed

Common Pitfalls

Out of Memory Errors

Solution: Reduce batch size, use a smaller model, or enable gradient accumulation

Model Not Learning (Loss Not Decreasing)

Solution: Increase learning rate, check data preprocessing, ensure labels are correct

Overfitting (Training Accuracy High, Validation Low)

Solution: Add data augmentation, reduce model size, increase dataset, add regularization

Poor Accuracy on Certain Classes

Solution: Add more training examples for those classes, check for label errors, adjust class weights

Training Too Slow

Solution: Use a smaller model, increase batch size, use GPU, reduce image resolution

Predictions All the Same Class

Solution: Check class balance, reduce learning rate, verify data loading is working correctly


Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items