Dokumentation (english)

Image Classification

Assigning category labels to images based on their visual content

Image classification is the task of assigning one or more labels to an entire image from a predefined set of categories. It answers the question "What is in this image?" and is one of the most fundamental tasks in computer vision.

📚 Training Image Classification Models

Looking to train image classification models? Check out our comprehensive Image Classification Training Guide with detailed parameter documentation for all available models.

What is Image Classification?

Image classification takes an image as input and outputs one or more class labels with associated confidence scores. For example:

  • A medical imaging system classifying X-rays as "normal" or "abnormal"
  • A photo app categorizing images into "landscape," "portrait," "food," etc.
  • Quality control systems identifying defective products on assembly lines

The task can be:

  • Single-label: Each image belongs to exactly one category (e.g., dog breed classification)
  • Multi-label: Each image can belong to multiple categories (e.g., tagging images with "outdoor," "daytime," "people")

Key Concepts

Classes and Labels

Classes are the predefined categories your model learns to recognize. The number and choice of classes depends on your specific application:

  • Binary classification: 2 classes (e.g., cat vs. dog)
  • Multiclass classification: 3+ mutually exclusive classes (e.g., 1000 ImageNet categories)
  • Multilabel classification: Multiple non-exclusive labels per image

Confidence Scores

Models output probability distributions over classes, indicating confidence in each prediction:

  • Values range from 0 to 1
  • Sum to 1.0 for single-label tasks
  • Enable threshold-based decision making
  • Useful for uncertainty estimation

Features and Representations

Deep learning models learn hierarchical features:

  • Low-level features: Edges, colors, textures (early layers)
  • Mid-level features: Shapes, parts, patterns (middle layers)
  • High-level features: Object parts, semantic concepts (deep layers)

Approaches and Architectures

Convolutional Neural Networks (CNNs)

CNNs are the foundation of modern image classification, using convolutional layers to learn spatial hierarchies of features:

Classic architectures:

  • AlexNet (2012): First successful deep CNN, 8 layers
  • VGG (2014): Deeper networks (16-19 layers) with small 3×3 filters
  • ResNet (2015): Residual connections enabling 50-200+ layer networks
  • Inception/GoogLeNet: Multi-scale feature extraction with parallel pathways

Modern efficient architectures:

  • EfficientNet: Compound scaling of depth, width, and resolution
  • MobileNet: Lightweight models for mobile and edge devices
  • SqueezeNet: Aggressive compression while maintaining accuracy

Vision Transformers (ViT)

Transformers adapted for vision by treating images as sequences of patches:

  • Split images into fixed-size patches (e.g., 16×16 pixels)
  • Apply self-attention to model relationships between patches
  • Often require more data than CNNs but can achieve superior performance
  • Variants: DeiT, Swin Transformer, BEiT

Transfer Learning vs. Training from Scratch

Transfer Learning (recommended for most cases):

  • Start with weights pretrained on large datasets (ImageNet, JFT-300M)
  • Fine-tune on your specific task with less data
  • Faster training and better performance with limited data
  • Lower computational requirements

Training from Scratch:

  • Initialize weights randomly
  • Requires large datasets (typically 100K+ images)
  • More computational resources needed
  • Useful when target domain differs significantly from pretraining data

Evaluation Metrics

Accuracy

The most straightforward metric — the fraction of correct predictions:

Accuracy=Correct PredictionsTotal Predictions\text{Accuracy} = \frac{\text{Correct Predictions}}{\text{Total Predictions}}

While intuitive, accuracy can be misleading with imbalanced datasets. A model that always predicts the majority class might achieve high accuracy but be useless.

Precision, Recall, and F1-Score

These metrics provide deeper insights, especially for imbalanced datasets:

Precision: Of all images predicted as class C, what fraction truly belongs to class C?

Precision=TPTP+FP\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}

Recall: Of all images that truly belong to class C, what fraction did we identify?

Recall=TPTP+FN\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}

F1-Score: Harmonic mean balancing precision and recall:

F1=2PrecisionRecallPrecision+Recall\text{F1} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}

Top-K Accuracy

For tasks with many classes, top-K accuracy considers a prediction correct if the true label is among the K highest-confidence predictions:

Top-K Accuracy=Samples with correct label in top K predictionsTotal samples\text{Top-K Accuracy} = \frac{\text{Samples with correct label in top K predictions}}{\text{Total samples}}

Top-5 accuracy is commonly reported for ImageNet (1000 classes).

Confusion Matrix

A matrix showing actual vs. predicted classes for all samples:

  • Diagonal elements show correct predictions
  • Off-diagonal elements reveal common misclassifications
  • Helps identify which classes are confused with each other
  • Essential for understanding model behavior

Area Under ROC Curve (AUC-ROC)

For binary and probabilistic classification:

  • Plots True Positive Rate vs. False Positive Rate at various thresholds
  • AUC = 1.0: Perfect classifier
  • AUC = 0.5: Random classifier
  • Threshold-independent performance measure

Data Requirements and Preparation

Dataset Size

Required data varies significantly:

  • Transfer learning: 100s to 1000s of images per class (minimum)
  • Training from scratch: 10,000s to millions of images
  • Fine-tuning: Even 10-50 images per class can work with aggressive data augmentation

Data Quality

Quality matters more than quantity:

  • Clear labels: Accurate, consistent annotations
  • Diverse examples: Various angles, lighting, backgrounds
  • Representative distribution: Training data should match deployment conditions
  • Balanced classes: Roughly equal samples per class (or use weighted losses)

Data Augmentation

Artificially expand datasets by applying transformations:

  • Geometric: Rotation, flipping, cropping, scaling
  • Color: Brightness, contrast, saturation adjustments
  • Advanced: Cutout, mixup, CutMix, AutoAugment
  • Helps prevent overfitting and improves generalization

Train/Validation/Test Split

Standard practice:

  • Training set (70-80%): Model learns from this data
  • Validation set (10-15%): Tune hyperparameters and monitor overfitting
  • Test set (10-15%): Final evaluation on unseen data

Common Challenges

Class Imbalance

When some classes have far more examples than others:

  • Solutions: Class weighting, oversampling minority classes, focal loss
  • Metrics: Use precision, recall, F1 instead of raw accuracy
  • Evaluation: Report per-class metrics, not just overall accuracy

Overfitting

Model memorizes training data rather than learning generalizable patterns:

  • Symptoms: High training accuracy, poor validation accuracy
  • Solutions: More data, data augmentation, regularization (dropout, weight decay), early stopping
  • Prevention: Monitor validation metrics during training

Dataset Size Limitations

Insufficient training data leads to poor generalization:

  • Solutions: Transfer learning, data augmentation, synthetic data generation
  • Alternative: Few-shot learning approaches
  • Consider: Whether your task really requires custom training

Domain Shift

Performance drops when deployment conditions differ from training:

  • Example: Model trained on professional photos fails on smartphone images
  • Solutions: Include diverse training data, domain adaptation techniques
  • Testing: Evaluate on data representative of real-world conditions

Computational Constraints

Training large models requires significant resources:

  • Solutions: Use smaller architectures (MobileNet, EfficientNet), knowledge distillation
  • Cloud options: Cloud GPU services for training
  • Edge deployment: Model quantization and pruning for inference

Fine-Grained Classification

Distinguishing between very similar classes (e.g., dog breeds, bird species):

  • Challenges: Subtle visual differences, high inter-class similarity
  • Solutions: Higher resolution images, attention mechanisms, part-based models
  • Data: Requires expert annotations and more examples

Practical Applications

Medical Imaging

  • Disease detection from X-rays, CT scans, MRIs
  • Skin lesion classification for melanoma detection
  • Diabetic retinopathy screening
  • Cell classification in pathology

Autonomous Vehicles

  • Traffic sign recognition
  • Road scene classification
  • Weather condition detection
  • Lane type identification

E-commerce and Retail

  • Product categorization
  • Visual search
  • Quality control and defect detection
  • Inventory management

Agriculture

  • Crop disease identification
  • Plant species recognition
  • Pest detection
  • Ripeness assessment

Content Moderation

  • NSFW content detection
  • Spam image identification
  • Trademark violation detection

Wildlife Conservation

  • Animal species identification from camera traps
  • Endangered species monitoring
  • Biodiversity assessment

Choosing an Approach

Consider these factors:

For limited data (< 1000 images per class):

  • Use transfer learning with a pretrained model
  • Apply aggressive data augmentation
  • Consider few-shot learning methods

For real-time inference:

  • Choose efficient architectures (MobileNet, EfficientNet-B0)
  • Consider model quantization
  • Profile inference speed on target hardware

For highest accuracy:

  • Use state-of-the-art architectures (EfficientNet, Vision Transformers)
  • Ensemble multiple models
  • Accept longer training and inference times

For interpretability:

  • Simpler models may be more interpretable
  • Use attention visualization techniques
  • Consider gradient-based explanation methods (Grad-CAM)

Next Steps

Ready to train your own image classification models? Our Image Classification Training Guide provides comprehensive documentation on:

  • Available architectures and their trade-offs
  • Hyperparameter tuning
  • Training strategies and best practices
  • Model evaluation and deployment

For understanding the broader context of computer vision tasks, see our Computer Vision overview.


Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items