Dokumentation (english)

Object Detection

Train models to locate and classify multiple objects within images

Object detection combines classification and localization to identify where objects are in images and what they are. Unlike image classification which labels entire images, object detection outputs bounding boxes and class labels for each detected object. This task is fundamental to applications like autonomous driving, surveillance, robotics, and visual inspection.

Learn About Object Detection

New to object detection? Visit our Object Detection Concepts Guide to learn about bounding boxes, IoU metrics, anchor-free vs anchor-based methods, and annotation formats like COCO.

Available Models

DETR (Detection Transformer) Family

DETR revolutionized object detection by eliminating hand-crafted components like anchor boxes and non-maximum suppression through a transformer-based approach.

Advanced DETR Variants

Improvements on the DETR architecture addressing convergence speed and accuracy.

  • Deformable DETR - Deformable attention for faster convergence and better small object detection
  • Conditional DETR - Conditional spatial queries for faster training

YOLO Family

You Only Look Once (YOLO) models prioritize real-time detection speed while maintaining competitive accuracy.

  • YOLOv8-Nano - Ultra-fast and lightweight for edge devices and real-time applications

Common Configuration

Data Requirements

Training Images: Directory containing your object images

Annotations: JSON file in COCO format containing:

  • Image information (filename, dimensions)
  • Bounding boxes (x, y, width, height)
  • Object categories/classes
  • Instance IDs

COCO Annotation Format Example:

[
  "images": [
    ["id": 1, "file_name": "image1.jpg", "height": 480, "width": 640]
  ],
  "annotations": [
    ["id": 1, "image_id": 1, "category_id": 1,
     "bbox": [100, 150, 200, 180], "area": 36000]
  ],
  "categories": [
    ["id": 1, "name": "car"],
    ["id": 2, "name": "person"]
  ]
]

Key Training Parameters

Batch Size: Number of images processed together

  • DETR models: 2-8 (transformer overhead)
  • YOLO models: 8-32 (more efficient architecture)
  • Reduce if out-of-memory errors occur

Epochs: Complete passes through training data

  • 1-5 epochs typical for fine-tuning
  • More epochs for training from scratch or small datasets
  • Object detection generally needs fewer epochs than classification

Learning Rate: Optimizer step size

  • 5e-5 typical for DETR models
  • Higher rates possible for YOLO (1e-3 to 1e-4)
  • Lower rates for small datasets or when fine-tuning

Eval Steps: Evaluation frequency during training

Understanding Metrics

mAP (mean Average Precision): Primary metric for object detection

  • mAP@0.5: Average Precision at IoU threshold 0.5 (lenient)
  • mAP@0.5:0.95: Average over IoU thresholds 0.5 to 0.95 (strict, COCO standard)
  • Higher is better, ranges from 0 to 1 (or 0% to 100%)

IoU (Intersection over Union): Overlap between predicted and ground truth boxes

  • IoU > 0.5: Generally considered a correct detection
  • IoU > 0.75: High-quality detection

Precision: Fraction of detections that are correct

  • High precision: Few false positives

Recall: Fraction of ground truth objects that are detected

  • High recall: Few missed objects

Loss Components:

  • Classification loss: How well classes are predicted
  • Bounding box regression loss: How accurately boxes are localized
  • Should both decrease during training

Choosing the Right Model

By Priority

Maximum Accuracy

  1. DETR ResNet-101 DC5 (best overall)
  2. Deformable DETR (great for small objects)
  3. DETR ResNet-101

Fastest Training

  1. YOLOv8-Nano (quickest to converge)
  2. Conditional DETR (improved DETR convergence)
  3. DETR ResNet-50

Fastest Inference

  1. YOLOv8-Nano (real-time capable)
  2. DETR ResNet-50
  3. Conditional DETR

Best for Small Objects

  1. Deformable DETR (designed for this)
  2. DETR ResNet-50/101 DC5 (dilated convolutions help)
  3. YOLOv8-Nano (with appropriate input size)

Edge Deployment

  1. YOLOv8-Nano (only practical option)
  2. Consider quantization for other models

By Use Case

Autonomous Vehicles

  • Deformable DETR or YOLOv8-Nano
  • Need real-time performance and small object detection
  • Large, well-annotated datasets available

Security/Surveillance

  • DETR ResNet-101 DC5 for maximum accuracy
  • YOLOv8-Nano if real-time processing required
  • Often dealing with small, distant objects

Manufacturing Quality Control

  • DETR ResNet-50 for balanced performance
  • Controlled environment, good lighting
  • Precision important, speed often secondary

Retail Analytics

  • YOLOv8-Nano for real-time people counting
  • Deformable DETR for product detection
  • Need balance of speed and accuracy

Wildlife Monitoring

  • DETR ResNet-101 or Deformable DETR
  • Animals often small in frame
  • Accuracy more important than speed

Best Practices

Data Preparation

  1. Annotation Quality: Accurate bounding boxes are critical

    • Tight boxes around objects (no excessive padding)
    • Consistent annotation guidelines
    • Include partially visible objects if relevant
  2. Dataset Balance:

    • Aim for balanced instances across classes
    • At least 100 instances per class minimum
    • More instances for difficult classes
  3. Image Diversity:

    • Various lighting conditions
    • Different angles and scales
    • Diverse backgrounds
    • Include edge cases
  4. Validation Split:

    • 10-20% of data for validation
    • Ensure validation set represents real-world distribution

Training Strategy

  1. Start with Default Config: Use default learning rates and batch sizes initially

  2. Monitor Training:

    • Loss should decrease steadily
    • Both classification and localization losses important
    • Check mAP on validation set
  3. Adjust Learning Rate:

    • Reduce if loss oscillates or increases
    • Increase if convergence very slow
    • Consider learning rate scheduling
  4. Augmentation:

    • Less aggressive than classification (preserve spatial information)
    • Common: horizontal flip, brightness/contrast adjustment
    • Avoid: heavy rotation or cropping that cuts objects

Common Pitfalls

Small Objects Not Detected

  • Use Deformable DETR or models with DC5
  • Increase input resolution if possible
  • Ensure small objects well-annotated in training data

Many False Positives

  • Lower confidence threshold at inference
  • Train longer for better classification
  • Check if similar-looking objects confuse model

Poor Localization (Low IoU)

  • Focus on bounding box loss during training
  • Verify annotation quality and consistency
  • May need more training data

Slow Convergence

  • DETR models converge slower than YOLO
  • Consider Conditional DETR or Deformable DETR
  • Increase learning rate cautiously

Class Imbalance Issues

  • Ensure adequate examples of rare classes
  • Consider weighted sampling or loss reweighting
  • May need to collect more data for rare classes

GPU Requirements

Memory Guidelines

DETR Models:

  • 8GB minimum (batch_size=2)
  • 12-16GB recommended (batch_size=4-8)
  • Transformers memory-intensive

YOLO Models:

  • 4-8GB sufficient
  • More efficient architecture
  • Can use larger batch sizes

Training Time Estimates

Small Dataset (1,000 images):

  • DETR models: 30-60 minutes per epoch
  • YOLO models: 10-20 minutes per epoch

Medium Dataset (5,000 images):

  • DETR models: 2-5 hours per epoch
  • YOLO models: 30-90 minutes per epoch

Large Dataset (20,000+ images):

  • DETR models: 8+ hours per epoch
  • YOLO models: 2-4 hours per epoch

Times assume modern GPU (RTX 3080/4080 or better)

Dataset Size Guidelines

Minimum: 500 annotated images with 50+ instances per class Good: 2,000-5,000 images with 200+ instances per class Excellent: 10,000+ images with 1,000+ instances per class

Object detection typically requires more data than classification due to the additional complexity of localization.


Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items