Dokumentation (english)

ResNet-101

Deep 101-layer Residual Network for maximum CNN accuracy

ResNet-101 is the deepest standard variant of the Residual Network architecture, featuring 101 layers with bottleneck blocks. With 44.5 million parameters, it represents the peak of CNN-based image classification accuracy before diminishing returns set in. Pre-trained on ImageNet-1k, ResNet-101 delivers the highest accuracy among ResNet models while remaining more efficient than transformer-based alternatives.

When to Use ResNet-101

ResNet-101 is optimal for:

  • Maximum CNN accuracy when transformers are not suitable
  • Large datasets (5,000+ images) that can leverage the additional capacity
  • Complex or fine-grained classification requiring deep feature hierarchies
  • Production systems where CNN inference speed advantage matters
  • When transformers overfit but you need more capacity than ResNet-50

Choose ResNet-101 when you need the best possible CNN-based accuracy and have sufficient data to train the deeper network.

Strengths

  • Highest ResNet accuracy: Best performance in the ResNet family
  • Deep feature hierarchies: 101 layers capture complex visual patterns
  • Strong transfer learning: Rich pre-trained features generalize well
  • Faster than transformers: 2-3x faster inference than ViT Base
  • Mature architecture: Well-understood with extensive documentation
  • CNN advantages: Translation equivariance and locality beneficial for many tasks

Weaknesses

  • Slower than lighter models: 2x training time of ResNet-50, 4x of ResNet-18
  • Higher memory requirements: Needs 10-12GB GPU for comfortable training
  • Overfitting risk on small data: Too much capacity for datasets <5,000 images
  • Not state-of-the-art: ViT Large outperforms on very large datasets
  • Diminishing returns: Only marginally better than ResNet-50 in many cases

Architecture Overview

Deep Bottleneck Network

ResNet-101 extends ResNet-50 with more residual blocks:

Residual Stages: 4 stages with [3, 4, 23, 3] bottleneck blocks

  • Stage 1: 64 -> 256 filters (3 blocks)
  • Stage 2: 128 -> 512 filters (4 blocks)
  • Stage 3: 256 -> 1024 filters (23 blocks) <- Much deeper
  • Stage 4: 512 -> 2048 filters (3 blocks)

Specifications:

  • Layers: 101
  • Parameters: ~44.5M
  • Input: 224x224 RGB
  • FLOPs: ~7.8 billion

Parameters

Training Configuration

Training Images

  • Type: Folder
  • Description: Directory containing training images organized in class subfolders
  • Required: Yes
  • Minimum: 2,000 images (overfitting likely below this)
  • Optimal: 5,000+ images

Batch Size (Default: 4)

  • Range: 2-32
  • Recommendation:
    • 4-8 for 8-12GB GPU
    • 8-16 for 16GB GPU
    • 16-32 for 24GB+ GPU
  • Impact: Constrained by model size

Epochs (Default: 1)

  • Range: 1-20
  • Recommendation:
    • 1-3 epochs for large datasets (>20k images)
    • 3-8 epochs for medium datasets (5k-20k images)
    • 8-15 epochs for small datasets (2k-5k images)
  • Impact: Deeper model takes longer to converge

Learning Rate (Default: 5e-5)

  • Range: 1e-5 to 1e-4
  • Recommendation:
    • 5e-5 for standard fine-tuning
    • 1e-4 for large datasets
    • 2e-5 for datasets near minimum size
  • Impact: Deep network needs careful tuning

Eval Steps (Default: 1)

  • Description: Evaluation frequency
  • Recommendation: 1 for careful monitoring

Configuration Tips

Dataset Size Recommendations

Small Datasets (2,000-5,000 images)

  • Use cautiously - consider ResNet-50 instead
  • Configuration: learning_rate=2e-5, epochs=10-15, batch_size=8
  • Heavy augmentation essential
  • Monitor closely for overfitting

Medium Datasets (5,000-20,000 images)

  • Excellent choice - optimal range
  • Configuration: learning_rate=5e-5, epochs=5-8, batch_size=16
  • Standard augmentation
  • Expect 2-3% improvement over ResNet-50

Large Datasets (20,000-100,000 images)

  • Great choice - ResNet-101 excels here
  • Configuration: learning_rate=1e-4, epochs=3-5, batch_size=16-32
  • Light augmentation
  • Strong performance vs transformers with faster inference

Very Large Datasets (>100,000 images)

  • Good but consider ViT Large for maximum accuracy
  • ResNet-101 still valuable for faster inference
  • May be 1-2% behind transformers

Fine-tuning Best Practices

  1. Start Conservative: Use learning_rate=5e-5, epochs=5
  2. Monitor Memory: Deeper network uses more VRAM
  3. Patience: Takes longer to converge than ResNet-50
  4. Check Overfitting: Deep model sensitive to small datasets
  5. Batch Size: Use largest possible for stable training

Hardware Requirements

Minimum Configuration

  • GPU: 10GB VRAM (RTX 3080 or better)
  • RAM: 16GB system memory
  • Storage: 175MB model + dataset

Recommended Configuration

  • GPU: 12-16GB VRAM (RTX 3090/4090 or better)
  • RAM: 32GB system memory
  • Storage: SSD strongly recommended

CPU Training

  • Not recommended - extremely slow
  • Would take days for single epoch
  • GPU required for practical use

Common Issues and Solutions

Overfitting

Problem: Large gap between training and validation accuracy

Solutions:

  1. Reduce to ResNet-50 (common solution)
  2. Collect more training data
  3. Increase data augmentation intensity
  4. Reduce epochs significantly
  5. Lower learning rate to 1e-5

Out of Memory

Problem: CUDA out of memory errors

Solutions:

  1. Reduce batch_size (try 4 or 2)
  2. Lower image resolution if possible
  3. Enable gradient checkpointing
  4. Use mixed precision training
  5. Consider ResNet-50 if memory critical

Slow Convergence

Problem: Model takes many epochs to learn

Solutions:

  1. Increase learning rate to 1e-4
  2. Use larger batch size
  3. Check data loading pipeline
  4. Verify sufficient data for deep network
  5. Consider learning rate warmup

Marginal Improvement Over ResNet-50

Problem: ResNet-101 not much better than ResNet-50

Solutions:

  1. This is normal for some datasets
  2. Ensure dataset is large/complex enough
  3. Train longer (more epochs)
  4. Try higher learning rate
  5. Consider if ResNet-50 sufficient for your needs

Example Use Cases

Fine-Grained Bird Classification

Scenario: 200 bird species, 150 images per species

Configuration:

Model: ResNet-101
Batch Size: 12
Epochs: 10
Learning Rate: 5e-5
Images: 30,000 total (150 per species)

Why ResNet-101: Fine-grained differences, large dataset, need deep features, CNN locality helpful

Expected Results: 78-84% accuracy on challenging fine-grained task

Industrial Defect Detection

Scenario: 15 defect types with subtle differences

Configuration:

Model: ResNet-101
Batch Size: 8
Epochs: 12
Learning Rate: 3e-5
Images: 10,000 defect images (650+ per type)

Why ResNet-101: Subtle visual differences, sufficient data, production deployment needs reliability

Expected Results: 88-93% accuracy with quality labeled data

Medical Imaging Multi-class

Scenario: 10 disease categories from medical scans

Configuration:

Model: ResNet-101
Batch Size: 8
Epochs: 8
Learning Rate: 5e-5
Images: 15,000 scans (1,500 per disease)

Why ResNet-101: Critical accuracy, complex medical patterns, substantial dataset, deep hierarchical features

Expected Results: 89-94% accuracy

Comparison with Alternatives

ResNet-101 vs ResNet-50

Choose ResNet-101 when:

  • Dataset >5,000 images
  • Maximum CNN accuracy needed
  • Complex or fine-grained classification
  • Have 10GB+ GPU
  • 2x training time acceptable

Choose ResNet-50 when:

  • Dataset <5,000 images
  • Training speed important
  • Good accuracy sufficient
  • Limited GPU memory
  • Standard classification task

ResNet-101 vs ViT Base

Choose ResNet-101 when:

  • Need faster inference (2-3x)
  • Dataset 2,000-10,000 images
  • CNN inductive bias beneficial
  • Lower memory requirements
  • Production latency constraints

Choose ViT Base when:

  • Dataset >10,000 images
  • Maximum accuracy priority
  • Global context important
  • Have 12GB+ GPU
  • Training time not critical

ResNet-101 vs ViT Large

Choose ResNet-101 when:

  • Faster inference critical
  • Dataset <50,000 images
  • GPU memory limited (<16GB)
  • CNN advantages desired
  • Cost-effective solution needed

Choose ViT Large when:

  • Dataset >50,000 images
  • Absolute maximum accuracy
  • Have 16GB+ GPU
  • Can afford slower inference
  • State-of-the-art performance required

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items