Dokumentation (english)

Prototypical Network

Few-shot image classification through learned prototypical representations in embedding space

Prototypical Networks revolutionized few-shot learning by introducing an elegant approach based on metric learning. The core idea is simple yet powerful: learn an embedding space where each class is represented by a prototype (the mean of its support examples), and classify query images based on their distance to these prototypes. This architecture enables the model to recognize entirely novel classes from just a handful of examples, making it ideal for scenarios where traditional supervised learning is impractical due to data scarcity or rapidly evolving taxonomies.

When to Use Prototypical Network

Prototypical Network is ideal for:

  • Few-shot classification requiring recognition of new classes from 1-10 examples
  • Rare or emerging categories where collecting large datasets is impractical
  • Long-tail distributions with many classes having limited examples
  • Dynamic taxonomies where new classes appear frequently
  • Personalized classification where each user defines custom categories
  • Domain adaptation requiring quick adjustment to new visual domains
  • Research and experimentation with few-shot learning techniques
  • Cost-sensitive applications where data labeling is expensive

Prototypical Networks excel when you need flexibility to add new classes without retraining, minimal data requirements, and reasonable accuracy on novel categories.

Strengths

  • Minimal data requirements: Works with 1-10 examples per novel class
  • No retraining needed: Add new classes at inference time
  • Interpretable: Distance-based classification easy to understand
  • Efficient training: Episodic meta-training converges relatively quickly
  • Flexible: Handles variable number of classes and shots
  • Robust: Stable performance across different few-shot scenarios
  • Embedding reusability: Learned embeddings transfer to new domains
  • Scalable: Can handle many novel classes without architecture changes
  • Fast inference: Simple distance computation for classification

Weaknesses

  • Lower accuracy: Not as accurate as fully supervised methods with abundant data
  • Requires diverse base classes: Meta-training needs variety for generalization
  • Support set quality critical: Poor support examples significantly hurt performance
  • Struggles with fine-grained distinctions: Very similar classes challenging
  • Domain shift sensitive: Novel classes very different from base classes problematic
  • Episode design important: Training configuration affects final performance
  • Limited to metric learning: May miss complex decision boundaries
  • Prototype assumption: Assumes class compactly represented by centroid

Architecture Overview

Metric Learning for Few-Shot Classification

Prototypical Networks learn through episodic meta-training:

  1. Embedding Network: CNN backbone maps images to embedding space

    • Typically ResNet, VGG, or similar architecture
    • Learns to cluster similar images, separate dissimilar ones
    • Fixed-dimensional output (e.g., 512-D or 1024-D)
  2. Episode Construction: Meta-training through N-way K-shot episodes

    • Sample N classes (5-way typical)
    • K support examples per class (1-shot, 5-shot, etc.)
    • Q query examples to classify
    • Thousands of episodes during training
  3. Prototype Computation: Per-class representative in embedding space

    • Prototype = mean of support example embeddings
    • One prototype per class in current episode
    • Simple averaging provides class centroid
  4. Distance-Based Classification: Classify by nearest prototype

    • Euclidean distance typically used
    • Query embedding compared to all prototypes
    • Softmax over negative distances gives probabilities
  5. Loss and Optimization: Learn embeddings that minimize classification error

    • Prototypical loss: Cross-entropy on distance-based predictions
    • Backpropagate through entire network
    • Optimize for good few-shot generalization

Key Innovation: By meta-learning to classify in episodes, the model learns to extract features that generalize to novel classes, not just memorize base classes.

Mathematical Formulation:

  • Embedding: f(x) maps image x to d-dimensional space
  • Prototype for class c: p_c = mean(f(x) for x in support set of class c)
  • Distance: d(f(query), p_c) = ||f(query) - p_c||₂
  • Probability: P(y=c|query) ∝ exp(-d(f(query), p_c))

Parameters

Training Configuration

Training Images

  • Type: Folder
  • Description: Directory containing base class images organized in class subfolders
  • Required: Yes
  • Minimum: 15-20 classes with 20+ images each for meta-training
  • Format: Standard image formats (PNG, JPG, JPEG)
  • Organization: Each subfolder is a class, containing all images for that class

Epochs (Default: 1)

  • Range: 1-10
  • Description: Number of meta-training epochs
  • Recommendation:
    • 1-3 epochs for fine-tuning pre-trained embeddings
    • 3-5 epochs for standard meta-training
    • 5-10 epochs for training from scratch or difficult domains
  • Note: Each epoch contains many episodes (e.g., 1,000-10,000)
  • Impact: More epochs improve generalization but risk overfitting to base classes

Learning Rate (Default: 0.001)

  • Range: 0.0001-0.01
  • Description: Optimizer step size for embedding learning
  • Recommendation:
    • 0.001 standard starting point
    • 0.0001 for fine-tuning existing embeddings
    • 0.01 for training from scratch (use caution)
  • Impact: Metric learning sensitive to learning rate; too high causes instability

Eval Steps (Default: 1)

  • Range: 1-10
  • Description: Evaluation frequency during training (in epochs)
  • Recommendation: 1 for epoch-level evaluation on validation set
  • Impact: More frequent evaluation provides better monitoring but adds overhead

Inference Configuration

Finetuned Checkpoint (Optional)

  • Type: Artifact
  • Description: Path to fine-tuned model checkpoint
  • Required: No (uses base pre-trained model if not provided)
  • Use Case: Load your trained embedding network for inference
  • Format: PyTorch model weights (.pth file)

Input Image (Required)

  • Type: Image file
  • Description: Query image to classify into novel classes
  • Required: Yes
  • Format: PNG, JPG, JPEG
  • Preprocessing: Automatically resized and normalized

Support Set (Provided at Inference)

  • Description: Few examples per novel class (not a config parameter but critical)
  • Structure: Images organized by novel class folders
  • Shots: 1-10 examples per class
  • Classes: Any number of novel classes (N-way)

Model-Specific Parameters

Episode Configuration (Training-time, not exposed as user config)

  • N-way: Number of classes per episode (typically 5)
  • K-shot: Support examples per class (typically 1-5)
  • Query: Query examples per episode (typically 5-15)
  • Note: These are usually fixed in the implementation

Embedding Dimension (Architecture-dependent)

  • Typically 512 or 1024 dimensions
  • Larger dimensions more expressive but higher memory
  • Determined by embedding network architecture

Distance Metric

  • Euclidean distance standard for Prototypical Networks
  • Some variants use cosine similarity
  • Euclidean distance in normalized embeddings = cosine similarity

Configuration Tips

By Use Case

Product Recognition in E-Commerce

  • Configuration: epochs=3-5, learning_rate=0.001
  • Base Classes: 50+ diverse product categories
  • Novel Classes: New products as they're added to catalog
  • Shots: 5-10 product images per new category
  • Expected Performance: 75-85% accuracy (5-way 5-shot)

Medical Image Classification

  • Configuration: epochs=5-8, learning_rate=0.0001 (careful tuning)
  • Base Classes: 20-30 common conditions with rich data
  • Novel Classes: Rare diseases or emerging conditions
  • Shots: 3-5 examples (data collection expensive)
  • Expected Performance: 60-75% accuracy (depends on domain)

Wildlife Species Recognition

  • Configuration: epochs=3-5, learning_rate=0.001
  • Base Classes: 100+ common species
  • Novel Classes: Rare or newly discovered species
  • Shots: 5-10 field photos per rare species
  • Expected Performance: 70-80% accuracy (5-way 5-shot)

Personal Photo Organization

  • Configuration: epochs=3, learning_rate=0.001
  • Base Classes: 30-50 general visual categories
  • Novel Classes: User-defined categories (places, people, events)
  • Shots: 3-5 examples per user category
  • Expected Performance: 70-80% accuracy (user-specific)

Fashion Trend Classification

  • Configuration: epochs=3-5, learning_rate=0.001
  • Base Classes: 50+ established fashion categories
  • Novel Classes: Emerging trends and styles
  • Shots: 5-10 examples from fashion shows/social media
  • Expected Performance: 75-85% accuracy (5-way 5-shot)

Scientific Research (Novel Materials, Phenomena)

  • Configuration: epochs=5-10, learning_rate=0.0001
  • Base Classes: Well-characterized examples
  • Novel Classes: New discoveries or experimental conditions
  • Shots: 1-5 examples (experiments expensive)
  • Expected Performance: 55-70% accuracy (highly specialized)

Dataset Size Recommendations

Small Base Dataset (15-20 classes, 20-50 images/class)

  • Viability: Minimal viable meta-training
  • Configuration: epochs=5-8, learning_rate=0.001
  • Tips: Use pre-trained embeddings if possible, careful validation
  • Expected Results: 55-65% accuracy on novel classes (5-way 5-shot)
  • Limitations: Poor generalization if novel classes very different

Medium Base Dataset (30-50 classes, 50-100 images/class)

  • Viability: Good meta-training foundation
  • Configuration: epochs=3-5, learning_rate=0.001
  • Tips: Ensure base class diversity, monitor validation performance
  • Expected Results: 65-75% accuracy on novel classes (5-way 5-shot)
  • Sweet spot: Balanced effort and performance

Large Base Dataset (50-100+ classes, 100+ images/class)

  • Viability: Excellent meta-training, strong generalization
  • Configuration: epochs=3-5, learning_rate=0.001
  • Tips: Can handle challenging few-shot scenarios (1-shot)
  • Expected Results: 70-85% accuracy on novel classes (5-way 5-shot)
  • Optimal: Best few-shot learning performance

Very Large Base Dataset (100+ classes, 500+ images/class)

  • Viability: Maximum meta-learning potential
  • Configuration: epochs=2-3, learning_rate=0.001
  • Tips: Excellent transfer to diverse novel classes
  • Expected Results: 75-90% accuracy on novel classes (5-way 5-shot)
  • Research-grade: State-of-the-art few-shot performance

Fine-tuning Best Practices

  1. Use Pre-trained Embeddings: If available from similar domain

    • ImageNet pre-trained weights common starting point
    • Domain-specific pre-training even better
    • Dramatically improves few-shot performance
    • Reduces meta-training time
  2. Validate on Novel Classes: Critical for honest evaluation

    • Hold out 20-30% of base classes as "novel" for validation
    • Never let validation classes leak into meta-training
    • Simulates true few-shot scenario
    • Prevents overfitting to base classes
  3. Balance Episode Difficulty: Match training to deployment

    • If deploying 5-way 1-shot, meta-train on 5-way 1-shot episodes
    • Can train on harder scenarios (more ways, fewer shots) for robustness
    • Consider curriculum learning (easy → hard episodes)
  4. Monitor Embedding Quality: Visualize learned representations

    • t-SNE or UMAP of embeddings
    • Check if classes cluster well
    • Verify separation between classes
    • Debug poor performance
  5. Augmentation Strategy: Moderate augmentation recommended

    • Random crop, flip, rotation (±15°)
    • Color jitter, brightness adjustment
    • Avoid changing semantic content
    • Helps generalization to novel classes
  6. Learning Rate Scheduling: Consider decay over time

    • Start with 0.001, reduce to 0.0001 in later epochs
    • Cosine annealing or step decay
    • Fine-tunes embeddings without disrupting learned structure

Hardware Requirements

Minimum Configuration

  • GPU: 4GB VRAM (GTX 1650 or better)
  • RAM: 8GB system memory
  • Storage: ~50MB model + dataset
  • Suitable for: Small-scale experimentation

Recommended Configuration

  • GPU: 6-8GB VRAM (RTX 2060/3060)
  • RAM: 16GB system memory
  • Storage: ~100MB model + dataset
  • Suitable for: Standard few-shot learning projects

High-Performance Configuration

  • GPU: 8-12GB VRAM (RTX 3070/4070)
  • RAM: 32GB system memory
  • Storage: ~200MB model + dataset
  • Suitable for: Large-scale meta-training, research

CPU Training

  • Viable for small datasets but slow
  • 10-20x slower than GPU
  • Recommended only if GPU unavailable
  • Meta-training less demanding than standard training

Common Issues and Solutions

Poor Generalization to Novel Classes

Problem: High accuracy on base classes during meta-training, but poor on novel classes

Solutions:

  1. Root cause: Overfitting to base classes instead of learning transferable features
  2. Ensure validation classes truly novel (held-out from training)
  3. Increase base class diversity
  4. Reduce number of epochs (try 3 instead of 10)
  5. Add regularization (dropout, weight decay)
  6. Use more diverse data augmentation
  7. Check if novel classes too different from base classes

High Variability in Performance

Problem: Accuracy varies significantly between different novel class sets

Solutions:

  1. Root cause: Inconsistent episode difficulty or support set quality
  2. Ensure sufficient base class diversity during meta-training
  3. Train longer (more epochs or episodes per epoch)
  4. Use larger K (more shots) at inference for stability
  5. Average predictions over multiple support set samples
  6. Check if some novel classes inherently harder (fine-grained)
  7. Validate on multiple novel class sets to assess robustness

Confusion Between Similar Novel Classes

Problem: Model struggles to distinguish visually similar novel classes

Solutions:

  1. Root cause: Embedding space not discriminative enough
  2. Include more fine-grained base classes during meta-training
  3. Increase embedding dimensionality if possible
  4. Use more shots (5-10 instead of 1-3) for better prototypes
  5. Carefully curate support set (choose maximally discriminative examples)
  6. Consider metric learning losses beyond standard prototypical loss
  7. May need supervised fine-tuning for very fine-grained tasks

Slow Convergence or Training Instability

Problem: Loss not decreasing or oscillating wildly

Solutions:

  1. Root cause: Learning rate too high or training dynamics unstable
  2. Reduce learning rate (try 0.0001 instead of 0.001)
  3. Use learning rate warmup (gradually increase from 0)
  4. Check for data loading issues or corrupted images
  5. Normalize images consistently (mean/std from ImageNet if pre-trained)
  6. Reduce batch size or number of ways if memory issues
  7. Verify episode construction correct (balanced sampling)

Poor 1-Shot Performance

Problem: Acceptable 5-shot accuracy but terrible 1-shot accuracy

Solutions:

  1. Root cause: Prototypes based on single example unreliable
  2. Meta-train explicitly on 1-shot episodes (not just 5-shot)
  3. This is inherently hardest few-shot scenario
  4. Use more robust embedding (larger network, more training)
  5. At inference, choose most representative support example
  6. Consider multiple prototypes per class (sub-class structure)
  7. Accept that 1-shot has lower ceiling than 5-shot

Domain Shift to Novel Classes

Problem: Novel classes from different visual domain than base classes

Solutions:

  1. Root cause: Distribution mismatch between meta-training and deployment
  2. Include base classes covering diverse visual domains
  3. Pre-train on large-scale diverse dataset (e.g., ImageNet)
  4. Meta-fine-tune on few examples from target domain
  5. Use domain adaptation techniques
  6. Collect base classes more similar to expected novel classes
  7. Consider multi-modal approaches (vision + text)

Out of Memory Errors

Problem: CUDA out of memory during meta-training

Solutions:

  1. Reduce number of ways (N) in episodes
  2. Reduce number of query examples per episode
  3. Use smaller batch size (number of episodes per batch)
  4. Reduce image resolution
  5. Use smaller embedding network (e.g., ResNet-18 instead of ResNet-50)
  6. Enable gradient checkpointing if available
  7. Close other GPU applications

Example Use Cases

E-Commerce: Recognizing New Product Categories

Scenario: Online marketplace adding new product categories daily; traditional classification requires 1,000+ images per category and retraining

Configuration:

Model: Prototypical Network
Epochs: 5
Learning Rate: 0.001
Base Classes: 80 diverse product categories
Base Images: 100-200 per category
Novel Classes: New products (fashion items, electronics, home goods)
Shots: 5-10 product photos per new category

Implementation:

  1. Meta-train on 80 established product categories
  2. When new category appears, collect 5-10 product images
  3. Use as support set for few-shot inference
  4. Immediately classify user queries into new category
  5. No retraining required

Expected Results:

  • 5-way 5-shot: 80-85% accuracy
  • 10-way 5-shot: 75-80% accuracy
  • Rapid deployment (minutes vs days)

Business Impact:

  • Reduce time-to-market for new products
  • Lower data collection costs (90% fewer labels)
  • Scale to thousands of categories dynamically

Wildlife Conservation: Identifying Rare Species

Scenario: Conservation project monitoring endangered species with limited photographs; traditional methods impractical due to data scarcity

Configuration:

Model: Prototypical Network
Epochs: 5
Learning Rate: 0.001
Base Classes: 100 common species (birds, mammals)
Base Images: 50-100 per species
Novel Classes: Rare or endangered species
Shots: 3-5 camera trap images per rare species

Implementation:

  1. Meta-train on common species with abundant data
  2. For rare species, collect 3-5 field photos
  3. Classify camera trap footage into rare species
  4. Update support set as more images collected

Expected Results:

  • 5-way 3-shot: 70-75% accuracy
  • Sufficient for conservation monitoring
  • Handles challenging field conditions

Impact:

  • Monitor rare species without extensive labeling
  • Rapid adaptation to newly discovered species
  • Cost-effective biodiversity assessment

Medical Imaging: Classifying Rare Conditions

Scenario: Hospital encountering rare diseases with limited case studies; collecting 1,000+ examples per rare condition impossible

Configuration:

Model: Prototypical Network
Epochs: 8
Learning Rate: 0.0001 (careful tuning for medical)
Base Classes: 30 common conditions with extensive data
Base Images: 500-1,000 per condition
Novel Classes: Rare diseases, new presentations
Shots: 3-5 clinical images per rare condition

Implementation:

  1. Meta-train on common conditions (well-characterized)
  2. For rare condition, use 3-5 documented cases
  3. Assist radiologists in identifying similar presentations
  4. Update support set with confirmed cases

Expected Results:

  • 5-way 5-shot: 65-75% accuracy (clinical-grade challenging)
  • Decision support for rare diagnoses
  • Reduces diagnostic time

Considerations:

  • Medical accuracy critical - use as assistant, not replacement
  • Validate thoroughly before clinical deployment
  • Explainability important for physician trust

Comparison with Alternatives

Prototypical Network vs Matching Networks

Choose Prototypical Network when:

  • Simple, interpretable approach preferred
  • Distance-based classification intuitive for users
  • Prototypes as class representatives desired
  • Standard metric learning sufficient
  • Computational efficiency important

Choose Matching Networks when:

  • Attention mechanism over support set desired
  • More complex support set encoding needed
  • Have sufficient computational resources
  • Research exploring attention-based few-shot

Prototypical Network vs MAML (Model-Agnostic Meta-Learning)

Choose Prototypical Network when:

  • Fast inference critical (no gradient steps at inference)
  • Simpler training procedure preferred
  • Metric learning approach suitable
  • Many novel classes expected
  • Limited computational resources

Choose MAML when:

  • Maximum accuracy priority (with sufficient data)
  • Can afford few gradient steps at inference
  • Task diversity very high
  • Have substantial computational resources
  • Research-oriented project

Prototypical Network vs Siamese Networks

Choose Prototypical Network when:

  • Multi-class few-shot classification (N-way K-shot)
  • Prototypes as explicit class representations
  • More than 2 classes at inference
  • Meta-learning approach preferred

Choose Siamese Networks when:

  • Binary verification tasks (same/different)
  • Pairwise similarity primary goal
  • One-shot learning with single reference
  • Simpler architecture desired

Prototypical Network vs Fine-tuning Supervised Model

Choose Prototypical Network when:

  • Few examples per class (<50)
  • Novel classes appear frequently
  • No retraining acceptable
  • Rapid deployment critical
  • Many classes with limited data

Choose Fine-tuning Supervised Model when:

  • Abundant data available (>100 examples/class)
  • Fixed set of classes
  • Maximum accuracy critical (few-shot has lower ceiling)
  • Retraining infrastructure in place
  • Well-established pipelines exist

When NOT to Use Prototypical Network

Consider alternatives if:

  • Abundant labeled data available (>100 examples/class): Use standard supervised learning
  • Maximum accuracy non-negotiable: Supervised methods outperform with sufficient data
  • Fine-grained recognition critical: May need specialized architectures
  • Real-time inference with many classes: Distance computation scales with number of classes
  • 3D or multi-modal data: May need specialized few-shot architectures
  • Very few base classes (<10): Insufficient diversity for meta-learning
  • Zero-shot without examples: Need vision-language models (CLIP, etc.)

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items