Computer Vision
AI tasks involving images, videos, and spatial understanding
Computer vision tasks work with images and videos.


Classification Tasks
- Image Classification: Assign labels or categories to images
- Video Classification: Classify actions or scenes in video content
- Zero-Shot Image Classification: Classify images without task-specific training
Detection and Localization
- Object Detection: Detect and localize objects within images
- Zero-Shot Object Detection: Localize unseen object categories without training
- Keypoint Detection: Detect body, pose, or facial keypoints
Segmentation
- Image Segmentation: Pixel-level labeling for object or region boundaries
- Mask Generation: Generate segmentation masks automatically
Generation Tasks
- Text-to-Image: Generate images from text prompts
- Text-to-Video: Generate videos from text descriptions
- Image-to-Image: Modify or restyle images using another image or prompt
- Image-to-Video: Generate videos based on input images
- Video-to-Video: Transform or modify video content
- Unconditional Image Generation: Generate images without any prompt or condition
3D Tasks
- Text-to-3D: Generate 3D models from text descriptions
- Image-to-3D: Reconstruct 3D shapes from images
Other Vision Tasks
- Depth Estimation: Predict a per-pixel depth map from images
- Image-to-Text: Convert images into natural language descriptions
- Image Feature Extraction: Generate embeddings or semantic features from images
- OCR: Extract text from images and documents