UMAP
Uniform Manifold Approximation and Projection - modern manifold learning preserving both local and global structure
UMAP
Uniform Manifold Approximation and Projection - modern manifold learning preserving both local and global structure.
When to use:
- Need fast dimensionality reduction
- Want to transform new data
- Need to preserve global structure
- Any dimensionality (not just 2D/3D)
- Better than t-SNE for most cases
Strengths: Much faster than t-SNE, supports inference, preserves global structure, scales better, more robust Weaknesses: More hyperparameters to tune, less mature than PCA/t-SNE
Model Parameters
N Components (default: 2, required) Target embedding dimensions.
- 2-3: Visualization
- Higher: Feature extraction for downstream tasks
N Neighbors (default: 15) Controls local vs. global structure balance.
- Small (2-5): Emphasizes very local structure, fine details
- Medium (15-50): Balanced (default)
- Large (50-200): More global structure
- Larger datasets can use larger values
Min Distance (default: 0.1) Minimum distance between points in embedding.
- Small (0.0-0.1): Clumpy embeddings, emphasizes clusters
- Medium (0.1-0.5): Balanced (default)
- Large (0.5-0.99): More spread out, emphasizes overall structure
Metric (default: "euclidean") Distance metric:
- euclidean: Standard (default)
- manhattan: L1 distance
- cosine: Angle similarity (good for text)
- correlation: Pearson correlation
- hamming: For binary data
Random State (default: 42) Seed for reproducibility.