UMAP
Nonlinear manifold projection preserving local and global data structure
UMAP (Uniform Manifold Approximation and Projection) learns a low-dimensional representation of the data that preserves both local neighborhood structure and global topology. It is faster than t-SNE and supports inference on new data.
When to use:
- Visualizing clusters in 2D or 3D with preserved global structure
- Nonlinear feature extraction as preprocessing for downstream models
- Exploratory data analysis on complex high-dimensional datasets
Input: Tabular data with the feature columns defined during training Output: Projected coordinates in the reduced-dimensional space
Model Settings (set during training, used at inference)
N Components (default: 2) Dimensionality of the embedding space.
N Neighbors (default: 15, range: 2–200) Number of neighbors used in local manifold approximation. Smaller values capture fine local structure; larger values capture broader global structure.
Min Distance (default: 0.1, range: 0.0–0.99) Minimum distance between embedded points. Smaller values pack points together more tightly; larger values allow more spread.
Metric (default: euclidean) Distance metric for computing neighborhoods in the original space.
Inference Settings
No dedicated inference-time settings. The trained graph and embedding are used to project new points.