Mean Shift
Centroid-based algorithm that automatically discovers clusters by finding density peaks in the feature space
Centroid-based algorithm that automatically discovers clusters by finding density peaks in the feature space.
When to use:
- Don't know number of clusters in advance
- Clusters have different shapes
- Want to find natural cluster centers
- Have moderate-sized datasets
Strengths: Automatically finds number of clusters, handles arbitrary shapes, single parameter (bandwidth) Weaknesses: Very slow on large datasets, sensitive to bandwidth parameter, computationally expensive
Model Parameters
Bandwidth (optional) Size of the search window. Critical parameter.
- null: Automatically estimated from data (recommended)
- Low values: Many small clusters
- High values: Few large clusters
- Use estimate_bandwidth() for data-driven selection
Bin Seeding (default: false) If true, initial kernel locations are discretized to a grid for speed.
- false: Seed from all points (slower, more accurate)
- true: Seed from grid (faster, approximate)
Cluster All (default: true) Whether to cluster all points or leave orphans as outliers.
- true: Assign all points to nearest cluster
- false: Points far from cluster centers remain unassigned
Max Iterations (default: 300) Maximum iterations per seed point before convergence.