Mini-Batch K-Means
Scalable K-Means variant using random mini-batches for faster training
Mini-Batch K-Means trains K-Means using small random batches of data per iteration instead of the full dataset. This significantly reduces training time with only a minor loss in cluster quality.
When to use:
- Very large datasets where standard K-Means is too slow
- Online or streaming settings where data arrives in batches
- When approximate cluster centroids are acceptable for significant speed gains
Input: Tabular data with the feature columns defined during training Output: Cluster label (0 to K-1) for each row
Model Settings (set during training, used at inference)
N Clusters (default: 8) Number of clusters.
Init (default: k-means++) Initialization method for centroids.
Max Iter (default: 100) Maximum passes over the complete dataset.
Batch Size (default: 1024) Number of samples per mini-batch. Larger batches improve centroid quality at the cost of speed.
Inference Settings
No dedicated inference-time settings. Each point is assigned to its nearest trained centroid.