DBSCAN

Density-Based Spatial Clustering finds arbitrarily-shaped clusters and identifies outliers as noise points

Density-Based Spatial Clustering finds arbitrarily-shaped clusters and identifies outliers as noise points.

When to use:

Don't know number of clusters in advance
Clusters have arbitrary shapes (not just spherical)
Need to identify outliers/anomalies
Have varying cluster densities

Strengths: Finds arbitrary shapes, detects outliers, no need to specify k, robust to noise Weaknesses: Sensitive to parameters (eps, min_samples), struggles with varying densities, not scalable to very large datasets

Model Parameters

Eps (default: 0.5, required) Maximum distance between two samples to be considered neighbors. This is crucial.

Too low: Many small clusters and noise points
Too high: Merges distinct clusters
Use k-distance plot to determine optimal eps

Min Samples (default: 5, required) Minimum points needed to form a dense region (core point).

3-5: Sensitive, more clusters
5-10: Good default
10+: Conservative, fewer clusters, more noise

Metric (default: "euclidean") Distance metric:

euclidean: Standard distance (default)
manhattan: City-block distance
chebyshev: Maximum coordinate difference
Others: cosine, minkowski, etc.

Algorithm (default: "auto") Algorithm for nearest neighbors:

auto: Automatically choose best (default)
ball_tree: Good for low dimensions
kd_tree: Fast for low dimensions
brute: Exact but slow (use for small datasets)

P (optional) Power parameter for Minkowski metric (2 = Euclidean, 1 = Manhattan).

DBSCAN

Model Parameters

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

DBSCAN

Model Parameters

On this page

Command Palette