Skip to content

Clustering Algorithms

In real world, data does not always come with labels and in such cases, we need to create our own clusters/labels in order to make some groupings in the data. This is called unsupervised learning. The following algorithms are used to do this:

Clustering Algorithms and Their Features

Algorithm Category Features
K-Means Partitioning - Simple and fast
- Works well with large datasets
- Assumes clusters are spherical and balanced
Hierarchical Clustering Hierarchical - Does not require the number of clusters to be specified
- Can be visualized using dendrograms
DBSCAN (Density-Based) Density-Based - Can find arbitrarily shaped clusters
- Good for data with noise and outliers
Mean Shift Centroid-Based - Does not require the number of clusters to be specified
- Can find clusters of any shape
OPTICS (Ordering Points To Identify the Clustering Structure) Density-Based - Generalizes DBSCAN by addressing varying density clusters
- Provides a reachability plot for cluster hierarchy
Spectral Clustering Graph-Based - Good for non-convex clusters
- Uses graph distance to cluster points
Affinity Propagation Graph-Based - Does not require the number of clusters to be specified
- Sends messages between pairs of samples
Agglomerative Clustering Hierarchical - A type of hierarchical clustering
- Uses a bottom-up approach
BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) Hierarchical - Designed for very large datasets
- Builds a tree-like structure to cluster

Note that this is not an exhaustive list, and there are many other clustering algorithms and variations thereof. Each algorithm has its own set of parameters and assumptions that can affect its performance on different datasets.