Clustering Algorithms
In real world, data does not always come with labels and in such cases, we need to create our own clusters/labels in order to make some groupings in the data. This is called unsupervised learning. The following algorithms are used to do this:
Clustering Algorithms and Their Features
Algorithm | Category | Features |
---|---|---|
K-Means | Partitioning | - Simple and fast - Works well with large datasets - Assumes clusters are spherical and balanced |
Hierarchical Clustering | Hierarchical | - Does not require the number of clusters to be specified - Can be visualized using dendrograms |
DBSCAN (Density-Based) | Density-Based | - Can find arbitrarily shaped clusters - Good for data with noise and outliers |
Mean Shift | Centroid-Based | - Does not require the number of clusters to be specified - Can find clusters of any shape |
OPTICS (Ordering Points To Identify the Clustering Structure) | Density-Based | - Generalizes DBSCAN by addressing varying density clusters - Provides a reachability plot for cluster hierarchy |
Spectral Clustering | Graph-Based | - Good for non-convex clusters - Uses graph distance to cluster points |
Affinity Propagation | Graph-Based | - Does not require the number of clusters to be specified - Sends messages between pairs of samples |
Agglomerative Clustering | Hierarchical | - A type of hierarchical clustering - Uses a bottom-up approach |
BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) | Hierarchical | - Designed for very large datasets - Builds a tree-like structure to cluster |
Note that this is not an exhaustive list, and there are many other clustering algorithms and variations thereof. Each algorithm has its own set of parameters and assumptions that can affect its performance on different datasets.