Clustering types and their usage areas are explained with python implementation

Ibrahim Kovan

Unlabeled datasets can be grouped by considering their similar properties with the unsupervised learning technique. However, the point of view of these similar features is different in each algorithm. Unsupervised learning provides detailed information about the dataset as well as labeling the data. With this acquired information, the dataset can be rearranged and made more understandable. In this way, unsupervised learning is used in customer segmentation, image segmentation, genetics (clustering DNA patterns), recommender systems (grouping together users with similar viewing patterns), anomaly detection, and many more. New and concise components are obtained according to the statistical properties of the dataset with the PCA that is one of the most frequently used dimensionality reduction techniques and mentioned in the previous article. This article explains clustering types, using clustering for image segmentation, data preprocessing with clustering, and Gaussian mixtures method in detail. All explanations are supported with python implementation.

Photo by Valentin Salja on Unsplash

2.1. K-Means


K-means clustering is one of the frequently used clustering algorithms. The underlying idea is to place the samples according to the distance from the center of the clusters in the number determined by the user. The code block below explains how the k-means cluster is built from scratch.


The centers of the determined number of clusters are randomly placed in the dataset. All samples are assigned to the closest center, this closeness is calculated with Euclidian Distance in the code block above, but different calculation methods can also be used such as Manhattan Distance. Centers to which samples are assigned update their location (average) according to their population. Stage 2, that is, the distances of the samples from the center are recalculated and the assignment to the nearest center takes place, and it is repeated. Stage 3, each cluster center recenters itself according to the dataset….

Continue reading:—-7f60cf5620c9—4