K-Means Algorithm

The K-Means Algorithm is an Unsupervised Learning algorithm used to find clusters. The clusters are formed by grouping data points that are as similar as possible to each other and different from other data points. The distance between data points are calculated and averaged to form groups. K-Means is used for market segmentation, computer vision, and astronomy.
Whilst the K-Means Algorithm is an Unsupervised Learning algorithm the attributes that are used for comparison are defined by the user. K-Means was developed in the 1950s and 60s for signal analysis.
- GIF: https://commons.wikimedia.org/wiki/File:K-means_convergence.gif
- AWS: https://docs.aws.amazon.com/sagemaker/latest/dg/k-means.html
- Wikipedia: https://en.wikipedia.org/wiki/K-means_clustering
Attributes
Problem attribute | Description |
Data types and format | Tabular |
Learning paradigm or domain | Unsupervised Learning |
Problem type | Clustering or grouping |
Use case examples | Group similar objects/data together |
Training
For training the data can be formatted in either CSV or recordIO-wrapped-protobuf.
Model artifacts and inference
Description | Artifacts |
Learning paradigm | Unsupervised Learning |
Request format | CSV, JSON or x-recordio-protobuf |
Result | CSV, JSON, Json lines, recordIO |
Processing environment
For training CPU instances are used.
Video: Amazon SageMaker’s Built-in Algorithm Webinar Series: Clustering with K Means
This is a 58.51 minutes video from AWS.