The K-Means Algorithm is an Unsupervised Learning algorithm used to find clusters. The clusters are formed by grouping data points that are as similar as possible to each other and different from other data points. The distance between data points are calculated and averaged to form groups. K-Means is used for market segmentation, computer vision, and astronomy.
Whilst the K-Means Algorithm is an Unsupervised Learning algorithm the attributes that are used for comparison are defined by the user. K-Means was developed in the 1950s and 60s for signal analysis.
- GIF: https://commons.wikimedia.org/wiki/File:K-means_convergence.gif
- AWS: https://docs.aws.amazon.com/sagemaker/latest/dg/k-means.html
- Wikipedia: https://en.wikipedia.org/wiki/K-means_clustering
|Data types and format
|Learning paradigm or domain
|Clustering or grouping
|Use case examples
|Group similar objects/data together
For training the data can be formatted in either CSV or recordIO-wrapped-protobuf.
Model artifacts and inference
|CSV, JSON or x-recordio-protobuf
|CSV, JSON, Json lines, recordIO
For training CPU instances are used.
Video: Amazon SageMaker’s Built-in Algorithm Webinar Series: Clustering with K Means
This is a 58.51 minutes video from AWS.