K-Means Algorithm

A GIF to show how the SageMaker K-Means algorithm works - Chire, CC BY-SA 4.0 <https://creativecommons.org/licenses/by-sa/4.0>, via Wikimedia Commons
A GIF to show how the SageMaker K-Means algorithm works.

The K-Means Algorithm is an Unsupervised Learning algorithm used to find clusters. The clusters are formed by grouping data points that are as similar as possible to each other and different from other data points. The distance between data points are calculated and averaged to form groups. K-Means is used for market segmentation, computer vision, and astronomy.

Whilst the K-Means Algorithm is an Unsupervised Learning algorithm the attributes that are used for comparison are defined by the user. K-Means was developed in the 1950s and 60s for signal analysis.

Attributes

Problem attributeDescription
Data types and formatTabular
Learning paradigm or domainUnsupervised Learning
Problem typeClustering or grouping
Use case examplesGroup similar objects/data together

Training

For training the data can be formatted in either CSV or recordIO-wrapped-protobuf.

Model artifacts and inference

DescriptionArtifacts
Learning paradigmUnsupervised Learning
Request formatCSV, JSON or x-recordio-protobuf
ResultCSV, JSON, Json lines, recordIO

Processing environment

For training CPU instances are used.

Video: Amazon SageMaker’s Built-in Algorithm Webinar Series: Clustering with K Means

This is a 58.51 minutes video from AWS.

Similar Posts