SageMaker unsupervised algorithms
There are five SageMaker unsupervised algorithms that process tabular data. Unsupervised Learning algorithms process data that has not been labeled. IP Insights is an anomaly detection algorithm to detect problems and threats in an IR network. K-Means is a clustering algorithm. Object2Vec translates input data to vectors. Principal Component Analysis (PCA) algorithm is used in Feature Engineering to reduce the number of features in data. The Random Cut Forest (RCF) is a general purpose anomaly detection algorithm.

IP Insights Algorithm
SageMaker IP Insights Algorithm is used for detecting anomalies in network traffic. It is an unsupervised learning algorithm that is trained on historical data to learn the patterns of normal network usage. In production it can detect anomalies in network usage that may indicate changes in user behaviour, network performance or malicious activity. The IP…

Random Cut Forest Algorithm
The Random Cut Forest Algorithm (RCF) is an unsupervised algorithm which is used to identify anomalies in data. An anomaly is a data point that differs significantly from the bulk of the data. The Random Cut Forest Algorithm provides a score for each data point. A low score indicates the datapoint is similar to the…

Object2Vec Algorithm
Object2Vec Algorithm is an Unsupervised Learning algorithm. The algorithm compares pairs of data points and preserves the semantics of the relationship between the pairs. The algorithm creates embeddings that can be used by other algorithms downstream. The embeddings are low-dimensional dense embeddings of high-dimensional objects. Object2Vec can be used for product search, item matching and…
K-Means Algorithm
The K-Means Algorithm is an Unsupervised Learning algorithm used to find clusters. The clusters are formed by grouping data points that are as similar as possible to each other and different from other data points. The distance between data points are calculated and averaged to form groups. K-Means is used for market segmentation, computer vision,…
Summary
These SageMaker built-in algorithms all use Unsupervised Learning and so process unlabelled data. Both IP Insights and Random Cut Forest (RCF) are used for anomaly detection. Object2Vec translates data into vectors to be used by downstream processing. K-Means identifies clusters in data. PCA reduces the numbers of features in high dimensional data as part of Feature Engineering.
Credits
- Network leads photo by Jordan Harrison on Unsplash
- Woman with shopping photo by freestocks on Unsplash
- Filter coffee Photo by Najib Kalil on Unsplash
- Cut down trees photo by Olia Gozha on Unsplash
AWS Certified Machine Learning Study Guide: Specialty (MLS-C01) Exam
This study guide provides the domain-by-domain specific knowledge you need to build, train, tune, and deploy machine learning models with the AWS Cloud. The online resources that accompany this Study Guide include practice exams and assessments, electronic flashcards, and supplementary online resources.