SageMaker unsupervised algorithms
There are five SageMaker unsupervised algorithms that process tabular data. Unsupervised Learning algorithms process data that has not been labeled. IP Insights is an anomaly detection algorithm to detect problems and threats in an IR network. K-Means is a clustering algorithm. Object2Vec translates input data to vectors. Principal Component Analysis (PCA) algorithm is used in Feature Engineering to reduce the number of features in data. The Random Cut Forest (RCF) is a general purpose anomaly detection algorithm.
K-Means Algorithm
The K-Means Algorithm is an Unsupervised Learning algorithm used to find clusters. The clusters are formed by grouping data points that are as similar as possible to each other and different from other data points. The distance between data points are calculated and averaged to form groups. K-Means is used for market segmentation, computer vision,…

Principal Component Analysis Algorithm
Sometimes data can have large amounts of features, so many that further processing or inference can be hampered. When this occurs Principal Component Analysis Algorithm (PCA), an Unsupervised Learning algorithm, is used to reduce the number of features whilst retaining as much information as possible. This is Feature Engineering. PCA has two modes: Regular and…

Random Cut Forest Algorithm
The Random Cut Forest Algorithm (RCF) is an unsupervised algorithm which is used to identify anomalies in data. An anomaly is a data point that differs significantly from the bulk of the data. The Random Cut Forest Algorithm provides a score for each data point. A low score indicates the datapoint is similar to the…

IP Insights Algorithm
SageMaker IP Insights Algorithm is used for detecting anomalies in network traffic. It is an unsupervised learning algorithm that is trained on historical data to learn the patterns of normal network usage. In production it can detect anomalies in network usage that may indicate changes in user behaviour, network performance or malicious activity. The IP…
Summary
These SageMaker built-in algorithms all use Unsupervised Learning and so process unlabelled data. Both IP Insights and Random Cut Forest (RCF) are used for anomaly detection. Object2Vec translates data into vectors to be used by downstream processing. K-Means identifies clusters in data. PCA reduces the numbers of features in high dimensional data as part of Feature Engineering.
Credits
- Network leads photo by Jordan Harrison on Unsplash
- Woman with shopping photo by freestocks on Unsplash
- Filter coffee Photo by Najib Kalil on Unsplash
- Cut down trees photo by Olia Gozha on Unsplash
Whizlab’s AWS Certified Machine Learning Specialty practice exams
Practice Exams with 271 questions, Video Lectures and Hands-on Labs from Whizlabs
Whizlab’s AWS Certified Machine Learning Specialty Practice tests are designed by experts to simulate the real exam scenario. The questions are based on the exam syllabus outlined by official documentation. These practice tests are provided to the candidates to gain more confidence in exam preparation and self-evaluate them against the exam content.
Practice test content
- Free Practice test – 15 questions
- Practice test 1 – 65 questions
- Practice test 2 – 65 questions
- Practice test 3 – 65 questions