A photograph showing gingerbread men being cut out with a cookie cutter to symbolize selecting a SageMaker built-in algorithm for an appropriate problem

How to select a model for a given machine learning problem

To select a model for a given Machine Learning problem we use the information and conclusions from Framing the Problem. A Machine Learning problem can be described with four aspects:

  1. Data types and format
  2. Learning paradigm or domain
  3. Problem type
  4. Use case examples

The first aspect concerns the format and structure of the data, which could be numeric, images or text. Numeric data is often tabular. The second aspect is the learning paradigm or domain which includes supervised learning, unsupervised learning, textual analysis and image processing. The third aspect is about the type of problem, for example classification, clustering, or topic modeling. The final aspect is use cases and AWS provides sixteen use case examples which will apply to many Machine Learning problems.

This information allows us to narrow down the choices of algorithms, sometimes to a single algorithm. However there may be other factors that influence the choice of algorithm. For example, some algorithms do not perform well with sparse data. These factors and nuances are discussed in the individual algorithm pages.

These revision notes are part of subdomain 3.2 Select the appropriate model(s) for a given machine learning problem of the exam syllabus.

Scroll to the bottom of the page for questions and answers.

More questions for SageMaker built-in algorithms and their uses are in this article: 35 Q & A for SageMaker built-in algorithms

Video: Built-in Machine Learning Algorithms with Amazon SageMaker – a Deep Dive

A 15.37 minute Video by Emily Webber from AWS.

What are the built-in algorithms

To select a model SageMaker has seventeen built in algorithms to choose from. These are optimised versions of common open source algorithms. Here they are listed in alphabetical order:

  1. BlazingText algorithm
  2. DeepAR Forecasting Algorithm
  3. Factorization Machines Algorithm
  4. Image Classification Algorithm
  5. IP Insights
  6. K-Means Algorithm
  7. K-Nearest Neighbors (K-NN) Algorithm
  8. Latent Dirichlet Allocation (LDA) Algorithm
  9. Linear Learner Algorithm
  10. Neural Topic Model (NTM) Algorithm
  11. Object Detection Algorithm
  12. Object2Vec Algorithm
  13. Principal Component Analysis (PCA) Algorithm
  14. Random Cut Forest (RCF) Algorithm
  15. Semantic Segmentation Algorithm
  16. Sequence-to-Sequence Algorithm
  17. XGBoost Algorithm

Video: AWS re:Invent 2020: Choose the right machine learning algorithm in Amazon SageMaker

This is a 29.53 minutes video from AWS by Denis Batalov and Alberto Danese. The timestamps are:

  • 0 – introduction
  • 1.30 – 17 built-in algorithms in Amazon SageMaker
  • 5.13 – Image classification demo
  • 10.35 – Guide for classification / regression algorithm
  • 11.30 – Amazon blog post on Linear Learner
  • 11.46  – K-Nearest Neighbor (K-NN)
  • 13.22 – Amazon blog post on K-NN
  • 13.50 – XG Boost, how it works
  • 16.00 – Getting a grasp on how XG Boost works
  • 24.00 – XG Boost as a built-in algorithm
  • 25.18 – XG Boost in Nexi
  • 28.19 – Popular frameworks
  • 28.47 – AWS Marketplace
  • 29.20 – Amazon resources for SageMaker built-in algorithms

What is the definition of a model and an algorithm

When you take one of the SageMaker built-in algorithms and train it with data you create a model, therefore:

Model = Training (an Algorithm + Data)

The four aspects of a problem used for model selection

  1. Data types and format
  2. Learning paradigm or domain
  3. Problem type
  4. Use case examples

Data types and format

SageMaker algorithms have very specific requirements for the data you train them with. So we can select a model based on the data type and form of the data the algorithm processes. This aspect allows the SageMaker built in algorithms to be split into three groups: Tabular, Text and Image.

SageMaker algorithmData types and format
DeepAR Forecasting AlgorithmTabular
Factorization Machines AlgorithmTabular
IP InsightsTabular
K-Means AlgorithmTabular
K-Nearest Neighbors (K-NN) AlgorithmTabular
Linear Learner AlgorithmTabular
Object2Vec AlgorithmTabular
Principal Component Analysis (PCA) AlgorithmTabular
Random Cut Forest (RCF) AlgorithmTabular
XGBoost AlgorithmTabular
BlazingText algorithmText
Latent Dirichlet Allocation (LDA) AlgorithmText
Neural Topic Model (NTM) AlgorithmText
Sequence-to-Sequence AlgorithmText
Image Classification AlgorithmImage
Object Detection AlgorithmImage
Semantic Segmentation AlgorithmImage
An infographic that groups the SageMaker built-in algorithms by their data types and domains
Add this revision card to your Pinterest account

Learning paradigm or domain

The learning paradigm or domain includes:

  1. Supervised learning
  2. Unsupervised learning
  3. Textual analysis
  4. Image processing

The input data domain can be used to select a model by identifying a subset of the algorithms. If the input data domain is text or images the choice is confined to three and four algorithms respectively. The Learning Paradigm also narrows the search to smaller groups of algorithms. The key factor here is if the data is labelled for Supervised Learning or unlabeled for Unsupervised Learning.

SageMaker algorithmLearning paradigm or domain
DeepAR Forecasting AlgorithmSupervised Learning
Factorization Machines AlgorithmSupervised Learning
K-Nearest Neighbors (K-NN) AlgorithmSupervised Learning
Linear Learner AlgorithmSupervised Learning
XGBoost AlgorithmSupervised Learning
IP InsightsUnsupervised Learning
K-Means AlgorithmUnsupervised Learning
Object2Vec AlgorithmUnsupervised Learning
Principal Component Analysis (PCA) AlgorithmUnsupervised Learning
Random Cut Forest (RCF) AlgorithmUnsupervised Learning
BlazingText algorithmTextual Analysis
Latent Dirichlet Allocation (LDA) AlgorithmTextual Analysis
Neural Topic Model (NTM) AlgorithmTextual Analysis
Sequence-to-Sequence AlgorithmTextual Analysis
Image Classification AlgorithmImage Processing
Object Detection AlgorithmImage Processing
Semantic Segmentation AlgorithmImage Processing
An infographic that groups the SageMaker built-in algorithms by learning paradigm or domain
Add this revision card to your Pinterest account

Problem type

The Problem Type is the type of problem with reference to the data. This aspect includes:

  • Classification
  • Regression
  • Time-series forecasting
  • Clustering
  • Topic modeling
  • Dimensionality reduction
  • Anomaly detection
  • IP anomaly detection
  • Embeddings
  • Text classification
  • Machine translation
  • Text summarization
  • Speech-to-text
  • Image and multi-label classification
  • Object detection and classification
  • Computer vision
SageMaker AlgorithmProblem type
BlazingText algorithmText classification and embedding
DeepAR Forecasting AlgorithmTime-series forecasting
Factorization Machines AlgorithmBinary/multi-class classification, Regression
Image Classification AlgorithmImage and multi-label classification
IP InsightsIP anomaly detection
K-Means AlgorithmClustering or grouping
K-Nearest Neighbors (k-NN) AlgorithmBinary/multi-class classification, Regression
Latent Dirichlet Allocation (LDA) AlgorithmTopic modeling
Linear Learner AlgorithmBinary/multi-class classification, Regression
Neural Topic Model (NTM) AlgorithmTopic modeling
Object Detection AlgorithmObject detection and classification
Object2Vec AlgorithmEmbeddings
Principal Component Analysis (PCA) AlgorithmFeature engineering: dimensionality reduction
Random Cut Forest (RCF) AlgorithmAnomaly detection
Semantic Segmentation AlgorithmComputer vision
Sequence-to-Sequence AlgorithmMachine translation
XGBoost AlgorithmBinary/multi-class classification, Regression
An infographic to show how the SageMaker built-in algorithms can be grouped depending on the problems type they solve
Add this revision card to your Pinterest account

Use case examples

SageMaker AlgorithmUse case
BlazingText algorithmAssign predefined categories to documents in a corpus of text
DeepAR Forecasting AlgorithmBased on historical data for a behavior, predict future behavior
Factorization Machines AlgorithmPredict a numeric/continuous value; Predict if an item belongs to a category
Image Classification AlgorithmLabel/tag an image based on the content of the image
IP InsightsProtect your application from suspicious users
K-Means AlgorithmGroup similar objects/data together
K-Nearest Neighbors (K-NN) AlgorithmPredict a numeric/continuous value; Predict if an item belongs to a category
Latent Dirichlet Allocation (LDA) AlgorithmOrganize a set of documents into topics (not known in advance)
Linear Learner AlgorithmPredict a numeric/continuous value; Predict if an item belongs to a category
Neural Topic Model (NTM) AlgorithmOrganize a set of documents into topics (not known in advance)
Object Detection AlgorithmDetect people and objects in an image
Object2Vec AlgorithmImprove the data embeddings of the high-dimensional objects
Principal Component Analysis (PCA) AlgorithmDrop those columns from a dataset that have a weak relation with the label/target variable. This reduces the number of features to be analyzed.
Random Cut ForRcfest (RCF) AlgorithmDetect abnormal behavior in application
Semantic Segmentation AlgorithmTag every pixel of an image individually with a category
Sequence-to-Sequence AlgorithmConvert audio files to text, Summarize a long text corpus, Convert text from one language to other
XGBoost AlgorithmPredict a numeric/continuous value; Predict if an item belongs to a category

Classifying algorithms with Learning Paradigm and Data Type

The first two problem aspects we discussed have the fewest options, four for learning paradigms and three for data types. These aspects are also the easiest to identify in Problem Framing since they are based on easily observable characteristics of the data. From the table below it can be seen that identifying if the data will require Supervised or Unsupervised learning will make a significant reduction in the number of suitable algorithms. However you will still have five algorithms to choose from in each group.

Data types and formatLearning paradigm or domain
SupervisedUnsupervisedTextImage
TabularDeep AR forecasting
Factorization Machines
K-Nearest Neighbor
Linear Learner
XG Boost
IP Insights
K-Means
PCA
Random Cut Forest
Object2Vec
Blazing Text
TextLDA
NTM
Sequence to Sequence
ImageImage Classification
Object Detection
Semantic Segmentation

Summary

SageMaker has seventeen built-in algorithms that can be used to build Machine Learning models. Four aspects can be used to select a model: Data types and format; Learning paradigm or domain; Problem type; Use case examples. Using these aspects to select appropriate algorithms will reduce choice to a small group and often to a single one.

Credits

Photo by Dari lli on Unsplash


Contains affiliate links. If you go to Whizlab’s website and make a purchase I may receive a small payment. The purchase price to you will be unchanged. Thank you for your support.

Whizlabs AWS Certified Machine Learning Specialty

Practice Exams with 271 questions, Video Lectures and Hands-on Labs from Whizlabs

Whizlab’s AWS Certified Machine Learning Specialty Practice tests are designed by experts to simulate the real exam scenario. The questions are based on the exam syllabus outlined by official documentation. These practice tests are provided to the candidates to gain more confidence in exam preparation and self-evaluate them against the exam content.

Practice test content

  • Free Practice test – 15 questions
  • Practice test 1 – 65 questions
  • Practice test 2 – 65 questions
  • Practice test 3 – 65 questions
Whizlabs AWS certified machine learning course with a robot hand

Section test content

  • Core ML Concepts – 10 questions
  • Data Engineering – 11 questions
  • Exploratory Data Analysis – 13 questions
  • Modeling – 15 questions
  • Machine Learning Implementation and Operations – 12 questions

Questions and answers

0 votes, 0 avg
33
Created on By Michael Stainsbury

3.2 How to select a model for a given machine learning problem

Five questions from a test bank of 10 questions about sub-domain 3.2 Select the appropriate model(s) for a given machine learning problem of the Modeling knowledge domain.

1 / 5

What are the aspects you can use to choose a SageMaker built-in algorithm?

2 / 5

3 / 5

What are the Image processing algorithms are:

  1. Image Classification
  2. Object Detection
  3. <–?–>
2 words left

4 / 5

5 / 5

What are the SageMaker built in Text processing algorithms?

Your score is

The average score is 63%

0%


Whizlab’s AWS Certified Machine Learning Specialty course

  • In Whizlabs AWS Machine Learning certification course, you will learn and master how to build, train, tune, and deploy Machine Learning (ML) models on the AWS platform.
  • Whizlab’s Certified AWS Machine Learning Specialty practice tests offer you a total of 200+ unique questions to get a complete idea about the real AWS Machine Learning exam.
  • Also, you get access to hands-on labs in this course. There are about 10 lab sessions that are designed to take your practical skills on AWS Machine Learning to the next level.
Whizlabs AWS certified machine learning course with a robot hand

Course content

The course has 3 resources which can be purchased seperately, or together:

  • 9 Practice tests with 271 questions
  • Video course with 65 videos
  • 9 hands on labs

Similar Posts