plastic model of the human brain to symbolize Machine Learning modeling

When people talk about Machine Learning they are mostly thinking about Modeling. Modeling is selecting and testing the algorithms to process data to find the information of value. This domain comprises 36% of the exam marks and has five subdomains:

Problem Framing (subdomain 3.1) is a method used to understand, define and prioritize business problems. This will determine if all the work that is to be done subsequently is perceived to be of use and provides business value. Framing identifies what will be observed and what will be predicted and the metrics that need to be optimised to monitor performance. The Framing will lead to selecting the type of Machine Learning approach to use which will involve Supervised Learning or Unsupervised Learning.

For Supervised Learning you need labeled training data. In Supervised Learning we provide data that has already been identified and therefore labeled, as being what we are looking for. Unsupervised Learning is used to infer patterns in unlabeled datasets. The algorithms can detect hidden patterns and data groupings in data without help from humans through labeling. Unsupervised learning is ideal for exploring raw and unknown data.

Many models (subdomain 3.2) are available through AWS Machine Learning services. Each model has its own use cases and requirements. Once the model has been chosen an iterative process of training, tuning and evaluation is undertaken.

Model training (subdomain 3.3) is the process of providing a model with data to learn from. During model training the data is split into three parts. Most (70% to 80%) is used as training data with the remainder used for validation and testing.

Model tuning (subdomain 3.4) is also known as hyperparameter optimization. Hyperparameters are SageMaker settings that do not change during training. They can be tuned manually, using search methods and automatically by using SageMaker guided search. Model tuning also includes additional feature engineering and experimenting with new algorithms.

Model evaluation (subdomain 3.5) is used to find out how well a model will do in predicting the desired outcome. This is done using metrics to measure the performance of the Model. Metrics measure accuracy, precision and other features of the Model by comparing the results from the model with the known contents of the training data.

Your model is now ready to be used with real data. But before it can be let loose on your corporate data it has to be deployed into the production environment.


AWS Certified Machine Learning Study Guide: Specialty (MLS-C01) Exam

This study guide provides the domain-by-domain specific knowledge you need to build, train, tune, and deploy machine learning models with the AWS Cloud. The online resources that accompany this Study Guide include practice exams and assessments, electronic flashcards, and supplementary online resources. It is available in both paper and kindle version for immediate access. (Vist Amazon books)


Sample Modeling questions

This test is 5 questions randomly taken from the 25 questions in the tests of the five sub-domains.

25

3 Modeling

This quiz is five questions randomly selected from all the Modeling quiz questions.

1 / 5

Unsupervised Learning is the machine learning task of inferring a function to describe hidden structure from <–?–> data.

2 / 5

What are the advantages of using recordIO protobuf?

  1. It can be compressed. This reduces storage and speeds up data transfer.
  2. It enables you to stream data into the algorithm using pipe mode directly from S3.

3 / 5

What training data format can be compressed and streamed into the algorithm using pipe mode directly from S3.

4 / 5

What is the main aim of Model Tuning?

5 / 5

In Model training what is another name for parameters, the internal values being used to process the data?

Study guides for Modeling:

Two news papers, one in French and one in English to symbolize the SageMaker text processing algorithm Sequence-to-sequence which performs machione translation of languages
Modeling (Domain 3)

Sequence-to-Sequence Algorithm

SageMaker Sequence-to-Sequence algorithm is used for machine translation of languages. The algorithm takes the input sequence of tokens, for example French words, and outputs the translation as a sequence of English words. As well as translation, Sequence-to-Sequence can be used to summarize a document and convert speech to text. Sequence-to-Sequence is a Supervised Learning algorithm….

Photograph of a senior lady reading a book to two young boys to symbolize Supervised Learning for Machine Learning
Modeling (Domain 3)

Supervised Learning for Machine Learning

What is Supervised Learning? For Supervised Learning you need labeled training data. In Supervised Learning we provide data that has already been identified and therefore labeled, as being what we are looking for. Once the Machine Learning model has been trained it can then be presented with real unknown data to which the Machine Learning…

a photograph of a burning book held in a hand to symbolize the SageMaker built-in algorithm BlazingText
Modeling (Domain 3)

BlazingText Algorithm

BlazingText is the name AWS calls it’s SageMaker built-in algorithm that can identify relationships between words in text documents. These relationships, which are also called embeddings, are expressed as vectors. The semantic relationship between words is preserved by the vectors which cluster words with similar semantics together. This conversion of words to meaningful numeric vectors…

A photograph of a washing line with pegs to symbolize the SageMaker Linear Learner algorithm
Modeling (Domain 3)

Linear Learner Algorithm

Linear Learner Algorithm is a Supervised Learning algorithm that can be used to solve three types of problems: Binary classification; Multi-class classification; and Regression. The algorithm is trained with lists of data comprising a high dimensional vector x and a label y to learn the equation of the line. The Linear Learner Algorithm uses Stochastic…

Machine Learning books on bookshelf
Modeling (Domain 3)

SageMaker text processing algorithms

There are four SageMaker text processing algorithms: BlazingText, LDA, NTM and Sequence-to-sequence. BlazingText converts text to numeric vectors. LDA and NTM identify topics in text documents and Sequence-to-sequence provides machine translation of languages. Each algorithm has it’s own section and embedded video. These revision notes are part of subdomain 3.2 Select the appropriate model(s) for…

Modeling (Domain 3)

K-Means Algorithm

The K-Means Algorithm is an Unsupervised Learning algorithm used to find clusters. The clusters are formed by grouping data points that are as similar as possible to each other and different from other data points. The distance between data points are calculated and averaged to form groups. K-Means is used for market segmentation, computer vision,…

Photo of lady with shopping approaching a car symbolizing Object2Vec Algorithm
Modeling (Domain 3)

Object2Vec Algorithm

Object2Vec Algorithm is an Unsupervised Learning algorithm. The algorithm compares pairs of data points and preserves the semantics of the relationship between the pairs. The algorithm creates embeddings that can be used by other algorithms downstream. The embeddings are low-dimensional dense embeddings of high-dimensional objects. Object2Vec can be used for product search, item matching and…

Image of a child under three years old reading a fruit alphabet book to symbolize Unsupervised Learning
Modeling (Domain 3)

SageMaker unsupervised algorithms

There are five SageMaker unsupervised algorithms that process tabular data. Unsupervised Learning algorithms process data that has not been labeled. IP Insights is an anomaly detection algorithm to detect problems and threats in an IR network. K-Means is a clustering algorithm. Object2Vec translates input data to vectors. Principal Component Analysis (PCA) algorithm is used in…

Image of a child under three years old reading a fruit alphabet book to symbolize Unsupervised Learning
Modeling (Domain 3)

Unsupervised Learning for Machine Learning

What is Unsupervised Learning? Unsupervised learning is the machine learning task of inferring a function to describe hidden structure from unlabeled data. Unsupervised Learning is used to infer patterns in unlabeled datasets. The algorithms can detect hidden patterns and data groupings in data without help from humans through labeling. Unsupervised learning is ideal for exploring…

A photograph of IT network cables and sockets to symbolize the SageMaker built in algorithm IP Insights
Modeling (Domain 3)

IP Insights Algorithm

SageMaker IP Insights Algorithm is used for detecting anomalies in network traffic. It is an unsupervised learning algorithm that is trained on historical data to learn the patterns of normal network usage. In production it can detect anomalies in network usage that may indicate changes in user behaviour, network performance or malicious activity.  The IP…

A photograph of hands on a table to symbolize the SageMaker K-Nearest Neighbor algorithm
Modeling (Domain 3)

K-Nearest Neighbors Algorithm

The K-Nearest Neighbors Algorithm is used to place data into a category for example in recommendation applications used for recommending products on Amazon, articles on Medium, movies on Netflix, or videos on YouTube. It returns results based on the nearest training data points to the sample datapoint, also called nearest neighbors.  The K-Nearest Neighbors algorithm…

a photgraph of a curving library bookshelf to symbolize the SageMaker text processing algorithm LDA
Modeling (Domain 3)

Latent Dirichlet Allocation Algorithm

SageMaker Latent Dirichlet Allocation algorithm (LDA) is an Unsupervised Learning algorithm that groups words in a document into topics. The topics are found by a probability distribution of all the words in a document. LDA can be used to discover topics shared by documents within a text corpus. The number of topics is specified by…

Amazon Study Guide for the AWS Machine Learning Speciality exam
Reviews

Amazon Study Guide review – AWS Certified Machine Learning Specialty

This Amazon Study Guide review is a review of the official Amazon study guide to accompany the exam. The study guide provides the domain-by-domain specific knowledge you need to build, train, tune, and deploy machine learning models with the AWS Cloud. The online resources that accompany this Study Guide include practice exams and assessments, electronic…

A photograph of a fruit stall in a market with a woman buying fruit. This iamge is used later in the article to symbolize and explain SageMaker image processing.
Modeling (Domain 3)

SageMaker image processing algorithms

There are three built-in SageMaker image processing algorithms. They are all Supervised Learning algorithms and so have to be trained using labelled data. Each one analyzes images in a different way and returns different inference data for downstream processing. SageMaker’s three built-in image processing algorithms each have their own way of visualizing real word objects….

A photo of coffee being dripped into a flask from a paper filter symbolising PCA Principal Component Analysis Algorithm
Modeling (Domain 3)

Principal Component Analysis Algorithm

Sometimes data can have large amounts of features, so many that further processing or inference can be hampered. When this occurs Principal Component Analysis Algorithm (PCA), an Unsupervised Learning algorithm, is used to reduce the number of features whilst retaining as much information as possible. This is Feature Engineering. PCA has two modes: Regular and…

A photograph showing gingerbread men being cut out with a cookie cutter to symbolize selecting a SageMaker built-in algorithm for an appropriate problem
Modeling (Domain 3)

How to select a model for a given machine learning problem

To select a model for a given Machine Learning problem we use the information and conclusions from Framing the Problem. A Machine Learning problem can be described with four aspects: The first aspect concerns the format and structure of the data, which could be numeric, images or text. Numeric data is often tabular. The second…

A photograph of a pasta machine making spaghetti symbolizing how SageMaker unsupervised learning algorithms process tabular data
Modeling (Domain 3)

SageMaker supervised algorithms

There are five SageMaker supervised algorithms for tabular data. DeepAR Forecasting uses Deep Learning for financial forecasting. Linear Learner is good for regression problems. Factorization Machines can be used for the same purpose, but can handle data with gaps and holes better. K-Nearest Neighbor is good at categorising data. XGBoost can predict if an item…

Whizlabs AWS certified machine learning course with a robot hand
Reviews

Whizlabs review – AWS Certified Machine Learning Specialty

Need more practice with the exams? Check out Whizlab’s free test with 15 questions. They also have three practice tests (65 questions each) and five section tests (10-15 questions each). Money off promo codes are below. For the AWS Certified Machine Learning Specialty Whizlabs provides a practice tests, a video course and hands-on labs. These…

Pluralsight AWS Certified Machine Learning web page screen shot
Reviews

Pluralsight review – AWS Certified Machine Learning Specialty

Contains affiliate links. If you go to Whizlab’s website and make a purchase I may receive a small payment. The purchase price to you will be unchanged. Thank you for your support. The AWS Certified Machine Learning Specialty learning path from Pluralsight has six high quality video courses taught by expert instructors. Two are introductory…

A photograph of boys playing Rugby and being lifted up in the air to symbolize the SageMaker XGBoost algorithm
Modeling (Domain 3)

XGBoost Algorithm

XGBoost Algorithm stands for eXtreme Gradient Boosting. XGBoost uses ensemble learning, which is also called boosting. The results of multiple models are grouped together to produce a better fit to the training data. Each decision tree model is added using the prediction errors of previous models to improve the fit to the training data. XGBoost…

A photograph of a fruit store with a women selecting fruit. The woman is bounded in a white box to show how SageMaker Object Detection algorithm works.
Modeling (Domain 3)

Object Detection Algorithm

The SageMaker Object Detection algorithm identifies and classifies objects in images. The identified object is placed in a class with a numerical measure of confidence. The location in the image is identified by a bounding box around the object. Object Detection is a Supervised Learning algorithm trained on a corpus of labeled images. Because the…

A graph of the Dow Jones index to symbolize the SageMaker DeepAR Forecasting algorithm
Modeling (Domain 3)

DeepAR Forecasting Algorithm

The SageMaker DeepAR Forecasting Algorithm forecasts how the target time series will evolve based on past performance. AR, which stands for AutoRegression, is a statistical method that uses observations from previous time steps as input to a regression equation to predict the value at the next time step. The forecast is a one dimensional time…

A photograph of cheese with holes to symbolize data with gaps and holes that can be processed by the SageMaker Factorization Machines algorithm
Modeling (Domain 3)

Factorization Machines Algorithm

The Factorization Machines Algorithm has two modes: Classification and Regression. Classification is a binary method that returns either one or zero and a label which is a number. The Regression mode returns the predicted value. Factorization Machines are a good choice for high dimensional, sparse datasets. Common uses are web page click prediction and item…

A photograph of a woman reading a newspaper to symbolize the SageMaker text processing Neural Topic Model (NTM) Algorithm
Modeling (Domain 3)

Neural Topic Model Algorithm

The Neural Topic Model Algorithm (NTM) is used to identify topics in a corpus of documents. NTM uses statistics to group words. The groups are termed Latent Representations because they are identified via word distributions in the documents. The Latent Representations reveal the semantics of the documents and so outperform analysis using the word form…

A photograph showing gingerbread men being cut out with a cookie cutter to symbolize selecting a SageMaker built-in algorithm for an appropriate problem
Modeling (Domain 3)

35 Q & A for SageMaker built-in algorithms

The AWS Machine Learning – Speciality certification exam (MLS-C01) tests your abilities to select the correct answer to real life scenarios. 36% of the questions in the MLS-C01 exam will be from Domain 3. These SageMaker built-in algorithms are part of Sub-domain 3.2, Select the appropriate models for a given Machine Learning problem. Sub-domain 3.2…

A photograph of a fruit stall in a market with a woman buying fruit. Items in the image have a tag label next to them with the fruit or object name. This symbolizes how the SageMaker image classification algorithm works.
Modeling (Domain 3)

Image Classification Algorithm

The SageMaker Image Classification algorithm can apply multiple labels to an image depending on what objects are identified. Objects are either identified, or not, there are no probability scores. Attributes Problem attribute Description Data types and format Image Learning paradigm or domain Image Processing, Supervised Problem type Image and multi-label classification Use case examples Label/tag…

A photograph of the flight instruments of an aeroplane to symbolize model hyperparameters
Modeling (Domain 3)

Model tuning

Hyperparameters can be thought of as the external controls that influence how the model operates, just as flight instruments control how an aeroplane flies. These values are external to the model and are controlled by the user. They can influence how an algorithm is trained and the structure of the final model. The optimized settings…

Credits

Photo by Robina Weermeijer on Unsplash