
When people talk about Machine Learning they are mostly thinking about Modeling. Modeling is selecting and testing the algorithms to process data to find the information of value.
Scroll down for Modeling questions test app …
This domain comprises 36% of the exam marks and has five subdomains:
- 3.1 Frame the business problem
- Problem Framing for Machine Learning (5 questions)
- Supervised Learning for Machine Learning (5 questions)
- Unsupervised Learning for Machine Learning (5 questions)
- 3.2 Select the appropriate models
- How to select a model for a given machine learning problem (10 questions)
- 3.3 Train the models
- Training Machine Learning models (10 questions)
- 3.4 Tune the models
- Model tuning (10 questions)
- 3.5 Evaluate the models
- How to evaluate Machine Learning models (10 questions)
Problem Framing (subdomain 3.1) is a method used to understand, define and prioritize business problems. This will determine if all the work that is to be done subsequently is perceived to be of use and provides business value. Framing identifies what will be observed and what will be predicted and the metrics that need to be optimised to monitor performance. The Framing will lead to selecting the type of Machine Learning approach to use which will involve Supervised Learning or Unsupervised Learning.
For Supervised Learning you need labeled training data. In Supervised Learning we provide data that has already been identified and therefore labeled, as being what we are looking for. Unsupervised Learning is used to infer patterns in unlabeled datasets. The algorithms can detect hidden patterns and data groupings in data without help from humans through labeling. Unsupervised learning is ideal for exploring raw and unknown data.
Many models (subdomain 3.2) are available through AWS Machine Learning services. Each model has its own use cases and requirements. Once the model has been chosen an iterative process of training, tuning and evaluation is undertaken.
Model training (subdomain 3.3) is the process of providing a model with data to learn from. During model training the data is split into three parts. Most (70% to 80%) is used as training data with the remainder used for validation and testing.
Model tuning (subdomain 3.4) is also known as hyperparameter optimization. Hyperparameters are SageMaker settings that do not change during training. They can be tuned manually, using search methods and automatically by using SageMaker guided search. Model tuning also includes additional feature engineering and experimenting with new algorithms.
Model evaluation (subdomain 3.5) is used to find out how well a model will do in predicting the desired outcome. This is done using metrics to measure the performance of the Model. Metrics measure accuracy, precision and other features of the Model by comparing the results from the model with the known contents of the training data.
Your model is now ready to be used with real data. But before it can be let loose on your corporate data it has to be deployed into the production environment.
- For description of the exam structure see this articles: AWS Machine Learning exam syllabus
- The AWS exam guide pdf can be downloaded from: https://d1.awsstatic.com/training-and-certification/docs-ml/AWS-Certified-Machine-Learning-Specialty_Exam-Guide.pdf
Contains affiliate links. If you go to Whizlab’s website and make a purchase I may receive a small payment. The purchase price to you will be unchanged. Thank you for your support.
Whizlabs AWS Certified Machine Learning Specialty
Practice Exams with 271 questions, Video Lectures and Hands-on Labs from Whizlabs
Whizlab’s AWS Certified Machine Learning Specialty Practice tests are designed by experts to simulate the real exam scenario. The questions are based on the exam syllabus outlined by official documentation. These practice tests are provided to the candidates to gain more confidence in exam preparation and self-evaluate them against the exam content.
Practice test content
- Free Practice test – 15 questions
- Practice test 1 – 65 questions
- Practice test 2 – 65 questions
- Practice test 3 – 65 questions
Modeling test questions
Study guides for Modeling:
Semantic Segmentation Algorithm
via Gfycat The Semantic Segmentation algorithm processes images by tagging every single pixel in the image. This fine grained approach enables the information about the shapes of objects and edges to be gathered. A common use case is computer vision. The output of training is a Segmentation Mask which is a RGB or grayscale PNG…

Unsupervised Learning for Machine Learning
What is Unsupervised Learning? Unsupervised learning is the machine learning task of inferring a function to describe hidden structure from unlabeled data. Unsupervised Learning is used to infer patterns in unlabeled datasets. The algorithms can detect hidden patterns and data groupings in data without help from humans through labeling. Unsupervised learning is ideal for exploring…

SageMaker image processing algorithms
There are three built-in SageMaker image processing algorithms. They are all Supervised Learning algorithms and so have to be trained using labelled data. Each one analyzes images in a different way and returns different inference data for downstream processing. SageMaker’s three built-in image processing algorithms each have their own way of visualizing real word objects….

Principal Component Analysis Algorithm
Sometimes data can have large amounts of features, so many that further processing or inference can be hampered. When this occurs Principal Component Analysis Algorithm (PCA), an Unsupervised Learning algorithm, is used to reduce the number of features whilst retaining as much information as possible. This is Feature Engineering. PCA has two modes: Regular and…

How to select a model for a given machine learning problem
To select a model for a given Machine Learning problem we use the information and conclusions from Framing the Problem. A Machine Learning problem can be described with four aspects: The first aspect concerns the format and structure of the data, which could be numeric, images or text. Numeric data is often tabular. The second…

Linear Learner Algorithm
Linear Learner Algorithm is a Supervised Learning algorithm that can be used to solve three types of problems: Binary classification; Multi-class classification; and Regression. The algorithm is trained with lists of data comprising a high dimensional vector x and a label y to learn the equation of the line. The Linear Learner Algorithm uses Stochastic…

XGBoost Algorithm
XGBoost Algorithm stands for eXtreme Gradient Boosting. XGBoost uses ensemble learning, which is also called boosting. The results of multiple models are grouped together to produce a better fit to the training data. Each decision tree model is added using the prediction errors of previous models to improve the fit to the training data. XGBoost…

Object Detection Algorithm
The SageMaker Object Detection algorithm identifies and classifies objects in images. The identified object is placed in a class with a numerical measure of confidence. The location in the image is identified by a bounding box around the object. Object Detection is a Supervised Learning algorithm trained on a corpus of labeled images. Because the…

Supervised Learning for Machine Learning
What is Supervised Learning? For Supervised Learning you need labeled training data. In Supervised Learning we provide data that has already been identified and therefore labeled, as being what we are looking for. Once the Machine Learning model has been trained it can then be presented with real unknown data to which the Machine Learning…

35 Q & A for SageMaker built-in algorithms
The AWS Machine Learning – Speciality certification exam (MLS-C01) tests your abilities to select the correct answer to real life scenarios. 36% of the questions in the MLS-C01 exam will be from Domain 3. These SageMaker built-in algorithms are part of Sub-domain 3.2, Select the appropriate models for a given Machine Learning problem. Sub-domain 3.2…

SageMaker text processing algorithms
There are four SageMaker text processing algorithms: BlazingText, LDA, NTM and Sequence-to-sequence. BlazingText converts text to numeric vectors. LDA and NTM identify topics in text documents and Sequence-to-sequence provides machine translation of languages. Each algorithm has it’s own section and embedded video. These revision notes are part of subdomain 3.2 Select the appropriate model(s) for…

Latent Dirichlet Allocation Algorithm
SageMaker Latent Dirichlet Allocation algorithm (LDA) is an Unsupervised Learning algorithm that groups words in a document into topics. The topics are found by a probability distribution of all the words in a document. LDA can be used to discover topics shared by documents within a text corpus. The number of topics is specified by…
K-Means Algorithm
The K-Means Algorithm is an Unsupervised Learning algorithm used to find clusters. The clusters are formed by grouping data points that are as similar as possible to each other and different from other data points. The distance between data points are calculated and averaged to form groups. K-Means is used for market segmentation, computer vision,…

SageMaker supervised algorithms
There are five SageMaker supervised algorithms for tabular data. DeepAR Forecasting uses Deep Learning for financial forecasting. Linear Learner is good for regression problems. Factorization Machines can be used for the same purpose, but can handle data with gaps and holes better. K-Nearest Neighbor is good at categorising data. XGBoost can predict if an item…

Training Machine Learning models
Before a Machine Learning Model can be deployed to the production environment it has to be trained. Training Machine Learning Models allows the algorithm to learn from the training data how to make a generalized prediction. This is an iterative process where the training data is processed multiple times as the algorithm learns from previous…

Factorization Machines Algorithm
The Factorization Machines Algorithm has two modes: Classification and Regression. Classification is a binary method that returns either one or zero and a label which is a number. The Regression mode returns the predicted value. Factorization Machines are a good choice for high dimensional, sparse datasets. Common uses are web page click prediction and item…

How to evaluate Machine Learning models
Evaluating Machine Learning models is the last stage before deploying a model to production. We evaluate Machine Learning models to confirm that they are performing as expected and that they are good enough for the task they were created for. The evaluation stage is performed after model training is finished. Different techniques are used depending…

Model tuning
Hyperparameters can be thought of as the external controls that influence how the model operates, just as flight instruments control how an aeroplane flies. These values are external to the model and are controlled by the user. They can influence how an algorithm is trained and the structure of the final model. The optimized settings…

DeepAR Forecasting Algorithm
The SageMaker DeepAR Forecasting Algorithm forecasts how the target time series will evolve based on past performance. AR, which stands for AutoRegression, is a statistical method that uses observations from previous time steps as input to a regression equation to predict the value at the next time step. The forecast is a one dimensional time…

Image Classification Algorithm
The SageMaker Image Classification algorithm can apply multiple labels to an image depending on what objects are identified. Objects are either identified, or not, there are no probability scores. Attributes Problem attribute Description Data types and format Image Learning paradigm or domain Image Processing, Supervised Problem type Image and multi-label classification Use case examples Label/tag…

BlazingText Algorithm
BlazingText is the name AWS calls it’s SageMaker built-in algorithm that can identify relationships between words in text documents. These relationships, which are also called embeddings, are expressed as vectors. The semantic relationship between words is preserved by the vectors which cluster words with similar semantics together. This conversion of words to meaningful numeric vectors…

K-Nearest Neighbors Algorithm
The K-Nearest Neighbors Algorithm is used to place data into a category for example in recommendation applications used for recommending products on Amazon, articles on Medium, movies on Netflix, or videos on YouTube. It returns results based on the nearest training data points to the sample datapoint, also called nearest neighbors. The K-Nearest Neighbors algorithm…

Random Cut Forest Algorithm
The Random Cut Forest Algorithm (RCF) is an unsupervised algorithm which is used to identify anomalies in data. An anomaly is a data point that differs significantly from the bulk of the data. The Random Cut Forest Algorithm provides a score for each data point. A low score indicates the datapoint is similar to the…

Neural Topic Model Algorithm
The Neural Topic Model Algorithm (NTM) is used to identify topics in a corpus of documents. NTM uses statistics to group words. The groups are termed Latent Representations because they are identified via word distributions in the documents. The Latent Representations reveal the semantics of the documents and so outperform analysis using the word form…

Whizlabs review – AWS Certified Machine Learning Specialty
Need more practice with the exams? Check out Whizlab’s free test with 15 questions. They also have three practice tests (65 questions each) and five section tests (10-15 questions each). Money off promo codes are below. For the AWS Certified Machine Learning Specialty Whizlabs provides a practice tests, a video course and hands-on labs. These…

Problem Framing for Machine Learning
Problem Framing is a method used to understand, define and prioritize business problems. It is one of the most important phases in Machine Learning that will determine if all the work that is to be done subsequently is perceived to be of use and provides business value. Framing determines what will be observed and what…

Object2Vec Algorithm
Object2Vec Algorithm is an Unsupervised Learning algorithm. The algorithm compares pairs of data points and preserves the semantics of the relationship between the pairs. The algorithm creates embeddings that can be used by other algorithms downstream. The embeddings are low-dimensional dense embeddings of high-dimensional objects. Object2Vec can be used for product search, item matching and…

Sequence-to-Sequence Algorithm
SageMaker Sequence-to-Sequence algorithm is used for machine translation of languages. The algorithm takes the input sequence of tokens, for example French words, and outputs the translation as a sequence of English words. As well as translation, Sequence-to-Sequence can be used to summarize a document and convert speech to text. Sequence-to-Sequence is a Supervised Learning algorithm….

SageMaker unsupervised algorithms
There are five SageMaker unsupervised algorithms that process tabular data. Unsupervised Learning algorithms process data that has not been labeled. IP Insights is an anomaly detection algorithm to detect problems and threats in an IR network. K-Means is a clustering algorithm. Object2Vec translates input data to vectors. Principal Component Analysis (PCA) algorithm is used in…

IP Insights Algorithm
SageMaker IP Insights Algorithm is used for detecting anomalies in network traffic. It is an unsupervised learning algorithm that is trained on historical data to learn the patterns of normal network usage. In production it can detect anomalies in network usage that may indicate changes in user behaviour, network performance or malicious activity. The IP…
Credits
Photo by Robina Weermeijer on Unsplash