
When people talk about Machine Learning they are mostly thinking about Modeling. Modeling is selecting and testing the algorithms to process data to find the information of value.
Scroll down for Modeling questions test app …
This domain comprises 36% of the exam marks and has five subdomains:
- 3.1 Frame the business problem
- Problem Framing for Machine Learning (5 questions)
- Supervised Learning for Machine Learning (5 questions)
- Unsupervised Learning for Machine Learning (5 questions)
- 3.2 Select the appropriate models
- How to select a model for a given machine learning problem (10 questions)
- 3.3 Train the models
- Training Machine Learning models (10 questions)
- 3.4 Tune the models
- Model tuning (10 questions)
- 3.5 Evaluate the models
- How to evaluate Machine Learning models (10 questions)
Problem Framing (subdomain 3.1) is a method used to understand, define and prioritize business problems. This will determine if all the work that is to be done subsequently is perceived to be of use and provides business value. Framing identifies what will be observed and what will be predicted and the metrics that need to be optimised to monitor performance. The Framing will lead to selecting the type of Machine Learning approach to use which will involve Supervised Learning or Unsupervised Learning.
For Supervised Learning you need labeled training data. In Supervised Learning we provide data that has already been identified and therefore labeled, as being what we are looking for. Unsupervised Learning is used to infer patterns in unlabeled datasets. The algorithms can detect hidden patterns and data groupings in data without help from humans through labeling. Unsupervised learning is ideal for exploring raw and unknown data.
Many models (subdomain 3.2) are available through AWS Machine Learning services. Each model has its own use cases and requirements. Once the model has been chosen an iterative process of training, tuning and evaluation is undertaken.
Model training (subdomain 3.3) is the process of providing a model with data to learn from. During model training the data is split into three parts. Most (70% to 80%) is used as training data with the remainder used for validation and testing.
Model tuning (subdomain 3.4) is also known as hyperparameter optimization. Hyperparameters are SageMaker settings that do not change during training. They can be tuned manually, using search methods and automatically by using SageMaker guided search. Model tuning also includes additional feature engineering and experimenting with new algorithms.
Model evaluation (subdomain 3.5) is used to find out how well a model will do in predicting the desired outcome. This is done using metrics to measure the performance of the Model. Metrics measure accuracy, precision and other features of the Model by comparing the results from the model with the known contents of the training data.
Your model is now ready to be used with real data. But before it can be let loose on your corporate data it has to be deployed into the production environment.
- For description of the exam structure see this articles: AWS Machine Learning exam syllabus
- The AWS exam guide pdf can be downloaded from: https://d1.awsstatic.com/training-and-certification/docs-ml/AWS-Certified-Machine-Learning-Specialty_Exam-Guide.pdf
AWS Certified Machine Learning Study Guide: Specialty (MLS-C01) Exam
This study guide provides the domain-by-domain specific knowledge you need to build, train, tune, and deploy machine learning models with the AWS Cloud. The online resources that accompany this Study Guide include practice exams and assessments, electronic flashcards, and supplementary online resources. It is available in both paper and kindle version for immediate access. (Vist Amazon books)
Modeling test questions
Study guides for Modeling:

SageMaker image processing algorithms
There are three built-in SageMaker image processing algorithms. They are all Supervised Learning algorithms and so have to be trained using labelled data. Each one analyzes images in a different way and returns different inference data for downstream processing. SageMaker’s three built-in image processing algorithms each have their own way of visualizing real word objects….

Latent Dirichlet Allocation Algorithm
SageMaker Latent Dirichlet Allocation algorithm (LDA) is an Unsupervised Learning algorithm that groups words in a document into topics. The topics are found by a probability distribution of all the words in a document. LDA can be used to discover topics shared by documents within a text corpus. The number of topics is specified by…

Image Classification Algorithm
The SageMaker Image Classification algorithm can apply multiple labels to an image depending on what objects are identified. Objects are either identified, or not, there are no probability scores. Attributes Problem attribute Description Data types and format Image Learning paradigm or domain Image Processing, Supervised Problem type Image and multi-label classification Use case examples Label/tag…

IP Insights Algorithm
SageMaker IP Insights Algorithm is used for detecting anomalies in network traffic. It is an unsupervised learning algorithm that is trained on historical data to learn the patterns of normal network usage. In production it can detect anomalies in network usage that may indicate changes in user behaviour, network performance or malicious activity. The IP…

Amazon Study Guide review – AWS Certified Machine Learning Specialty
This Amazon Study Guide review is a review of the official Amazon study guide to accompany the exam. The study guide provides the domain-by-domain specific knowledge you need to build, train, tune, and deploy machine learning models with the AWS Cloud. The online resources that accompany this Study Guide include practice exams and assessments, electronic…

Object Detection Algorithm
The SageMaker Object Detection algorithm identifies and classifies objects in images. The identified object is placed in a class with a numerical measure of confidence. The location in the image is identified by a bounding box around the object. Object Detection is a Supervised Learning algorithm trained on a corpus of labeled images. Because the…

SageMaker text processing algorithms
There are four SageMaker text processing algorithms: BlazingText, LDA, NTM and Sequence-to-sequence. BlazingText converts text to numeric vectors. LDA and NTM identify topics in text documents and Sequence-to-sequence provides machine translation of languages. Each algorithm has it’s own section and embedded video. These revision notes are part of subdomain 3.2 Select the appropriate model(s) for…

Problem Framing for Machine Learning
Problem Framing is a method used to understand, define and prioritize business problems. It is one of the most important phases in Machine Learning that will determine if all the work that is to be done subsequently is perceived to be of use and provides business value. Framing determines what will be observed and what…

SageMaker unsupervised algorithms
There are five SageMaker unsupervised algorithms that process tabular data. Unsupervised Learning algorithms process data that has not been labeled. IP Insights is an anomaly detection algorithm to detect problems and threats in an IR network. K-Means is a clustering algorithm. Object2Vec translates input data to vectors. Principal Component Analysis (PCA) algorithm is used in…

Training Machine Learning models
Before a Machine Learning Model can be deployed to the production environment it has to be trained. Training Machine Learning Models allows the algorithm to learn from the training data how to make a generalized prediction. This is an iterative process where the training data is processed multiple times as the algorithm learns from previous…

Unsupervised Learning for Machine Learning
What is Unsupervised Learning? Unsupervised learning is the machine learning task of inferring a function to describe hidden structure from unlabeled data. Unsupervised Learning is used to infer patterns in unlabeled datasets. The algorithms can detect hidden patterns and data groupings in data without help from humans through labeling. Unsupervised learning is ideal for exploring…

Whizlabs review – AWS Certified Machine Learning Specialty
Need more practice with the exams? Check out Whizlab’s free test with 15 questions. They also have three practice tests (65 questions each) and five section tests (10-15 questions each). Money off promo codes are below. For the AWS Certified Machine Learning Specialty Whizlabs provides a practice tests, a video course and hands-on labs. These…

XGBoost Algorithm
XGBoost Algorithm stands for eXtreme Gradient Boosting. XGBoost uses ensemble learning, which is also called boosting. The results of multiple models are grouped together to produce a better fit to the training data. Each decision tree model is added using the prediction errors of previous models to improve the fit to the training data. XGBoost…

K-Nearest Neighbors Algorithm
The K-Nearest Neighbors Algorithm is used to place data into a category for example in recommendation applications used for recommending products on Amazon, articles on Medium, movies on Netflix, or videos on YouTube. It returns results based on the nearest training data points to the sample datapoint, also called nearest neighbors. The K-Nearest Neighbors algorithm…
K-Means Algorithm
The K-Means Algorithm is an Unsupervised Learning algorithm used to find clusters. The clusters are formed by grouping data points that are as similar as possible to each other and different from other data points. The distance between data points are calculated and averaged to form groups. K-Means is used for market segmentation, computer vision,…

How to evaluate Machine Learning models
Evaluating Machine Learning models is the last stage before deploying a model to production. We evaluate Machine Learning models to confirm that they are performing as expected and that they are good enough for the task they were created for. The evaluation stage is performed after model training is finished. Different techniques are used depending…

Factorization Machines Algorithm
The Factorization Machines Algorithm has two modes: Classification and Regression. Classification is a binary method that returns either one or zero and a label which is a number. The Regression mode returns the predicted value. Factorization Machines are a good choice for high dimensional, sparse datasets. Common uses are web page click prediction and item…

35 Q & A for SageMaker built-in algorithms
The AWS Machine Learning – Speciality certification exam (MLS-C01) tests your abilities to select the correct answer to real life scenarios. 36% of the questions in the MLS-C01 exam will be from Domain 3. These SageMaker built-in algorithms are part of Sub-domain 3.2, Select the appropriate models for a given Machine Learning problem. Sub-domain 3.2…

Linear Learner Algorithm
Linear Learner Algorithm is a Supervised Learning algorithm that can be used to solve three types of problems: Binary classification; Multi-class classification; and Regression. The algorithm is trained with lists of data comprising a high dimensional vector x and a label y to learn the equation of the line. The Linear Learner Algorithm uses Stochastic…

SageMaker supervised algorithms
There are five SageMaker supervised algorithms for tabular data. DeepAR Forecasting uses Deep Learning for financial forecasting. Linear Learner is good for regression problems. Factorization Machines can be used for the same purpose, but can handle data with gaps and holes better. K-Nearest Neighbor is good at categorising data. XGBoost can predict if an item…

CV Library
If you want to land your dream AWS job you have to do more than just dream about it you need a CV. Agents may call, email or text and job ads pop up on every site you visit but the first thing they will ask for is a copy of your CV. A CV…

Model tuning
Hyperparameters can be thought of as the external controls that influence how the model operates, just as flight instruments control how an aeroplane flies. These values are external to the model and are controlled by the user. They can influence how an algorithm is trained and the structure of the final model. The optimized settings…

Neural Topic Model Algorithm
The Neural Topic Model Algorithm (NTM) is used to identify topics in a corpus of documents. NTM uses statistics to group words. The groups are termed Latent Representations because they are identified via word distributions in the documents. The Latent Representations reveal the semantics of the documents and so outperform analysis using the word form…

Pluralsight review – AWS Certified Machine Learning Specialty
Contains affiliate links. If you go to Pluralsight’s website and make a purchase I may receive a small payment. The purchase price to you will be unchanged. Thank you for your support. The AWS Certified Machine Learning Specialty learning path from Pluralsight has six high quality video courses taught by expert instructors. Two are introductory…

DeepAR Forecasting Algorithm
The SageMaker DeepAR Forecasting Algorithm forecasts how the target time series will evolve based on past performance. AR, which stands for AutoRegression, is a statistical method that uses observations from previous time steps as input to a regression equation to predict the value at the next time step. The forecast is a one dimensional time…

Principal Component Analysis Algorithm
Sometimes data can have large amounts of features, so many that further processing or inference can be hampered. When this occurs Principal Component Analysis Algorithm (PCA), an Unsupervised Learning algorithm, is used to reduce the number of features whilst retaining as much information as possible. This is Feature Engineering. PCA has two modes: Regular and…

Object2Vec Algorithm
Object2Vec Algorithm is an Unsupervised Learning algorithm. The algorithm compares pairs of data points and preserves the semantics of the relationship between the pairs. The algorithm creates embeddings that can be used by other algorithms downstream. The embeddings are low-dimensional dense embeddings of high-dimensional objects. Object2Vec can be used for product search, item matching and…

Supervised Learning for Machine Learning
What is Supervised Learning? For Supervised Learning you need labeled training data. In Supervised Learning we provide data that has already been identified and therefore labeled, as being what we are looking for. Once the Machine Learning model has been trained it can then be presented with real unknown data to which the Machine Learning…
Semantic Segmentation Algorithm
via Gfycat The Semantic Segmentation algorithm processes images by tagging every single pixel in the image. This fine grained approach enables the information about the shapes of objects and edges to be gathered. A common use case is computer vision. The output of training is a Segmentation Mask which is a RGB or grayscale PNG…

Random Cut Forest Algorithm
The Random Cut Forest Algorithm (RCF) is an unsupervised algorithm which is used to identify anomalies in data. An anomaly is a data point that differs significantly from the bulk of the data. The Random Cut Forest Algorithm provides a score for each data point. A low score indicates the datapoint is similar to the…
Credits
Photo by Robina Weermeijer on Unsplash