How to use this website
For most people effective learning is active learning. Frequent, targeted, testing can help to keep interest, measure achievements and identify weaknesses requiring further study. This site has 25+ tests with 193 questions and answers to help you pass the AWS Certified Machine Learning Specialty exam.
- Free Practice exam
- 25+ Tests
- 278 Questions and Answers
- Free practice exam
- 27 Study Guides
- 40+ curated videos
Suggest exam preparation strategy
- Go through each knowledge domain in order
- In each article have a go at the questions in the test app.
- Read and understand the material in the article.
- Answer the questions in the test app at the end of the article. Keep repeating the test until you get them all correct.
The test app
This is the test app to access over 25 tests containing 278 questions. Each app is one test with 5 questions from a test bank. You answer all the questions and your results are displayed at the end. Repeat the test to answer different questions randomly chosen from the test bank. Try to answer these questions about Data Repositories, which is part of the Data Engineering knowledge domain. Don’t worry if you do not get many correct.

Now read through this article: Machine Learning data repositories compared and answer the questions at the bottom of the page. Keep repeating the test at the end of the article, referring to the subject matter in the article, until you get 100%. remember the questions are not just to test you, they are also teaching you.
When you have completed all the Study Guides you are ready to take the Free Practice Exam which has AWS exam style questions.
About the exam

The AWS Certified Machine Learning Specialty exam is a Professional level certification exam from AWS. The exam is taken by experienced developers, engineers and data scientists who either wish to learn how AWS does Machine Learning, or wish to validate their existing knowledge. The exam is a multiple choice test taken at approved AWS test centres. The MLS-C01 exam is divided into four knowledge domains and sixteen sub-domains. This website has study guides that are organised in the same domain and sub-domain structure providing a learning path for the AWS ML exam. Each sub-domain has it’s own study guide with questions and answers so you can test your knowledge and validate your progress.
Machine Learning exam details
- Exam title: AWS Certified Machine Learning – Specialty
- Exam code: MLS-C01
- Exam cost: $300 USD
- Number of questions in exam: 65 Multiple choice
- Number of marks in exam: 100 – 1000
- Exam pass mark: 750 (pass or fail, no grades)
- Time allowed in exam: 180 minutes
- How to book the exam: Register with AWS training and book on-line. https://www.aws.training/Dashboard
Exam content structure
The exam content is divided into four knowledge domains each contributing a different proportion of the exam marks. The exame content is explained with a bit more detail in the Exam Guide from AWS.
- Data Engineering – 20%
- Exploratory Data Analysis – 24%
- Modeling – 36%
- Machine Learning Implementation and Operations – 20%

Resources
Test index
Frequent, targeted, testing can help to keep interest, measure achievements and identify weaknesses requiring further study. There are over 25 tests are embedded in the Study Guides. The Test Index has links to take you to each test.
Curated videos
Infographics
Infographics brings together information and displays it in an easy to remember way. The Infographics Index lists links to all the infographics so you can add them to your Pinterest account.
Other resources
This article describes free questions and answers available on the internet: Free questions for the AWS Machine Learning exam
New Questions and Answers
Study articles

Batch processing for Machine Learning
For Machine Learning AWS glue and AWS Database Migration Service are used to ingest data. Batch processing refers to processing usually performed to a specific schedule. Before the batch process starts data is waiting and often any new data will have to wait for the next batch processing to be processed. In AWS any compute…

Amazon Forecast
Overview Amazon Forecast uses historical time series data combined with user provided parameter data to generate predictions. The service requires time series data as an input. This can be argumented with local weather data. The desired quantile or mean forcast can be selected and the forcast is output as a CSV file. The output is…

BlazingText Algorithm
BlazingText is the name AWS calls it’s SageMaker built-in algorithm that can identify relationships between words in text documents. These relationships, which are also called embeddings, are expressed as vectors. The semantic relationship between words is preserved by the vectors which cluster words with similar semantics together. This conversion of words to meaningful numeric vectors…

Random Cut Forest Algorithm
The Random Cut Forest Algorithm (RCF) is an unsupervised algorithm which is used to identify anomalies in data. An anomaly is a data point that differs significantly from the bulk of the data. The Random Cut Forest Algorithm provides a score for each data point. A low score indicates the datapoint is similar to the…

Kinesis KPL vs API
The Kinesis Producer Library (KPL) and the Kinesis API can both be used to send data to Kinesis Data Streams. The advantage of the KPL is it provides a lot of added features, such as failed transmission handling built in. If you use the Kinesis API you have to code these features yourself. The advantages…

Amazon Personalize
Overview Amazon Personalize draws on features that Amazon incorporates into their own retail website. This includes personalization experiences, including specific product recommendations, personalized product re-ranking, and customized direct marketing. The Amazon Personalize AI service provides personalisation with AWS doing all the heavy lifting of providing the Machine Learning infrastructure to train and deploy the model….

Amazon Transcribe
Overview Amazon Transcribe converts speech to text, by using Automatic Speech Recognition (ASR) technology which is the same underlying technology used by Amazon Alexa. Transcribe can work with multiple languages and speakers and incorporate custom vocabulary provided by the user. Transcribe can be configured to remove sensitive text, such as PII information or swearing. Video:…

Identify data sources
Obtaining large specialised datasets is a must to experiment and train Machine Learning models so they can recognise patterns in real world data and infer a prediction. Datasets can also be used as a source of labeled data to train models to generalise unlabeled real world data. Fortunately there are many data sources for datasets…
Semantic Segmentation Algorithm
via Gfycat The Semantic Segmentation algorithm processes images by tagging every single pixel in the image. This fine grained approach enables the information about the shapes of objects and edges to be gathered. A common use case is computer vision. The output of training is a Segmentation Mask which is a RGB or grayscale PNG…

Principal Component Analysis Algorithm
Sometimes data can have large amounts of features, so many that further processing or inference can be hampered. When this occurs Principal Component Analysis Algorithm (PCA), an Unsupervised Learning algorithm, is used to reduce the number of features whilst retaining as much information as possible. This is Feature Engineering. PCA has two modes: Regular and…

Latent Dirichlet Allocation Algorithm
SageMaker Latent Dirichlet Allocation algorithm (LDA) is an Unsupervised Learning algorithm that groups words in a document into topics. The topics are found by a probability distribution of all the words in a document. LDA can be used to discover topics shared by documents within a text corpus. The number of topics is specified by…

Factorization Machines Algorithm
The Factorization Machines Algorithm has two modes: Classification and Regression. Classification is a binary method that returns either one or zero and a label which is a number. The Regression mode returns the predicted value. Factorization Machines are a good choice for high dimensional, sparse datasets. Common uses are web page click prediction and item…

Machine Learning data repositories compared
These revision notes describe the AWS services available for storing data in data repositories for use in Machine Learning

SageMaker unsupervised algorithms
There are five SageMaker unsupervised algorithms that process tabular data. Unsupervised Learning algorithms process data that has not been labeled. IP Insights is an anomaly detection algorithm to detect problems and threats in an IR network. K-Means is a clustering algorithm. Object2Vec translates input data to vectors. Principal Component Analysis (PCA) algorithm is used in…

Supervised Learning for Machine Learning
What is Supervised Learning? For Supervised Learning you need labeled training data. In Supervised Learning we provide data that has already been identified and therefore labeled, as being what we are looking for. Once the Machine Learning model has been trained it can then be presented with real unknown data to which the Machine Learning…

Ingesting data for Machine Learning
These revision notes describe the AWS services used to ingest streaming data for Machine Learning.

Streaming data for Machine Learning
Streaming data processing is used when data is continuously being generated and needs to be processed as it arrives. The AWS service for data streaming processing is Kinesis. Kineses comprises of four services each with different capabilities and some that can be used together. As well as Kinesis there is another AWS service that can…

IP Insights Algorithm
SageMaker IP Insights Algorithm is used for detecting anomalies in network traffic. It is an unsupervised learning algorithm that is trained on historical data to learn the patterns of normal network usage. In production it can detect anomalies in network usage that may indicate changes in user behaviour, network performance or malicious activity. The IP…
K-Means Algorithm
The K-Means Algorithm is an Unsupervised Learning algorithm used to find clusters. The clusters are formed by grouping data points that are as similar as possible to each other and different from other data points. The distance between data points are calculated and averaged to form groups. K-Means is used for market segmentation, computer vision,…

How to select a model for a given machine learning problem
To select a model for a given Machine Learning problem we use the information and conclusions from Framing the Problem. A Machine Learning problem can be described with four aspects: The first aspect concerns the format and structure of the data, which could be numeric, images or text. Numeric data is often tabular. The second…

Data cleansing and preparation for modeling
Understanding data, cleansing data and dataset generation are important first steps in exploratory data analysis. Every other phase in the Machine Learning process relies on the data being cleaned and prepared. This Study Guide starts with statistical techniques used to help understand the data. Once data is understood it has to be cleaned up so…

35 Q & A for SageMaker built-in algorithms
The AWS Machine Learning – Speciality certification exam (MLS-C01) tests your abilities to select the correct answer to real life scenarios. 36% of the questions in the MLS-C01 exam will be from Domain 3. These SageMaker built-in algorithms are part of Sub-domain 3.2, Select the appropriate models for a given Machine Learning problem. Sub-domain 3.2…

Amazon Textract
Overview Amazon Textract is used to convert scanned documents to text. This includes text in tables and hand written form. When text is extracted it is returned with coordinates that identify a box shaped area on the document. This allows for auditing later since the text can be traced back to a specific area in…

The Machine Learning Production Environment
When you launch a Machine Learning solution in production it needs to perform well to provide the business benefit it was designed for. There are two types of performance: This Study Guide focuses on the production environment. The production environment can be assessed using five measures: There are three curated videos in this Study Guide:…
Credits
- Light house photo by Rodrigo Soares on Unsplash, Brain stars by GDJ – Gordon Johnson from Pixabay
- Infographic
- tools by Luis Prado from the Noun Project
- binocular by Eucalyp from the Noun Project
- Factory by iconsphere from the Noun Project
- AWS icons by Amazon Web Services