image symbolizing success with the AWS Machine Learning - Specalty exam

MLS-C01 AWS Machine Learning practice exam

This page has an AWS Machine Learning practice exam. The MLS-C01 exam contains 65 multiple choice questions. The exam must be completed in 180 minutes. The questions are multiple choice so you should pick the best answer. A few questions allow more than one answer to be chosen. The answers to the questions are displayed in a pdf format once the exam is completed which you can download and keep.

Question distribution

The exam specification, or syllabus describes how the exam is split between four knowledge domains and further divided into sub-domains. For more information about the exam content see this article:

Each knowledge has a percentage weight and since we know the whole exam contains 65 questions we can estimate how many questions are from each domain. The exam specification does not tell us the weighting for each sub-domain so I have assumed an equal distribution of the questions. The results are shown in the table below. Each of the domain descriptions links to their accompanying category page which lists the study guides for that category.

DomainDescription% of examQuestions
1Data Engineering2013
2Exploratory Data Analysis2415
4Machine Learning Implementation and Operations2013
AWS Machine Learning – Specialty exam content domains and question weightings

The domain questions can be distributed across the sub-domains equally as in the following table. Where sharing the questions between sub-domains leaves a remainder I have chosen the most important sub-domain to have the extra question. Each of the sub-domain descriptions links to their accompanying study guide.

Data Engineering
1.1Create data repositories for machine learning4
1.2Identify and implement a data-ingestion solution4
1.3Identify and implement a data-transformation solution5
Exploratory Data Analysis
2.1Sanitize and prepare data for modelling5
2.2Perform feature engineering5
2.3Analyze and visualize data for machine learning5
3.1Frame business problems as machine learning problems4
3.2Select the appropriate model(s) for a given machine learning problem6
3.3Train machine learning models4
3.4Perform hyperparameter optimization4
3.5Evaluate machine learning models4
Machine Learning Implementation and Operation
4.1Build machine learning solutions for performance, availability, scalability, resiliency and fault tolerance3
4.2Recommend and implement the appropriate machine learning services and features for a given problem4
4.3Apply basic AWS security practices to machine learning solutions3
4.4Deploy and operationalize machine learning solutions3
AWS Machine Learning – Specialty exam content sub-domains and numbers of question

Exam questions


This exam has 65 questions to be answered in 180 minutes. You will be warned when 90% of the time has elapsed. You can answer the questions in any order and can navigate backwards and forwards through the questions. At the end of the exam a pdf report will be displayed with the questions and answers for you to download and keep.

Good luck!

The exam has now finished. The report will be displayed.


MLS-C01 exam A

Gold content:
AWS Machine Learning – Specialty exam with 65 exam style questions.

1 / 65

1. The SageMaker built-in algorithm Object Detection is used to identify fungal infections in fruit. The results are displayed in this Confusion Matrix:

           Predicted NO   Predicted YES
Actual NO       83              3
Actual YES       2             85

When the Actual YES & Predicted YES value is added to the Actual NO & Predicted NO value and divided by the total of all values we get:

( TP+TN ) / Total = ( 85 + 83 ) / 173 = 168 / 173 = .97 or 97%

What is the name of this metric?

2 / 65

2. You work for an AI consultancy advising an on-line retail store about implementing a Personalization feature to increase sales value per customer. Your customer is very concerned about the possibility of introducing a model variant that may not perform as well as a previous version.

What method would you recommend to mitigate this risk?

3 / 65

3. You work for a Machine Learning Consultancy. Your client, an electric supply company, wishes to provide customers with more help when they provide meter readings of their energy consumption. You advise them that installing a chatbot on their website will improve the customer experience and provided targeted help for their immediate problems.

Which SageMaker AI services would you recommend?

4 / 65

4. You work for an online fashion clothing store. Machine Learning is used for personalization, showing customers items they are likely to buy.The Models are re-trained multiple times per day as items, availability and customer purchasing habits change. The speed with which a new model is available for inferencing is critical to optimise sales. Training data storage in S3 has been identified as a bottleneck.

What data storage solution could be used to reduce data loading time and increase data throughput?

5 / 65

5. You work for a fruit growing association on a research project to detect disease and pests in fruit from image data using machine learning models. From your results you create this Confusion Matrix:

What is the precision of the model?

Question Image

6 / 65

6. You work for a women’s online fashion retailer. The marketing department has noticed that more men are buying from the site, probably as a gift for female friends and family.They want to perform customer segmentation on unlabeled historical data see if gift givers form a distinct group.

What probabilistic approach would you recommend?

7 / 65

7. You work for a merchant bank that needs to store inferences from the SageMaker built-in algorithm as JSON objects that can be rapidly accessed. The access is at the item level within each JSON object.

Which data repository would be suitable for fast access of the JSON data?

8 / 65

8. You work for an SaaS company providing single sign on to other third party apps. Because of this you are a rich target for hackers becaus ethey would get access to many other websites and applications. One atack method that is often used is Credential Stuffing where a hacker uses a large number of password guesses to gain access. This maybe from the same compromised users account or a proxy service or several compromised or falsified users. The rapid multiple login attempts may be detectable. You are asked to use a SageMaker built-in algorithm to search login attempt records to identify these attacks.

Which SageMaker built-in algorithm would you use?

9 / 65

9. A fruit importer needs to rapidly review changing price information for many fruit types and packaging types. They want a visualization method that will clearly summarise large amounts of data and display key price indicators such as maximum and minimum price and outliers that can scew results.

What visualization method would you recommend?

10 / 65

10. You work for an insurance company that needs to store training data for regulatory compliance and to answer infrequent customer complaints about suspected unfairness (bias) in insurance premiums. When a customer complains research has shown that very quick responses can lead to a positive outcome and the policy being purchased.

What is the lowest cost storage that allows immediate access to the data?

11 / 65

11. Your AI Consultancy has been contacted by a soft drinks manufacturer. They want to save on energy costs by freezing their fruit juice concentrates overnight using cheap off peak electricity. However the concentrate must return to the chilled liquid form in time for use in production. Each concentrate has different freeze / thaw properties and the production schedule is always changing. This task has been performed by factory staff, but it is time consuming and it takes time for an individual staff member to get good at it.

Which unsupervised learning technique or feature would you recommend to maximise the cost savings?

12 / 65

12. An insurance company has set up a Datalake to store historical data about its customers, policies and claims. Data is sent from the main insurance application in a variety of formats. This data needs to be processed into a common format ready for querying and processing for ML models.

What is the simplest way to achieve this.

13 / 65

13. You are working at a market research company processing questionnaire data harvested from websites and face-to-face interviews. Not all questions are answered leaving some fields with missing data. One such field is annual income.

What is the best way to replace the missing income data with an estimated value?

14 / 65

14. You are responsible for a group of models that are used to decide if a customer can have credit, or has to pay in cash. The sales staff believe the models are biased towards older customers.

How can you reduce the bias in the models?

15 / 65

15. A real estate company needs to provide fast property price estimates based on recent sales, historical data, tax and economic information. Millions of predictions are processed quickly using Apache Spark on EMR. The recent data needs to be stored for repeated re-use for a couple of days. The data is also required longer term for historical analysis.

Which services can be used to achieve this?

16 / 65

16. You work for a home appliance maintenance company that is installing a new SageMaker Machine Learning system to send the engineer most likely to fix a problem first time. The CTO wants to know how much the ML model is used.

What is the simplest way to provide this information?

17 / 65

17. You work for a marketing company and have been asked to perform Brand Awareness analysis on some text based feeds, for example twitter.

How would you extract word embeddings for use in downstream Natural Language Processing (NLP)?

18 / 65

18. You are working in a bioscience company trying to predict the presence of cancer in patients from makers in their blood sample. The data is close to being regarded as sparse and contains outliers. You need to use a method to prevent overfitting that does not reduce the data density further.

Which technique should you use?

19 / 65

19. A financial services company wishes to create a database to supply historical data to ML models to predict future trends in the financial markets. It’s analysts have established expertise in Standard ANSI SQL. The data will have to be cleansed before it can be transformed for the ML models to use. What is the best AWS service for the Analyst to use?

20 / 65

20. A wildlife conservation charity has asked you for advice on how to identify frog species in an asian jungle. They have many recordings of frog vocalisations for 15 species. The charity want to develop a mobile device that will identify and count the species and number of frogs in an area by their vocalizations.

What SageMaker built in algorithm would you suggest they use.

21 / 65

21. You work for a stock broker. During trading hours the machine learning models are heavily used, but outside this time usage declines rapidly.It has been estimated that 1 instance is required at minimum and 6 at peak usage. The billing department notices that the AWS costs remain unchanged both during and after trading. You investigate and find the SageMaker Endpoint is misconfigured.

How do you correctly configure the endpoint to minimise cost whilst meeting peak demands and safety.

22 / 65

22. You work for a financial services company that is implementing machine learning models to predict future share price movements. The training data is taken from stock markets around the world and is combined together. Each record has three fields: company code; timestamp; price. You examine some price data: $13.96, $2.28, $15-21, £6.39.

What problems does this data have?

23 / 65

23. You work for a hotel booking company that uses Personalization to offer customers additional services once they have booked a room. Testing has shown that the additional services do not always display quickly enough. The ML Model is returning inferences too slowly especially in peak demand.

What actions do you take (choose 2)

24 / 65

24. You work for a city public authority with responsibility for provisioning community sports facilities, for example: basket ball courts; tennis courts; football pitches. Because no entrance fee is charged it is difficult to track the usage of these facilities. You suggest using images from the security cameras to determine usage via the SageMaker Object Detection algorithm. You obtain images with various amounts of people and usage to use as training data.

How can you label this data with the least amount of effort to the Authority.?

25 / 65

25. You work for a small insurance company. For regulatory compliance you must demonstrate separation of duties. This means that different teams and users have separate non-overlapping duties and matching security privileges. One separation is between Insurance Staff and Data Scientists when using SageMaker.

How would you enforce separation of duties using IAM?

26 / 65

26. You work for an AI Consultancy. Your client uses AWS and SageMaker exclusively. They have recently taken over another business that uses machine learning models that are developed in house using custom code and algorithms. Your client wants to incorporate these models into their AWS SageMaker environment.

How would you advise your client this as easily as possible with the least risk to the performance of the model?

27 / 65

27. What are the key features of Amazon SageMaker Reinforcement Learning? (choose two)

28 / 65

28. You work for a parcel delivery firm which receives data from it’s fleet of vehicles including location and driving events. The data arrives throughout the day all year round. The data needs to be stored in S3 for preprocessing before being presented to ML models.

Which AWS service can be used to store data is S3 as simply as possible?

29 / 65

29. You work for a cinema chain that uses XGBoost in regression mode to predict future audience sizes. Hyperparameter tuning is performed manually. To speed up the training process you have been asked to implement SageMaker automatic hyperparameter tuning.

Which of the following information would you have to provide to the training job?

30 / 65

30. You have been asked to analyze some COVID vaccine take up data. The data includes these fields, amongst others:

  • Age: 18 to 69 and 3 people over 100!
  • Annual income: $2,000 to $50,000
  • Home town
  • Covid vaccinations: Yes / No

You decide to use K-Means SageMaker built-in algorithm. What feature scaling technique or techniques would you use?

31 / 65

31. You work for an oil exploration company that need to improve performance monitoring of it’s oil rigs. The company plans to equip it’s rigs with many IOT devices monitoring the performance of machinery and then use a machine learning model to predict machinery breakdowns. The stream of data from the IOT devices will be collected by Kinesis Data Streams from which it will be unloaded to S3 storage by Docker containers in ECS. It is important to ensure all the data is retrieved.

32 / 65

32. You work for the mortgage department of a major retail bank. Inferences from the Machine Learning models has to be stored for seven years for regulatory reasons. At the end of seven years it must be deleted because it contains PII. The data may be accessed in the first eighteen months, but is seldom accessed after eighteen months. It is intensively used for the first two months.

How would you use S3 lifecycle management to ensure data is stored in the most appropriate data repository?

33 / 65

33. You have been contracted by a health organization to support a medical trial of a new cancer detection blood test. There will be 140,000 patients in the trial who will provide PII information.

What security features of S3 would you use to protect the data from unauthorized users?

34 / 65

34. You are working for a government health department and have been asked to process some nutrition data from four areas of your country. In each area the quantity of food eaten in grams is recorded for fifteen food types. You have been asked to identify areas that group together and have similar diets and outliers which require further investigations. What would you use to do this?

35 / 65

35. You are working for a healthcare organization to optimize their appointment scheduling system. You have been asked to analyze the problem of patients missing appointments by using historical data to identify patients likely to miss their appointments. The data you are given contains three fields:

  • scheduled_day: date and time
  • appointment_day: date
  • missed: yes/no

You have been told there is likely to be a relationship about the time between the scheduled day and the appointment day. How would you feature engineer the dates to test this hypothesis as simply as possible?

36 / 65

36. You work for a financial services company analyzing stock market prices. You use gradient descent to find the minimum which informs buying and selling decisions. The data is streamed in throughout the trading day and the algorithms are updated multiple times per day. Speed of completition is paramount.

What type of gradient descent would you use?

37 / 65

37. You work for an on-line fashion store. The store wants to increase add-on purchases of accessories by specific categories of customers for example young unmarried women, middle ages married men. This will be achieved by using past purchases of accessories by similar customers in the same categories to forecast the accessories the customer may want to purchase.

Which SageMaker AI service would you recommend?

38 / 65

38. You are advising a major health organization with many hospitals to optimise their on-call rota for senior clinicians. You have been asked to analyze the cost of senior clinician support for the common procedure of intubation in which a breathing tube is inserted into a patient’s throat. Some patient’s have challenging throat anatomy that can make intubation difficult requiring support from senior clinicians.

How can the types of different airway anatomies be classified in order of difficulty for intubation for further analysis?

39 / 65

39. How can SageMaker machine learning models be deployed in ECS with the least effort?

40 / 65

40. You work for a home loan company that uses machine learning to infer the sales price of a house. If is vital that changes to the model can be backed out quickly because even an out of date prediction is better than a poor new prediction. It can take some time before poore performance is identified.

What deployment method would you recommend?

41 / 65

41. You are working for an AI consultancy that has been asked by a client to set up a machine learning application that can be used as a template for future applications. They have many good ideas for ML apps so you decide to hold a workshop on Problem Framing so they can understand, define and prioritize business problems.

Which of these best practices would you include in the workshop? (select 2 answers)

42 / 65

42. You work for a car hire company. Large quantities of data messages are received from the car fleet concerning engine performance, location and driving events are received every hour 24 hours a day. This data has to be received and pre-processed prior to being fed to ML models for marketing and vehicle maintenance. The pre-processing is performed by an ECS container fleet. If a failure occures the Container will be automatically re-deployed and with them need to be given the data again.

Which service will satisfy these needs?

43 / 65

43. You work for a fintech company that makes predictive models for credit scoring. The predictive models are based on logistic regression. To validate the effectiveness of the credit score it is necessary to calculate the loss, the difference between predicted and actual outcomes. This is usually done by taking historical data inferring the predicted outcomes and then comparing this with what actually happened e.g. credit repayment defaults. The variables usually show a Gaussian distribution.

What type of loss function would you use?

44 / 65

44. You work for a bioinformatics company.  You have access to many biological datasets from similar research areas. You believe their may be overlaps of information that could lead to new synergies and discoveries. The data can be processed into biological objects analogous to the document-subject-word hierarchy.

Which SageMaker built-in algorithm would you use (choose two).

45 / 65

45. You work for an office supply company that has a very large supplier base. Recently a small fraud was uncovered where a fictitious company was submitting plausible invoices and was receiving money. You have been asked if machine learning can detect anomolies in the invoice data tghat will flag the invoice for further investigation and block its payment.

Which SageMaker built-in algorithm would you recommend?

46 / 65

46. Which of the following are model tuning best practices? (choose 2)

47 / 65

47. You have been asked to evaluate two model variants that use the Linear Learner SageMaker built-in algorithm. You have data for predicted and actual values.

How will you evaluate the model variants to determine which one performs better?

48 / 65

48. You need to compare a new Deep Learning model variant with the current production variant to confirm the new variant will not underperform when in production.

What is the best method to evaluate the models?

49 / 65

49. You work for an insurance company that uses a call centre to manage claims and sales. The company wants to improve the effectiveness of their call center staff by directing customer calls to the specialist teams as quickly as possible. The customers will be asked some questions and they will then be automatically re-directed to the correct team that can help them. These questions and answers are needed for both voice and text communications.

Which SageMaker AI service, or services would you use?

50 / 65

50. A power supply company has an existing infrastructure assets database for it’s items of equipment and installations and a seperate data feed of customer usage information. It wishes to combine this information and use a ML model to help organize the planned maintenance schedule.

How can this be achieved with a single data transformation service from AWS.

51 / 65

51. A Retail Bank has a Dataklake to store data about customers and their interactions. Data from other banking systems is copied to an S3 bucket.

What is the simplest way to make this data available for querying with Athena?

52 / 65

52. You are working at an online used car sales company. Users can post pictures and details of their cars for sale. Most people use genuine pictures but some use stock images or images taken from the internet. To stop this practice the company wants to identify images that are not genuine. You have a large corpus of images for training data that have to be labeled genuine or false.

What is the most effective way to rapidly label the data?

53 / 65

53. You work for a clothing retail company. You have been asked to predict future sales volumes based on medium range weather forecasts. You have historical data items sold and rainfall on the same day. You want to show the relationship or correlation between rainfall and sales for shoes, coats and umbrellas.

What type of data visualization could you use?


54 / 65

54. You work for a major car dealership franchise and you have been asked to investigate the relationship between car ownership and income in the franchise’s sales area. What type of graph would you select to display the relationship?

Select 2 answers

55 / 65

55. You work for a building security consultancy. Your clients have a problem with monitoring security cameras with as few staff as possible. It is very easy for a fatigued operator to miss a potential risk to the building or it’s occupants. They want a warning to be sounded when a camera showed a person with a weapon, or tools to break in to the building. Which SageMaker built-in algorithm would you recommend?

56 / 65

56. You work for a chain of sports centers. The business forecasts equipment usage rates using historical data. The ML algorithm used is XGBoost in regression mode. The hyperparameters have been guessed based on intuition. You have been asked to tune the hyperparameters. Speed is important because the business wants to update the forecast multiple times perday.

What hyperparameter tuning method would you choose?

57 / 65

57. You work for a large bank that is trying to leverage the customer data held in many systems all supporting different trading units, for example insurance, customer accounts, home loans etc. By bringing all this data together to form a single customer view the banks hopes to use machine learning to identify new products to sell to each customer. You decide to study two pieces of data to get a feel for the data quality: date of birth; home address. You find that a customer’s date of birth is recorded differently for 22% of customers. You compare the zip code (post code) with an address code look up service and find that it is incorrect or missing in 16% of customers.

How would you describe the data quality?

58 / 65

58. You work for an AI Consultancy advising a government client on transportation policy. You client on Transportation policy. Your client wants to reduce congestion in the road network. without building more roads. They believe that reducing the incidents of vehicles breaking down on major roads may improve traffic flows. They want you to use machine learning to predict if a vehicle is likely to breakdown whilst driving. You are given a vast amount of historical data. Once your model is trained it will have to be tested to show it’s effectiveness.

How will you split the data to show it’s effectiveness and avoid bias?

59 / 65

59. You work for an on-line fashion retailer. The company wants to present users with a list of items at the bottom of the web page that they are likely to buy. This list has to be based on the previous viewing and buying behaviour of the user.

What type of AI system would you suggest they use?

60 / 65

60. You are working for a bioscience company researching gene therapy. Machine Learning is being used to infer the results of gene expression based on previously collected data. Each sample has twenty genes and each test batch may contain up to six samples. You need to provide a graphical summary that can be understood at a glance and can show clustering.

Which data visualization method would you select.

61 / 65

61. You are helping a tropical fish company optimize it’s sales. The data includes a field called fish_type. This is often a Latin name, but may be a common name. Two or more fish may be the same fish type whilst having different names.How would you encode fish_type for Machine Learning

62 / 65

62. You work for a Home loans company that uses Machine Learning models as part of it’s decision making process. Regulatory requirements make creating an audit log of all calls to the model for inferring and retaining the information for seven years.

Which AWS service would you use to log this information

63 / 65

63. You are assisting in a medical study in the relationship between blood pressure and age and how environmental factors affect this. There is a problem with overfitting. The data contains outliers that can have substantial effect on the model when overfitting occurs. You recommend using regularization.

How does Regularization mitigate overfitting?

64 / 65

64. Creating a training job in SageMaker requires many parameters, for example:

  1. URL of the S3 bucket containing training data
  2. URL of the S3 bucket for the output

What other parameters are need, choose two?

65 / 65

65. A truck manufacture stores data from it’s manufacturing processes in Amazon EMR Hadoop cluster. The data is received in many formats.

What is the simplest way to clean and convert the data to a common format ready for processing by Machine Learning models?

Your score is

The average score is 32%


Good luck in your exam!


Photo by Joshua Earle on Unsplash

Whizlab’s AWS Certified Machine Learning Specialty practice exams

Practice Exams with 271 questions, Video Lectures and Hands-on Labs from Whizlabs

Whizlab’s AWS Certified Machine Learning Specialty Practice tests are designed by experts to simulate the real exam scenario. The questions are based on the exam syllabus outlined by official documentation. These practice tests are provided to the candidates to gain more confidence in exam preparation and self-evaluate them against the exam content.

Practice test content

  • Free Practice test – 15 questions
  • Practice test 1 – 65 questions
  • Practice test 2 – 65 questions
  • Practice test 3 – 65 questions
Whizlabs AWS certified machine learning course with a robot hand

Section test content

  • Core ML Concepts – 10 questions
  • Data Engineering – 11 questions
  • Exploratory Data Analysis – 13 questions
  • Modeling – 15 questions
  • Machine Learning Implementation and Operations – 12 questions

Similar Posts