Data Engineering

The Data Engineering domain of the AWS Machine Learning Specialist certification exam comprises obtaining the data, transforming it and transferring it to a repository. Twenty percent of the exam marks come from this knowledge domain which is divided into three subdomains.
- 1.1 Create data repositories for machine learning.
- 1.2 Identify and implement a data-ingestion solution.
- 1.3 Identify and implement a data-transformation solution.
The data repository (subdomain 1.1) is where the raw and processed data is stored. S3 is the repository of choice for Machine Learning in AWS although some other data stores are also mentioned. The data ingestion subdomain (1.2) is concerned with getting the raw data into the repository. This can be via batch processing or streaming data. With batch processing data is collected and grouped at a point in time and passed to the data store. Streaming data is constantly being collected and fed into the data store. The third subdomain (1.3) focuses on how raw data is transformed into data that can be used for ML processing. The transformation process changes the data structure. The data may also need to be cleaned, de-duplicated, incomplete data managed and have it’s attributes standardised.
Once these data engineering processes are complete the data is ready for further pre-processing prior to being fed into a Machine Learning algorithm. This preprocessing is covered by the second knowledge domain, Exploratory Data Analysis.
- For description of the exam structure see this article: https://www.mlexam.com/aws-machine-learning-exam-syllabus/.
- The AWS exam guide pdf can be downloaded from: https://d1.awsstatic.com/training-and-certification/docs-ml/AWS-Certified-Machine-Learning-Specialty_Exam-Guide.pdf
AWS Certified Machine Learning Study Guide: Specialty (MLS-C01) Exam
This study guide provides the domain-by-domain specific knowledge you need to build, train, tune, and deploy machine learning models with the AWS Cloud. The online resources that accompany this Study Guide include practice exams and assessments, electronic flashcards, and supplementary online resources. It is available in both paper and kindle version for immediate access. (Vist Amazon books)
Sample Data Engineering questions
This test is five questions randomly taken from 17 questions of the three sub-domains.
Data engineering study guides

Whizlabs review – AWS Certified Machine Learning Specialty
Need more practice with the exams? Check out Whizlab’s free test with 15 questions. They also have three practice tests (65 questions each) and five section tests (10-15 questions each). Money off promo codes are below. For the AWS Certified Machine Learning Specialty Whizlabs provides a practice tests, a video course and hands-on labs. These…

Pluralsight review – AWS Certified Machine Learning Specialty
Contains affiliate links. If you go to Pluralsight’s website and make a purchase I may receive a small payment. The purchase price to you will be unchanged. Thank you for your support. The AWS Certified Machine Learning Specialty learning path from Pluralsight has six high quality video courses taught by expert instructors. Two are introductory…

Ingesting data for Machine Learning
These revision notes describe the AWS services used to ingest streaming data for Machine Learning.

Power Machine Learning at Scale – Summary
This is a summary of the AWS Power Machine Learning at Scale White Paper which is a 15 page pdf document focusing on High Power Computing (HPC) in AWS. It can be downloaded from here: The list of White Papers for Machine Learning is on the Prepare for Your AWS Certification Exam web page: AWS…

Batch processing for Machine Learning
For Machine Learning AWS glue and AWS Database Migration Service are used to ingest data. Batch processing refers to processing usually performed to a specific schedule. Before the batch process starts data is waiting and often any new data will have to wait for the next batch processing to be processed. In AWS any compute…

CV Library
If you want to land your dream AWS job you have to do more than just dream about it you need a CV. Agents may call, email or text and job ads pop up on every site you visit but the first thing they will ask for is a copy of your CV. A CV…