A Leady reading a map on an open road symbolising the AWS Machine Learning exam syllabus

AWS Machine Learning exam syllabus

What is in the AWS Machine Learning Specialist Certificate exam?

Perhaps exam syllabus is an old fashioned word, which is why AWS do not use it. Specification, and blue print are more current terms, but AWS don’t call it that either. AWS call the description of their course the Exam Guide. This is where the course contents are listed, split into four domains and fifteen sub-domains. Each is sub-domain has a single sentence that may leave you little wiser as to what that section is about. If you already know what “perform hyperparameter optimization” is, perhaps you don’t need to do this exam. This article describes each sub-domain in enough detail for the complete newbie to get a good idea about what it is about. This will give you an overview of what you are getting yourself in to.

Last updated: 24 May 2021

Domain 1: Data Engineering

This Domain is about getting the data, transforming it and putting it in a repository. It comprises 20% of the exam marks. There are three sub-domains that can be summarised as:

The data repository (sub-domain 1.1) is where you store the raw and processed data. S3 is the repository of choice for Machine Learning in AWS although some other data stores are also mentioned. The data ingestion sub-domain (1.2) is concerned with getting the raw data into the repository. This can be via batch processing or streaming data. With batch processing data is collected and grouped at a point in time and passed to the data store. Streaming data is constantly being collected and fed into the data store. The third sub-domain (1.3) focuses on how raw data is transformed into data that can be used for ML processing. The transformation process changes the data structure. The data may also need to be clean up, de-duplicated, incomplete data managed and have it’s attributes standardised.

Once these data engineering processes are complete the data is ready for further pre-processing prior to being fed into a Machine Learning algorithm. This pre-processing is covered by the second knowledge domain Exploratory Data Analysis.

Domain 2: Exploratory Data Analysis

In this domain the data is analysed so it can be understood and cleaned up. It comprises 24% of the exam marks. There are three sub-domains:

Analyzing and visualizing the data (sub-domain 2.3) overlaps with the other two sub-domains which use these techniques. The techniques include graphs, charts and matrices. Before you can sanitize and prepare data (sub-domain 2.1) you have to understand the data. This is done using statistics that focus on specific aspects of the data and graphs and charts that allow relationships and distributions to be seen. The data can then be cleaned up using techniques to remove distortions and fill in gaps. Feature Engineering (sub-domain 2.2) is about creating new features from existing ones to make the ML algorithms more powerful. Techniques are used to reduce the number of features and categorise the data.

You now understand your data and have cleaned it up ready for the next stage, modeling.

Domain 3: Modeling

When people talk about Machine Learning they are mostly thinking about Modeling. Modeling is selecting and testing the algorithms to process data to find the information of value. It comprises 36% of the exam marks. This domain has five sub-domains:

(sub-domain 3.1) Firstly decide if ML is appropriate for this problem. ML is good for data driven problems involving large amounts of data where the rules cannot easily be coded. The business problem can probably be framed in many ways and this determines what kind of ML problem is being solved. For example the business problem could be framed to require a yes/no answer as in fraud detection, or a value as in share price.

Many models (sub-domain 3.2) are available through AWS Machine Learning services. Each model has it’s own use cases and requirements. Once the model has been chosen an iterative process of training, tuning and evaluation is undertaken.

Model training (sub-domain 3.3) is the process of providing a model with data to learn from. During model training the data is split into three parts. Most (70% to 80%) is used as training data with the remainder used for validation and testing.

Model tuning (sub-domain 3.4) is also known as hyperparameter optimisation. Hyperparameters are SageMaker settings that do not change during training. They can be tuned manually, using search methods and automatically by using SageMaker guided search. Model tuning also includes additional feature engineering and experimenting with new algorithms.

Model evaluation (sub-domain 3.5) is used to find out how well a model will do in predicting the desired outcome. This is done using metrics to measure the performance of the Model. Metrics measure accuracy, precision and other features of the Model by comparing the results from the model with the known contents of the training data.

Your model is now ready to be used with real data. But before it can be let loose on your corporate data it has to be deployed into the production environment.

Domain 4: Machine Learning Implementation and Operations

This is about productionisation and related DevOps skills to make everything work in production. It comprises 20% of the exam marks. There are four sub-domains:

Building highly available fault tolerant systems relies on separating components of a system into a loosely coupled distributed system. This ensures that failure in one part of the system is less able to effect other parts of the system. AWS services and features then enable decoupling are SQS, CloudWatch, CloudTrail and SageMaker Notebook end points.

Scalability is the property of a system to automatically provision more resources when needed and to scale back those resources to reduce waste when demand is low. AWS services and features that enable scalability are Autoscaling and containerised ML models, which are Docker images.

Conclusion

The AWS Certified Machine Learning – Speciality, exam readiness course prepares you for the exam. It is good for outlining the breadth of the course and how it is divided up into four domains and fifteen sub-domains. Whilst it lists and mentions many subjects only a few are described in any detail and it is still a bit light with those. I suggest this is the first thing you should study when you start preparing for the exam.

Credits

Photo by Daniel Gonzalez on Unsplash

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *