Using SageMaker AI services is like visiting a well equipped gym, you just have to choose the right equipment for your goals. AWS has a wide range of Machine Learning services and capabilities, each one has its own advantages and disadvantages. Understanding your use case is key to selecting the most appropriate service.
These revision notes cover sub-domain 4.2, Recommend and implement the appropriate machine learning services and features for a given problem, of the AWS Machine Learning Speciality exam. A description of all the knowledge domains in the exam is in these revision notes: AWS Machine Learning exam syllabus
The three layers of ML technologies and services
AWS describes it’s Machine Learning technologies and services in terms of three layers. Each layer builds on top of the preceding layer incorporating features and abstracting them so that users do not have to develop expertise in the underlying technologies.
- AI services
- Machine Learning Services
- Frameworks and Infrastructure
These revision notes contain four videos:
- An Overview of AI and Machine Learning Services From AWS
- Build Intelligent Apps Using AI Services
- Why TensorFlow?
- Deep learning with Apache MXNet
Video: An Overview of AI and Machine Learning Services From AWS
Amazon AI Services
AI services are AWS’s pre-packaged algorithms as a service. They are easy to use and incorporate to enhance existing systems or as completely new systems. Because they are services AWS does all the heavy lifting leaving the user to interact with the services without having to set up infrastructure or other supporting services.
Machine Learning Services
The Machine Learning services layer is focused on Amazon SageMaker and it’s associated services. If you want AWS to do all the heavy lifting but do not have a use case that is satisfied by any of the AI services in the top layer then SageMaker is your new friend.
SageMaker is Amazon’s managed service for Machine Learning. SageMaker has services and features to support all stages of Model development and production.
Model training can be performed using API calls that set up, run and tear down a high performance compute cluster managed by SageMaker. You can control the configuration of the cluster by selecting the EC2 instance type, size and number of instances. To help with analyzing the results of training SageMaker Debugger provides real-time insight into the training process by automating the capture and analysis of data.
- AWS docs: https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html
- Machine Learning Mastery: https://machinelearningmastery.com/difference-between-a-parameter-and-a-hyperparameter/
A model is optimized by adjusting it’s hyperparameters.
“A model hyperparameter is a configuration that is external to the model and whose value cannot be estimated from data.” (Machine Learning Mastery).
Amazon SageMaker has an automatic model optimizing feature known as hyperparameter tuning. You choose a metric and SageMaker runs the model multiple times with different combinations of hyper parameters until the optimal set of hyperparameters is found. For successful hyperparameter tuning you need:
- A prepared dataset
- A training job (model) that has run successfully before with the data
- Understand the Machine Learning algorithm you have selected
- A clear understanding what a successful training run looks like
There is a choice of strategies for finding the optimized hyperparameters:
- Random search
- Bayesian Search
- Custom search
Note, that it is possible for hyperparameter tuning to be unsuccessful and not return the optimized parameters.
Using SageMaker endpoints, multiple production model variants can be deployed at the same time. Traffic can be rapidly switched between them. This feature enables risk limiting deployment options such as:
- Green / Blue deployment
- Canary release
- A/B testing
This article compares these methods: Canary deployment, Blue/Green deployment and A/B testing compared
- AWS docs: https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-deployment.html#how-it-works-hosting
- AWS docs: https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html
Within SageMaker there are two ways to host your model for production:
- SageMaker endpoints
- SageMaker batch transform
SageMaker Ground Truth
Amazon SageMaker Ground Truth is a service you can use to manually label data. This provides high quality labelled data in the preprocessing stage to be used to train Supervised Learning models. Read more in Amazon SageMaker Ground Truth
SageMaker Notebooks are Jupyter Notebooks hosted on a managed SageMaker EC2 instances. A Jupyter Notebook is a web application that allows you to create documents that contain code, equations, visualizations and narrative text. It can be described as an Integrated Development Environment (IDE) in a web browser. This feature enables the text of a tutorial to be interlaced with executable code, so you can read the narrative and then execute the code to study the output. Amazon provides walkthrough instances that use this feature.
Typically each user has their own Notebook instance. Unlike Glue Notebooks, SageMaker notebooks do not have a permanent cluster spun up to support them, So they are much more cost effective to use.
Algorithms and Marketplace
AWS Marketplace is a curated catalog of algorithms and model packages that have been built by third party suppliers. Users can purchase the algorithms and model packages to use in their systems.
- SageMaker Algorithms are the model artifacts that can be trained to produce production models.
- SageMaker Model packages are the complete trained model ready to be used in production.
Amazon SageMaker RL
- AWS docs: https://docs.aws.amazon.com/sagemaker/latest/dg/reinforcement-learning.html
- AWS blog: https://aws.amazon.com/blogs/aws/amazon-sagemaker-rl-managed-reinforcement-learning-with-amazon-sagemaker/
Reinforcement Learning (RL) is something we do with pets and children. When they do something right we praise them, or give them a treat. When they screw up we scold them, or withdraw a privilege and blame an innocent third party or society in general. For Machine Learning models we use Markov Decision Processes (MDPs) which consist of a number of Episodes that comprises a series of Time Steps. Each Time Step has the following:
The model attempts to find a strategy that optimizes the cumulative reward over the long term. This strategy is called a Policy.
Amazon SageMaker RL key features are:
- A deep learning framework: TensorFlow, Apache MXNet.
- RL tool kit: Intel Coach, Ray RLlib.
- RL environment: OpenAI Gym, EnergyPlus and RoboSchool.
Frameworks and Infrastructures
This is the lowest Machine Learning layer. Here you are doing most, if not all of the heavy lifting yourself. You may not even be deploying in the cloud! Using a managed service such as SageMaker simplifies Machine Learning by abstracting away the complexities of the underlying infrastructure. However this abstraction comes at a cost since the details of the implementation are hidden from you and maybe SageMaker just does not allow you to do what you want to. Working with the lower level frameworks and interfaces gives you the freedom to interact more directly with the Machine Learning algorithms. The full Machine Learning pipeline is now exposed to you.
Working at this level also opens up the world of Deep Learning. Deep Learning is a subset of Machine Learning that focuses on Neural Networks. The performance of most Machine Learning algorithms plateaus as more training data is supplied. However Neural Networks get better the more data is processed.
TensorFlow is an open source Apache project. It provides an end to end framework with tools and libraries that make it easy to build and deploy Machine Learning applications. TensorFlow was originally developed by Google and was open sourced in 20?? Google continues to support it and contribute to the underlying code. TensorFlow uses Python as the orchestration language, but the processing libraries are written in super fast C++.
Using TensorFlow users can create processing pipelines called dataflow graphs that contain processing nodes. Each node is a mathematical operation, and each connection between the nodes is a data array, or tensor which is where the framework gets its name.
This is a 2.22. minute video from Tensor.org. It gives a brief overview of TensorFlow.
Mxnet is an open source framework developed by AWS and supported by Microsoft. It uses C++ on the backend for high performance and a range of languages, including Python, to interface with users. MxNet models are portable and able to fit into small amounts of memory. They are also scalable using multiple GPU instances. MxNet is Amazon’s Deep Learning language of choice since November 2016.
Video: Deep learning with Apache MXNet
This is a 24.13 minute from AWS introducing MxNet by Nathalie Rauschmayr. The timestamps are:
- 0 Introduction to Deep Learning
- 3.20 Introduction to MxNet
- 4.05 History of MxNet
- 6.53 MxNet Ecosystem
- 7.08 Multiple Language support
- 7.49 Ecosystem – Gluon toolkits, Model zoo, MxBoard, Spark, TensorRT, TVM, ONNX, Keras (fork)
- 13.36 Gluon API
- 14.09 imperative vs symbolic
- 17.35 Hybrid programming
- 18.44 Distributed training
- 20.12 Deep learning acceleration
- 21.05 ML Perf benchmark
- 21.44 MxNet community
- 24.13 End
Pytorch is an open source deep learning framework developed by Facebook in partnership with AWS. It has a Python and C++ interface and is optimised for high performance GPU processing. PyTorch has a versitile collection of tools including:
- torchtext – NLP
- torchvision – Computer vision
- torchaudio – Speech processing
PyTorch has tensor like structures that are GPU compatible for performance. An imperative paradigm adds a little to the processing graph with each line of code which aids in debugging.
Gluon is an open source deep learning library jointly created by AWS and Microsoft. Gluon acts as an interface between the user, coding in Python, and the Apache MxNet framework. This interface greatly simplifies the process of creating deep learning models without sacrificing training speed.
Keras is an open source Interface. It is independent of the major IT vendors, but the main contributor is a Google engineer. Keras APIs make TensorFlow easier to work with. They are simple and consistent to minimise the number of user actions for common use cases. This makes development iterations faster leading to final solutions quicker.
- AWS docs: https://docs.aws.amazon.com/sagemaker/latest/dg/cmn-info-instance-types.html
- AWS docs: https://docs.aws.amazon.com/dlami/latest/devguide/cpu.html
- AWS instance specifications: https://aws.amazon.com/ec2/instance-types/
All EC2 choices are based on computational power vs cost. More expensive EC2 types may work out cheaper because the high hourly rates are offset by needing to have the instance provisioned for a shorter length of time. Spot instances provide the opportunity of having the power you want with up to 70% cost saving. However the downside is that AWS can withdraw the EC2s at any time, even in the middle of your processing. This means spot instances are only useful for situations where your work can be interrupted. For example training cycles that can be delayed for a while.
The main choices for EC2 types are:
- SageMaker ML instance types
- AWS Inferentia
CPUs are regular compute virtual processors. For Machine Learning you should choose CPUs that are compute optimized. The instances in this family are prefixed by the letter “C”.
GPUs get their name from Graphical Processing Units. Originally this type of processor was designed for the high levels of processing required by highly graphical applications such as computer games. Compared to a CPU, a GPU has many more smaller sized logical cores. A core comprises arithmetic logic units (ALU), control units and memory cache. This architecture is suitable for processing a set of similar, simpler, computations in parallel. This is a typical workload for Machine Learning applications. GPUs cost more but complete processing quicker and so can work out more cost effective.
SageMaker managed EC2 instances are prefixed ml.m for standard instances, ml.c for compute optimised and ml.p for accelerated computing.
AWS Inferentia is a custom designed CPU chip optimised for inferencing in the cloud. This optimising will drive down the cost of cloud based Machine Learning by as much as 45% per inference. Up to 16 Inferentia CPUs can be configured in a single Inf1 EC2 instance for maximum power and throughput. Enhanced 100Gps networks improve throughput further by preventing network bottlenecks.
AWS has provided a SDK to make the best use of Inferentia instances called AWS Neuron. With Inferentia and Neuron, Machine Learning frameworks such as TensorFlow, Pytorch and MxNet can use high performance, low latency, EC2 instances to power Neural Network Processing.
Deep Learning (DL) containers and AMI
AWS Deep Learning containers are Docker containers pre-loaded with Machine Learning frameworks and libraries needed to start Machine Learning straight away. AWS DL container images can be obtained from the Elastic Container Registry and AWS Marketplace at no additional cost.
AWS Deep Learning AMIs (Amazon Machine Images) have popular Machine Learning frameworks pre-loaded. There are Base images ready for you to configure and load tools and images with Conda pre-installed.
Amazon Elastic Kubernetes Service (EKS)
Kubernetes is an open source orchestration system for Docker containers. EKS is Kubernetes with all the heavy lifting done by AWS. You can use EKS to run SageMaker Deep Learning containers, or your own non-sageMaker containers. EKS can also be installed on-prem by using the Amazon EKS Anywhere distro:
Amazon EKS Explained
AWS IoT Greengrass
GreenGrass helps you to build, maintain and deploy software on devices as part of an Internet of Things (IoT) system. With GreenGrass you can program devices to act locally on the data they generate and execute Machine Learning model inferencing. Only information that has to be returned is transmitted back home. GreenGrass also helps to maintain the software versions on the devices to keep them up to date.
Connected Devices at the Edge using AWS IoT Greengrass
SageMaker Spark containers
SageMaker Spark containers are used for data processing or feature engineering workloads. This brings the tremendous power and scalability of Apache Spark to bear on these resource intensive tasks. It makes sense to use Spark containers when these pre-processing tasks are intermittent and would not use a dedicated Spark cluster enough to make the administration of the cluster worthwhile.
SageMaker build your own containers
- AWS docs: https://docs.aws.amazon.com/sagemaker/latest/dg/docker-containers-adapt-your-own.html
- AWS ML training workshop: https://www.getstartedonsagemaker.com/workshop-studio-training/trainingbyoc/
With SageMaker you can bring-your-own container. Because SageMaker uses containers for its own processing you can take a container that has a model developed outside SageMaker and adapt it so it can work inside the SageMaker environment. There are two toolkits that will enable you to adapt your existing containerised model to work in SageMaker. If you are developing a new model there are toolkits for each of the major frameworks you can download from github.
The ML Pipeline course does not mention this sub-domain at all. The AWS White Paper The Machine Learning Lens describes some services and provides a couple of reference architectures. The AWS Exam Readiness course describes this sub-domain in terms of three tiers and then lists loads of AWS services. From this sparse guidance it appears AWS wants us to have an overview of their services for Machine Learning.
These revision notes cover sub-domain 4.2 of the Machine Learning Implementation and Operations knowledge domain (domain 4). The four sub-domains are:
- 4.1 Build machine learning solutions for performance, availability, scalability, resiliency, and fault tolerance.
- 4.2 Recommend and implement the appropriate machine learning services and features for a given problem.
- 4.3 Apply basic AWS security practices to machine learning solutions.
- 4.4 Deploy and operationalize machine learning solutions.
If you are progressing through the exam structure in order, the next revision notes to study are those for sub-domain 4.3 which is about security in AWS.
- Photo by CHUTTERSNAP on Unsplash
- Dog with flower photo by Richard Brutyo on Unsplash
- TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
- The Apache Software Foundation Apache MXNet, MXNet, Apache, the Apache feather, and the Apache MXNet project logo are either registered trademarks or trademarks of the Apache Software Foundation.
- PyTorch, the PyTorch logo and any related marks are trademarks of Facebook, Inc.
Contains affiliate links. If you go to Whizlab’s website and make a purchase I may receive a small payment. The purchase price to you will be unchanged. Thank you for your support.
Whizlabs AWS Certified Machine Learning Specialty
Whizlab’s AWS Certified Machine Learning Specialty Practice tests are designed by experts to simulate the real exam scenario. The questions are based on the exam syllabus outlined by official documentation. These practice tests are provided to the candidates to gain more confidence in exam preparation and self-evaluate them against the exam content.
Practice test content
- Free Practice test – 15 questions
- Practice test 1 – 65 questions
- Practice test 2 – 65 questions
- Practice test 3 – 65 questions
Section test content
- Core ML Concepts – 10 questions
- Data Engineering – 11 questions
- Exploratory Data Analysis – 13 questions
- Modeling – 15 questions
- Machine Learning Implementation and Operations – 12 questions
Questions and answers
Whizlab’s AWS Certified Machine Learning Specialty course
- In Whizlabs AWS Machine Learning certification course, you will learn and master how to build, train, tune, and deploy Machine Learning (ML) models on the AWS platform.
- Whizlab’s Certified AWS Machine Learning Specialty practice tests offer you a total of 200+ unique questions to get a complete idea about the real AWS Machine Learning exam.
- Also, you get access to hands-on labs in this course. There are about 10 lab sessions that are designed to take your practical skills on AWS Machine Learning to the next level.
- 9 Practice tests with 271 questions
- Video course with 65 videos
- 9 hands on labs