The Machine Learning Production Environment
When you launch a Machine Learning solution in production it needs to perform well to provide the business benefit it was designed for. There are two types of performance:
- The performance of the Model. Are the prediction good enough to provide business benefit?
- The technical performance of the Model in the production environment. How well does the production environment support the operation of the Model?
This Study Guide focuses on the production environment. The production environment can be assessed using five measures:
- performance
- availability
- scalability
- resiliency
- fault tolerance
There are three curated videos in this Study Guide:
- Deploy Your ML Models to Production at Scale with Amazon SageMaker
- Amazon ECS: Autoscaling for Containers
- Amazon EMR Notebooks
Build machine learning solutions for performance, availability, scalability, resiliency, and fault tolerance is sub-domain 4.1 of the Machine Learning Implementation and Operations knowledge domain. For more information about the exam structure see: AWS Machine Learning exam syllabus
Questions
To confirm your understanding scroll to the bottom of the page for 10 questions and answers.
Assessing the production environment
Performance in the production environment
The performance referred to is the speed of the application, not the predictive performance of the model. This can be measured in workload over time.
For example:
- Megabyte per second
- Records per-minute
- Predictions returned per minute
- Batch processing time in hours
The performance of the application, is dependent on the size of the workload and the power of the resources available for the processing. There may also be a set up time if the resources are not immediately available.
Availability in the production environment
Availability is the capability of a Machine Learning application to keep working and be available for work even if there is an infrastructure failure.
A Machine learning application can be regarded as available, even if it is not working at full efficiency. Availability is measured over time and expressed as a percentage and is often called system uptime. To achieve this, components of the system are distributed across different AWS Availability Zones, usually referred to as AZs. These are separate data centers, physically isolated from each other, so that a problem at one does not affect the others. Each AWS Region has two, and usually three AZs. When you use SageMaker and specify two or more instances they are automatically placed in different AZs. If you choose a different deployment option you have the responsibility to do this yourself.
- AWS docs: https://docs.aws.amazon.com/sagemaker/latest/dg/best-practices.html
- AWS Well-Architected Framework: https://wa.aws.amazon.com/wat.concept.availability.en.html
Scalability in the production environment
Scalability refers to the ability of a Machine Learning application to handle increases in workload without decreasing performance or output quality.
SageMaker automatically provisions more instances as demand increases and removes instances when the workload decreases. This is known as scaling out and scaling in or horizontal scaling. This is controlled by parameters you can set. If you use another deployment option you will be responsible for providing the infrastructure to achieve this.
- AWS Machine Learning Blog: Configuring autoscaling inference endpoints in Amazon SageMaker
Scaling strategies
- Simple scaling or TargetTrackingScaling: Use this option when you want to scale based on a specific Amazon CloudWatch metric. For example average CPUUtilization or SageMakerVariantInvocationsPerInstance.
- Step scaling: Define additional policies to dynamically adjust the number of instances to configure a more aggressive response when demand reaches a certain level.
- Scheduled scaling: Used when the demand follows a particular schedule in the day, week, month, or year.
- On-demand scaling: When you want to increase or decrease the number of instances manually.
Resiliency in the production environment
Resiliency is the ability of a Machine Learning application to recover from problems and self-heal.
A resilient system will maintain the same level of performance and quality even if some of the underlying infrastructure has failed.
- AWS Well-Architected Framework: https://wa.aws.amazon.com/wat.concept.resiliency.en.html
- AWS: https://d1.awsstatic.com/whitepapers/architecture/AWS-Reliability-Pillar.pdf
- SageMaker docs: https://docs.aws.amazon.com/sagemaker/latest/dg/disaster-recovery-resiliency.html
Fault tolerance
Fault-tolerance is the ability for a Machine Learning application to remain in operation even if some of the components used to build the system fail.
This is often achieved by providing redundant features that can be mobilised to continue with the application’s processing should the primary feature fail. In AWS many of these redundant features do not have to be provisioned until a fault is identified.
- AWS Fault Tolerance white paper: https://docs.aws.amazon.com/whitepapers/latest/fault-tolerant-components/fault-tolerant-components.pdf
Deployment Options
Amazon SageMaker

- AWS docs: https://aws.amazon.com/sagemaker/
- AWS FAQs: https://aws.amazon.com/sagemaker/faqs/
SageMaker is a managed AWS service where AWS takes most of the responsibility for managing the Machine Learning components and the infrastructure on which they run. Models that run inside SageMaker in production are usually deployed as SageMaker endpoints.
A SageMaker endpoint is a REST interface behind which a model is hosted in Docker containers running on SageMaker managed EC2 instances.
Video – Deploy Your ML Models to Production at Scale with Amazon SageMaker
Performance
Processing in SageMaker is performed by Docker containers. These containers run on SageMaker instances that are specified by you and managed by SageMaker. There is probably one container per instance, although this is hidden from us. They will not appear in the EC2 console because they are managed by SageMaker. This means that not all the EC2 configuration options are exposed. However you can chose the instance type. The instance type allows the number of CPUs and memory the instance has to be selected. The instance family can also be chosen. Each family has specific characteristics that they are tuned for, for example compute intensive processing, memory oriented processing.
Availability
Sagemaker provides high availability by spreading SageMaker instances across all available AZs in a Region. This can only happen if SagMaker is configured to have two or more SageMaker instances.
Scalability
SageMaker Auto Scaling is a feature that enables SageMaker to provision more resources to handle increases in demand. The configurable parameters that control the provision of EC2 instances for an endpoint are:
- Minimum number of instances
- Maximum number of instances
- Target value, the number of invocations per minute
- Scale in, in seconds
- Scale out, in seconds
Gotcha: scaling does not occur where there is zero traffic: https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling-prerequisites.html
AWS auto scaling: https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling.html
AWS instance types: https://aws.amazon.com/ec2/instance-types/
Scaling using burstable instances
Some instance types have the ability to automatically increase their performance during periods of high demand. These instance families are referred to as being burstable. An example is the T2 family that will increase the CPU available for a period of time when it is needed.
Resiliency
Because SageMaker is an AWS managed service it inherits all the benefits of running on resilient AWS infrastructure with it’s baked in resiliency. In addition SageMaker’s reliance on Docker containers provides a self healing capability as non-responding contains can be terminated and replaced by SageMaker.
https://docs.aws.amazon.com/sagemaker/latest/dg/disaster-recovery-resiliency.html
Fault tolerance
The SageMaker service stack is distributed across multiple AZs on AWS’s fault tolerant infrastructure.
Amazon ECS

- AWS docs: https://aws.amazon.com/ecs/
- AWS FAQs: https://aws.amazon.com/ecs/faqs/
- AWS docs: https://aws.amazon.com/machine-learning/containers/
The Elastic Container Service can be used to host docker containers that have model artifacts in an inference container image. This may provide a cost advantage, but also means the user is responsible for managing the system.
Performance
Being able to specify the number of containers and the size of the EC2 instances on which they run allows the user to engineer the performance they need.
Availability
The EC2 instances on which the containers run can be hosted in different Availability Zones to ensure availability
Scalability
Auto-scaling can configured by the user to increase the number of Containers and EC2 instances that they run on when demand is high and decrease them when demand falls.
Video – Amazon ECS: Autoscaling for Containers
This is a short, 3.33 minutes, video about autoscaling with ECS by Nathan Peck. The timestamps are:
- 0 Define problem: scaling
- 0.39 Capturing statistics for autoscaling
- 1.17 Dashboard graphs of load over time
- 1.58 Different services have different demands
- 2.27 Auto-scaling integration
- 2.50 Summary of ECS auto-scaling
Resiliency
Auto-scaling allows the cluster to be resilient as unhealthy EC2 instances can be terminated and replaced.
Fault tolerance
ECS inherits fault tolerance from the AWS architecture on which it is hosted.
Amazon EC2
Amazon provide Amazon Machine Images (AMI) with Deep learning frameworks pre-loaded. They can be combined with model artifacts to provide EC2 based Machine Learning. Powerful compute and GPU instances are available.
Performance
Performance is dependant on the power of the EC2 instance.
Availability
There is only a single EC2 instance, so availability depend on the instance and the AWS environment in the AZ
Scalability
The size of the EC2 instance and be increased. This is vertical scaling.
Resiliency
Resiliency depends on snapshots, back ups, of the EC2 EBS storage.
Fault tolerance
They will be some fault tolerance in the AWS environment.
Amazon EMR
EMR features a performance-optimized runtime environment for Apache Spark. This can be over 3x faster than and is fully compatible with standard Spark. This improved performance means your workloads run faster. Spark stores data in memory for increased performance. EMR can support Docker containers via Apache Hadoop. EMR provides fully-managed Auto Scaling to dynamically add and remove capacity.
The aws-sagemaker-spark-sdk component is installed along with Spark. This allows SageMaker features to be used in the EMR Spark environment including using:
- Amazon-provided ML algorithms
- SageMaker endpoints
- user provided ML algorithms built into SageMaker compatible Docker containers
Amazon SageMaker Spark can be used to construct Spark Machine Learning pipelines using Amazon SageMaker stages.
- AWS docs: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-sagemaker.html
- AWS docs: https://aws.amazon.com/emr/features/spark/
- Github SageMaker Spark: https://github.com/aws/sagemaker-spark/blob/master/README.md – Connect to preview
Amazon EMR Notebooks
This is a 10 minute video by Amir Basirat from AWS.
Performance
EMR features a performance-optimized runtime environment for Apache Spark which is three times faster than standard Spark installations.
Availability
EMR is highly available with built in automatic failover if any component stops operating.
Scalability
EMR has user configured scaling policies to control automatic scaling of nodes.
AWS docs: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-automatic-scaling.html
Resiliency
EMR is highly available with built in automatic failover to rapidly replace any component that stops operating.
Fault tolerance
EMR inherits fault tolerance from the AWS architecture on which it is hosted. EMR can also shut down unhealthy nodes and start new ones.
On premises
It is possible to run SageMaker model artifacts to on premises on Tensor Flow or MxNet machine Learning frameworks. In this scenario the user is responsible for everything. Every aspect of the system, performance, availability, scalability, resiliency, and fault tolerance depends on the hardware, software and management systems the user has decided to install.
Summary
The how the production environment supports a Model in production can be assessed using five measures: performance, availability, scalability, resiliency and fault tolerance. There are five deployment options: SageMaker endpoints, ECS, EC2, EMR, On premises.
These revision notes cover sub-domain 4.1 of the Machine Learning Implementation and Operations knowledge domain (domain 4). The four sub-domains are:
- 4.1 Build machine learning solutions for performance, availability, scalability, resiliency, and fault tolerance.
- 4.2 Recommend and implement the appropriate machine learning services and features for a given problem.
- 4.3 Apply basic AWS security practices to machine learning solutions.
- 4.4 Deploy and operationalize machine learning solutions.
If you are progressing through the exam structure in order, the next sub-domain to be studied is sub-domain 4.2 which is about AWS Machine Learning services and their use cases.
Credits
Photo by Mateusz Stępień on Unsplash
AWS Certified Machine Learning Study Guide: Specialty (MLS-C01) Exam
This study guide provides the domain-by-domain specific knowledge you need to build, train, tune, and deploy machine learning models with the AWS Cloud. The online resources that accompany this Study Guide include practice exams and assessments, electronic flashcards, and supplementary online resources. It is available in both paper and kindle version for immediate access. (Vist Amazon books)
10 questions and answers

CV Library
If you want to land your dream AWS job you have to do more than just dream about it you need a CV. Agents may call, email or text and job ads pop up on every site you visit but the first thing they will ask for is a copy of your CV. A CV…

Pluralsight review – AWS Certified Machine Learning Specialty
Contains affiliate links. If you go to Pluralsight’s website and make a purchase I may receive a small payment. The purchase price to you will be unchanged. Thank you for your support. The AWS Certified Machine Learning Specialty learning path from Pluralsight has six high quality video courses taught by expert instructors. Two are introductory…

Whizlabs review – AWS Certified Machine Learning Specialty
Need more practice with the exams? Check out Whizlab’s free test with 15 questions. They also have three practice tests (65 questions each) and five section tests (10-15 questions each). Money off promo codes are below. For the AWS Certified Machine Learning Specialty Whizlabs provides a practice tests, a video course and hands-on labs. These…