a birds next with an egg symbolising the production environment

The Machine Learning Production Environment

When you launch a Machine Learning solution in production it needs to perform well to provide the business benefit it was designed for. There are two types of performance:

  1. The performance of the Model. Are the prediction good enough to provide business benefit?
  2. The technical performance of the Model in the production environment. How well does the production environment support the operation of the Model?

This Study Guide focuses on the production environment. The production environment can be assessed using five measures:

  1. performance
  2. availability
  3. scalability
  4. resiliency
  5. fault tolerance

There are three curated videos in this Study Guide:

Build machine learning solutions for performance, availability, scalability, resiliency, and fault tolerance is sub-domain 4.1 of the Machine Learning Implementation and Operations knowledge domain. For more information about the exam structure see: AWS Machine Learning exam syllabus

Questions

To confirm your understanding scroll to the bottom of the page for 10 questions and answers.

Assessing the production environment

Performance in the production environment

The performance referred to is the speed of the application, not the predictive performance of the model. This can be measured in workload over time.

For example:

  • Megabyte per second
  • Records per-minute
  • Predictions returned per minute
  • Batch processing time in hours

The performance of the application, is dependent on the size of the workload and the power of the resources available for the processing. There may also be a set up time if the resources are not immediately available.

Availability in the production environment

Availability is the capability of a Machine Learning application to keep working and be available for work even if there is an infrastructure failure.

A Machine learning application can be regarded as available, even if it is not working at full efficiency. Availability is measured over time and expressed as a percentage and is often called system uptime. To achieve this, components of the system are distributed across different AWS Availability Zones, usually referred to as AZs. These are separate data centers, physically isolated from each other, so that a problem at one does not affect the others. Each AWS Region has two, and usually three AZs. When you use SageMaker and specify two or more instances they are automatically placed in different AZs. If you choose a different deployment option you have the responsibility to do this yourself.

Scalability in the production environment

Scalability refers to the ability of a Machine Learning application to handle increases in workload without decreasing performance or output quality.

SageMaker automatically provisions more instances as demand increases and removes instances when the workload decreases. This is known as scaling out and scaling in or horizontal scaling. This is controlled by parameters you can set. If you use another deployment option you will be responsible for providing the infrastructure to achieve this.

Scaling strategies

  • Simple scaling or TargetTrackingScaling: Use this option when you want to scale based on a specific Amazon CloudWatch metric. For example average CPUUtilization or SageMakerVariantInvocationsPerInstance.
  • Step scaling: Define additional policies to dynamically adjust the number of instances to configure a more aggressive response when demand reaches a certain level.
  • Scheduled scaling: Used when the demand follows a particular schedule in the day, week, month, or year.
  • On-demand scaling: When you want to increase or decrease the number of instances manually.

Resiliency in the production environment

Resiliency is the ability of a Machine Learning application to recover from problems and self-heal.

A resilient system will maintain the same level of performance and quality even if some of the underlying infrastructure has failed.

Fault tolerance

Fault-tolerance is the ability for a Machine Learning application to remain in operation even if some of the components used to build the system fail.

This is often achieved by providing redundant features that can be mobilised to continue with the application’s processing should the primary feature fail. In AWS many of these redundant features do not have to be provisioned until a fault is identified.

Deployment Options

Amazon SageMaker

SageMaker is a managed AWS service where AWS takes most of the responsibility for managing the Machine Learning components and the infrastructure on which they run. Models that run inside SageMaker in production are usually deployed as SageMaker endpoints.

A SageMaker endpoint is a REST interface behind which a model is hosted in Docker containers running on SageMaker managed EC2 instances.

Video – Deploy Your ML Models to Production at Scale with Amazon SageMaker
This video from AWS is 7.52 minutes long.

Performance

Processing in SageMaker is performed by Docker containers. These containers run on SageMaker instances that are specified by you and managed by SageMaker. There is probably one container per instance, although this is hidden from us. They will not appear in the EC2 console because they are managed by SageMaker. This means that not all the EC2 configuration options are exposed. However you can chose the instance type. The instance type allows the number of CPUs and memory the instance has to be selected. The instance family can also be chosen. Each family has specific characteristics that they are tuned for, for example compute intensive processing, memory oriented processing.

Availability

Sagemaker provides high availability by spreading SageMaker instances across all available AZs in a Region. This can only happen if SagMaker is configured to have two or more SageMaker instances.

AWS FAQs: https://aws.amazon.com/sagemaker/faqs/#:~:text=Amazon SageMaker is designed for high availability.&text=SageMaker APIs run in Amazon’s,failure or Availability Zone outage.

Scalability

SageMaker Auto Scaling is a feature that enables SageMaker to provision more resources to handle increases in demand. The configurable parameters that control the provision of EC2 instances for an endpoint are:

  • Minimum number of instances
  • Maximum number of instances
  • Target value, the number of invocations per minute
  • Scale in, in seconds
  • Scale out, in seconds

Gotcha: scaling does not occur where there is zero traffic: https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling-prerequisites.html

AWS auto scaling: https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling.html

AWS instance types: https://aws.amazon.com/ec2/instance-types/

https://www.freecodecamp.org/news/what-we-learned-by-serving-machine-learning-models-at-scale-using-amazon-sagemaker-ad1d974d8dca/

Scaling using burstable instances

Some instance types have the ability to automatically increase their performance during periods of high demand. These instance families are referred to as being burstable. An example is the T2 family that will increase the CPU available for a period of time when it is needed.

Resiliency

Because SageMaker is an AWS managed service it inherits all the benefits of running on resilient AWS infrastructure with it’s baked in resiliency. In addition SageMaker’s reliance on Docker containers provides a self healing capability as non-responding contains can be terminated and replaced by SageMaker.

https://docs.aws.amazon.com/sagemaker/latest/dg/disaster-recovery-resiliency.html

Fault tolerance

The SageMaker service stack is distributed across multiple AZs on AWS’s fault tolerant infrastructure.

AWS FAQs: https://aws.amazon.com/sagemaker/faqs/#:~:text=Amazon SageMaker is designed for high availability.&text=SageMaker APIs run in Amazon’s,failure or Availability Zone outage.

Amazon ECS

The Elastic Container Service can be used to host docker containers that have model artifacts in an inference container image. This may provide a cost advantage, but also means the user is responsible for managing the system.

Performance

Being able to specify the number of containers and the size of the EC2 instances on which they run allows the user to engineer the performance they need.

Availability

The EC2 instances on which the containers run can be hosted in different Availability Zones to ensure availability

Scalability

Auto-scaling can configured by the user to increase the number of Containers and EC2 instances that they run on when demand is high and decrease them when demand falls.

Video – Amazon ECS: Autoscaling for Containers

This is a short, 3.33 minutes, video about autoscaling with ECS by Nathan Peck. The timestamps are:

  • 0 Define problem: scaling
  • 0.39 Capturing statistics for autoscaling
  • 1.17 Dashboard graphs of load over time
  • 1.58 Different services have different demands
  • 2.27 Auto-scaling integration
  • 2.50 Summary of ECS auto-scaling
This is a 3.33 minute video from AWS

Resiliency

Auto-scaling allows the cluster to be resilient as unhealthy EC2 instances can be terminated and replaced.

Fault tolerance

ECS inherits fault tolerance from the AWS architecture on which it is hosted.

Amazon EC2

Amazon provide Amazon Machine Images (AMI) with Deep learning frameworks pre-loaded. They can be combined with model artifacts to provide EC2 based Machine Learning. Powerful compute and GPU instances are available.

Performance

Performance is dependant on the power of the EC2 instance.

Availability

There is only a single EC2 instance, so availability depend on the instance and the AWS environment in the AZ

Scalability

The size of the EC2 instance and be increased. This is vertical scaling.

Resiliency

Resiliency depends on snapshots, back ups, of the EC2 EBS storage.

Fault tolerance

They will be some fault tolerance in the AWS environment.

Amazon EMR

EMR features a performance-optimized runtime environment for Apache Spark. This can be over 3x faster than and is fully compatible with standard Spark. This improved performance means your workloads run faster. Spark stores data in memory for increased performance. EMR can support Docker containers via Apache Hadoop. EMR provides fully-managed Auto Scaling to dynamically add and remove capacity.

The aws-sagemaker-spark-sdk component is installed along with Spark. This allows SageMaker features to be used in the EMR Spark environment including using:

  • Amazon-provided ML algorithms
  • SageMaker endpoints
  • user provided ML algorithms built into SageMaker compatible Docker containers

Amazon SageMaker Spark can be used to construct Spark Machine Learning pipelines using Amazon SageMaker stages.

Amazon EMR Notebooks

This is a 10 minute video by Amir Basirat from AWS.

Performance

EMR features a performance-optimized runtime environment for Apache Spark which is three times faster than standard Spark installations.

Availability

EMR is highly available with built in automatic failover if any component stops operating.

AWS docs: https://aws.amazon.com/about-aws/whats-new/2019/04/amazon-emr-announces-support-for-multiple-master-nodes-to-enable-high-availability-for-EMR-applications/

Scalability

EMR has user configured scaling policies to control automatic scaling of nodes.

AWS docs: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-automatic-scaling.html

Resiliency

EMR is highly available with built in automatic failover to rapidly replace any component that stops operating.

Fault tolerance

EMR inherits fault tolerance from the AWS architecture on which it is hosted. EMR can also shut down unhealthy nodes and start new ones.

On premises

It is possible to run SageMaker model artifacts to on premises on Tensor Flow or MxNet machine Learning frameworks. In this scenario the user is responsible for everything. Every aspect of the system, performance, availability, scalability, resiliency, and fault tolerance depends on the hardware, software and management systems the user has decided to install.

Summary

The how the production environment supports a Model in production can be assessed using five measures: performance, availability, scalability, resiliency and fault tolerance. There are five deployment options: SageMaker endpoints, ECS, EC2, EMR, On premises.

These revision notes cover sub-domain 4.1 of the Machine Learning Implementation and Operations knowledge domain (domain 4). The four sub-domains are:

If you are progressing through the exam structure in order, the next sub-domain to be studied is sub-domain 4.2 which is about AWS Machine Learning services and their use cases.

Credits

Photo by Mateusz Stępień on Unsplash


AWS Certified Machine Learning Study Guide: Specialty (MLS-C01) Exam

This study guide provides the domain-by-domain specific knowledge you need to build, train, tune, and deploy machine learning models with the AWS Cloud. The online resources that accompany this Study Guide include practice exams and assessments, electronic flashcards, and supplementary online resources. It is available in both paper and kindle version for immediate access. (Vist Amazon books)


10 questions and answers

12
Created on By Michael Stainsbury

4.1 The Machine Learning Production Environment (full)

10 quiz style questions covering subdomain 4.1, Build machine learning solutions for performance, availability, scalability, resiliency, and fault tolerance.

1 / 10

How does Amazon enable hosting on (non-SageMaker) EC2 instances?

2 / 10

How are SageMaker features used in the EMR Spark environment?

3 / 10

What does SageMaker Auto Scaling do?

4 / 10

5 / 10

What are models that run inside SageMaker deployed on?

6 / 10

How does SageMaker provide high availability?

7 / 10

System uptime is another name for <–?–>.

8 / 10

What are the measures for assessing the production environment?

  1. performance
  2. availability
  3. scalability
  4. resiliency
  5. fault tolerance

9 / 10

<–?–> is the capability of a Machine Learning application to keep working and be ready for work even if there is an infrastructure failure.

10 / 10

<–?–> is the ability for a Machine Learning application to remain in operation even if some of the components used to build the system fail.

2 words left

Your score is

The average score is 70%

0%


Pluralsight AWS Certified Machine Learning web page screen shot
Reviews
Pluralsight review – AWS Certified Machine Learning Specialty

Contains affiliate links. If you go to Whizlab’s website and make a purchase I may receive a small payment. The purchase price to you will be unchanged. Thank you for your support. The AWS Certified Machine Learning Specialty learning path from Pluralsight has six high quality video courses taught by expert instructors. Two are introductory…

Amazon Study Guide for the AWS Machine Learning Speciality exam
Reviews
Amazon Study Guide review – AWS Certified Machine Learning Specialty

This Amazon Study Guide review is a review of the official Amazon study guide to accompany the exam. The study guide provides the domain-by-domain specific knowledge you need to build, train, tune, and deploy machine learning models with the AWS Cloud. The online resources that accompany this Study Guide include practice exams and assessments, electronic…


Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *