Deploy and operationalize machine learning solutions
This Study Guide describes how to deploy a Machine Learning Model into the production environment and to monitor it once it is deployed. The foundations of a reliable production environment are good Software Management and Software Engineering. The emerging job role of ML Ops, which is derived from Dev Ops, is focused on delivering the operational architecture for Machine Learning. The AWS White paper Machine Learning Lens has a section on Operational Excellence on pages 35 – 45:
- AWS White Paper: Machine Learning Lens
Scroll to the bottom of the page for questions and answers test app.
Curated videos
- Video: An introduction to MLOps on Google Cloud
- Video: Deploy Your ML Models to Production at Scale with Amazon SageMaker
- Video: Inawisdom: Machine Learning and Automated Model Retraining with SageMaker
- Introducing Amazon SageMaker Clarify, part 1 – Bias detection – AWS re:Invent 2020
This Study Guide covers sub-domain 4.4, Deploy and operationalize machine learning solutions. A description of all the knowledge domains in the exam is in this article: AWS Machine Learning exam syllabus
Video: An introduction to MLOps on Google Cloud
This is a 23.55 minute video from Google Cloud Platform (GCP). The first 17.30 minutes covers conceptual subjects for MLOps without referring to GCP services. This video is a good introduction to MLOps and covers many of the subjects expanded below.
Software Engineering and Management for Machine Learning
A production Machine Learning Model needs managing the same as any software system. The Machine Learning function may be part of a larger system and will need to integrate with pre-existing systems and business processes such as:
- Security
- Logging and Monitoring
- API versioning
Security
Security concerns both the security of the Machine Learning system and the data. If the data is anonymised it may need no more security than provided by the system in which it is used. However if the data contains Personal Identifiable Information (PII) or financial data it may require higher levels of security. Security for Machine Learning environments is discussed in this Study Guide:
Logging and Monitoring
Logging and Monitoring in the AWS environment is achieved by CloudWatch and CloudTrail. Both are discussed later in these revision notes.
Task management
With large, complex systems and teams of engineers you need to track the tasks and changes they make to the system. This enables Change Management and promotes engineering responsibility to the changes made. There are many commercial Task Management systems on the market. One popular choice is JIRA from Atassian which has a free version that can be used for small projects:
Version control
Machine Learning models will need to be improved, replaced or retrained. Multiple versions of models and configurations will grow and you will need a repository to manage version control. Most popular repositories are based on Git. AWS CodeCommit is the AWS git based service for version control. SageMaker model artifacts can be stored in the CodeCommit repository. Atlassian’s repository, BitBucket ,is another popular choice. BitBucket can be installed locally on AWS or used as a cloud based service.
AWS CodeCommit

- AWS docs: https://aws.amazon.com/codecommit
- AWS FAQs: https://aws.amazon.com/codecommit/faqs/
Atlassian Bitbucket

- Atlassian docs: https://bitbucket.org/product
- Atlassian overview: https://bitbucket.org/product/guides/getting-started/overview#a-brief-overview-of-bitbucket
Testing
End to end testing
End to end testing is a testing method that aims to run the system’s entire workflow from beginning to end. This aims to reproduce expected scenarios and exercise all systems and services that the Machine Learning system integrates with.
A/B testing
- AWS docs: https://aws.amazon.com/blogs/machine-learning/a-b-testing-ml-models-in-production-using-amazon-sagemaker/
- Canary deployment, Blue/Green deployment and A/B testing compared
In Machine Learning A/B testing is used to compare the performance of a new model variant with the current one. In SageMaker the proportion of traffic split between the two model variants is configurable using the variant weight. Using this feature more than two variants could be tested at the same time if desired. See Deployment section below.
API versioning
SageMaker endpoints allow multiple production model variants to be deployed. Traffic can be shared between the variants depending on a configurable weighting parameter. This capability allows A/B testing, Blue Green and Canary deployments to be performed.
Reliability and failover
Reliability is the ability of systems to recover from disruptions. For Machine Learning systems inside the SageMaker managed environment much of the work is already done for you with just a little tweaking needed to maximise reliability. For example, when specifying SageMaker managed instances if you select the minimum number as two SageMaker will automatically provision each instance in different Availability Zones (AZ). This means that if there is a problem in one AZ the instance in the second AZ will still operate. SageMaker, on detecting one instance is unhealthy, will provision a new one in a different AZ. This is called failover.
If you are deploying your Machine Learning Model outside SageMaker you will have to take on more responsibility for Reliability yourself.
ML Ops
- Wikipedia definition: https://en.wikipedia.org/wiki/MLOps
- AWS white paper page: 39-40: Machine Learning Lens
Machine Learning Operations (ML Ops) is a relatively new discipline descended from DevOps and Model Ops. It seeks to improve the quality of Production Machine Learning management by establishing an operational architecture to manage Machine Learning Model deployment, updates and operation.
ML Ops is concerned with:
- Model generation, SDLC, CI/CD
- Orchestration
- Deployment
- System health
- Diagnostics
- Governance
- Business metrics
The foundation of the Machine Learning Software Development Life Cycle (SLDC) is a repository to store and version control:
- Model and configuration source code
- Model data
- Model artefacts
The repository provides the opportunity to enforce Change Control. This provides traceback for debugging investigations and the ability to roll back to a previous version if needed.
The Machine Learning SDLC is summarised in a diagram in the AWS White Paper, Machine Learning Lens, on page 40:
- AWS White paper: https://d1.awsstatic.com/whitepapers/architecture/wellarchitected-Machine-Learning-Lens.pdf
The AWS tools that assist in this process are:
CodeBuild

- AWS docs: https://aws.amazon.com/codebuild/
- AWS FAQs: https://aws.amazon.com/codebuild/faqs
AWS CodeBuild provides Source code compilation, for example, for supporting Lambda.
CodePipeline

- AWS docs: https://aws.amazon.com/codepipeline/
- AWS FAQs: https://aws.amazon.com/codepipeline/faqs
AWS CodePipeline is used to Orchestrate workload movements from the Machine Learning system. CodePipeline provides a continuous delivery service that can trigger each stage of the MachineLearning pipeline in the correct order.
CodeCommit

- AWS docs: https://aws.amazon.com/codecommit/
- AWS FAQs: https://aws.amazon.com/codecommit/faqs/
AWS CodeCommit is a repository based on git.
AWS CloudFormation
AWS CloudFormation automates Infrastructure as Code and Configuration as Code to promote:
- Consistency
- Enable process to be created across environments
- Automated mechanism to orchestrate workload movements
SageMaker deployment

- AWS docs: https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-deployment.html
- AWS: https://aws.amazon.com/blogs/machine-learning/creating-a-machine-learning-powered-rest-api-with-amazon-api-gateway-mapping-templates-and-amazon-sagemaker/
- Inawisdom: https://www.inawisdom.com/machine-learning/amazon-sagemaker-endpoints-inference
When a model is ready to go into production there are a number of deployment options, one of which is to use Amazon SageMaker as the target deployment environment. SageMaker has a rich set of features to support Machine Learning models in production. An Amazon SageMaker endpoint is a fully managed service that allows you to make real-time inferences via a REST API. Behind the scenes SageMaker has provisioned SageMaker EC2 instances on which run Docker containers that host your model. You can configure the type and number of instances. So when we talk about a SageMaker endpoint we are often referring to the substantial compute power that SageMaker has created for us and our Model that is running on it.
There are three steps to creating a SageMaker Endpoint:
- Create the model
- Create the endpoint config
- Create HTTPS endpoint
- AWS Whitepaper, Page 29-30: https://d1.awsstatic.com/whitepapers/architecture/wellarchitected-Machine-Learning-Lens.pd
Create a model
Once you have created the Model the following information is required for subsequent steps:
- The Amazon S3 path where the model artifacts are stored. This must be in the same AWS Region as the SageMaker service you are deploying to.
- The Docker registry path for the image that contains the inference code.
- A name that you can use as an identifier.
Create the endpoint configuration
When creating the Endpoint configuration you will specify:
- The Model to be hosted. You can specify more than one Model and identify the specific variants.
- The SageMaker Managed EC2 instances that will host the Docker containers that will host your Model. You can specify the EC2 instance type and the number of instances.
Create HTTPS endpoint
Using the endpoint configuration SageMaker provisions the SageMaker EC2 instances and all the necessary infrastructure to expose an HTTPS REST API. Your Model is now ready for inferencing.
Video: Deploy Your ML Models to Production at Scale with Amazon SageMaker
This is a 7.52 minute video from AWS by Emily Webber. The first two and a half minutes are most relevant to SageMaker endpoints. The timestamps are:
- 0 – 2.31: SageMaker endpoint
- 2.32 – 5.26: Example using the Blazing Text algorithm in a Jupyter Notebook.
- 5.26 – 7.52: Pro tips
Autoscaling
The SageMaker endpoint you have just created can also scale depending on the workload. So as more production inferencing requests are received SageMaker will provision more SageMaker EC2 instances and deploy more copies of the Model to process the requests. The parameters for autoscaling, such as the maximum number of instances can be configured. Scaling can be controlled by using CloudWatch metrics to trigger the scaling action.
Monitoring
Video: Inawisdom: Machine Learning and Automated Model Retraining with SageMaker
This is a 7.56 minute video from AWS with Robin Meehan and Shafreen Sayyed.
Why monitor models?
The performance of Models can change overtime due to:
- a change in the data – data drift
- a change in the context of the target variable – concept drift
- unexpected factors
Monitoring is cumbersome, but critical. The outcome is model re-training.
What are concept and data drift?
- https://d1.awsstatic.com/whitepapers/architecture/wellarchitected-Machine-Learning-Lens.pdf
- https://neptune.ai/blog/concept-drift-best-practices
- https://en.wikipedia.org/wiki/Concept_drift
When data changes in general, this problem is called data drift whereas the changes in the context of the target variable are called concept drift.
With Concept Drift the statistical properties of the target variable change over time in ways that could not be predicted when the model was designed. Data Drift occurs when the data has changed overtime, however the statistical properties of the target variable are unchanged. This leads to degrading performance of the model that may be corrected by re-training the model. For Concept Drift simple re-training may be insufficient and may require revisiting the preparation steps such as Feature Engineering.
What should be monitored?
- https://christophergs.com/machine learning/2020/03/14/how-to-monitor-machine-learning-models
- https://towardsdatascience.com/monitoring-your-machine-learning-model-6cf98c106e99
Defining what needs to be monitored is still an evolving area in Data Science. The following monitoring techniques were mentioned in the AWS Exam Readiness course.
Data change tracking
Tracking the input data can be used to identify when the data has changed significantly when compared to the data on which the Model was trained.
Error rate
An increase in the error rate can indicate Data Drift and Concept Drift. Errors in the input data and output inferences have to be defined and monitored.
Error class proportion
Whilst the error rate may be unchanged, what about the composition of the errors? Data and Concept drift can be detected by comparing the change of the proportion error classes.
How are Models monitored?
The key AWS services for monitoring Machine Learning Models are CloudWatch and CloudTrail. SageMaker also has a built in feature to assist on Monitoring: Amazon SageMaker Model Monitor.
CloudWatch

- AWS docs: https://aws.amazon.com/cloudwatch/
- AWS docs: Monitor Amazon SageMaker with Amazon CloudWatch
- AWS FAQs: https://aws.amazon.com/cloudwatch/faqs/
Four important features of Amazon CloudWatch are: Logs, Metrics, Events and Alarms.
CloudWatch Logs
CloudWatch Log Groups enable you to monitor, store and access logs ordered by time. Sources generate Log Events which are presented as a Log Stream in time order. A Log Stream contains Log Events that originated from the same Source. Log Streams are grouped together into Log Groups. This enables Log Streams to share the same:
- Retention
- Monitoring
- Access control
Sources can be:
- EC2, using the CloudWatch log agent
- AWS Services, some will need a CloudWatch Logs resource policy
CloudWatch Metrics
CloudWatch Metrics are created from Metric Filters that filter Log Events in a Log Stream. CloudWatch Metrics can be displayed in a graph, or used by a CloudWatch Alarm.
CloudWatch Events and EventBridge
CloudWatch Events are now being superseded by Amazon EventBridge. Currently the functionality is the same, so you can use either. CloudWatch Events comprise of three features:
- Events are produced by Supported Services. Unsupported Services can be tracked via CloudTrail.
- Rules route events from Sources to Targets.
- Targets process incoming events which are formatted as JSON documents.
CloudWatch Alarms
There are two types of CloudWatch Alarms:
- Metric Alarm which watches CloudWatch Metrics
- Composite Alarm which watches other Alarms
CloudWatch Alarms are triggered by considering three properties:
- Period is the length of time for a data point
- Evaluation periods are the number of periods to check to see if the alarm criteria are met.
- Data points to alarm are the number of data points that must be in error to trigger an alarm.
Missing data
Alarms can be configured to treat missing data in one of four ways:
- Not breaching – good
- Breaching – bad
- Ignore – current state maintained
- Missing – Insufficient Data state adopted
Maths expressions
CloudWatch Alarms are able to handle mathematical expressions with up to ten metrics. If you need to test more Metrics you could try a composite alarm (an alarm of alarms) or use a Lambda.
Percentile and low data alarms
When percentile alarms do not receive enough data the default action can be set to evaluate anyway, or wait until enough good data arrives before evaluating.
CloudTrail

- AWS docs: https://aws.amazon.com/cloudtrail/
- AWS FAQs: https://aws.amazon.com/cloudtrail/faqs/
CloudTrail collects data from AWS API calls which can then be stored in an S3 bucket for analysis. CloudTrail can also be monitored from CloudWatch by configuring the Trail to send data to a CloudWatch Log.
SageMaker Model Monitor

- AWS docs: https://aws.amazon.com/sagemaker/model-monitor/
- AWS developer guide: https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor.html
By using SageMaker Model Monitor you can automatically monitor your model in production and be alerted when Data Drift or Concept Drift occurs. The alerting feature can also be used to automatically start retraining your model. Data Drift and Concept Drift are detected by monitoring the quality of the model based on the inputs to an ML model, which are features or independent variables and the outputs of the model which are dependent variables.
What does SageMaker Model Monitor do?
SageMaker Model Monitor emits per-feature metrics which are passed to CloudWatch by monitoring models in production and detecting errors. This means you do not have to develop yourself.
What types of drift are monitored by Model Monitor?
There are four types of drift monitored by Model Monitor:
- Data quality
- Model quality
- Model bias (SageMaker Clarify)
- Feature attribution drift (SageMaker Clarify)
Model Monitor compares the production data quality to the training data by using rules to detect Data Drift and alerting you when this happens.
Model quality is measured by comparing the models predictions with the actual labels the model has been trained to predict. This is done by merging production data with the actual labels and measuring the predictions.
Amazon SageMaker Clarify
Amazon SageMaker Clarify provides bias monitoring. Model Bias occurs when training and production data differ. This difference may be temporary, or a permanent shift in the real world data.
Amazon SageMaker Clarify provides feature attribution drift monitoring. Feature attribution drift is where data drift causes changes in the attribution of features. When this drift breaches a configured threshold alerts are raised via CloudWatch.
Introducing Amazon SageMaker Clarify, part 1 – Bias detection – AWS re:Invent 2020
Model deployment approaches
When deploying to production there are two risks:
- There may be a disruption in the service which means that users cannot use the Model for a while whilst resources are being provisioned.
- The new Model may not perform as well as the one that it is replacing.
These risks can be mitigated by using Blue/Green and Canary deployment strategies. Both of these strategies make use of built in SageMaker endpoint features. In a SageMaker endpoint you can deploy more than one production variant and specify what proportion of traffic is routed to each production variant. The proportions can be changed instantly.
The AWS White paper Machine Learning Lens has a section on Model Deployment Approaches on pages 29 – 34:
- AWS White Paper: https://d1.awsstatic.com/whitepapers/architecture/wellarchitected-Machine-Learning-Lens.pdf
Blue Green
- ML Exam article: Canary deployment, Blue/Green deployment and A/B testing compared
Blue Green deployments have two phases. In the first Phase the new variant is deployed in an identical environment to the production variant. It is then fed synthetic data and monitoring metrics are checked. In the second phase live traffic is switched to the new variant and the metrics are compared with those produced by the current production variant. If a problem is identified all live traffic is switched back to the production variant, otherwise the new variant becomes the new production variant and the old one is removed.

Canary
- ML Exam article: Canary deployment, Blue/Green deployment and A/B testing compared
A Canary release is a very risk averse deployment strategy. It involves directing a small proportion of the live traffic to the new production variant and checking that everything works as expected. The proportion of live traffic is gradually increased until all the traffic is being directed to the new production variant at which point the previous version can be removed. If any issues are identified live traffic can be switched back to the original production variant.
Summary
If you are progressing through the exam structure in order, this is the last sub-domain to be studied. Congratulations on finishing!
This Study Guide has described what is needed to deploy and operationalize a Machine Learning Model in a production environment. Many of the features and capabilities draw on existing Software Management and Engineering that is used for non-Machine Learning systems. Machine Learning Models in production have their own emphasis and priorities which the emerging role of ML Ops seeks to satisfy.
Notes:
- The AWS Machine Learning Exam Readiness course refers to API versioning. I have taken this to mean SageMaker endpoints.
- The AWS Machine Learning exam readiness course can usually be relied upon to identify areas of a broad subject that need to be focused on for the exam. Unfortunately for this sub-domain it provides little direction and lists a vast, almost random, list of software management and engineering terms.
- Using Lambda with Amazon SageMaker has not been included yet.
Credits
Photo by Bill Jelen on Unsplash
Contains affiliate links. If you go to Whizlab’s website and make a purchase I may receive a small payment. The purchase price to you will be unchanged. Thank you for your support.
Whizlabs AWS Certified Machine Learning Specialty
Practice Exams with 271 questions, Video Lectures and Hands-on Labs from Whizlabs
Whizlab’s AWS Certified Machine Learning Specialty Practice tests are designed by experts to simulate the real exam scenario. The questions are based on the exam syllabus outlined by official documentation. These practice tests are provided to the candidates to gain more confidence in exam preparation and self-evaluate them against the exam content.
Practice test content
- Free Practice test – 15 questions
- Practice test 1 – 65 questions
- Practice test 2 – 65 questions
- Practice test 3 – 65 questions
Questions and answers
Whizlab’s AWS Certified Machine Learning Specialty course
- In Whizlabs AWS Machine Learning certification course, you will learn and master how to build, train, tune, and deploy Machine Learning (ML) models on the AWS platform.
- Whizlab’s Certified AWS Machine Learning Specialty practice tests offer you a total of 200+ unique questions to get a complete idea about the real AWS Machine Learning exam.
- Also, you get access to hands-on labs in this course. There are about 10 lab sessions that are designed to take your practical skills on AWS Machine Learning to the next level.

Course content
The course has 3 resources which can be purchased seperately, or together:
- 9 Practice tests with 271 questions
- Video course with 65 videos
- 9 hands on labs