A UK Guards soldier in a sentry box symbolizing AWS IAM security for Machine Learning

AWS security for machine learning

Security is a vast subject and AWS even have their own Professional level certificate exam on this subject. Using the AWS course: Exam Readiness: AWS Certified Machine Learning – Specialty as a guide these revision notes give an overview of the main AWS security service Identity and Access Management (IAM) and then highlight security features and requirements specific to SageMaker.

Questions

To confirm your understanding scroll to the bottom of the page for 10 questions and answers.

There are four curated videos in the revision notes. The last video is An Overview of Amazon SageMaker Security which is an AWS video by Tom Faulhaber. This is a great video that covers most of the content for this subdomain.

Curated videos

Security is an important activity to protect your Machine Learning processes and data. AWS provides security services and tools as part of the Shared Responsibility Model. This is where AWS provides security of the cloud, but you have to provide security in the cloud.

This Study Guide cover the content for sub-domain 4.3 Apply basic AWS security practices to machine learning solutions of the Machine Learning Implementation and Operations knowledge domain. For more information about the exam structure see: AWS Machine Learning exam syllabus

IAM

What is IAM?

Identity and Access Management (IAM) is the AWS security service that protects our data, processes and communication in the cloud. IAM is part of the locked door through which we enter to work, and bad people are kept out. IAM allows other people and systems to trust us with their data. IAM is also the evergreen task that is always at the top of our to do list not matter what you do in AWS. So whatever task you need to do, or service you wish to use, some knowledge of IAM is important.

What are IAM Users Groups, Roles and Policies?

IAM has Users, Groups and Roles:

  • Users – a User is … well … you and me, real people. We have names and email addresses and we can be issued with user names and passwords.
  • Groups – Users can be grouped together with others that will be doing similar tasks in AWS.
  • Roles – allow you to give access privileges to AWS services, that is, not people.
  • Policies – contain permissions that allow you to do things or be prevented from doing things.

So it works like this:

  1. Develop a Policy
  2. Add the policy to a Group. The Group will contain people with similars tasks for example DevOps, Data Analysts, Data Scientists.
  3. Add a User to the Group. The User now has all the privileges that are in the Policy you wrote because it is attached to the Group.

This is great for managing large numbers of users. If people move between jobs, you can detach them from one Group and add them to another one. If you need to add a new permission, for example to access a service that you organisation has agreed to use and fund, the Policy can be updated. This will immediately allow any User that belongs to the Group to which the Policy is attached to use the new service.

All this creating, attaching and detaching can be done through the AWS console, CloudFormation or AWS CLI.

Roles are similar to Groups and allow privileges to be granted to AWS services. For example Amazon SageMaker Notebooks have a role that contains permissions that control what the Notebook can access. Note that in more sophisticated security architectures User permissions are kept to a minimum and the User assumes a Role on login which provides them with the privileges they need.

What does a Policy look like?

Policies are the core of IAM. They define in specific detail what can, or cannot be done. They are written in YAML, or JSON. These two languages are interchangeable and it is a matter of personal choice which one you use. There are two types of Policies:

  • AWS managed
  • Custom

AWS managed Policies are created and managed by AWS. They have useful names that describe what they do, for example:

  • AmazonS3ReadOnlyAccess
  • AmazonRedshiftQueryEditor
  • AmazonSageMakerReadOnly

This means you do not have to look inside to guess what they do. You can add many Policies to build up the access profile you need.

The advantage of user Managed Policies is that AWS automatically updates then when services are changed, or new features are added. The disadvantage is that AWS may change the Policy without you knowing to give access to new features.

Anatomy of a policy

This is the structure of a Policy:

{
  "Statement":[{
    "Sid": "statement ID and description"
    "Effect":"effect",
    "Action": [
      "action1",
      "action2"
      ],
    "Resource":"arn",
    "Condition":{
      "condition":{
        "key":"value"
        }
      }
    }
  ]
}
  • Statement, This is a list of Policy statements
  • Sid, This is an optional ID for a Statement. This is very useful for large policies.
  • Effect, This is what the Statement does, it can Allow or Deny
  • Action, This is a list of the detailed actions the Statement is allowing, or denying
  • Resource, This is the AWS resource that the Statement acts on, for example a S3 bucket name
  • Condition, Any conditions that must be satisfied for the Statement to be effective.

Infrastructure security on Amazon SageMaker

Infrastructure are the services that make up the AWS cloud in which you work. Here are some services that you will need to be aware of and know how they are secured:

  • Virtual Private Cloud (VPC)
  • Security Group
  • Network Address Translation Gateway (NAT)
  • Internet Gateway
  • Simple Storage Service (S3)

What is a VPC?

The VPC ( Virtual Private Cloud) is your virtual network in the cloud. It contains all the features that you would expect to find on a network on a physical premises. The VPC contains subnets that can be public to the internet or private and security to protect it. Security at the network level is provided by NACLs (Network Access Control List) and the instance level by Security Groups. They do similar jobs in slightly different ways and allow you to restrict access to traffic coming from specific IP addresses, or ranges, and protocols. For example you could lock down access to a single PC via it’s IP address and then only if it uses https protocol.

How does a Security Group work?

A Security Group acts as a virtual firewall for instances. They control incoming and outgoing traffic using separate rules for each. The rules enable you to filter traffic based on protocols and port numbers. Security Group rules allow access, rules cannot deny access.

How does a NAT Gateway improve security?

A NAT (Network Address Translation) Gateway makes it easy to connect to the Internet from instances within a private subnet in an AWS Virtual Private Cloud (VPC). A NAT enables instances in a private subnet to connect to the internet, but prevents hosts on the internet from initiating connections with the instances.

Video – Secure your workloads with NAT Gateway

This video from AWS is 3.43 minutes long

What is an Internet Gateway?

An Internet Gateway connects a subnet in a VPC to the internet. The definition of a public subnet is one that has an Internet Gateway.

What security does S3 have?

The contents of the S3 bucket can be encrypted by KMS encryption. This encryption can be enforced on upload so that all the contents of the bucket are encrypted.

There two types of security policies for S3:

  1. Resource based policies are features of the S3 bucket. They include Access Control Lists (ACL) and bucket policies.
  2. User based policies are IAM policies that can be attached to a User, Group or Role. Since access to S3 buckets is default deny, these policies usually explicitly allow access.

AWS KMS

What types of KMS are there?

There are three types of encryption used with Machine Learning. They are all Server Side Encryption methods, differing on how the key is managed:

  1. Server-Side Encryption with S3 Managed Keys (SSE-S3)
  2. Server-Side Encryption with KMS Managed Keys (SSE-KMS)
  3. Server-Side Encryption with Customer Provided Keys (SSE-C)

Security on Amazon SageMaker

Access control

SageMaker controls access to t’s Notebooks. There are two types of Notebooks:

  1. SageMaker Notebooks
  2. Studio Notebooks

In SageMaker Notebooks users have root access by default, so they have administrator privileges. This root access can be disabled. In SageMaker Studio access control and isolation is achieved by using filesystem and container permissions.

Data Protection

Data protection at rest

By default SageMaker uses the AWS KMS with an AWS managed customer master key (CMK) for:

  • Notebooks
  • Training jobs
  • Amazon S3 location to store models Endpoint

Data protection in motion

Communication between components inside the SageMaker managed environment is usually unencrypted to prevent performance degradation due the the time spent encrypting and decrypting. Data protection of data during transmission out side the SageMaker managed environment is achieved by using HTTPS with TLS certificates for:

  • API/console
  • Notebooks
  • VPC-enabled
  • Interface endpoint
  • Limit by IP Training jobs/endpoints

IAM for SageMaker

Authentication

Authentication is signing in, authorization refers to permission privileges.

IAM federation

You can also use your company’s single sign-on authentication or even sign in using Google or Facebook. MFA (Multi Factor Authentication) can be set up. This involves entering a code from an app on a mobile phone to verify your identity.

Gaining insight

Restrict access by IAM policy and condition keys

SageMaker Roles

SageMaker may use Roles for different tasks. Depending on your security environment you may use a few very broad Roles to perform all SageMaker tasks. The IAM managed policy AmazonSageMakerFullAccess provides a convenient way to explore SageMaker features for investigation and personal training needs. However for use cases where security has a higher priority there will be Roles developed for specific tasks with specific privileges. These AWS docs have examples of the permissions you can use.

To provide isolation of each Notebook, each user can have a Notebook Role that they assume when they log in.

Because SageMaker has a managed environment that has EC2 instances which it creates and scales it needs the ability to pass Roles to other services such as EC2. The ability to pass a Role and assume a Role without human intervention is considered significant from a security perspective. For this reason, in organisations where security is a high priority, SageMaker may be set up in an AWS account of it’s own with cross account access to data in other AWS accounts.

Logging and Monitoring

Amazon CloudWatch is used to monitor SageMaker processing. The CloudWatch Logs enable you to monitor, store, and access your log files from SageMaker events, AWS CloudTrail, and other sources.

Compliance Validation

AWS CloudTrail is used to provide an Audit Trail. CloudTrail logs a record of actions performed in SageMaker and who, or what performed them. CloudTrail captures all API calls for SageMaker, with the exception of InvokeEndpoint, as events.

Compliance programs

Amazon SageMaker has been accessed by third party auditors to confirm compliance with published standards. Below are three standards mentioned in the Exam Readiness course.

  1. PCI DSS – The Payment Card Industry Data Security Standard (PCI DSS) is an information security standard for organizations that handle branded credit cards from the major card schemes. https://en.wikipedia.org/wiki/Payment_Card_Industry_Data_Security_Standard
  2. HIPAA-eligible with BAA – Health Insurance Portability and Accountability, Business Associate Agreement. These are US standards for Medical information privacy. https://en.wikipedia.org/wiki/Medical_privacy
  3. ISO – International Organization for Standardization https://www.iso.org/home.html

Resilience

Resilience is part of the AWS infrastructure of AWS Regions and Availability Zones (AZ). These provide isolation so that problems in one Region or AZ do not affect another. SageMaker uses multiple AZs for it’s managed environment. For example when you specify more than one SageMaker managed EC2 they are automatically created in separate AZs.

Infrastructure Security

VPCs and endpoints

Connecting to SageMaker through a interface VPC endpoint (interface endpoint) ensures that all data is transmitted within the AWS network without exposing the data to the internet. Exposing data to the internet occurs when you access a service with a URL address. With an endpoint this form of addressing is not used, which results in keeping the data transmission within the AWS network.

SageMaker Notebooks can access the internet to download libraries need to process data for Machine Learning. This is enabled by default as is being created in the SageMaker managed VPC. This internet access may be seen as a vulnerability by your organisation, so there are two actions that can be taken.

  1. Remove internet access when the Notebook is created. This can be done in CloudFormation.
  2. Create the Notebook in your own VPC. This enables you to control all the security features to give your Notebooks and data assets the protection you believe they need. Connecting to SageMaker Notebooks via a VPC interface endpoint means that communication between your VPC and the notebook instance is within the AWS network without being exposed to the internet. https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-interface-endpoint.html

SageMaker processing jobs, training jobs, hosted endpoints and batch transform jobs will access your resources, such as data in S3 buckets over the internet. To improve security AWS recommends hosting your data in a private VPC, this is a VPC without access to the internet. SageMaker can access your private VPC via an endpoint which means that all data transmission remains within the AWS environment without any exposure to the internet. https://docs.aws.amazon.com/sagemaker/latest/dg/process-vpc.html

Video – What is an Interface VPC Endpoint and how can I create Interface Endpoint for my VPC?

This is a 5.08 minute video from AWS

Scans

SageMaker automatically scans for Common Vulnerabilities and Exposures (CVE) identified in public vulnerability databases.

Gaining insight

Restrict access by IAM policy and condition keys. An IAM policy can be used to restrict access to SageMaker in general as well as to specific SageMaker services. A Condition Key is logic within an IAM policy that further restricts access at a more granular level. For example using an Amazon Resource Name (ARN), or Service Name to restrict the action of a Role with the policy attached.

Video – An Overview of Amazon SageMaker Security (Level 100)

This is a 26.32 minute video from AWS.

Summary

This Study Guide has introduced IAM features relevant to securing a Machine Learning environment. SageMaker security requirements and features have more explained in more detail. These revision notes have used the AWS course: Exam Readiness: AWS Certified Machine Learning – Specialty as a guide.

This Study Guide covers sub-domain 4.3 of the Machine Learning Implementation and Operations knowledge domain (domain 4). The four sub-domains are:

If you are progressing through the exam structure in order, the next Study Guide to review is for sub-domain 4.4 which is deploying Machine Learning models in production.

Credits

Photo by Anthony Bressy on Unsplash


AWS Certified Machine Learning Study Guide: Specialty (MLS-C01) Exam

This study guide provides the domain-by-domain specific knowledge you need to build, train, tune, and deploy machine learning models with the AWS Cloud. The online resources that accompany this Study Guide include practice exams and assessments, electronic flashcards, and supplementary online resources. It is available in both paper and kindle version for immediate access. (Vist Amazon books)


10 questions and answers

13
Created on By Michael Stainsbury

4.3 AWS security for machine learning (full)

10 test questions that cover sub-domain 4.3, Apply basic AWS security practices to machine learning solutions

1 / 10

A <–?–> improves security by enabling instances and services in a private subnet to connect to the internet, but prevents hosts on the internet from initiating connections.

4 words left

2 / 10

What does an IAM Policy Statement comprises?

3 / 10

The contents of the S3 bucket can be encrypted by KMS encryption. This encryption can be enforced on <–?–> so that all the contents of the bucket are encrypted.

4 / 10

What are the Server Side Encryption methods used with Machine Learning?

5 / 10

What are the security features of a VPC?

6 / 10

What are the IAM sub-features used to organise security privileges?

7 / 10

Compliance Validation is achieved by using <–?–> to provide an Audit Trail. CloudTrail logs a record of actions performed in SageMaker and who, or what performed them.

3 words left

8 / 10

Connecting to SageMaker through a <–?–> ensures that all data is transmitted within the AWS network without exposing the data to the internet.

3 words left

9 / 10

<–?–> are written and maintained by AWS. They allow general types of access to AWS services, or types of jobs.

2 words left

10 / 10

Your score is

The average score is 43%

0%


Pluralsight AWS Certified Machine Learning web page screen shot
Reviews
Pluralsight review – AWS Certified Machine Learning Specialty

Contains affiliate links. If you go to Whizlab’s website and make a purchase I may receive a small payment. The purchase price to you will be unchanged. Thank you for your support. The AWS Certified Machine Learning Specialty learning path from Pluralsight has six high quality video courses taught by expert instructors. Two are introductory…

Amazon Study Guide for the AWS Machine Learning Speciality exam
Reviews
Amazon Study Guide review – AWS Certified Machine Learning Specialty

This Amazon Study Guide review is a review of the official Amazon study guide to accompany the exam. The study guide provides the domain-by-domain specific knowledge you need to build, train, tune, and deploy machine learning models with the AWS Cloud. The online resources that accompany this Study Guide include practice exams and assessments, electronic…


Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *