A photograph of the flight instruments of an aeroplane to symbolize model hyperparameters

Model tuning

Hyperparameters can be thought of as the external controls that influence how the model operates, just as flight instruments control how an aeroplane flies. These values are external to the model and are controlled by the user. They can influence how an algorithm is trained and the structure of the final model.

The optimized settings are difficult to determine empirically although prior experience with the model and data may help. Exhaustive manual searches for the best hyperparameters would take a long time and use a lot of computing resources. This is why automatic Hyperparameter tuning is used to search for the optimum values.

This study guide cover subdomain 3.4 Perform hyperparameter optimization of the AWS exam. More information on the syllabus can be found in: AWS Machine Learning exam syllabus

Questions

To confirm your understanding scroll to the bottom of the page for questions and answers.

Video: Hyperparameter Tuning with Amazon SageMaker’s Automatic Model Tuning – AWS Online Tech Talks

This is a 47.49 minutes video by Leo Dirac from AWS.

  • 0 – Introduction
  • 0.48 – What’s a Hyperparameter, and why does it need tuning?
  • 1.06 – Supervised Machine Learning
  • 1.52 – Always perfect ML algorithm
  • 2.49 – Overfitting
  • 4.28 – Regularization
  • 5.40 – Learning rate
  • 7.22 – Hyperparameter: = “Any decision the algorithm author can’t make for you”.
  • 7.55 – Configurable expressiveness
  • 9.27 – Types of hyperparameters – 5 types
  • 12.40 – Tuning strategies
  • 14.20 – Grid search
  • 14.56 – High dimensional grid search
  • 16.54 – Random search
  • 19.55 – Surrogate Model – Byasian search
  • 23.40 – E.I – Expected Improvement
  • 25.42 – Demo of automatic model tuning in SageMaker
  • 35.49 – Tips for Model Tuning – effectively using Automatic Model Tuning
  • 36.03 – Clarify your Goals
  • 36.54 – Quality vs time trade off
  • 41.36 – Intuition about Hyperparameters
  • 43.45 – More tuning tips
  • 46.42 – Automatic Model tuning is a productivity, it is not magic
  • 47.49 – End

What are Parameters and Hyperparameters

Parameters are internal variables that can be estimated from the training data. They change during training and are preserved after training as part of the learned model. Parameters are used by the model to make predictions. Hyperparameters are external configurations set by the user. They control or influence how the model learns during training and do not change whilst the training job is running. Model tuning is essentially about optimizing the values of Hyperparameters.

The Characteristics of Parameters and Hyperparameters.

ParametersHyperparameters
TrainedTuned
Internal valuesExternal values
Estimated or learned from dataDefined by the user
Usually saved as part of the trained modelNot part of the trained model

Model Tuning

What is model tuning

Model tuning is also known as hyperparameter optimization. Hyperparameters are variables that control the training process. These are configuration variables that do not change during a Model training job. Model tuning provides optimized values for hyperparameters, which maximize your model’s predictive accuracy. 

Each model has its own Hyperparameters, some are unique, some are similar across a class of algorithms. For example, XG boost has tree depth and maximum leaf nodes, whereas Neural Networks have hyperparameters for number of layers and hidden width.

Additional Feature Engineering

Model tuning may identify opportunities to enhance the value of the data with further Feature Engineering. For more information about Feature Engineering see: Feature Engineering for Machine Learning

Tweaking hyperparameters

When changing hyperparameters to see if the model improves you need to consider:

  • Which hyperparameters are the most influential for your model.
  • Which values you should pick.
  • How many combinations of hyperparameters should you try.

Types of Hyperparameters

There are three types of hyperparameters:

  1. Model hyperparameters
  2. Optimizer hyperparameters
  3. Data hyperparameters

Model

Model hyperparameters describe the structure of the model, for example:

  • Number of hidden units
  • First hidden layer
  • Number of layers

Model hyperparameters are used a lot for neural networks where the size and shape of the neural network has to be defined before training can start.

Optimizer

Optimizer hyperparameters are concerned with the optimizing and training process. Examples are:

  • Learning rate
  • Minibatch size
  • Number of epochs

Optimizer hyperparameters control how the algorithm learns:

  • Adam
  • SGD (Stochastic Gradient Descent)

Data

Data hyperparameters relate to attributes of the data. For image data this can include resizing or cropping information.

How to tune Hyperparameters

Manual tuning of Hyperparameters

In manual tuning the values of the algorithms hyperparameters are determined by experience or intuition. The results of one training epoch are used to guess hyperparameters for the next training epoch.

Automated Hyperparameter tuning

In automated hyperparameter tuning, or optimization, multiple training jobs are submitted with different hyperparameter values. The hyperparameter values are chosen from a range set by the user. After each job completes the trained Model it is tested using validation data and a specified target variable. Target variables are also known as hyperparameter metrics.

For SageMaker built-in algorithms the hyperparameter tuning job is passed a JSON object with the hyperparameter details specified, which includes:

  1. The hyperparameters to be tuned
  2. The ranges for the hyperparameters values
  3. Maximum resource limits the tuning job can use
  4. The objective metric for the hyperparameter tuning job

Video: Tune Your ML Models to the Highest Accuracy with Amazon SageMaker Automatic Model Tuning

A 19.52 minute video by Emily Webber from AWS.

  • 0 – Quick recap on Model Training
  • 1.44 – What is a tuning job
  • 2.05 – Hyperparameter tuning jobs
  • 3.05 – Bayesian optimizer
  • 5.14 – How do I set up a hyperparameter tuning job
  • 6.25 – Can I use hyperparameter tuning with your own model
  • 7.04 – What if I need all my jobs tuned at the same time
  • 8.35 – Can I stop a job early if the model is not getting better
  • 9.25 – How can I maximise efficiency across tuning jobs
  • 10.58 – How do I compare results across tuning jobs
  • 12.10 – Demo
  • 18.15 – Pro tips

Using search methods to tune Hyperparameters

In hyperparameter tuning or optimization the process of optimizing the hyperparameters is called Searching. The user defines a Search Space which is a n-dimensional volume containing hyperparameters within ranges chosen by the user.

Grid search

A two dimensional table, or grid, of hyperparameter values is used to provide hyperparameter values that are then tested. Because of the large number of combinations this method can be very resource intensive.

Random search

This method is similar to Grid search however the hyperparameter pairs are chosen randomly. This process proceeds until a resource limit, for example time, has been reached.

Bayesian search

This method is a search guided by a statistical process to find optimum values for hyperparameters as quickly as possible. The Bayesian Search treats the choice of hyperparameter values as a linear regression problem. The results of previous training jobs are used to improve the selection of the next set of hyperparameters. Sometimes the new hyperparameters are close to the previous ones to find subtle improvements, other times the values chosen are distant to search the range of values.

Additional subjects

SageMaker Hyperparameter tuning jobs

A SageMaker hyperparameter tuning job has three inputs:

  1. Tuning job name – HyperParameterTuningJobName
  2. Tuning job config – HyperParameterTuningJobConfig
  3. Training job definition – TrainingJobDefinition

Tuning job name

A unique name for management purposes. This is passed in the HyperParameterTuningJobName tuning job parameter

Tuning job config

The tuning job config is provided in the HyperParameterTuningJobConfig tuning job parameter. This is a JSON object that contains values for:

  • Names and ranges of hyperparameters
  • Resource limits, for example: MaxNumberOfTrainingJobs
  • Objective metrics, search strategy

Training job definition

The training job definition is provided in the TrainingJobDefinition tuning job parameter. This is a JSON object that contains values for:

  • Metrics that the training jobs emit when using a custom training algorithm.
  • Training algorithm Docker container image
  • S3 location of training data
  • S3 location of test data
  • S3 location of output
  • Hyperparameters that are not being tuned
  • Instance definition
  • The maximum duration for the training job

Video: Improve model quality with Amazon SageMaker Automatic Model Tuning by Kumar Venkateswar

A 36.11 video by Kumar Venkateswar from AWS.

Common hyperparameters to tune

Momentum

Momentum hyperparameters speed up the learning process. This can prevent the algorithm from becoming stuck at a local minima and increase the chance of finding the global minima. Momentum hyperparameters can also prevent oscillation, or noise whilst searching for the minima.

Optimizers

Optimizers control how the model learns.

Activation Functions

To allow deep learning models to learn non-linear prediction boundaries Activation Functions are used. Activation Functions introduce nonlinearity to models. A commonly used Activation Function is the Rectifier Activation Function.

Learning Rate

The Learning Rate hyperparameter determines the step size of each iteration in search of the minima. A small Learning Rate will increase the time and processing to find the minima and result in no convergence or oscillation. A large Learning Rate will decrease processing time, but may result is oscillation.

Regularization

Regularization is used to avoid overfitting. This is achieved by penalizing overfitting values during model training. This has the effect of reducing Model parameters and simplifying the Model. Regularization biases data towards certain values. It does this by adding a tuning hyperparameter value to make biased values more likely to appear.

L1 / L2

L1 Regularization adds a L1 penalty equal to the absolute value of the coefficients. This can lead to sparse models with few coefficients. L1 Regularization is also known as Lasso Regularization.

L2 Regularization adds a L2 penalty which is the square of the magnitude of the coefficients. No coefficient is limited and so this does not lead to sparse data. L2 Regularization is also known as Ridge Regularization.

Video: Ridge vs Lasso Regression, Visualized!!!

This 9 minutes video, by Josh Starmer, describes L1 and L2 Regularization using good graphs and visualization techniques.

Dropout

Dropout can be used to avoid overfitting. It is a regularizing technique used to increase the generalizing capability of the algorithm. When used this hyperparameter has the effect of cancelling neurons in the neural network, that is outputs of the layer are dropped or retained.

Six tuning best practices

Hyperparameter tuning, or optimization can be extremely resource intensive. To control the resource requirements and improve optimization, best practices can be employed.

  1. Number of Hyperparameters. Whilst SageMaker limits you to searching 20 hyperparameters it is best to search much lees. This is because the computational complexity increases with the number of hyperparameters in the Search Space.
  2. Hyperparameter ranges. Better results can be obtained by limiting the range of the hyperparameters to be searched. This is where prior experience with optimization with a type of data and algorithm can be employed. Restricting the range controls the size of the Search Space.
  3. Log scales for hyperparameters. SageMaker initially presumes a variable is linear scaled and will only process values as logged scaled once it has identified it as being a logarithmic variable. So convert log-scales variables to linear-scaled to speed up processing.
  4. Concurrent training jobs. Running lots of training jobs concurrently will complete optimizing quicker, but sequential processing will produce better results. This is because each completed training job provides information to improve the next training job. With concurrent training jobs there is much less opportunity to share this information with subsequent jobs. So Concurrency is a trade off between speed and quality.
  5. Using multiple instances. When running training jobs on multiple instances there is a similar communication issue as running jobs concurrently. You have to make sure the correct objective metric is communicated and used.
  6. Use Bayesian search. Bayesian search is better, cheaper and faster way to tune hyperparameters. Bayesian typically requires 10x fewer jobs than random search.

Summary

This Study Guide has covered hyperparameter optimization which is also called model tuning. Hyperparameters are the external controls used to control how the model trains and operates. Whilst hyperparameters can be controlled manually and optimized by trial and error, SageMaker also has an automatic hyperparameter optimization feature.

Credits

Aeroplane flight controls photo Abby AR on Unsplash


AWS Certified Machine Learning Study Guide: Specialty (MLS-C01) Exam

This study guide provides the domain-by-domain specific knowledge you need to build, train, tune, and deploy machine learning models with the AWS Cloud. The online resources that accompany this Study Guide include practice exams and assessments, electronic flashcards, and supplementary online resources. It is available in both paper and kindle version for immediate access. (Vist Amazon books)


10 questions and answers

9
Created on By Michael Stainsbury

3.4 Model tuning (Sliver)

10 test type questions that cover subdomain 3.4 Perform hyperparameter optimization of the Modelling knowledge domain.

1 / 10

What can Momentum hyperparameters prevent?

prevent the algorithm from becoming stuck at a local minima.
prevent oscillation, or noise whilst searching for the minima.

2 / 10

What controls the values of the Hyperparameters?

What are the types of Hyperparameters?

  1. Model hyperparameters
  2. Optimizer hyperparameters
  3. Data hyperparameters

3 / 10

What are the types of Hyperparameters?

4 / 10

The <–?–> search is guided by a statistical process to find optimum values for Hyperparameters as quickly as possible.

1 words left

5 / 10

The process of optimizing the hyperparameters called <–?–>.

6 / 10

What are the inputs to a SageMaker hyperparameter tuning job?

7 / 10

What is the main aim of Model Tuning?

8 / 10

Dropout is a regularizing technique used to increase the generalizing capability of the algorithm and can be used to avoid <–?–>.

9 / 10

The two types of Regularization are called <–?–> Regularization.

3 words left

10 / 10

What types of search used to find the optimal values of hyperparameters?

Your score is

The average score is 68%

0%


Amazon Study Guide for the AWS Machine Learning Speciality exam
Reviews
Amazon Study Guide review – AWS Certified Machine Learning Specialty

This Amazon Study Guide review is a review of the official Amazon study guide to accompany the exam. The study guide provides the domain-by-domain specific knowledge you need to build, train, tune, and deploy machine learning models with the AWS Cloud. The online resources that accompany this Study Guide include practice exams and assessments, electronic…

Pluralsight AWS Certified Machine Learning web page screen shot
Reviews
Pluralsight review – AWS Certified Machine Learning Specialty

Contains affiliate links. If you go to Whizlab’s website and make a purchase I may receive a small payment. The purchase price to you will be unchanged. Thank you for your support. The AWS Certified Machine Learning Specialty learning path from Pluralsight has six high quality video courses taught by expert instructors. Two are introductory…


Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *