Hyperparameters can be thought of as the external controls that influence how the model operates, just as flight instruments control how an aeroplane flies. These values are external to the model and are controlled by the user. They can influence how an algorithm is trained and the structure of the final model.
The optimized settings are difficult to determine empirically although prior experience with the model and data may help. Exhaustive manual searches for the best hyperparameters would take a long time and use a lot of computing resources. This is why automatic Hyperparameter tuning is used to search for the optimum values.
This study guide cover subdomain 3.4 Perform hyperparameter optimization of the AWS exam. More information on the syllabus can be found in: AWS Machine Learning exam syllabus
- Microsoft: Hyperparameter tuning a model – Azure Machine Learning
- Google: Overview of hyperparameter tuning | AI Platform Training
- Hyperparameters Optimization. An introduction on how to fine-tune Machine and Deep Learning models
- Wikipedia: Hyperparameter optimization
Video: Hyperparameter Tuning with Amazon SageMaker’s Automatic Model Tuning – AWS Online Tech Talks
This is a 47.49 minutes video by Leo Dirac from AWS.
- 0 – Introduction
- 0.48 – What’s a Hyperparameter, and why does it need tuning?
- 1.06 – Supervised Machine Learning
- 1.52 – Always perfect ML algorithm
- 2.49 – Overfitting
- 4.28 – Regularization
- 5.40 – Learning rate
- 7.22 – Hyperparameter: = “Any decision the algorithm author can’t make for you”.
- 7.55 – Configurable expressiveness
- 9.27 – Types of hyperparameters – 5 types
- 12.40 – Tuning strategies
- 14.20 – Grid search
- 14.56 – High dimensional grid search
- 16.54 – Random search
- 19.55 – Surrogate Model – Byasian search
- 23.40 – E.I – Expected Improvement
- 25.42 – Demo of automatic model tuning in SageMaker
- 35.49 – Tips for Model Tuning – effectively using Automatic Model Tuning
- 36.03 – Clarify your Goals
- 36.54 – Quality vs time trade off
- 41.36 – Intuition about Hyperparameters
- 43.45 – More tuning tips
- 46.42 – Automatic Model tuning is a productivity, it is not magic
- 47.49 – End
What are Parameters and Hyperparameters
Parameters are internal variables that can be estimated from the training data. They change during training and are preserved after training as part of the learned model. Parameters are used by the model to make predictions. Hyperparameters are external configurations set by the user. They control or influence how the model learns during training and do not change whilst the training job is running. Model tuning is essentially about optimizing the values of Hyperparameters.
The Characteristics of Parameters and Hyperparameters.
|Internal values||External values|
|Estimated or learned from data||Defined by the user|
|Usually saved as part of the trained model||Not part of the trained model|
What is model tuning
Model tuning is also known as hyperparameter optimization. Hyperparameters are variables that control the training process. These are configuration variables that do not change during a Model training job. Model tuning provides optimized values for hyperparameters, which maximize your model’s predictive accuracy.
Each model has its own Hyperparameters, some are unique, some are similar across a class of algorithms. For example, XG boost has tree depth and maximum leaf nodes, whereas Neural Networks have hyperparameters for number of layers and hidden width.
Additional Feature Engineering
Model tuning may identify opportunities to enhance the value of the data with further Feature Engineering. For more information about Feature Engineering see: Feature Engineering for Machine Learning
When changing hyperparameters to see if the model improves you need to consider:
- Which hyperparameters are the most influential for your model.
- Which values you should pick.
- How many combinations of hyperparameters should you try.
Types of Hyperparameters
There are three types of hyperparameters:
- Model hyperparameters
- Optimizer hyperparameters
- Data hyperparameters
Model hyperparameters describe the structure of the model, for example:
- Number of hidden units
- First hidden layer
- Number of layers
Model hyperparameters are used a lot for neural networks where the size and shape of the neural network has to be defined before training can start.
Optimizer hyperparameters are concerned with the optimizing and training process. Examples are:
- Learning rate
- Minibatch size
- Number of epochs
Optimizer hyperparameters control how the algorithm learns:
- SGD (Stochastic Gradient Descent)
Data hyperparameters relate to attributes of the data. For image data this can include resizing or cropping information.
How to tune Hyperparameters
Manual tuning of Hyperparameters
In manual tuning the values of the algorithms hyperparameters are determined by experience or intuition. The results of one training epoch are used to guess hyperparameters for the next training epoch.
Automated Hyperparameter tuning
In automated hyperparameter tuning, or optimization, multiple training jobs are submitted with different hyperparameter values. The hyperparameter values are chosen from a range set by the user. After each job completes the trained Model it is tested using validation data and a specified target variable. Target variables are also known as hyperparameter metrics.
- AWS: Perform Automatic Model Tuning – Amazon SageMaker
- Coursera video: Automatic model tuning using Amazon SageMaker – Introduction to Amazon SageMaker
For SageMaker built-in algorithms the hyperparameter tuning job is passed a JSON object with the hyperparameter details specified, which includes:
- The hyperparameters to be tuned
- The ranges for the hyperparameters values
- Maximum resource limits the tuning job can use
- The objective metric for the hyperparameter tuning job
Video: Tune Your ML Models to the Highest Accuracy with Amazon SageMaker Automatic Model Tuning
A 19.52 minute video by Emily Webber from AWS.
- 0 – Quick recap on Model Training
- 1.44 – What is a tuning job
- 2.05 – Hyperparameter tuning jobs
- 3.05 – Bayesian optimizer
- 5.14 – How do I set up a hyperparameter tuning job
- 6.25 – Can I use hyperparameter tuning with your own model
- 7.04 – What if I need all my jobs tuned at the same time
- 8.35 – Can I stop a job early if the model is not getting better
- 9.25 – How can I maximise efficiency across tuning jobs
- 10.58 – How do I compare results across tuning jobs
- 12.10 – Demo
- 18.15 – Pro tips
Using search methods to tune Hyperparameters
In hyperparameter tuning or optimization the process of optimizing the hyperparameters is called Searching. The user defines a Search Space which is a n-dimensional volume containing hyperparameters within ranges chosen by the user.
- Hyperparameter Optimization With Random Search and Grid Search
- AWS: How Hyperparameter Tuning Works – Amazon SageMaker
- Hyper-parameter optimization algorithms: a short review
A two dimensional table, or grid, of hyperparameter values is used to provide hyperparameter values that are then tested. Because of the large number of combinations this method can be very resource intensive.
This method is similar to Grid search however the hyperparameter pairs are chosen randomly. This process proceeds until a resource limit, for example time, has been reached.
This method is a search guided by a statistical process to find optimum values for hyperparameters as quickly as possible. The Bayesian Search treats the choice of hyperparameter values as a linear regression problem. The results of previous training jobs are used to improve the selection of the next set of hyperparameters. Sometimes the new hyperparameters are close to the previous ones to find subtle improvements, other times the values chosen are distant to search the range of values.
SageMaker Hyperparameter tuning jobs
A SageMaker hyperparameter tuning job has three inputs:
- Tuning job name – HyperParameterTuningJobName
- Tuning job config – HyperParameterTuningJobConfig
- Training job definition – TrainingJobDefinition
Tuning job name
A unique name for management purposes. This is passed in the HyperParameterTuningJobName tuning job parameter
Tuning job config
The tuning job config is provided in the HyperParameterTuningJobConfig tuning job parameter. This is a JSON object that contains values for:
- Names and ranges of hyperparameters
- Resource limits, for example: MaxNumberOfTrainingJobs
- Objective metrics, search strategy
Training job definition
The training job definition is provided in the TrainingJobDefinition tuning job parameter. This is a JSON object that contains values for:
- Metrics that the training jobs emit when using a custom training algorithm.
- Training algorithm Docker container image
- S3 location of training data
- S3 location of test data
- S3 location of output
- Hyperparameters that are not being tuned
- Instance definition
- The maximum duration for the training job
- AWS: Configure and Launch a Hyperparameter Tuning Job – Amazon SageMaker
- AWS: Example: Hyperparameter Tuning Job – Amazon SageMaker
- PDF slides: Optimizing Your Machine Learning Models on Amazon SageMaker
Video: Improve model quality with Amazon SageMaker Automatic Model Tuning by Kumar Venkateswar
A 36.11 video by Kumar Venkateswar from AWS.
Common hyperparameters to tune
Momentum hyperparameters speed up the learning process. This can prevent the algorithm from becoming stuck at a local minima and increase the chance of finding the global minima. Momentum hyperparameters can also prevent oscillation, or noise whilst searching for the minima.
Optimizers control how the model learns.
To allow deep learning models to learn non-linear prediction boundaries Activation Functions are used. Activation Functions introduce nonlinearity to models. A commonly used Activation Function is the Rectifier Activation Function.
The Learning Rate hyperparameter determines the step size of each iteration in search of the minima. A small Learning Rate will increase the time and processing to find the minima and result in no convergence or oscillation. A large Learning Rate will decrease processing time, but may result is oscillation.
- Wikipedia: Learning rate
Regularization is used to avoid overfitting. This is achieved by penalizing overfitting values during model training. This has the effect of reducing Model parameters and simplifying the Model. Regularization biases data towards certain values. It does this by adding a tuning hyperparameter value to make biased values more likely to appear.
L1 / L2
L1 Regularization adds a L1 penalty equal to the absolute value of the coefficients. This can lead to sparse models with few coefficients. L1 Regularization penalises small weights more than L2and can be used to identify features to drop. L1 Regularization is also known as Lasso Regularization.
L2 Regularization adds a L2 penalty which is the square of the magnitude of the coefficients. No coefficient is limited and so this does not lead to sparse data. L2 Regularization is also known as Ridge Regularization.
Video: Ridge vs Lasso Regression, Visualized!!!
This 9 minutes video, by Josh Starmer, describes L1 and L2 Regularization using good graphs and visualization techniques.
Dropout can be used to avoid overfitting. It is a regularizing technique used to increase the generalizing capability of the algorithm. When used this hyperparameter has the effect of cancelling neurons in the neural network, that is outputs of the layer are dropped or retained.
Six tuning best practices
Hyperparameter tuning, or optimization can be extremely resource intensive. To control the resource requirements and improve optimization, best practices can be employed.
- Number of Hyperparameters. Whilst SageMaker limits you to searching 20 hyperparameters it is best to search much lees. This is because the computational complexity increases with the number of hyperparameters in the Search Space.
- Hyperparameter ranges. Better results can be obtained by limiting the range of the hyperparameters to be searched. This is where prior experience with optimization with a type of data and algorithm can be employed. Restricting the range controls the size of the Search Space.
- Log scales for hyperparameters. SageMaker initially presumes a variable is linear scaled and will only process values as logged scaled once it has identified it as being a logarithmic variable. So convert log-scales variables to linear-scaled to speed up processing.
- Concurrent training jobs. Running lots of training jobs concurrently will complete optimizing quicker, but sequential processing will produce better results. This is because each completed training job provides information to improve the next training job. With concurrent training jobs there is much less opportunity to share this information with subsequent jobs. So Concurrency is a trade off between speed and quality.
- Using multiple instances. When running training jobs on multiple instances there is a similar communication issue as running jobs concurrently. You have to make sure the correct objective metric is communicated and used.
- Use Bayesian search. Bayesian search is better, cheaper and faster way to tune hyperparameters. Bayesian typically requires 10x fewer jobs than random search.
This Study Guide has covered hyperparameter optimization which is also called model tuning. Hyperparameters are the external controls used to control how the model trains and operates. Whilst hyperparameters can be controlled manually and optimized by trial and error, SageMaker also has an automatic hyperparameter optimization feature.
Contains affiliate links. If you go to Whizlab’s website and make a purchase I may receive a small payment. The purchase price to you will be unchanged. Thank you for your support.
Whizlabs AWS Certified Machine Learning Specialty
Whizlab’s AWS Certified Machine Learning Specialty Practice tests are designed by experts to simulate the real exam scenario. The questions are based on the exam syllabus outlined by official documentation. These practice tests are provided to the candidates to gain more confidence in exam preparation and self-evaluate them against the exam content.
Practice test content
- Free Practice test – 15 questions
- Practice test 1 – 65 questions
- Practice test 2 – 65 questions
- Practice test 3 – 65 questions
Section test content
- Core ML Concepts – 10 questions
- Data Engineering – 11 questions
- Exploratory Data Analysis – 13 questions
- Modeling – 15 questions
- Machine Learning Implementation and Operations – 12 questions
10 questions and answers
Whizlab’s AWS Certified Machine Learning Specialty course
- In Whizlabs AWS Machine Learning certification course, you will learn and master how to build, train, tune, and deploy Machine Learning (ML) models on the AWS platform.
- Whizlab’s Certified AWS Machine Learning Specialty practice tests offer you a total of 200+ unique questions to get a complete idea about the real AWS Machine Learning exam.
- Also, you get access to hands-on labs in this course. There are about 10 lab sessions that are designed to take your practical skills on AWS Machine Learning to the next level.
- 9 Practice tests with 271 questions
- Video course with 65 videos
- 9 hands on labs