What is Supervised Learning?
For Supervised Learning you need labeled training data. In Supervised Learning we provide data that has already been identified and therefore labeled, as being what we are looking for. Once the Machine Learning model has been trained it can then be presented with real unknown data to which the Machine Learning Model can select a correct label. It is the presence of labels that identify an algorithm as using Supervised Learning
These revision notes are part of sub-domain 3.1, Frame business problems as machine learning problems, of the Modeling domain of the AWS Machine Learning Speciality exam. A description of all the knowledge domains in the exam is in these revision notes: AWS Machine Learning exam syllabus
How is Supervised Learning used?
You could, for example, train an algorithm to recognise what healthy fruit looked like and the appearance diseased of diseased fruit. This could be achieved by using training data with pictures of healthy and diseased fruit that had been labelled manually by people. People will have to be trained to identify diseased fruit and then apply the correct labels. To help with this task Amazon provide two services:
Once the model is trained it can be given images of fruit to determine if they are healthy or diseased.
The advantages and disadvantages of Supervised Learning?
Supervised learning allows you to predict future events from accumulated historical data. Performance criteria can be optimised using experience. It is usually less computationally intensive since the models can be simpler. The training often requires significant resources due to the quantity of data. The selection of good quality training data is important for performance. However this can lead to the introduction of bias and the risk of over fitting. It is a popular method of solving real world problems.
The applications of Supervised Learning?
Here are some examples of the applications of Supervised Learning:
- Image recognition
- Predictive analytics. This helps businesses to forecast future events based on the accumulated historical data.
- Customer sentiment analysis. This is the study of text communication from customers to determine their sentiment from the words and phrases they use. Amazon Comprehend is an AI service that also provides sentiment analysis
- Spam detection. Spam emails are detected and directed to a spam directory based on actions performed on similar emails in the past.
There are two main types of Supervised Learning:
Video: Classification and Regression in Machine Learning
This is a brief introduction of Supervised Learning by Max Margenot from Quantopian. It is 2.48 minutes long.
Classification Machine Learning
What is Classification?
Classification is a type of Supervised Learning that results in data receiving a specific label to identify it as being a member of a class. There are two types of classification:
- Binary classification – two choices or labels
- Multiclass classification – more than two labels
In Binary Classification there are two choices or labels. An example of binary classification is a model that can detect the presence or absence of disease in a type of fruit. In Multiclass Classification there are more than two choices or labels. An example of a multiclass classification model is one that can identify the specific disease, or disease type.
Examples of classes used in Classification
These are some examples of classes that can be used with a Classification model. This group are for binary classification where the data is classified in to one of two classes:
- Identification of Male or Female responders to comments on a e-commerce website.
- classification of spam email and non spam email
- positive and negative sentiment in text messages or twitter feeds
This group are for multi-class classification where the data is classified in to more than two classes:
- classification of types of soil
- classification of types of crops
- classification of mood/feelings in songs/music
Applications of Classification Machine Learning
Some applications of Classification are:
- speech recognition
- handwriting recognition
- biometric identification
- document classification
What SageMaker algorithms use Classification?
- Linear Learner – for binary classification
- XG Boost – for multiclass classification
- K Nearest Neighbor – for multiclass classification
Recommendation is a type of Supervised Learning built on top of classification. Recommendations are provided by many of the web based services we interact with, for example:
- Google play
Netflix analyzes your previous viewing history to predict films that you are likely to want to watch. This is a Supervised Machine Learning process. All the films have associated metadata, information that describes the film. This may include genre, Age rating, starring actors etc. These are multiclass classifications that are used by the ML algorithms to match, score and rank films to suggest to their customers. Amazon Personalize is AI service that provides personalized recommendations.
What SageMaker algorithms support recommendations?
Regression for Machine Learning
Regression is a supervised learning technique which helps in finding the correlation between variables and enables us to predict the continuous output variable based on the one or more predictor variables.
A regression problem is when the output variable is a real value such as dollars or weight. The model attempts to find a relationship between dependent and independent variables. The dependent variable is the output. The independent variable is the input. Therefore the output depends on the independent input. These types of values are described as continuous variables because they can be as low as zero, or as high as infinity and assume any value in between. In Regression negative values are often changed to be positive ones to prevent negative predictions. For example temperature in Fahrenheit, which can contain negative values, could be converted to the Kelvin range which only contains positive values.
There are many types of regression, for example:
Video: Making Friends with Machine Learning: Regression
This is a 19 minute video by Cassie Kozyrkov from Google. If you need a gentle high level introduction to Regression this is a good place to start.
Linear regression is a common type of regression used in Machine Learning. The Linear Regression algorithm works by finding the line that is closest to the data points on average. Once this line in known predictions can be made by finding the output from a supplied input using the line. More complex relationships, or graph shapes, can be described by using other Regression techniques.
What is Time Series Forecasting?
Time series analysis is built on top of Regression. Time series data uses time to add order and structure to the data. So the observations are dependent on the time. Time series analysis uses historical data to predict future events. For example:
- Forecasting crop yields
- Forecasting commodities such as gasoline prices
- Stock Market forecasting
This is an example of regression and time series data. In this case the time period is in years. The line was created by using the Trend Line feature of Microsoft Excel. The data came from a UK government website providing historical data on agriculture. The graph shows the increasing weight of pesticides in strawberry cultivation over time.
The equation of the line on the graph is like this:
y = ax + b
This simplicity begins to explain why Supervised Learning may require large resources in the training phase, to create the equation, followed by smaller resources in production. This also explains how a Machine Learning model can be installed on edge devices that have constrained resources.
- Add a Linear Regression Trendline to an Excel Scatter Plot (online-tech-tips.com)
- Agriculture in the United Kingdom data sets
What SageMaker algorithms support Regression?
Supervised Learning is Machine Learning with labeled data. The supervision is the labels and the assessment in training that measures inferences against the known outcome. There are two main types of Supervised Learning: Classification and Regression.
- Image by Aline Dassel from Pixabay
- Image of strawberry sprayer, copyright 2021 NJ Seymore Ltd.
- Unsupervised learning infographic
Contains affiliate links. If you go to Whizlab’s website and make a purchase I may receive a small payment. The purchase price to you will be unchanged. Thank you for your support.
Whizlabs AWS Certified Machine Learning Specialty
Whizlab’s AWS Certified Machine Learning Specialty Practice tests are designed by experts to simulate the real exam scenario. The questions are based on the exam syllabus outlined by official documentation. These practice tests are provided to the candidates to gain more confidence in exam preparation and self-evaluate them against the exam content.
Practice test content
- Free Practice test – 15 questions
- Practice test 1 – 65 questions
- Practice test 2 – 65 questions
- Practice test 3 – 65 questions
Section test content
- Core ML Concepts – 10 questions
- Data Engineering – 11 questions
- Exploratory Data Analysis – 13 questions
- Modeling – 15 questions
- Machine Learning Implementation and Operations – 12 questions
Questions and answers test app
Whizlab’s AWS Certified Machine Learning Specialty course
- In Whizlabs AWS Machine Learning certification course, you will learn and master how to build, train, tune, and deploy Machine Learning (ML) models on the AWS platform.
- Whizlab’s Certified AWS Machine Learning Specialty practice tests offer you a total of 200+ unique questions to get a complete idea about the real AWS Machine Learning exam.
- Also, you get access to hands-on labs in this course. There are about 10 lab sessions that are designed to take your practical skills on AWS Machine Learning to the next level.
- 9 Practice tests with 271 questions
- Video course with 65 videos
- 9 hands on labs