Amazon SageMaker Ground Truth
Ground Truth Overview
Amazon SageMaker Ground Truth is a service you can use to manually label data. This provides high quality labelled data in the preprocessing stage to be used to train Supervised Learning models. Training data is sent to AWS and they take care of the rest returning your data with attached labels processed by humans.
Video: Build Highly Accurate Training Datasets at Reduced Costs with Amazon SageMaker Ground Truth
Workforce options
You can also create private Ground Truth jobs where you provide the work force and use Ground Truth to manage the workflow. There are three workforce options:
- You provide the workforce
- Third party vendor
- Amazon Mechanical Turk
Automated data labeling by Ground Truth
A proportion of the data provided to GroundTruth is given to human workers to manually label. This is then used to train a model to label the remaining data. Some of the automatically labeled data is returned to the human workforce for validation to check how accurate the automatic labeling was. This cycle continues until the automatic labeling is good enough to not need human intervention.
Video: Build Highly Accurate Training Datasets Using Amazon SageMaker Ground Truth
This AWS video by Kate Werling is 28 minutes long, with the first 13 minutes spent introducing Ground Truth followed by two demos. Here are the timings, in minutes, so you can select the parts most relevant to your studies:
- 0 – How can we build Machine Learning Models faster
- 4.45 – What kind of training data do I need?
- 7.10 – What is supervised learning?
- 9.19 – Amazon SageMaker Ground Truth, how it works.
- 10.18 – Amazon Mechanical Turk mentioned
- 13.22 – Demo – Mechanical Turk labeling images
- 21.28 – Demo – Aerial photography
- 33.10 – AWS Marketplace
- 34.20 Amazon SageMaker Endpoints
- 38.39 – End
Credits
- Photo by Michael Carruth on Unsplash