Machine Learning books on bookshelf

SageMaker text processing algorithms

There are four SageMaker text processing algorithms: BlazingText, LDA, NTM and Sequence-to-sequence. BlazingText converts text to numeric vectors. LDA and NTM identify topics in text documents and Sequence-to-sequence provides machine translation of languages. Each algorithm has it’s own section and embedded video.

These revision notes are part of subdomain 3.2 Select the appropriate model(s) for a given machine learning problem of the exam syllabus.

These revision notes describe the four SageMaker text processing algorithms. Each one processes text differently, although LDA and NTM have the same use case. BlazingText is a precursor for downstream Natural Language Processing. LDA and NTM both provide topic modeling of a large document corpus. Sequence-to sequence performs machine translation of languages.

Two news papers, one in French and one in English to symbolize the SageMaker text processing algorithm Sequence-to-sequence which performs machione translation of languages
Modeling (Domain 3)

Sequence-to-Sequence Algorithm

SageMaker Sequence-to-Sequence algorithm is used for machine translation of languages. The algorithm takes the input sequence of tokens, for example French words, and outputs the translation as a sequence of English words. As well as translation, Sequence-to-Sequence can be used to summarize a document and convert speech to text. Sequence-to-Sequence is a Supervised Learning algorithm….

A photograph of a woman reading a newspaper to symbolize the SageMaker text processing Neural Topic Model (NTM) Algorithm
Modeling (Domain 3)

Neural Topic Model Algorithm

The Neural Topic Model Algorithm (NTM) is used to identify topics in a corpus of documents. NTM uses statistics to group words. The groups are termed Latent Representations because they are identified via word distributions in the documents. The Latent Representations reveal the semantics of the documents and so outperform analysis using the word form…

a photgraph of a curving library bookshelf to symbolize the SageMaker text processing algorithm LDA
Modeling (Domain 3)

Latent Dirichlet Allocation Algorithm

SageMaker Latent Dirichlet Allocation algorithm (LDA) is an Unsupervised Learning algorithm that groups words in a document into topics. The topics are found by a probability distribution of all the words in a document. LDA can be used to discover topics shared by documents within a text corpus. The number of topics is specified by…

a photograph of a burning book held in a hand to symbolize the SageMaker built-in algorithm BlazingText
Modeling (Domain 3)

BlazingText Algorithm

BlazingText is the name AWS calls it’s SageMaker built-in algorithm that can identify relationships between words in text documents. These relationships, which are also called embeddings, are expressed as vectors. The semantic relationship between words is preserved by the vectors which cluster words with similar semantics together. This conversion of words to meaningful numeric vectors…

Whizlab’s AWS Certified Machine Learning Specialty practice exams

Practice Exams with 271 questions, Video Lectures and Hands-on Labs from Whizlabs

Whizlab’s AWS Certified Machine Learning Specialty Practice tests are designed by experts to simulate the real exam scenario. The questions are based on the exam syllabus outlined by official documentation. These practice tests are provided to the candidates to gain more confidence in exam preparation and self-evaluate them against the exam content.

Practice test content

  • Free Practice test – 15 questions
  • Practice test 1 – 65 questions
  • Practice test 2 – 65 questions
  • Practice test 3 – 65 questions
Whizlabs AWS certified machine learning course with a robot hand

Section test content

  • Core ML Concepts – 10 questions
  • Data Engineering – 11 questions
  • Exploratory Data Analysis – 13 questions
  • Modeling – 15 questions
  • Machine Learning Implementation and Operations – 12 questions

Similar Posts