There are four SageMaker text processing algorithms: BlazingText, LDA, NTM and Sequence-to-sequence. BlazingText converts text to numeric vectors. LDA and NTM identify topics in text documents and Sequence-to-sequence provides machine translation of languages. Each algorithm has it’s own section and embedded video.
These revision notes are part of subdomain 3.2 Select the appropriate model(s) for a given machine learning problem of the exam syllabus.
These revision notes describe the four SageMaker text processing algorithms. Each one processes text differently, although LDA and NTM have the same use case. BlazingText is a precursor for downstream Natural Language Processing. LDA and NTM both provide topic modeling of a large document corpus. Sequence-to sequence performs machine translation of languages.
BlazingText is the name AWS calls it’s SageMaker built-in algorithm that can identify relationships between words in text documents. These relationships, which are also called embeddings, are expressed as vectors. The semantic relationship between words is preserved by the vectors which cluster words with similar semantics together. This conversion of words to meaningful numeric vectors…
SageMaker Sequence-to-Sequence algorithm is used for machine translation of languages. The algorithm takes the input sequence of tokens, for example French words, and outputs the translation as a sequence of English words. As well as translation, Sequence-to-Sequence can be used to summarize a document and convert speech to text. Sequence-to-Sequence is a Supervised Learning algorithm….
The Neural Topic Model Algorithm (NTM) is used to identify topics in a corpus of documents. NTM uses statistics to group words. The groups are termed Latent Representations because they are identified via word distributions in the documents. The Latent Representations reveal the semantics of the documents and so outperform analysis using the word form…
SageMaker Latent Dirichlet Allocation algorithm (LDA) is an Unsupervised Learning algorithm that groups words in a document into topics. The topics are found by a probability distribution of all the words in a document. LDA can be used to discover topics shared by documents within a text corpus. The number of topics is specified by…
Whizlab’s AWS Certified Machine Learning Specialty practice exams
Whizlab’s AWS Certified Machine Learning Specialty Practice tests are designed by experts to simulate the real exam scenario. The questions are based on the exam syllabus outlined by official documentation. These practice tests are provided to the candidates to gain more confidence in exam preparation and self-evaluate them against the exam content.
Practice test content
- Free Practice test – 15 questions
- Practice test 1 – 65 questions
- Practice test 2 – 65 questions
- Practice test 3 – 65 questions
Section test content
- Core ML Concepts – 10 questions
- Data Engineering – 11 questions
- Exploratory Data Analysis – 13 questions
- Modeling – 15 questions
- Machine Learning Implementation and Operations – 12 questions