Sequence-to-Sequence Algorithm
SageMaker Sequence-to-Sequence algorithm is used for machine translation of languages. The algorithm takes the input sequence of tokens, for example French words, and outputs the translation as a sequence of English words. As well as translation, Sequence-to-Sequence can be used to summarize a document and convert speech to text. Sequence-to-Sequence is a Supervised Learning algorithm.
- AWS docs: https://docs.aws.amazon.com/sagemaker/latest/dg/seq-2-seq.html
- AWS blog: https://aws.amazon.com/blogs/machine-learning/create-a-word-pronunciation-sequence-to-sequence-model-using-amazon-sagemaker/
Attributes
Problem attribute | Description |
Data types and format | Text |
Learning paradigm or domain | Textual analysis, Supervised Learning |
Problem type | Machine translation |
Use case examples | Convert audio files to text, Summarize a long text corpus, Convert text from one language to other. |
Training
Sequence-to Sequence requires the input record to be in recordIO-protobuf with integer values only. Training input files:
- train.rec
- val.rec
- vocab.src.json
- vocab.trg.json
Model artifacts and inference
Description | Artifacts |
Learning paradigm | Supervised learning |
Request format | JSON, recordIO-protobuf |
Result | Same format as the choice of request format used |
Batch request format | JSON lines |
Batch result | JSON lines |
Processing environment
Sequence-to-Sequence can only use a single GPU instance, however the single instance may contain multiple GPUs.
Video: Amazon SageMaker’s Built-in Algorithm Webinar Series: Sequence2Sequence
This is a one hour video from AWS.
Credits
French and English newspapers photo by Markus Spiske on Unsplash