Two news papers, one in French and one in English to symbolize the SageMaker text processing algorithm Sequence-to-sequence which performs machione translation of languages

Sequence-to-Sequence Algorithm

SageMaker Sequence-to-Sequence algorithm is used for machine translation of languages. The algorithm takes the input sequence of tokens, for example French words, and outputs the translation as a sequence of English words. As well as translation, Sequence-to-Sequence can be used to summarize a document and convert speech to text. Sequence-to-Sequence is a Supervised Learning algorithm.

Attributes

Problem attributeDescription
Data types and formatText
Learning paradigm or domainTextual analysis, Supervised Learning
Problem typeMachine translation
Use case examplesConvert audio files to text, Summarize a long text corpus, Convert text from one language to other.

Training

Sequence-to Sequence requires the input record to be in recordIO-protobuf with integer values only. Training input files:

  • train.rec
  • val.rec
  • vocab.src.json
  • vocab.trg.json

Model artifacts and inference

DescriptionArtifacts
Learning paradigmSupervised learning
Request formatJSON, recordIO-protobuf
ResultSame format as the choice of request format used
Batch request formatJSON lines
Batch resultJSON lines

Processing environment

Sequence-to-Sequence can only use a single GPU instance, however the single instance may contain multiple GPUs.

Video: Amazon SageMaker’s Built-in Algorithm Webinar Series: Sequence2Sequence

This is a one hour video from AWS.

Credits

French and English newspapers photo by Markus Spiske on Unsplash

Similar Posts