A photograph of a woman reading a newspaper to symbolize the SageMaker text processing Neural Topic Model (NTM) Algorithm

Neural Topic Model Algorithm

The Neural Topic Model Algorithm (NTM) is used to identify topics in a corpus of documents. NTM uses statistics to group words. The groups are termed Latent Representations because they are identified via word distributions in the documents. The Latent Representations reveal the semantics of the documents and so outperform analysis using the word form alone.

NTM has the same use case as LDA however the underlying processing is very different. LDA uses a statistical approach, whereas NTM uses a deep learning neural network. This enables large groups of documents to be visualized in terms of the topics they contain. Compared to LDA, NTM may do a better job of discerning relevant topics. NTM is an Unsupervised Learning algorithm.

Attributes

Problem attributeDescription
Data types and formatText
Learning paradigm or domainTextual analysis, Unsupervised Learning
Problem typeTopic modeling
Use case examplesOrganize a set of documents into topics (not known in advance)

Training

Input data format can be recordIO-wrapperd-protobuf or CSV. A text file (vocab.txt) containing vocabulary can be provided with topic headings

Model artifacts and inference

DescriptionArtifacts
Learning paradigmUnsupervised Learning
Supporting artifactsvocab.txt (optional)
Request formatCSV
JSON
recordIO-protobuf
ResultJSON
recordIO-protobuf

Processing environment

Both CPU and GPU can be used. Recommended configuration:

  • Training: GPU
  • Inference: CPU

AWS Partner Webinar: Neural Topic Modeling on Amazon SageMaker

This is a 44 minute video by Chris Burns from AWS.

Credits

Newspaper photo by Ludovica Dri on Unsplash

Similar Posts