The Neural Topic Model Algorithm (NTM) is used to identify topics in a corpus of documents. NTM uses statistics to group words. The groups are termed Latent Representations because they are identified via word distributions in the documents. The Latent Representations reveal the semantics of the documents and so outperform analysis using the word form alone.
NTM has the same use case as LDA however the underlying processing is very different. LDA uses a statistical approach, whereas NTM uses a deep learning neural network. This enables large groups of documents to be visualized in terms of the topics they contain. Compared to LDA, NTM may do a better job of discerning relevant topics. NTM is an Unsupervised Learning algorithm.
- AWS docs: https://docs.aws.amazon.com/sagemaker/latest/dg/ntm.html
- AWS blog: https://aws.amazon.com/blogs/machine-learning/introduction-to-the-amazon-sagemaker-neural-topic-model/
|Data types and format||Text|
|Learning paradigm or domain||Textual analysis, Unsupervised Learning|
|Problem type||Topic modeling|
|Use case examples||Organize a set of documents into topics (not known in advance)|
Input data format can be recordIO-wrapperd-protobuf or CSV. A text file (vocab.txt) containing vocabulary can be provided with topic headings
Model artifacts and inference
|Learning paradigm||Unsupervised Learning|
|Supporting artifacts||vocab.txt (optional)|
Both CPU and GPU can be used. Recommended configuration:
- Training: GPU
- Inference: CPU
AWS Partner Webinar: Neural Topic Modeling on Amazon SageMaker
This is a 44 minute video by Chris Burns from AWS.