The AWS SageMaker ntm_20newsgroups_topic_model example notebook is a simple to follow introduction to SageMaker’s pre-packaged Natural Language Processing (NLP) tools. The notebook demonstrates how to use the Neural Topic Model (NTM) algorithm to extract a set of topics from a sample usenet newsgroups dataset and visualize as word clouds. It also contains code demonstrating how to set up an endpoint to interact with the NTM model.
SageMaker deploys algorithms like NTM configured to accept hyperparameters and data through standard APIs in containers hosted in the AWS Elastic Container Registry (ECR). In this case, the Estimator API is used to invoke an instance of the NTM algorithm’s container. The Estimator instance is used to set hyperparameters and run training jobs.
While working through the code with other datasets, I experimented with breaking up the notebook and re-attaching to existing training jobs, which are easy to look up in the SageMaker dashboard.
The Estimator.attach function is simple to use once you have the training job name, and may also re-use existing SageMaker session objects. The training log is also availble from the Estimator instance created by Estimator.attach.