Sagemaker LDA主题模型-如何访问已训练模型的参数?还有一种简单的方法来捕获连贯性 [英] Sagemaker LDA topic model - how to access the params of the trained model? Also is there a simple way to capture coherence

查看:126
本文介绍了Sagemaker LDA主题模型-如何访问已训练模型的参数?还有一种简单的方法来捕获连贯性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Sagemaker的新手,并且正在运行一些测试,以比较NMD和LDA在AWS上与LDA槌和本机Gensim LDA模型相比的性能.

I'm new to Sagemaker and am running some tests to measure the performance of NTM and LDA on AWS compared with LDA mallet and native Gensim LDA model.

我想检查Sagemaker上训练有素的模型,并研究诸如单词对每个主题贡献最大的东西.并获得模型一致性的度量.

I'm wanting to inspect the trained models on Sagemaker and look at stuff like what words have the highest contribution for each topic. And also to get a measure of model coherence.

通过下载输出文件解压缩并解压以显示3个文件params,symbol.json和meta.json,我已经能够成功地在Sagemaker上成功获得对NTM的每个主题贡献最大的单词.

I have been able to successfully get what words have the highest contribution for each topic for NTM on Sagemaker by downloading the output file untarring it and unzipping to expose 3 files params, symbol.json and meta.json.

但是,当我尝试对LDA执行相同的过程时,未解压缩的输出文件无法解压缩.

However, when I try to do the same process for LDA, the untarred output file cannot be unzipped.

与NTM相比,也许我缺少LDA或应该对LDA做一些不同的事情,但是我无法找到有关此文件的任何文档.还有,有人找到一种计算模型一致性的简单方法吗?

Maybe I'm missing something or should do something different for LDA compared with NTM but I have not been able to find any documentation on this. Also, anyone found a simple way to calculate model coherence?

任何帮助将不胜感激!

推荐答案

的部分中找到说明.为了方便起见,我将在此处复制相关代码:

This SageMaker notebook, which dives into the scientific details of LDA, also demonstrates how to inspect the model artifacts. Specifically, how to obtain the estimates for the Dirichlet prior alpha and the topic-word distribution matrix beta. You can find the instructions in the section titled "Inspecting the Trained Model". For convenience, I will reproduce the relevant code here:

import tarfile
import mxnet as mx

# extract the tarball
tarflie_fname = FILENAME_PREFIX + 'model.tar.gz' # wherever the tarball is located
with tarfile.open(tarfile_fname) as tar:
    tar.extractall()

# obtain the model file (should be the only file starting with "model_")
model_list = [
    fname
    for fname in os.listdir(FILENAME_PREFIX)
    if fname.startswith('model_')
]
model_fname = model_list[0]

# load the contents of the model file into MXNet arrays
alpha, beta = mx.ndarray.load(model_fname)

那应该为您获取模型数据.请注意,存储为beta行的主题不会以任何特定顺序显示.

That should get you the model data. Note that the topics, which are stored as rows of beta, are not presented in any particular order.

这篇关于Sagemaker LDA主题模型-如何访问已训练模型的参数?还有一种简单的方法来捕获连贯性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆