如何从gensim word2vec获取词汇量? [英] How to get vocabulary word count from gensim word2vec?

查看:484
本文介绍了如何从gensim word2vec获取词汇量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在python中使用gensim word2vec软件包.我知道如何从训练有素的模型中获取词汇.但是如何获取词汇表中每个单词的单词计数呢?

I am using gensim word2vec package in python. I know how to get the vocabulary from the trained model. But how to get the word count for each word in vocabulary?

推荐答案

词汇表中的每个单词都有一个关联的词汇表对象,其中包含索引和计数.

Each word in the vocabulary has an associated vocabulary object, which contains an index and a count.

vocab_obj = w2v.vocab["word"]
vocab_obj.count

google news w2v模型的输出:2998437

Output for google news w2v model: 2998437

因此,要获取每个单词的计数,您将遍历词汇表中的所有单词和vocab对象.

So to get the count for each word, you would iterate over all words and vocab objects in the vocabulary.

for word, vocab_obj in w2v.vocab.items():
  #Do something with vocab_obj.count

这篇关于如何从gensim word2vec获取词汇量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆