如何在Word2Vec的训练模型中计算单词频率? [英] How can I count word frequencies in Word2Vec's training model?

查看：405 发布时间：2020/5/18 1:07:47 python word2vec word-embedding word-frequency natural-language-processing

本文介绍了如何在Word2Vec的训练模型中计算单词频率?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要计算word2vec的训练模型中每个单词的频率.我想要的输出看起来像这样:

I need to count the frequency of each word in word2vec's training model. I want to have output that looks like this:

term    count
apple   123004
country 4432180
runs    620102
...

有可能这样做吗?我如何从word2vec中获取这些数据?

Is it possible to do that? How would I get that data out of word2vec?

推荐答案

您正在使用哪个word2vec实现?

Which word2vec implementation are you using?

在流行的gensim库中，在建立Word2Vec模型的词汇表后(通过进行全面训练或调用build_vocab()之后)，模型的wv属性包含KeyedVectors类型对象，它是属性vocab，它是Vocab类型对象的字典，在扫描的语料库中具有单词频率的count属性.

In the popular gensim library, after a Word2Vec model has its vocabulary established (either by doing its full training, or after build_vocab() has been called), the model's wv property contains a KeyedVectors-type object, which as a property vocab which is a dict of Vocab-type objects, which have a count property of the word's frequency in the scanned corpus.

因此，您可以通过以下方式大致找到您想要的东西:

So you could get roughly what you seek with something like:

w2v_model = Word2Vec(your_corpus, ...)
for word in w2v_model.wv.vocab:
    print((word, w2v_model.wv.vocab[word].count))

普通的词向量集(例如通过gensim的load_word2vec_format()方法加载的词向量)将不具有准确的计数，但是按照惯例，通常在内部将其频率从最高频率到最低频率排序.

Plain sets of word-vectors (such as those loaded via gensim's load_word2vec_format() method) won't have accurate counts, but are by convention usually internally ordered from most-frequent to least-frequent.

这篇关于如何在Word2Vec的训练模型中计算单词频率?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在Word2Vec的训练模型中计算单词频率? [英] How can I count word frequencies in Word2Vec's training model?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在Word2Vec的训练模型中计算单词频率? [英] How can I count word frequencies in Word2Vec&#39;s training model?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

如何在Word2Vec的训练模型中计算单词频率? [英] How can I count word frequencies in Word2Vec's training model?

登录关闭