如何计算 Word2Vec 训练模型中的词频? [英] How can I count word frequencies in Word2Vec's training model?

查看：54 发布时间：2022/1/2 17:38:07 python word2vec word-embedding word-frequency natural-language-processing

本文介绍了如何计算 Word2Vec 训练模型中的词频?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要统计word2vec的训练模型中每个词的出现频率.我希望输出如下所示:

I need to count the frequency of each word in word2vec's training model. I want to have output that looks like this:

term    count
apple   123004
country 4432180
runs    620102
...

可以这样做吗?我如何从 word2vec 中获取这些数据?

Is it possible to do that? How would I get that data out of word2vec?

推荐答案

你使用的是哪个 word2vec 实现?

Which word2vec implementation are you using?

在流行的gensim 库中，在Word2Vec 模型建立其词汇表后(通过进行完整训练，或在build_vocab()> 已被调用)，模型的 wv 属性包含一个 KeyedVectors 类型的对象，它作为属性 vocab 是 的一个字典>Vocab 类型的对象，它有一个 count 词在扫描语料库中的频率属性.

In the popular gensim library, after a Word2Vec model has its vocabulary established (either by doing its full training, or after build_vocab() has been called), the model's wv property contains a KeyedVectors-type object, which as a property vocab which is a dict of Vocab-type objects, which have a count property of the word's frequency in the scanned corpus.

所以你可以粗略地得到你想要的东西:

So you could get roughly what you seek with something like:

w2v_model = Word2Vec(your_corpus, ...)
for word in w2v_model.wv.vocab:
    print((word, w2v_model.wv.vocab[word].count))

简单的词向量集(例如通过 gensim 的 load_word2vec_format() 方法加载的那些)不会有准确的计数，但按照惯例通常在内部从最频繁到最不频繁的顺序.

Plain sets of word-vectors (such as those loaded via gensim's load_word2vec_format() method) won't have accurate counts, but are by convention usually internally ordered from most-frequent to least-frequent.

这篇关于如何计算 Word2Vec 训练模型中的词频?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何计算 Word2Vec 训练模型中的词频? [英] How can I count word frequencies in Word2Vec's training model?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何计算 Word2Vec 训练模型中的词频? [英] How can I count word frequencies in Word2Vec&#39;s training model?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

如何计算 Word2Vec 训练模型中的词频? [英] How can I count word frequencies in Word2Vec's training model?

登录关闭