加载预训练的词嵌入 [英] load pre-trained word embeddings

查看：68 发布时间：2021/5/4 19:21:41 python encoding word2vec gensim

本文介绍了加载预训练的词嵌入的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想从Google新闻中加载经过预训练的单词嵌入

I want to load pre-trained word embeddings from google news

model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)
print (model.wv.vocab)

但是错误显示:

UnicodeEncodeError: 'ascii' codec can't encode character '\u2022' in position 62425: ordinal not in range(128)

我该如何解决?因为我想列出单词嵌入中的所有单词，然后为句子嵌入取平均值.

How do I fix this? as I want to list all the words in the word embeddings and do the average for the sentence embedding.

推荐答案

我以相同的方式加载它们，并且没有问题-我怀疑这是打印语句.不管您是在jupyter还是在终端上，您的stdout可能仅设置为ascii.为避免该问题，我建议打开一个编码如下的文件

I'm loading them the same way and don't have that problem - I suspect that it's the print statement. Probably your stdout is setup for ascii only, whether it's in jupyter or on a terminal. To avoid that problem, I'd suggest opening a file with encoding like

with open("vocab.txt", "w", encoding="utf8") as vocab_out:
    for word in model.wv.vocab:
        vocab_out.write(word + "\n")

这篇关于加载预训练的词嵌入的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

加载预训练的词嵌入 [英] load pre-trained word embeddings

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

加载预训练的词嵌入 [英] load pre-trained word embeddings

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭