如何将Pandas DataFrame中加载的嵌入转换为Gensim模型? [英] How to turn embeddings loaded in a Pandas DataFrame into a Gensim model?

查看:209
本文介绍了如何将Pandas DataFrame中加载的嵌入转换为Gensim模型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个DataFrame,其中的索引是单词,并且我有100列具有浮点数,因此对于每个单词,我都将其嵌入为100d向量.我想将DataFrame对象转换为 gensim模型对象,以便我可以使用其方法;特别是gensim.models.keyedvectors.most_similar(),以便我可以在子集中搜索相似的单词.

I have a DataFrame in which the index are words and I have 100 columns with float number such that for each word I have its embedding as a 100d vector. I would like to convert my DataFrame object into a gensim model object so that I can use its methods; specially gensim.models.keyedvectors.most_similar() so that I can search for similar words within my subset.

哪种方法更可取?

谢谢

推荐答案

不确定这样做的首选"方式是什么,但是gensim期望的格式很容易复制:

Not sure what the "preferred" way of doing this is, but the format gensim expects is pretty easy to replicate:

data = pd.DataFrame([[0.15941701, 0.84058299],
                     [0.12190033, 0.87809967],
                     [0.06293788, 0.93706212]],
                    index=["these", "be", "words"])

np.savetxt('test.txt', data.reset_index().values, 
           delimiter=" ", 
           header="{} {}".format(len(data), len(data.columns)),
           comments="",
           fmt=["%s"] + ["%.18e"]*len(data.columns))

标题是2个空格分隔的整数,词汇中的单词数和单词向量的长度.每行的第一列是单词本身.其余的列是单词向量的元素. fmt的怪异之处在于,第一个元素的格式设置为字符串,其余元素的格式设置为浮点数.

The header is 2 space separated integers, the number of words in the vocabulary and the length of the word vector. The first column of each row is the word itself. The rest of the columns are the elements of the word vector. The fmt weirdness is to have the first element formatted as a string, and the rest formatted as a float.

然后可以将其加载到gensim中并执行任何操作:

Then can load this in gensim and do whatever:

import gensim

from gensim.models.keyedvectors import KeyedVectors
word_vectors = KeyedVectors.load_word2vec_format('test.txt', binary=False)

word_vectors.similarity('these', 'words')

这篇关于如何将Pandas DataFrame中加载的嵌入转换为Gensim模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆