gensim most_like具有正负两面,它是如何工作的? [英] gensim most_similar with positive and negative, how does it work?

查看:56
本文介绍了gensim most_like具有正负两面,它是如何工作的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读这个答案关于Gensim的 most_like :

I was reading this answer That says about Gensim most_similar:

它执行向量算术:将正向量相加,减去负数,然后从结果位置减去清单最接近该角度的已知向量.

it performs vector arithmetic: adding the positive vectors, subtracting the negative, then from that resulting position, listing the known-vectors closest to that angle.

但是当我测试它时,情况并非如此.我使用Gensim "text8" 数据集训练了一个Word2Vec,并测试了这两个:

But when I tested it, that is not the case. I trained a Word2Vec with Gensim "text8" dataset and tested these two:

model.most_similar(positive=['woman', 'king'], negative=['man'])

>>> [('queen', 0.7131118178367615), ('prince', 0.6359186768531799),...]


model.wv.most_similar([model["king"] + model["woman"] - model["man"]])

>>> [('king', 0.84305739402771), ('queen', 0.7326322793960571),...]

它们显然不同.甚至第一个的女王分数是 0.713 ,第二个 0.732 都不一样.

They are clearly not the same. even the queen score in the first is 0.713 and on the second 0.732 which are not the same.

所以,我再问一个问题,Gensim most_like 是如何工作的?为什么上面两个的结果不同?

So I ask the question again, How does Gensim most_similar work? why the result of the two above are different?

推荐答案

加法和减法并不是 all ;有关确切的描述,您应该查看源代码:

The adding and subtracting isn't all that it does; for an exact description, you should look at the source code:

https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/keyedvectors.py#LC690:~:text=def%20most_like,self%2C

您会看到,通过 get_vector(key,use_norm = True)访问器,每个矢量的 unit-normed 版本上都进行了加减.

You'll see there that the addition and subtraction is on the unit-normed version of each vector, via the get_vector(key, use_norm=True) accessor.

如果将 model [key] 的使用更改为 model.get_vector(key,use_norm = True),您应该会看到对目标向量的结果与使该方法组合向量的结果相同.

If you change your use of model[key] to model.get_vector(key, use_norm=True), you should see your outside-the-method calculation of the target vector give the same results as letting the method combine the positive and negative vectors.

这篇关于gensim most_like具有正负两面,它是如何工作的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆