给定余弦相似度创建随机向量 [英] Create random vector given cosine similarity

查看:90
本文介绍了给定余弦相似度创建随机向量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基本上给定一些向量v,我想获得另一个在向量v和w之间具有余弦相似性的随机向量w.有什么办法可以在python中获得它?

Basically given some vector v, I want to get another random vector w with some cosine similarity between v and w. Is there any way we can get this in python?

示例:为简单起见,我将具有v [3,-4]的2D向量.我想获得余弦相似度为60%或加0.6的随机向量w.这将生成具有[0.875,3]值的向量w或具有相同余弦相似度的任何其他向量.所以我希望这足够清楚.

Example: for simplicity I will have 2D vector of v [3,-4]. I want to get random vector w with cosine similarity of 60% or plus 0.6. This should generate vector w with values [0.875, 3] or any other vector with same cosine similarity. So I hope this is clear enough.

推荐答案

给出向量v和余弦相似度costheta(标量在-1和1之间),按照函数rand_cos_sim(v, costheta)中的方式计算w :

Given the vector v and cosine similarity costheta (a scalar between -1 and 1), compute w as in the function rand_cos_sim(v, costheta):

import numpy as np


def rand_cos_sim(v, costheta):
    # Form the unit vector parallel to v:
    u = v / np.linalg.norm(v)

    # Pick a random vector:
    r = np.random.multivariate_normal(np.zeros_like(v), np.eye(len(v)))

    # Form a vector perpendicular to v:
    uperp = r - r.dot(u)*u

    # Make it a unit vector:
    uperp = uperp / np.linalg.norm(uperp)

    # w is the linear combination of u and uperp with coefficients costheta
    # and sin(theta) = sqrt(1 - costheta**2), respectively:
    w = costheta*u + np.sqrt(1 - costheta**2)*uperp

    return w

例如,

In [17]: v = np.array([3, -4])

In [18]: w = rand_cos_sim(v, 0.6)

In [19]: w
Out[19]: array([-0.28, -0.96])

验证余弦相似度:

In [20]: v.dot(w)/(np.linalg.norm(v)*np.linalg.norm(w))
Out[20]: 0.6000000000000015

In [21]: w = rand_cos_sim(v, 0.6)

In [22]: w
Out[22]: array([1., 0.])

In [23]: v.dot(w)/(np.linalg.norm(v)*np.linalg.norm(w))
Out[23]: 0.6

返回值始终为1,因此在上面的示例中,只有两个可能的随机向量[1,0]和[-0.28,-0.96].

The return value always has magnitude 1, so in the above example, there are only two possible random vectors, [1, 0] and [-0.28, -0.96].

另一个示例,这是3-d中的一个:

Another example, this one in 3-d:

In [24]: v = np.array([3, -4, 6])

In [25]: w = rand_cos_sim(v, -0.75)

In [26]: w
Out[26]: array([ 0.3194265 ,  0.46814873, -0.82389531])

In [27]: v.dot(w)/(np.linalg.norm(v)*np.linalg.norm(w))
Out[27]: -0.75

In [28]: w = rand_cos_sim(v, -0.75)

In [29]: w
Out[29]: array([-0.48830063,  0.85783797, -0.16023891])

In [30]: v.dot(w)/(np.linalg.norm(v)*np.linalg.norm(w))
Out[30]: -0.75

这篇关于给定余弦相似度创建随机向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆