使用numba进行向量与矩阵中行之间的余弦相似度 [英] Using numba for cosine similarity between a vector and rows in a matix

查看:173
本文介绍了使用numba进行向量与矩阵中行之间的余弦相似度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用numba快速找到余弦相似度,找到了 gist .

Found this gist using numba for fast computation of cosine similarity.

import numba

@numba.jit(target='cpu', nopython=True)
def fast_cosine(u, v):
    m = u.shape[0]
    udotv = 0
    u_norm = 0
    v_norm = 0
    for i in range(m):
        if (np.isnan(u[i])) or (np.isnan(v[i])):
            continue

        udotv += u[i] * v[i]
        u_norm += u[i] * u[i]
        v_norm += v[i] * v[i]

    u_norm = np.sqrt(u_norm)
    v_norm = np.sqrt(v_norm)

    if (u_norm == 0) or (v_norm == 0):
        ratio = 1.0
    else:
        ratio = udotv / (u_norm * v_norm)
    return ratio

结果看起来很有希望(在我的机器中不使用jit装饰器的情况下,结果为500ns与只有200us).

Results look promising (500ns vs. only 200us without jit decorator in my machine).

我想使用numba在向量u和候选矩阵M之间(即每行的余弦值)并行化此计算.

I would like to use numba to parallelize this computation between a vector u and a candidate matrix M -- i.e. cosine across each row.

示例:

def fast_cosine_matrix(u, M):
    """
    Return array of cosine similarity between u and rows in M
    >>> import numpy as np
    >>> u = np.random.rand(100)
    >>> M = np.random.rand(10, 100)
    >>> fast_cosine_matrix(u, M)
    """

一种方法是只用第二个输入重写一个矩阵.但是,如果我尝试遍历矩阵的行,则会得到一个NotImplementedError.尝试仅使用切片.

One way is to just rewrite with second input a matrix. But I get a NotImplementedError if I try to iterate over the rows of a matrix. Going to try just using slices.

我曾考虑使用vectorize,但无法正常工作.

I thought about using vectorize but I can't get it to work.

推荐答案

替代方法:用numba生成通用UFunc

Alternative: make a Generalized UFunc with numba

@numba.guvectorize(["void(float64[:], float64[:], float64[:])"], "(n),(n)->()", target='parallel')
def fast_cosine_gufunc(u, v, result):
    m = u.shape[0]
    udotv = 0
    u_norm = 0
    v_norm = 0
    for i in range(m):
        if (np.isnan(u[i])) or (np.isnan(v[i])):
            continue

        udotv += u[i] * v[i]
        u_norm += u[i] * u[i]
        v_norm += v[i] * v[i]

    u_norm = np.sqrt(u_norm)
    v_norm = np.sqrt(v_norm)

    if (u_norm == 0) or (v_norm == 0):
        ratio = 1.0
    else:
        ratio = udotv / (u_norm * v_norm)
    result[:] = ratio


u = np.random.rand(100)
M = np.random.rand(100000, 100)

fast_cosine_gufunc(u, M[0,:])
fast_cosine_gufunc(u, M)

这篇关于使用numba进行向量与矩阵中行之间的余弦相似度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆