用于计算平方欧几里德距离可能的优化 [英] Possible optimizations for calculating squared euclidean distance

查看:669
本文介绍了用于计算平方欧几里德距离可能的优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要每天做几百万元欧氏距离计算的Python项目。

I need to do a few hundred million euclidean distance calculations every day in a Python project.

下面就是我开始了:

def euclidean_dist_square(x, y):
    diff = np.array(x) - np.array(y)
    return np.dot(diff, diff)

这是相当快,我已经放弃了开方计算,因为我需要为项目排序只(最近邻搜索)。它仍然是脚本的瓶颈虽然。因此,我写了一个C扩展,从而计算出距离。计算总是与128维向量进行。

This is quite fast and I already dropped the sqrt calculation since I need to rank items only (nearest-neighbor search). It is still the bottleneck of the script though. Therefore I have written a C extension, which calculates the distance. The calculation is always done with 128-dimensional vectors.

#include "euclidean.h"
#include <math.h>

double euclidean(double x[128], double y[128])
{
    double Sum;
    for(int i=0;i<128;i++)
    {
        Sum = Sum + pow((x[i]-y[i]),2.0);
    }
    return Sum;
}

完成code为扩展名是在这里: https://gist.github.com/herrbuerger / bd63b73f3c5cf1cd51de

现在这给了一个很好的增速相比于numpy的版本。

Now this gives a nice speedup in comparison to the numpy version.

但有什么办法,以进一步加快这(这是我的第一个C扩展有史以来所以我假设有)?随着次数此功能用于每一天,每一微秒实际上提供的好处。

But is there any way to speed this up further (this is my first C extension ever so I assume there is)? With the number of times this function is used every day, every microsecond would actually provide a benefit.

你们当中有些人可能会建议用Python完全移植这为另一种语言,可惜这是一个较大的项目,而不是一个选项:(

Some of you might suggest porting this completely from Python to another language, unfortunately this is a larger project and not an option :(

感谢。

修改

我已经张贴在codeReview这个问题:<一href=\"http://$c$creview.stackexchange.com/questions/52218/possible-optimizations-for-calculating-squared-euclidean-distance\">http://$c$creview.stackexchange.com/questions/52218/possible-optimizations-for-calculating-squared-euclidean-distance

I have posted this question on CodeReview: http://codereview.stackexchange.com/questions/52218/possible-optimizations-for-calculating-squared-euclidean-distance

我将在万一有人已经开始写一个答案一小时删除这个问题。

I will delete this question in an hour in case someone has started to write an answer.

推荐答案

来计算numpy的欧氏距离,我知道是<一个最快的方法href=\"http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.euclidean_distances.html\"相对=nofollow>一个在scikit学习,可作为

The fastest way to compute Euclidean distances in NumPy that I know is the one in scikit-learn, which can be summed up as

def squared_distances(X, Y):
    """Return a distance matrix for each pair of rows i, j in X, Y."""
    # http://stackoverflow.com/a/19094808/166749
    X_row_norms = np.einsum('ij,ij->i', X, X)
    Y_row_norms = np.einsum('ij,ij->i', Y, Y)
    distances = np.dot(X, Y)
    distances *= -2
    distances += X_row_norms
    distances += Y_row_norms

    np.maximum(distances, 0, distances)  # get rid of negatives; optional
    return distances

在这片code的瓶颈是矩阵乘法( np.dot ),所以请确保您numpy的是对一个良好的BLAS实现链接;与多核机器和足够大的输入矩阵上的多线程BLAS,应该比任何你可以更快的在C.注意掀起它依赖于二项式公式

The bottleneck in this piece of code is matrix multiplication (np.dot), so make sure your NumPy is linked against a good BLAS implementation; with a multithreaded BLAS on a multicore machine and large enough input matrices, it should be faster than anything you can whip up in C. Note that it relies on the binomial formula

||x - y||² = ||x||² + ||y||² - 2 x⋅y    

和,要么 X_row_norms Y_row_norms 可以在调用缓存为K-NN的用例。

and that either X_row_norms or Y_row_norms can be cached across invocations for the k-NN use case.

(我是这个code的合着者,我花了相当长的一段时间来优化它和SciPy的实施; scikit学习是在一定的准确性牺牲速度快,但对于k-NN说不该 ŧ事情太多了。在SciPy的实施, scipy.spatial.distance 可用,实际上是code你只是写和更准确的优化版。)

(I'm a coauthor of this code, and I spent quite some time optimizing both it and the SciPy implementation; scikit-learn is faster at the expense of some accuracy, but for k-NN that shouldn't matter too much. The SciPy implementation, available in scipy.spatial.distance, is actually an optimized version of the code you just wrote and is more accurate.)

这篇关于用于计算平方欧几里德距离可能的优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆