有关如何加快距离计算的建议 [英] Suggestions on how to speed up a distance calculation

查看：77 发布时间：2020/7/4 22:07:50 python performance python-c-api

本文介绍了有关如何加快距离计算的建议的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

请考虑以下课程:

class SquareErrorDistance(object):
    def __init__(self, dataSample):
        variance = var(list(dataSample))
        if variance == 0:
            self._norm = 1.0
        else:
            self._norm = 1.0 / (2 * variance)

    def __call__(self, u, v): # u and v are floats
        return (u - v) ** 2 * self._norm

我用它来计算向量的两个元素之间的距离.我基本上为使用该距离度量的矢量的每个维度(该维度使用其他距离度量)创建该类的一个实例.分析显示，此类的__call__函数占了我的knn实现(可能会想到的)的运行时间的90％.我不认为有任何纯Python的方法可以加快速度，但是如果我用C实现它呢?

I use it to calculate the distance between two elements of a vector. I basically create one instance of that class for every dimension of the vector that uses this distance measure (there are dimensions that use other distance measures). Profiling reveals that the __call__ function of this class accounts for 90% of the running-time of my knn-implementation (who would have thought). I do not think there is any pure-Python way to speed this up, but maybe if I implement it in C?

如果我运行一个简单的C程序，该程序使用上述公式为随机值计算距离，则它比Python快几个数量级.因此，我尝试使用 ctypes 并调用执行计算但显然是转换的C函数参数和返回值非常昂贵，因为生成的代码要慢得多.

If I run a simple C program that just calculates distances for random values using the formula above, it is orders of magnitude faster than Python. So I tried using ctypes and call a C function that does the computation, but apparently the conversion of the parameters and return-values is far to expensive, because the resulting code is much slower.

我当然可以在C中实现整个knn并调用它，但是问题是，正如我所描述的，我对向量的某些维度使用了不同的距离函数，将它们转换为C会花费很多工作

I could of course implement the entire knn in C and just call that, but the problem is that, like I described, I use different distance functions for some dimension of the vectors, and translating these to C would be too much work.

那我有什么选择?使用 Python C-API 编写C函数是否可以消除开销?还有其他方法可以加快计算速度吗?

So what are my alternatives? Will writing the C-function using the Python C-API get rid of the overhead? Are there any other ways to speed this calculation up?

推荐答案

以下cython代码(我意识到__init__的第一行是不同的，我用随机的东西替换了它，因为我不知道var并且因为无论如何都没关系-您说__call__是瓶颈):

The following cython code (I realize the first line of __init__ is different, I replaced it with random stuff because I don't know var and because it doesn't matter anyway - you stated __call__ is the bottleneck):

cdef class SquareErrorDistance:
    cdef double _norm

    def __init__(self, dataSample):
        variance = round(sum(dataSample)/len(dataSample))
        if variance == 0:
            self._norm = 1.0
        else:
            self._norm = 1.0 / (2 * variance)

    def __call__(self, double u, double v): # u and v are floats
        return (u - v) ** 2 * self._norm

通过简单的setup.py(只是文档(文件名已更改)，它在一个简单的timeit基准测试中的性能比同等的纯python好近20倍.请注意，对于_norm字段和__call__参数，唯一的更改是cdef.我认为这令人印象深刻.

Compiled via a simple setup.py (just the example from the docs with the file name altered), it performs nearly 20 times better than the equivalent pure python in a simple contrieved timeit benchmark. Note that the only changed were cdefs for the _norm field and the __call__ parameters. I consider this pretty impressive.

这篇关于有关如何加快距离计算的建议的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

有关如何加快距离计算的建议 [英] Suggestions on how to speed up a distance calculation

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

有关如何加快距离计算的建议 [英] Suggestions on how to speed up a distance calculation

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭