不提高并行化功能表现 [英] Improve performance of function without parallelization

查看:85
本文介绍了不提高并行化功能表现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

几周前我张贴的问题(<一个href=\"http://stackoverflow.com/questions/21269833/speed-up-nested-for-loop-with-elements-exponentiation\">Speed嵌套起来与元素幂环路),它通过了 abarnert 一个很好的答案。这个问题涉及到一个,因为它使用的性能改进所述用户建议。

Some weeks ago I posted a question (Speed up nested for loop with elements exponentiation) which got a very good answer by abarnert. This question is related to that one since it makes use of the performance improvements suggested by said user.

我需要改进,涉及计算三个因素,然后应用指数对他们的功能的性能

I need to improve the performance of a function that involves calculating three factors and then applying an exponential on them.

下面是一个 MWE 我的code的:

Here's a MWE of my code:

import numpy as np
import timeit

def random_data(N):
    # Generate some random data.
    return np.random.uniform(0., 10., N)

# Data lists.
array1 = np.array([random_data(4) for _ in range(1000)])
array2 = np.array([random_data(3) for _ in range(2000)])

# Function.
def func():
    # Empty list that holds all values obtained in for loop.    
    lst = []
    for elem in array1:
        # Avoid numeric errors if one of these values is 0.            
        e_1, e_2 = max(elem[0], 1e-10), max(elem[1], 1e-10)
        # Obtain three parameters.
        A = 1./(e_1*e_2)
        B = -0.5*((elem[2]-array2[:,0])/e_1)**2
        C = -0.5*((elem[3]-array2[:,1])/e_2)**2
        # Apply exponential.
        value = A*np.exp(B+C)
        # Store value in list.
        lst.append(value)

    return lst

# time function.
func_time = timeit.timeit(func, number=100)
print func_time

是否有可能加快 FUNC 无需recurr到并行?

Is it possible to speed up func without having to recurr to parallelization?

推荐答案

下面是我到目前为止所。我的方法是跨numpy的阵列做大量数学尽可能的。

Here's what I have so far. My approach is to do as much of the math as possible across numpy arrays.

优化:


  • 计算 A 取值numpy的内

  • B C 重系数计算他们分成因素的影响,其中一些可以在计算numpy的

  • Calculate As within numpy
  • Re-factor calculation of B and C by splitting them into factors, some of which can be computed within numpy

code:

def optfunc():
    e0 = array1[:, 0]
    e1 = array1[:, 1]
    e2 = array1[:, 2]
    e3 = array1[:, 3]

    ar0 = array2[:, 0]
    ar1 = array2[:, 1]

    As = 1./(e0 * e1)
    Bfactors = -0.5 * (1 / e0**2)
    Cfactors = -0.5 * (1 / e1**2)

    lst = []
    for i, elem in enumerate(array1):
        B = ((elem[2] - ar0) ** 2) * Bfactors[i]
        C = ((elem[3] - ar1) ** 2) * Cfactors[i]

        value = As[i]*np.exp(B+C)

        lst.append(value)

    return lst

print np.allclose(optfunc(), func())

# time function.
func_time = timeit.timeit(func, number=10)
opt_func_time = timeit.timeit(optfunc, number=10)
print "%.3fs --> %.3fs" % (func_time, opt_func_time)

结果:

True
0.759s --> 0.485s


在这一点上我卡住了。我设法完全做没有蟒蛇的循环,但它比上面的版本是有原因的我还不知道要慢:


At this point I'm stuck. I managed to do it entirely without python for loops, but it is slower than the above version for a reason I do not yet understand:

def optfunc():
    x = array1
    y = array2

    x0 = x[:, 0]
    x1 = x[:, 1]
    x2 = x[:, 2]
    x3 = x[:, 3]

    y0 = y[:, 0]
    y1 = y[:, 1]

    A = 1./(x0 * x1)
    Bfactors = -0.5 * (1 / x0**2)
    Cfactors = -0.5 * (1 / x1**2)

    B = (np.transpose([x2]) - y0)**2 * np.transpose([Bfactors])
    C = (np.transpose([x3]) - y1)**2 * np.transpose([Cfactors])

    return np.transpose([A]) * np.exp(B + C)

结果:

True
0.780s --> 0.558s

不过请注意,后者让你的 np.array ,而前者只让你一个Python列表...这可能解释这个差异,但我不肯定的。

However note that the latter gets you an np.array whereas the former only gets you a Python list... this might account for the difference but I'm not sure.

这篇关于不提高并行化功能表现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆