numpy数组上的有效循环 [英] efficient loop over numpy array

查看:130
本文介绍了numpy数组上的有效循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

已经问过这个问题的版本,但是我没有找到满意的答案.

Versions of this question have already been asked but I have not found a satisfactory answer.

问题:给定一个较大的Numpy向量,找到重复的向量元素的索引(其变化可以与公差比较).

Problem: given a large numpy vector, find indices of the vector elements which are duplicated (a variation of that could be comparison with tolerance).

所以问题出在〜O(N ^ 2)和内存绑定(至少从当前算法的角度来看).我不知道为什么我尝试使用Python比同等的C代码慢100倍或更慢.

So the problem is ~O(N^2) and memory bound (at least from the current algorithm point of view). I wonder why whatever I tried Python is 100x or more slower than an equivalent C code.

import numpy as np
N = 10000
vect = np.arange(float(N))
vect[N/2] = 1
vect[N/4] = 1
dupl = []
print("init done")
counter = 0
for i in range(N):
    for j in range(i+1, N):
        if vect[i] == vect[j]:
            dupl.append(j)
            counter += 1

print("counter =", counter)
print(dupl)
# For simplicity, this code ignores repeated indices 
# which can be trimmed later. Ref output is
# counter = 3
# [2500, 5000, 5000]

我尝试使用numpy迭代器,但它们甚至更糟(〜x4-5) http://docs.scipy.org/doc/numpy/reference/arrays .nditer.html

I tried using numpy iterators but they are even worse (~ x4-5) http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html

使用N = 10,000,我在C语言中获得0.1秒,在Python中(上面的代码)获得12秒,在使用np.nditer的Python中获得40秒,在使用np.ndindex的Python中获得50秒.我将其推到N = 160,000,并且时间尺度按预期扩展为N ^ 2.

Using N=10,000 I'm getting 0.1 sec in C, 12 sec in Python (code above), 40 sec in Python using np.nditer, 50 sec in Python using np.ndindex. I pushed it to N=160,000 and the timing scales as N^2 as expected.

推荐答案

由于答案已经停止,并且没有一个令人满意的记录,我记录了我自己的解决方案.

Since the answers have stopped coming and none was totally satisfactory, for the record I post my own solution.

据我了解,在这种情况下,是让Python变慢的是赋值,而不是我最初想到的嵌套循环.使用库或编译后的代码消除了分配的需要,并且性能得到了极大的改善.

It is my understanding that it's the assignment which makes Python slow in this case, not the nested loops as I thought initially. Using a library or compiled code eliminates the need for assignments and performance improves dramatically.

from __future__ import print_function
import numpy as np
from numba import jit

N = 10000
vect = np.arange(N, dtype=np.float32)

vect[N/2] = 1
vect[N/4] = 1
dupl = np.zeros(N, dtype=np.int32)

print("init done")
# uncomment to enable compiled function
#@jit
def duplicates(i, counter, dupl, vect):
    eps = 0.01
    ns = len(vect)
    for j in range(i+1, ns):
        # replace if to use approx comparison
        #if abs(vect[i] - vect[j]) < eps:
        if vect[i] == vect[j]:
            dupl[counter] = j
            counter += 1
    return counter

counter = 0
for i in xrange(N):
    counter = duplicates(i, counter, dupl, vect)

print("counter =", counter)
print(dupl[0:counter])

测试

# no jit
$ time python array-test-numba.py
init done
counter = 3
[2500 5000 5000]

elapsed 10.135 s

# with jit
$ time python array-test-numba.py
init done
counter = 3
[2500 5000 5000]

elapsed 0.480 s

编译版本(不带@jit注释)的性能接近C代码性能〜0.1-0.2秒.也许消除最后一个循环可以进一步提高性能.当使用eps进行近似比较时,性能差异甚至更大,而编译版本的差异很小.

The performance of compiled version (with @jit uncommented) is close to C code performance ~0.1 - 0.2 sec. Perhaps eliminating the last loop could improve the performance even further. The difference in performance is even stronger when using approximate comparison using eps while there is very little difference for the compiled version.

# no jit
$ time python array-test-numba.py
init done
counter = 3
[2500 5000 5000]

elapsed 109.218 s

# with jit
$ time python array-test-numba.py
init done
counter = 3
[2500 5000 5000]

elapsed 0.506 s

这是〜200倍的差异.在真实的代码中,我不得不将两个循环都放入函数中,并且必须使用具有可变类型的函数模板,因此它有点复杂,但不是很多.

This is ~ 200x difference. In the real code, I had to put both loops in the function as well as use a function template with variable types so it was a bit more complex but not very much.

这篇关于numpy数组上的有效循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆