加速结构化NumPy数组 [英] Speed up structured NumPy array

查看:79
本文介绍了加速结构化NumPy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

NumPy数组对于性能和易用性都非常有用(比列表更容易切片,索​​引).

NumPy arrays are great for both performance and easy use (easier slicing, indexing than lists).

我尝试从NumPy structured array而不是NumPy arraysdict构造一个数据容器.问题是性能差得多.使用同类数据的不良率约为2.5倍,而使用异构数据的不良率约为32倍(我说的是NumPy数据类型).

I try to construct a data container out of a NumPy structured array instead of dict of NumPy arrays. The problem is the performance is much worse. About 2.5 times as bad using homogeneous data and about 32 times for heterogeneous data (I'm talking about NumPy datatypes).

有没有一种方法可以加快结构化阵列的速度?我尝试将存储顺序从"c"更改为"f",但这没有任何影响.

Is there a way to speed the structured array's up? I tried changing the memoryorder from 'c' to 'f' but this didn't have any affect.

这是我的配置文件代码:

Here's my profiling code:

import time
import numpy as np

NP_SIZE = 100000
N_REP = 100

np_homo = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.double)], order='c')
np_hetro = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.int32)], order='c')
dict_homo = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE)}
dict_hetro = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE, np.int32)}

t0 = time.time()
for i in range(N_REP):
    np_homo['a'] += i

t1 = time.time()
for i in range(N_REP):
    np_hetro['a'] += i

t2 = time.time()
for i in range(N_REP):
    dict_homo['a'] += i

t3 = time.time()
for i in range(N_REP):
    dict_hetro['a'] += i
t4 = time.time()

print('Homogeneous Numpy struct array took {:.4f}s'.format(t1 - t0))
print('Hetoregeneous Numpy struct array took {:.4f}s'.format(t2 - t1))
print('Homogeneous Dict of numpy arrays took {:.4f}s'.format(t3 - t2))
print('Hetoregeneous Dict of numpy arrays took {:.4f}s'.format(t4 - t3))

编辑:忘记输入我的计时号:

Edit: Forgot to put my timing numbers:

Homogenious Numpy struct array took 0.0101s
Hetoregenious Numpy struct array took 0.1367s
Homogenious Dict of numpy arrays took 0.0042s
Hetoregenious Dict of numpy arrays took 0.0042s

Edit2 :我在timit模块中添加了一些其他测试用例:

Edit2: I added some additional test case with the timit module:

import numpy as np
import timeit

NP_SIZE = 1000000

def time(data, txt, n_rep=1000):
    def intern():
        data['a'] += 1

    time = timeit.timeit(intern, number=n_rep)
    print('{} {:.4f}'.format(txt, time))


np_homo = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.double)], order='c')
np_hetro = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.int32)], order='c')
dict_homo = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE)}
dict_hetro = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE, np.int32)}

time(np_homo, 'Homogeneous Numpy struct array')
time(np_hetro, 'Hetoregeneous Numpy struct array')
time(dict_homo, 'Homogeneous Dict of numpy arrays')
time(dict_hetro, 'Hetoregeneous Dict of numpy arrays')

导致:

Homogeneous Numpy struct array 0.7989
Hetoregeneous Numpy struct array 13.5253
Homogeneous Dict of numpy arrays 0.3750
Hetoregeneous Dict of numpy arrays 0.3744

运行之间的比率似乎相当稳定.同时使用两种方法和不同大小的数组.

The ratios between the runs seem reasonably stable. Using both methods and a different size of the array.

对于情况很重要: 的Python:3.4 NumPy:1.9.2

For the offcase it matters: python: 3.4 NumPy: 1.9.2

推荐答案

在我的快速计时测试中,差异并不大:

In my quick timing tests the difference isn't that large:

In [717]: dict_homo = {'a': np.zeros(10000), 'b': np.zeros(10000)}
In [718]: timeit dict_homo['a']+=1
10000 loops, best of 3: 25.9 µs per loop
In [719]: np_homo = np.zeros(10000, dtype=[('a', np.double), ('b', np.double)])
In [720]: timeit np_homo['a'] += 1
10000 loops, best of 3: 29.3 µs per loop

dict_homo情况下,将数组嵌入字典中这一事实是次要点.这样的简单字典访问速度很快,基本上与通过变量名访问数组相同.

In the dict_homo case, the fact that the array is embedded in a dictionary is a minor point. Simple dictionary access like this is fast, basically the same as accessing the array by variable name.

因此,第一种情况基本上是对一维数组+=的测试.

So the first case it basically a test of += for a 1d array.

在结构化情况下,ab值在数据缓冲区中交替出现,因此np_homo['a']是抽取"备用编号的视图.因此,它会慢一点也就不足为奇了.

In the structured case, the a and b values alternate in the data buffer, so np_homo['a'] is a view that 'pulls out' alternative numbers. So it's not surprising that it would be a bit slower.

In [721]: np_homo
Out[721]: 
array([(41111.0, 0.0), (41111.0, 0.0), (41111.0, 0.0), ..., (41111.0, 0.0),
       (41111.0, 0.0), (41111.0, 0.0)], 
      dtype=[('a', '<f8'), ('b', '<f8')])

一个2d数组也交错显示列值.

A 2d array also interleaves the column values.

In [722]: np_twod=np.zeros((10000,2), np.double)
In [723]: timeit np_twod[:,0]+=1
10000 loops, best of 3: 36.8 µs per loop

令人惊讶的是,它实际上比结构化案例要慢一些.使用order='F'或(2,10000)形状可以加快速度,但仍不如结构化情况好.

Surprisingly it's actually a bit slower than the structured case. Using order='F' or (2,10000) shape speeds it up a bit, but still not quite as good as the structured case.

这些时间很小,所以我不会大声疾呼.但是结构化数组不会回头.

These are small test times, so I won't make grand claims. But the structured array doesn't look back.

另一个时间测试,在每一步都重新初始化数组或字典

Another time tests, initializing the array or dictionary fresh each step

In [730]: %%timeit np.twod=np.zeros((10000,2), np.double)
np.twod[:,0] += 1
   .....: 
10000 loops, best of 3: 36.7 µs per loop
In [731]: %%timeit np_homo = np.zeros(10000, dtype=[('a', np.double), ('b', np.double)])
np_homo['a'] += 1
   .....: 
10000 loops, best of 3: 38.3 µs per loop
In [732]: %%timeit dict_homo = {'a': np.zeros(10000), 'b': np.zeros(10000)}
dict_homo['a'] += 1
   .....: 
10000 loops, best of 3: 25.4 µs per loop

2d和结构化语句更接近,对于字典(1d)情况,其性能要好一些.我也用np.ones尝试了此操作,因为np.zeros可以延迟分配,但是行为上没有区别.

2d and structured are closer, with somewhat better performance for the dictionary (1d) case. I tried this with np.ones as well, since np.zeros can have delayed allocation, but no difference in behavior.

这篇关于加速结构化NumPy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆