如何使用Numpy数组最大化效率? [英] How do I maximize efficiency with numpy arrays?

查看:70
本文介绍了如何使用Numpy数组最大化效率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是了解numpy,它对ndarray中的内存访问具有类似于C的效率的说法给我留下了深刻的印象.我想亲自了解这些列表与pythonic列表之间的区别,因此我进行了快速的时序测试,没有numpy的情况下执行了一些相同的简单任务.如预期的那样,在数组的分配和算术运算上,Numpy的性能比常规列表高出一个数量级.但是这段代码在两次测试中都相同,使用常规列表花费了大约1/8秒,而使用numpy花费了2.5秒多一点:

I am just getting to know numpy, and I am impressed by its claims of C-like efficiency with memory access in its ndarrays. I wanted to see the differences between these and pythonic lists for myself, so I ran a quick timing test, performing a few of the same simple tasks with numpy without it. Numpy outclassed regular lists by an order of magnitude in the allocation of and arithmetic operations on arrays, as expected. But this segment of code, identical in both tests, took about 1/8 of a second with a regular list, and slightly over 2.5 seconds with numpy:

file = open('timing.log','w')
for num in a2:
    if num % 1000 == 0:
        file.write("Multiple of 1000!\r\n")

file.close()

有人知道为什么会这样吗,如果我应该对这样的操作使用其他语法,以便更好地利用ndarray的功能呢?

Does anyone know why this might be, and if there is some other syntax i should be using for operations like this to take better advantage of what the ndarray can do?

谢谢...

要回答韦恩的评论...我以不同的顺序重复给它们计时,每次都得到几乎相同的结果,所以我怀疑这是另一个过程.我在numpy导入后将

To answer Wayne's comment... I timed them both repeatedly and in different orders and got pretty much identical results each time, so I doubt it's another process. I put

start = time()

放在文件的顶部,然后整个过程中都有类似

at the top of the file after the numpy import and then I have statements like

print 'Time after traversal:\t',(time() - start)

的语句.

推荐答案

a2是NumPy数组,对吗? NumPy可能花费很长时间的一个可能原因(如果其他进程的活动没有像Wayne Werner所建议的那样),则是您正在使用Python循环遍历数组.在迭代的每个步骤中,Python都必须从NumPy数组中提取一个值并将其转换为Python整数,这并不是特别快的操作.

a2 is a NumPy array, right? One possible reason it might be taking so long in NumPy (if other processes' activity don't account for it as Wayne Werner suggested) is that you're iterating over the array using a Python loop. At every step of the iteration, Python has to fetch a single value out of the NumPy array and convert it to a Python integer, which is not a particularly fast operation.

当您能够对整个数组作为一个单元执行操作时,NumPy会更好地工作.在您的情况下,一种选择(甚至可能不是最快的选择)都是

NumPy works much better when you are able to perform operations on the whole array as a unit. In your case, one option (maybe not even the fastest) would be

file.write("Multiple of 1000!\r\n" * (a2 % 1000 == 0).sum())

尝试将其与纯Python版本进行比较,

Try comparing that to the pure-Python equivalent,

file.write("Multiple of 1000!\r\n" * sum(filter(lambda i: i % 1000 == 0, a2)))

file.write("Multiple of 1000!\r\n" * sum(1 for i in a2 if i % 1000 == 0))

这篇关于如何使用Numpy数组最大化效率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆