为什么基于速度在列表上使用numpy? [英] Why use numpy over list based on speed?

查看:114
本文介绍了为什么基于速度在列表上使用numpy?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

参考为什么用NumPy代替Python列表?

tom10说:

速度:这是一项对列表和NumPy数组求和的测试,表明NumPy数组的求和速度快10倍(在此测试中,里程可能会有所不同).

Speed: Here's a test on doing a sum over a list and a NumPy array, showing that the sum on the NumPy array is 10x faster (in this test -- mileage may vary).

但是我的测试使用以下代码:

But my test using the following code:

import numpy as np
import time as time

N = 100000

#using numpy
start = time.time()
array = np.array([])

for i in range(N):
    array = np.append(array, i)

end = time.time()
print ("Using numpy: ", round(end-start, 2), end="\n")

#using list
start = time.time()
list = []

for i in range(N):
    list.append(i)

list = np.array(list)   
end = time.time()
print ("Using list : ", round(end-start, 2), end="\n")

给出结果:

Using numpy: 8.35
Using list : 0.02

的确,当使用"append"时,列表比numpy好吗?

It is true that when using "append", list is better than numpy ?

推荐答案

要回答您的问题,是的.附加到数组是一项昂贵的操作,而列表使其相对便宜(请参阅 Python内部列表,请访问并调整运行时的大小).但是,这没有理由放弃numpy.还有其他方法可以轻松地将数据添加到numpy数组.

To answer your question, yes. Appending to an array is an expensive operation, while lists make it relatively cheap (see Internals of Python list, access and resizing runtimes for why). However, that's no reason to abandon numpy. There are other ways to easily add data to a numpy array.

(无论如何,对我而言)有许多令人惊讶的方法.跳到底部以查看每个基准.

There are surprising (to me, anyway) amount of ways to do this. Skip to the bottom to see benchmarks for each of them.

最常见的可能是简单地预分配数组,并对其进行索引,

Probably the most common is to simply pre-allocate the array, and index into that,

#using preallocated numpy
start = time.time()
array = np.zeros(N)

for i in range(N):
    array[i] = i

end = time.time()
print ("Using preallocated numpy: ", round(end-start, 5), end="\n")

当然,您也可以为列表预先分配内存,因此,请为基准比较添加内存.

Of course, you can preallocate the memory for a list too, so lets include that for a benchmark comparison.

#using preallocated list
start = time.time()
res = [None]*N

for i in range(N):
    res[i] = i

res = np.array(res)
end = time.time()
print ("Using preallocated list : ", round(end-start, 5), end="\n")

根据您的代码,使用numpy的fromiter函数可能也会有所帮助,该函数使用迭代器的结果来初始化数组.

Depending on your code, it may also be helpful to use numpy's fromiter function, which uses the results of an iterator to initialize the array.

#using numpy fromiter shortcut
start = time.time()

res = np.fromiter(range(N), dtype='float64') # Use same dtype as other tests

end = time.time()
print ("Using fromiter : ", round(end-start, 5), end="\n")

当然,使用内置的迭代器并不十分灵活,因此我们也尝试使用自定义迭代器,

Of course, using a built in iterator isn't terribly flexible so let's try a custom iterator as well,

#using custom iterator
start = time.time()
def it(N):
    i = 0
    while i < N:
        yield i
        i += 1

res = np.fromiter(it(N), dtype='float64') # Use same dtype as other tests

end = time.time()
print ("Using custom iterator : ", round(end-start, 5), end="\n")

这是使用numpy的两种非常灵活的方式.首先,使用预分配的数组是最灵活的.让我们看看他们如何比较:

That's two very flexible ways of using numpy. The first, using a preallocated array, is the most flexible. Let's see how they compare:

Using numpy:  2.40017
Using list :  0.0164
Using preallocated numpy:  0.01604
Using preallocated list :  0.01322
Using fromiter :  0.00577
Using custom iterator :  0.01458

马上,您可以看到预分配使numpy比使用列表快得多,尽管预分配列表使两者的速度大致相同.使用内置的迭代器非常快,尽管迭代器的性能很高 使用自定义迭代器时,将回到预分配数组和列表的范围内.

Right off, you can see that preallocating makes numpy much faster than using lists, although preallocating the list brings both to about the same speed. Using a builtin iterator is extremely fast, although the iterator performance drops back into the range of the preallocated array and list when a custom iterator is used.

append一样,将代码直接转换为numpy的性能通常较差.使用numpy的方法找到一种方法几乎总是可以带来实质性的改进.在这种情况下,预分配数组或将每个元素的计算表示为迭代器,以获得与python列表类似的性能.或者使用香草python列表,因为性能大致相同.

Converting code directly to numpy often has poor performance, as with append. Finding an approach using numpy's methods can almost always give a substantial improvement. In this case, preallocating the array or expressing the calculation of each element as an iterator to get similar performance to python lists. Or use a vanilla python list since the performance is about the same.

原始答案也包括np.fromfunction.删除了它,因为它不适合一次添加一个元素的用例,fromfunction实际上初始化了数组,并使用numpy的广播进行单个函数调用.它的速度大约快一百倍,因此,如果您可以使用广播解决问题,请不要理会这些其他方法.

EDITS: Original answer also included np.fromfunction. This was removed since it didn't fit the use case of adding one element at a time, fromfunction actually initializes the array and uses numpy's broadcasting to make a single function call. It is about a hundred times faster, so if you can solve your problem using broadcasting don't bother with these other methods.

这篇关于为什么基于速度在列表上使用numpy?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆