是否有numpy库的O大复杂性列表? [英] Is there a list of big O complexities for the numpy library?

查看:116
本文介绍了是否有numpy库的O大复杂性列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对算法进行时间复杂度分析,需要了解某些numpy操作具有哪种复杂度.

I'm doing a time complexity analysis of an algorithm and need to know what kind of complexities certain numpy operations have.

对于某些人,我认为它们与基础数学运算匹配.像np.dot(array1, array2)将是O(n).对于其他人,我不确定.例如,np.array(my_array) O(1)是吗?还是O(n)?它只是重新分配一个指针还是在列表上进行迭代并复制出每个值?

For some, I assume they match the underlying mathematical operation. Like np.dot(array1, array2) would be O(n). For others, I am not as sure. For example, is np.array(my_array) O(1)? or is it O(n)? Does it simply reassign a pointer or is it iterating over the list and copying out each value?

我想确定每个操作的复杂性.我可以在某处找到此信息吗?还是我应该假设它们与数学运算相符?

I want to be sure of each operation's complexity. Is there somewhere I can find this information? Or should I just assume they match the mathematical operation?

推荐答案

BigO复杂度通常不用于Python和numpy.它衡量代码如何根据问题的大小进行扩展.这在像C这样的编译语言中很有用.但是这里的代码是解释性Python和编译后代码的混合.两者可以具有相同的bigO,但是解释版本将慢几个数量级.这就是为什么大多数SO有关提高numpy速度的问题,谈论删除循环"和向量化"的原因.

BigO complexity is not often used with Python and numpy. It's a measure of how the code scales with problem size. That's useful in a compiled language like C. But here the code is a mix of interpreted Python and compiled code. Both can have the same bigO, but the interpreted version will be orders of magnitude slower. That's why most of the SO questions about improving numpy speed, talk about 'removing loops' and 'vectorizing'.

还有很少的操作是纯O(n);大多数是混合的.有安装成本,还有每件成本.如果单件成本很小,则安装成本将占主导地位.

Also few operations are pure O(n); most are a mix. There's a setup cost, plus a per element cost. If the per element cost is small, the setup cost dominates.

如果从列表开始,则在列表上进行迭代通常会更快,因为将列表转换为数组会产生大量开销(O(n)).

If starting with lists, it's often faster to iterate on the list, because converting a list to an array has a substantial overhead (O(n)).

如果您已经拥有数组,则尽可能避免(python级)迭代.迭代是大多数计算的一部分,但是numpy可以让您在更快的编译代码(更快的O(n))中完成很多工作.

If you already have arrays, then avoid (python level) iteration where possible. Iteration is part of most calculations, but numpy lets you do a lot of that in faster compiled code (faster O(n)).

在某些时候,您必须了解numpy如何存储其数组. viewcopy之间的区别很重要.视图实际上是O(1),副本是O(n).

At some point you have to understand how numpy stores its arrays. The distinction between view and copy is important. A view is in effect O(1), a copy O(n).

通常您会看到SO答案可以进行timeit速度比较.我经常加警告,结果可能会因问题大小而异.更好的答案将解决各种尺寸问题,并在漂亮的图中显示结果.结果通常是直线(O(n))和曲线(O(1)和O(n)分量的各种混合)的混合.

Often you'll see SO answers do timeit speed comparisons. I often add the caution that results might vary with problem size. The better answers will time various size problems, and show the results on a nice plot. The results are often a mix of straight lines (O(n)), and curves (varying blends of O(1) and O(n) components).

您专门询问了有关np.array的问题.以下是一些示例计时:

You asked specifically about np.array. Here are some sample timings:

In [134]: %%timeit alist = list(range(1000))
     ...: np.array(alist)
67.9 µs ± 839 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [135]: %%timeit alist = list(range(10))
     ...: np.array(alist)

2.19 µs ± 9.88 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [136]: %%timeit alist = list(range(2000))
     ...: np.array(alist)
134 µs ± 1.98 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

复制数组:

In [137]: %%timeit alist = list(range(2000)); arr=np.array(alist)
     ...: np.array(arr)
1.77 µs ± 24.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

无副本:

In [138]: %%timeit alist = list(range(2000)); arr=np.array(alist)
     ...: np.array(arr, copy=False) 
237 ns ± 1.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

来自字符串列表:

In [139]: %%timeit alist = [str(i) for i in range(2000)]
     ...: np.array(alist, dtype=int)
286 µs ± 4.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


numpy中的几乎所有计算都为O(n).如果涉及数组的每个元素,则速度将取决于数组的大小.一些数组操作是O(1),例如重塑,因为它们实际上对数据不做任何事情;它们会改变形状和步幅等属性.


Almost all calculations in numpy are O(n). If it involves each element of an array it, speed will depend on the size of the array. Some array manipulations are O(1), such as reshaping, because they don't actually do anything with the data; they change properties like shape and strides.

搜索问题的增长速度通常快于O(n);通常numpy并不是解决此类问题的最佳选择.使用精明的Python列表和字典可以更快.

Search problems often grow faster than O(n); usually numpy is not the best choice for that kind of problem. Smart of use Python lists and dictionaries can be faster.

这篇关于是否有numpy库的O大复杂性列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆