沿numpy数组中的范围应用函数 [英] Apply function along ranges in numpy array

查看:123
本文介绍了沿numpy数组中的范围应用函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有以下numpy数组:

Say I have the following numpy array:

a = np.arange(20)

还有一个包含索引的数组,如下所示:

And also an array containing indices as follows:

ix = np.array([4,10,15])

我一直在尝试针对以下问题提出矢量化解决方案:如何使用ix中的索引沿a拆分的函数应用函数?

I've been trying to come up with a vectorized solution to the following question: How can I apply a function along a being splitted using the indices in ix?

所以说我在哪里用np.split分割a(我仅在这里使用np.split来说明我想在此处应用功能的组):

So say I where to split a with np.split (I'm only using np.split here to illustrate the groups to which I would like to apply a function here):

np.split(a,ix)

[array([0, 1, 2, 3]),
 array([4, 5, 6, 7, 8, 9]),
 array([10, 11, 12, 13, 14]),
 array([15, 16, 17, 18, 19])]

比如说我想在每个块上取总和,所以给出:

And say for instance I'd like to take the sum on each chunk, so giving:

[6, 39, 60, 85]

如何使用numpy将其矢量化?

推荐答案

split生成数组列表,其长度可能有所不同.它实际上是反复进行的

split produces a list of arrays, which may differ in length. It actually does so iteratively

In [12]: alist = []
In [13]: alist.append(a[0:idx[0]])
In [14]: alist.append(a[idx[0]:idx[1]])
In [15]: alist.append(a[idx[1]:idx[2]])
....

分别将sum应用于列表的每个元素是有道理的:

Applying sum to each element of the list individually makes sense:

In [11]: [np.sum(row) for row in alist]
Out[11]: [6, 39, 60, 85]

当您拥有形状不同的数组的列表时,可以肯定的是,您将必须对其进行Python级别的迭代.

When you have a list of arrays that differ in shape, it's a good bet that you'll have to do a Python level iteration on it.

快速的向量化"意味着以编译后的代码执行计算.大多数是围绕多维数组构建的,例如2d个.如果split产生了相等大小的数组,则可以将np.sum与相应的axis参数一起使用.

Fast 'vectorize' means performing the calculations in compiled code. Most that is built around multidimensional arrays, e.g. 2d ones. If your split had produced equal size array, you could use np.sum with the appropriate axis parameter.

In [23]: a1 = a.reshape(4,5)
In [24]: a1
Out[24]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])
In [25]: np.sum(a1, axis=1)
Out[25]: array([10, 35, 60, 85])

有时候,我们可以玩弄技巧,将问题转换为n-d维,例如,如果拆分的第一个数组用0填充.但是转换本身可能需要迭代.

Sometimes we can play tricks to cast the problem into a n-d one, for example if your first array of the split were padded with a 0. But that casting itself might require iteration.

如此处(及其链接)所述 AttributeError的来源:object没有属性"cos" 应用于对象dtype数组的math(ufunc)函数最终将操作委派给对象的相应方法.但这仍然涉及对对象的(近)Python级别的迭代.

As raised here (and its links) Origin of AttributeError: object has no attribute 'cos' math (ufunc) functions applied to object dtype arrays, ends up delegating the action to corresponding methods of the objects. But that still involves a (near)Python level iteration over the objects.

一些时间:

In [57]: timeit [np.sum(row) for row in alist]
31.7 µs ± 1.21 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [58]: timeit np.sum(list(itertools.zip_longest(*alist, fillvalue=0)),axis=0)
25.2 µs ± 82 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [59]: timeit np.nansum(pd.DataFrame(alist), axis=1)
908 µs ± 28.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [61]: timeit np.frompyfunc(sum,1,1)(alist)
12.9 µs ± 21.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

在最后一种情况下,Python sumnp.sum快.但是列表理解也是如此:

In this last case the Python sum is faster than than np.sum. But that's true with the list comprehension as well:

In [63]: timeit [sum(row) for row in alist]
6.86 µs ± 13.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

使用Divakar的wiz-bang fillna

And with Divakar's wiz-bang fillna, Numpy: Fix array with rows of different lengths by filling the empty elements with zeros

In [70]: timeit numpy_fillna(np.array(alist)).sum(axis=1)
44.2 µs ± 208 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

一旦有了多维数组,numpy代码就会很快.但是,如果从列表开始,甚至从数组列表开始,Python列表方法通常会更快.构造数组(或数据框)所花费的时间从来都不短.

Once you have a multidimensional array, the numpy code is fast. But if start with a list, even a list of arrays, Python list methods often are faster. The time it takes to construct an array (or Dataframe) is never trivial.

这篇关于沿numpy数组中的范围应用函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆