python锯齿状数组的运行效率 [英] python jagged array operation efficiency

查看:85
本文介绍了python锯齿状数组的运行效率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Python的新手,我正在寻找对锯齿状数组进行操作的最有效方法.

I am new to Python and I am looking for the most efficient way to do operations with a jagged array.

我有一个锯齿状的数组:

I have a jagged array like this:

A = array([[array([1, 2, 3]), array([4, 5])],[array([6, 7, 8, 9]), array([10])]], dtype=object)

我希望能够做这样的事情:

I want to be able to do things like this:

A=A[A>4]
B=A+A

很明显,python对于使用numpy数组进行这样的操作非常有效,但不幸的是,我需要对锯齿状数组执行此操作,但我还没有在Python中找到这样的对象.它是否存在于Python中,或者是否存在一个允许对锯齿状数组进行有效操作的库?

Apparently python is very efficient for doing operations like this with numpy arrays, but unfortunetely I need to do this for jagged arrays and I havent found such an object in Python. Does it exist in Python, or is there a library that allows to do efficient operations with jagged arrays ?

对于我给出的示例,这是我想要的输出:

For the example I gave, here are the outputs I'd like:

A = array([[array([]), array([5])],[array([6, 7, 8, 9]), array([10])]], dtype=object)
B = array([[array([]), array([10])],[array([12, 14, 16, 18]), array([20])]], dtype=object)

但是,也许Python的工作方式无法像使用numpy数组那样对锯齿状数组进行有效的操作,我不知道细节.

But maybe the way Python works it simply cannot do efficient operations with jagged arrays like it does with numpy arrays, I dont know the details.

推荐答案

您的数组为2x2:

In [298]: A
Out[298]: 
array([[array([1, 2, 3]), array([4, 5])],
       [array([6, 7, 8, 9]), array([10])]], dtype=object)

虽然 A + A 有效,但尚未为这种数组实现布尔测试:

While A+A works, boolean tests have not been implemented for this kind of array:

In [299]: A>4
...
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

我将使 A 变平,因为它可以更轻松地与列表操作进行比较:

I'm going to flatten A because it makes it easier to compare with list operations:

In [301]: A1=A.flatten()

In [303]: A1+A1
Out[303]: 
array([array([2, 4, 6]), array([ 8, 10]), array([12, 14, 16, 18]),
       array([20])], dtype=object)

In [304]: [a+a for a in A1]
Out[304]: [array([2, 4, 6]), array([ 8, 10]), array([12, 14, 16, 18]), array([20])]

In [305]: timeit A1+A1
100000 loops, best of 3: 6.85 µs per loop

In [306]: timeit [a+a for a in A1]
100000 loops, best of 3: 9.09 µs per loop

数组操作比列表理解要快一些.但是,如果我先将数组变成列表:

The array operation is a bit faster than a list comprehension. But if I first turn the array into a list:

In [307]: A1l=A1.tolist()

In [308]: A1l
Out[308]: [array([1, 2, 3]), array([4, 5]), array([6, 7, 8, 9]), array([10])]

In [309]: timeit [a+a for a in A1l]
100000 loops, best of 3: 5.2 µs per loop

时间改善了.这很好地表明 A1 + A1 (甚至是 A + A )正在使用类似的迭代方式.

times improve. This is a good indication that the A1+A1 (or even A+A) is using a similar sort of iteration.

因此执行 A,B 计算的直接方法是

So the straight forward way of performing your A,B calculation is

In [310]: A2=[a[a>4] for a in A1]
In [311]: B=[a+a for a in A2]
In [312]: B
Out[312]: [array([], dtype=int32), array([10]), array([12, 14, 16, 18]), array([20])]

(我们可以根据需要在数组和列表之间进行转换).

(we can convert to/from arrays and lists as needed).

numpy 数组将其数据存储在平面数据缓冲区中,并使用 shape strides 属性来快速计算任何元素的位置,无论尺寸如何.快速数组操作使用编译后的代码,这些代码可以快速通过参数的数据缓冲区,逐个元素(或其他组合)执行操作.

A numpy array stores its data a flat databuffer, and uses the shape and strides attributes to quickly calculate the location of any element, regardless of the dimensions. The fast array operations use compiled code that rapidly steps though the databuffers of arguments, performing the operations element by element (or some other combination).

一个 dtype object 数组也具有平面数据缓冲区,但是元素是指向其他位置的列表或数组的指针.因此,尽管它可以快速索引各个元素,但仍必须执行Python调用才能访问数组.因此,尤其是当数组为1d时,它实际上与具有相同指针的平面列表相同.

A dtype object array also has the flat databuffer, but the elements are pointers to lists or arrays elsewhere. So while it can index individual elements quickly, it still has to perform a Python call(s) to access the arrays. So especially when the array is 1d, it is virtually the same as a flat list with the same pointers.

多维对象数组比嵌套列表更好.您可以重塑它们,访问元素( A [1,3] v Al [1] [3] ),转置它们,等等.但是当涉及到遍历时他们没有提供很多好处的所有子阵列.

Multidimensional object arrays are nicer than nested lists. You can reshape them, access elements (A[1,3] v Al[1][3]), transpose them, etc. But when it comes to iterating through all the subarrays they don't offer much of a benefit.

再次查看您的2d数组:

Looking again at your 2d array:

In [315]: timeit A+A
100000 loops, best of 3: 6.93 µs per loop  # 6.85 for A1+A1 (above)

In [316]: timeit [[j+j for j in i] for i in A]
100000 loops, best of 3: 17.1 µs per loop

In [317]: Al = A.tolist()

In [318]: timeit [[j+j for j in i] for i in Al]
100000 loops, best of 3: 7.01 µs per loop    # 5.2 for A1l flat list

基本上是在同一时间对数组求和并遍历等效的嵌套列表.

Basically the same time for summing the array and iterating through the equivalent nested list.

这篇关于python锯齿状数组的运行效率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆