如何通过numpy的阵列,当DTYPE =对象播放功能? [英] How to broadcast a function over a numpy array, when dtype=object?

查看:148
本文介绍了如何通过numpy的阵列,当DTYPE =对象播放功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有数值的数组,这不得不使用对象指针代替值作为数据类型,由于不平等的向量长度:

If I have an array of numerical values, which had to use object pointers instead of values as the data type, due to unequal vector lengths:

In [145]: import numpy as np

In [147]: a = np.array([[1,2],[3,4,5]])

In [148]: a
Out[148]: array([[1, 2], [3, 4, 5]], dtype=object)

In [150]: np.sin(a)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-150-58d97006f018> in <module>()
----> 1 np.sin(a)

In [152]: np.sin(a[0])
Out[152]: array([ 0.84147098,  0.90929743])

我播过的实际数值的功能,而无需手动遍历数组?怎么办

How do I broadcast a function over the actual numerical values without having to manually traverse the array?

推荐答案

有几个不同的问题在这里。首先,有一点要通过广播在以上的python numpy的对象来获得;你可能会做的更好在这种情况下使用纯蟒蛇。

There are a couple of different issues here. First, there's little to be gained by broadcasting over python objects in numpy; you'll probably do better using pure python in this case.

>>> a = np.array([[1, 2, 3], [4, 5, 6]], dtype=object)
>>> b = np.arange(1, 7).reshape(2, 3)
>>> c = [[1, 2, 3], [4, 5, 6]]
>>> %timeit a * 5
100000 loops, best of 3: 4.28 µs per loop
>>> %timeit b * 5
100000 loops, best of 3: 2.08 µs per loop
>>> %timeit [[x * 5 for x in l] for l in c]
1000000 loops, best of 3: 998 ns per loop

这样的速度会不均匀规模了一点,但你的想法。

Those speeds will scale a bit unevenly but you get the idea.

其次,问题没有直接关系的广播。 numpy的将愉快地播送完了Python列表。其结果恰恰是不是你所期望的:

Second, the problem isn't directly related to broadcasting. numpy will happily broadcast over python lists. The result just isn't what you expect:

>>> a = np.array([[1, 2, 3], [4, 5]], dtype=object)
>>> a * 5
    array([[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
       [4, 5, 4, 5, 4, 5, 4, 5, 4, 5]], dtype=object)

numpy的允许数组中的对象来定义哪个运营商的他们自己的版本或函数是广播。在这种情况下,Python列表定义 * 的重复!这甚至对异构阵列;试试这个: np.array([5,[1,2],DTYPE =对象)* 5 。究其原因在这种情况下不播的是,Python列表没有定义可言。

numpy allows the objects in the array to define their own versions of whichever operator or function it's broadcasting. In this case, python lists define * as repetition! This holds even for heterogenous arrays; try this: np.array([5, [1, 2]], dtype=object) * 5. The reason sin doesn't broadcast in this case is that python lists don't define sin at all.

您很可能会更好使用固定宽度阵列带着面具。

You'd probably be better off using a fixed-width array with a mask.

>>> np.ma.array([[1, 2, 3], [4, 5, 6]], mask=[[0, 0, 0], [0, 0, 1]])
    masked_array(data =
 [[1 2 3]
 [4 5 --]],
             mask =
 [[False False False]
 [False False  True]],
       fill_value = 999999)

正如你所看到的,你可以模拟一个衣衫褴褛的数组这种方式,它会表现得如预期。

As you can see, you can "simulate" a ragged array this way, and it will behave just as expected.

>>> a = np.ma.array([[1, 2, 3], [4, 5, 6]], mask=[[0, 0, 0], [0, 0, 1]])
>>> np.sin(a)
    masked_array(data =
 [[0.841470984808 0.909297426826 0.14112000806]
 [-0.756802495308 -0.958924274663 --]],
             mask =
 [[False False False]
 [False False  True]],
       fill_value = 1e+20)

值得一提的一些方法来创建蒙面阵列。在你的情况, masked_invalid 可能是有用的。

>>> np.ma.masked_invalid([[1, 2, 3], [4, 5, np.NaN]])
masked_array(data =
 [[1.0 2.0 3.0]
 [4.0 5.0 --]],
             mask =
 [[False False False]
 [False False  True]],
       fill_value = 1e+20)

您也可以创建一个使用条件蒙面数组:

You can also create masked arrays using conditions:

>>> x = np.array([[1, 2, 3], [4, 5, 6]])
>>> np.ma.masked_where(x > 5, x)
masked_array(data =
 [[1 2 3]
 [4 5 --]],
             mask =
 [[False False False]
 [False False  True]],
       fill_value = 999999)

有关的变化对这些技术的完整列表,请参阅的这里

For a full list of variations on these techniques, see here.

这篇关于如何通过numpy的阵列,当DTYPE =对象播放功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆