numpy 数组上的向量化操作 [英] Vectorizing operation on numpy array
问题描述
我有一个包含许多三维 numpy 数组的 numpy 数组,其中每个子元素都是一个灰度图像.我想使用 numpy 的 vectorize 来应用仿射变换到数组中的每个图像.
I have a numpy array containing many three-dimensional numpy arrays, where each of these sub-elements is a grayscale image. I want to use numpy's vectorize to apply an affine transformation to each image in the array.
这是重现问题的最小示例:
Here is a minimal example that reproduces the issue:
import cv2
import numpy as np
from functools import partial
# create four blank images
data = np.zeros((4, 1, 96, 96), dtype=np.uint8)
M = np.array([[1, 0, 0], [0, 1, 0]], dtype=np.float32) # dummy affine transformation matrix
size = (96, 96) # output image size
现在我想将数据中的每个图像传递给 cv2.warpAffine(src, M, dsize).在对它进行矢量化之前,我首先创建了一个绑定 M 和 dsize 的偏函数:
Now I want to pass each of the images in data to cv2.warpAffine(src, M, dsize). Before I vectorize it, I first create a partial function that binds M and dsize:
warpAffine = lambda M, size, img : cv2.warpAffine(img, M, size) # re-order function parameters
partialWarpAffine = partial(warpAffine, M, size)
vectorizedWarpAffine = np.vectorize(partialWarpAffine)
print data[:, 0].shape # prints (4, 96, 96)
vectorizedWarpAffine(data[:, 0])
但是这个输出:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 1573, in __call__
return self._vectorize_call(func=func, args=vargs)
File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 1633, in _vectorize_call
ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)
File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 1597, in _get_ufunc_and_otypes
outputs = func(*inputs)
File "<stdin>", line 1, in <lambda>
TypeError: src is not a numpy array, neither a scalar
我做错了什么 - 为什么我不能对 numpy 数组进行矢量化操作?
What am I doing wrong - why can't I vectorize an operation on numpy arrays?
推荐答案
问题在于,仅仅使用 partial
并不会因为 partial
而使其他参数的存在消失代码>矢量化代码>.partial
对象的基础函数将是 vectorizedWarpAffine.pyfunc
,它将跟踪您在调用 vectorizedWarpAffine 时希望它使用的任何预绑定参数.pyfunc.func
(仍然是一个多参数函数).
The problem is that just by using partial
it doesn't make the existence of the other arguments go away for the sake of vectorize
. The function underlying the partial
object will be vectorizedWarpAffine.pyfunc
, which will keep track of whatever pre-bound arguments you'd like it to use when calling vectorizedWarpAffine.pyfunc.func
(which is still a multi-argumented function).
你可以这样看到(在你import inspect
之后):
You can see it like this (after you import inspect
):
In [19]: inspect.getargspec(vectorizedWarpAffine.pyfunc.func)
Out[19]: ArgSpec(args=['M', 'size', 'img'], varargs=None, keywords=None, defaults=None)
为了解决这个问题,您可以使用 np.vectorize
的 excluded
选项,它表示在包装矢量化行为时要忽略哪些参数(位置或关键字):
To get around this, you can use the excluded
option to np.vectorize
which says which arguments (positonal or keyword) to ignore when wrapping the vectorization behavior:
vectorizedWarpAffine = np.vectorize(partialWarpAffine,
excluded=set((0, 1)))
当我进行此更改时,代码现在似乎实际执行了矢量化函数,但它在 imagewarp.cpp
代码中遇到了实际错误,大概是由于此测试中的一些错误数据假设数据:
When I make this change, the code appears to actually execute the vectorized function now, but it hits an actual error in the imagewarp.cpp
code, presumably due to some bad data assumption on this test data:
In [21]: vectorizedWarpAffine(data[:, 0])
OpenCV Error: Assertion failed (cn <= 4 && ssize.area() > 0) in remapBilinear, file -------src-dir-------/opencv-2.4.6.1/modules/imgproc/src/imgwarp.cpp, line 2296
---------------------------------------------------------------------------
error Traceback (most recent call last)
<ipython-input-21-3fb586393b75> in <module>()
----> 1 vectorizedWarpAffine(data[:, 0])
/home/ely/anaconda/lib/python2.7/site-packages/numpy/lib/function_base.pyc in __call__(self, *args, **kwargs)
1570 vargs.extend([kwargs[_n] for _n in names])
1571
-> 1572 return self._vectorize_call(func=func, args=vargs)
1573
1574 def _get_ufunc_and_otypes(self, func, args):
/home/ely/anaconda/lib/python2.7/site-packages/numpy/lib/function_base.pyc in _vectorize_call(self, func, args)
1628 """Vectorized call to `func` over positional `args`."""
1629 if not args:
-> 1630 _res = func()
1631 else:
1632 ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)
/home/ely/anaconda/lib/python2.7/site-packages/numpy/lib/function_base.pyc in func(*vargs)
1565 the_args[_i] = vargs[_n]
1566 kwargs.update(zip(names, vargs[len(inds):]))
-> 1567 return self.pyfunc(*the_args, **kwargs)
1568
1569 vargs = [args[_i] for _i in inds]
/home/ely/programming/np_vect.py in <lambda>(M, size, img)
10 size = (96, 96) # output image size
11
---> 12 warpAffine = lambda M, size, img : cv2.warpAffine(img, M, size) # re-order function parameters
13 partialWarpAffine = partial(warpAffine, M, size)
14
error: -------src-dir-------/opencv-2.4.6.1/modules/imgproc/src/imgwarp.cpp:2296: error: (-215) cn <= 4 && ssize.area() > 0 in function remapBilinear
附带说明:我看到您的数据的形状为 (4, 96, 96)
,不是 (4, 10, 10)代码>.
As a side note: I am seeing a shape of (4, 96, 96)
for your data, not (4, 10, 10)
.
另请注意,使用np.vectorize
不是提高函数性能的技术.它所做的只是将您的函数调用轻轻地包裹在一个表面的 for
循环中(尽管是在 NumPy 级别).它是一种用于编写自动遵守 NumPy 广播规则的函数并使您的 API 表面上类似于 NumPy 的 API 的技术,从而期望函数调用能够在 ndarray
参数之上正确工作.
Also note that using np.vectorize
is not a technique for improving the performance of a function. All it does is gently wrap your function call inside a superficial for
-loop (albeit at the NumPy level). It is a technique for writing functions that automatically adhere to NumPy broadcasting rules and for making your API superficially similar to NumPy's API, whereby function calls are expected to work correctly on top of ndarray
arguments.
添加:在这种情况下,您使用 partial
的主要原因是为了获得一个表面上是单参数"的新函数,但实际上并没有根据 partial
的工作方式进行规划.那么为什么不一起去掉 partial
呢?
Added: The main reason you are using partial
in this case is to get a new function that's ostensibly "single-argumented" but that doesn't work out as planned based on the way partial
works. So why not just get rid of partial
all together?
您可以让 lambda
函数保持原样,即使有两个非数组位置参数,但仍要确保将第三个参数 视为矢量化.为此,您只需使用 excluded
如上所述,但您还需要告诉 vectorize
期望的输出内容.
You can leave your lambda
function exactly as it is, even with the two non-array positional arguments, but still ensure that the third argument is treated as something to vectorize over. To do this, you just use excluded
as above, but you also need to tell vectorize
what to expect as the output.
这样做的原因是 vectorize
将尝试通过在您提供的数据的第一个元素上运行您的函数来确定输出形状应该是什么.在这种情况下(我不完全确定,值得进行更多调试)这似乎会创建您看到的src 不是 numpy 数组"错误.
The reason for this is that vectorize
will try to determine what the output shape is supposed to be by running your function on the first element of the data you supply. In this case (and I am not fully sure and it would be worth more debugging) this seems to create the "src is not numpy array" error you were seeing.
所以为了防止 vectorize
甚至尝试它,您可以自己提供输出类型的列表,如下所示:
So to prevent vectorize
from even trying it, you can provide a list of the output types yourself, like this:
vectorizedWarpAffine = np.vectorize(warpAffine,
excluded=(0, 1),
otypes=[np.ndarray])
它有效:
In [29]: vectorizedWarpAffine(M, size, data[:, 0])
Out[29]:
array([[[ array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
...
我认为这更好,因为现在当你调用 vectorizedWarpAffine
时,你仍然明确地使用其他位置参数,而不是使用 partial<预先绑定的误导层/code>,但第三个参数仍然被向量处理.
I think this is a lot nicer because now when you call vectorizedWarpAffine
you still explicitly utilize the other positional arguments, instead of the layer of misdirection where they are pre-bound with partial
, and yet the third argument is still treated vectorially.
这篇关于numpy 数组上的向量化操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!