如何使Numpy将每一行/张量视为一个值 [英] How to make Numpy treat each row/tensor as a value
问题描述
许多功能,例如 in1d
和 setdiff1d
专为一维数组设计.在N维数组上应用这些方法的一种解决方法是使numpy
将每行(高维数)视为一个值.
Many functions like in1d
and setdiff1d
are designed for 1-d array. One workaround to apply these methods on N-dimensional arrays is to make numpy
to treat each row (something more high dimensional) as a value.
One approach I found to do so is in this answer Get intersecting rows across two 2D numpy arrays by Joe Kington.
以下代码摘自该答案. Joe Kington面临的任务是在尝试使用in1d
时检测两个数组A
和B
中的公共行.
The following code is taken from this answer. The task Joe Kington faced was to detect common rows in two arrays A
and B
while trying to use in1d
.
import numpy as np
A = np.array([[1,4],[2,5],[3,6]])
B = np.array([[1,4],[3,6],[7,8]])
nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)],
'formats':ncols * [A.dtype]}
C = np.intersect1d(A.view(dtype), B.view(dtype))
# This last bit is optional if you're okay with "C" being a structured array...
C = C.view(A.dtype).reshape(-1, ncols)
我希望您能解决以下三个问题中的任何一个.首先,我不了解此方法背后的机制.您可以尝试向我解释吗?
I am hoping you to help me with any of the following three questions. First, I do not understand the mechanisms behind this method. Can you try to explain it to me?
第二,还有其他方法可以让numpy将子数组视为一个对象吗?
Second, is there other ways to let numpy treat an subarray as one object?
另一个未解决的问题:乔的方法有什么弊端吗?我的意思是将行视为值可能会引起一些问题吗?抱歉,这个问题范围很广.
One more open question: dose Joe's approach have any drawbacks? I mean whether treating rows as a value might cause some problems? Sorry this question is pretty broad.
推荐答案
尝试发布我所学的内容. Joe使用的方法称为结构化数组.它将允许用户定义单个单元格/元素中包含的内容.
Try to post what I have learned. The method Joe used is called structured arrays. It will allow users to define what is contained in a single cell/element.
我们来看一下所提供文档的第一个示例的说明.
We take a look at the description of the first example the documentation provided.
x = np.array([(1,2.,'Hello'), (2,3.,"World")], ...
dtype=[('foo', 'i4'),('bar', 'f4'), ('baz', 'S10')])
在这里,我们创建了一个长度为2 的一维数组. 每个元素 该数组的结构是包含三项的结构,即32位 整数,32位浮点数和长度小于等于10的字符串.
Here we have created a one-dimensional array of length 2. Each element of this array is a structure that contains three items, a 32-bit integer, a 32-bit float, and a string of length 10 or less.
但是,如果不传入dtype
,我们将得到2 x 3矩阵.
Without passing in dtype
, however, we will get a 2 by 3 matrix.
使用这种方法,我们可以让numpy
将高维数组作为正确设置dtype
的单个元素.
With this method, we would be able to let numpy
treat a higher dimensional array as an single element with properly set dtype
.
Joe展示的另一个技巧是,我们不需要真正形成新的numpy数组即可达到目的.我们可以使用view
函数(请参见
Another trick Joe showed is that we don't need to really form a new numpy array to achieve the purpose. We can use the view
function (See ndarray.view
) to change the way numpy
view data. There is a section of Note
section in ndarray.view
that I think you should take a look before utilizing the method. I have no guarantee that there would not be side effects. The paragraph below is from the note section and seems to call for caution.
对于a.view(some_dtype),如果some_dtype每个条目的字节数与上一个dtype不同(例如,,将常规数组转换为结构化数组),则不能仅从a的表面外观来预测视图的行为(以print(a)表示).这也完全取决于a在内存中的存储方式.因此,如果a是C顺序相对于fortran顺序,相对于定义为切片或转置等,则视图可能会给出不同的结果.
For a.view(some_dtype), if some_dtype has a different number of bytes per entry than the previous dtype (for example, converting a regular array to a structured array), then the behavior of the view cannot be predicted just from the superficial appearance of a (shown by print(a)). It also depends on exactly how a is stored in memory. Therefore if a is C-ordered versus fortran-ordered, versus defined as a slice or transpose, etc., the view may give different results.
其他参考
https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.dtypes.html https://docs.scipy.org/doc/numpy-1.13.0/reference/generation/numpy.dtype.html
这篇关于如何使Numpy将每一行/张量视为一个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!