将元组列表转换为数组时,如何阻止元组创建第3维? [英] When turning a list of lists of tuples to an array, how can I stop tuples from creating a 3rd dimension?
问题描述
我有一个元组(每个相同长度的元组2)的列表(每个相同长度的子列表).每个子列表代表一个句子,元组是该句子的双字母组.
I have a list of lists (each sublist of the same length) of tuples (each tuple of the same length, 2). Each sublist represents a sentence, and the tuples are bigrams of that sentence.
当使用np.asarray
将其转换为数组时,python似乎在解释元组,因为我要求创建第3维.
When using np.asarray
to turn this into an array, python seems to interpret the tuples as me asking for a 3rd dimension to be created.
完整的工作代码在这里:
Full working code here:
import numpy as np
from nltk import bigrams
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
bi_grams = []
for sent in arr:
bi_grams.append(list(bigrams(sent)))
bi_grams = np.asarray(bi_grams)
print(bi_grams)
因此,在将bi_grams
转换为数组之前,它是这样的:[[(1, 2), (2, 3)], [(4, 5), (5, 6)], [(7, 8), (8, 9)]]
So before turning bi_grams
to an array it looks like this: [[(1, 2), (2, 3)], [(4, 5), (5, 6)], [(7, 8), (8, 9)]]
以上代码的输出:
array([[[1, 2],
[2, 3]],
[[4, 5],
[5, 6]],
[[7, 8],
[8, 9]]])
以这种方式将列表列表转换为数组通常很好,并创建了一个2D数组,但是似乎python将元组解释为一个附加维,因此输出实际上是(3, 2, 2)
的形状我想要并且一直期望形状为(3, 2)
.
Converting a list of lists to an array in this way is normally fine, and creates a 2D array, but it seems that python interprets the tuples as an added dimension, so the output is of shape (3, 2, 2)
, when in fact I want, and was expecting, a shape of (3, 2)
.
我想要的输出是:
array([[(1, 2), (2, 3)],
[(4, 5), (5, 6)],
[(7, 8), (8, 9)]])
形状为(3, 2)
的
.
which is of shape (3, 2)
.
为什么会这样?如何获得所需形状/形状的数组?
Why does this happen? How can I achieve the array in the form/shape that I want?
推荐答案
对于np.array
,元组列表的列表与列表列表的列表没有什么不同.从头到尾都是可迭代的. np.array
尝试创建尽可能高的尺寸数组.在这种情况下是3d.
To np.array
, your list of lists of tuples isn't any different from a list of lists of lists. It's iterables all the way down. np.array
tries to create as high a dimensional array as possible. In this case that is 3d.
有一些方法可以一步一步地制作一个包含对象的2d数组,其中这些对象是元组之类的东西.但是,正如评论中指出的那样,您为什么要这么做?
There are ways of side stepping that and making a 2d array that contains objects, where those objects are things like tuples. But as noted in the comments, why would you want that?
在最近的问题中,我想到了这个问题将nd数组转换为(nm)-d形状的对象数组的方法:
In a recent SO question, I came up with this way of turning a n-d array into an object array of (n-m)-d shape:
In [267]: res = np.empty((3,2),object)
In [268]: arr = np.array(alist)
In [269]: for ij in np.ndindex(res.shape):
...: res[ij] = arr[ij]
...:
In [270]: res
Out[270]:
array([[array([1, 2]), array([2, 3])],
[array([4, 5]), array([5, 6])],
[array([7, 8]), array([8, 9])]], dtype=object)
但这是数组的二维数组,而不是元组.
But that's a 2d array of arrays, not of tuples.
In [271]: for ij in np.ndindex(res.shape):
...: res[ij] = tuple(arr[ij].tolist())
...:
...:
In [272]: res
Out[272]:
array([[(1, 2), (2, 3)],
[(4, 5), (5, 6)],
[(7, 8), (8, 9)]], dtype=object)
那更好(或者是?)
或者我可以直接索引嵌套列表:
Or I could index the nested list directly:
In [274]: for i,j in np.ndindex(res.shape):
...: res[i,j] = alist[i][j]
...:
In [275]: res
Out[275]:
array([[(1, 2), (2, 3)],
[(4, 5), (5, 6)],
[(7, 8), (8, 9)]], dtype=object)
我正在使用ndindex
生成(3,2)数组的所有索引.
I'm using ndindex
to generate the all the indices of a (3,2) array.
注释中提到的结构化数组起作用是因为对于复合dtype,元组与列表不同.
The structured array mentioned in the comments works because for a compound dtype, tuples are distinct from lists.
In [277]: np.array(alist, 'i,i')
Out[277]:
array([[(1, 2), (2, 3)],
[(4, 5), (5, 6)],
[(7, 8), (8, 9)]], dtype=[('f0', '<i4'), ('f1', '<i4')])
但是,从技术上讲,这不是元组数组.它只是将数组的元素(或记录)表示为元组.
Technically, though, that isn't an array of tuples. It just represents the elements (or records) of the array as tuples.
在对象dtype数组中,数组的元素是指向列表中元组的指针(至少在Out[275]
情况下).在结构化数组的情况下,数字以与3d数组相同的方式存储为数组数据缓冲区中的字节.
In the object dtype array, the elements of the array are pointers to the tuples in the list (at least in the Out[275]
case). In the structured array case the numbers are stored in the same as with a 3d array, as bytes in the array data buffer.
这篇关于将元组列表转换为数组时,如何阻止元组创建第3维?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!