将具有不同大小行的数据加载到 Numpy 数组中 [英] Load data with rows of different sizes into Numpy array
问题描述
假设我有一个包含如下数据的文本文件:
Suppose I have a text file that contains data like this:
1 2 3 4 5
6 7 8
9 10 11 12 13 14
15 16 17 18 19
如何将其加载到 numpy 数组中,使其看起来像这样?
How do I load it into a numpy array so it looks like this?
[1 2 3 4 5 0
6 7 8 0 0 0
9 10 11 12 13 14
15 16 17 18 19 0 ]
到目前为止我一直在使用的方法是逐行读取文本文件,将每一行附加到一个列表中,找到长度最大的行并相应地填充剩余的行.
The method I've been using so far involves reading the text file line by line, appending each row to a list, finding the row with the maximum length and padding the remaining rows accordingly.
谁能提出更有效的方法?
Could anyone suggest a more efficient way?
非常感谢!
推荐答案
可以通过多种方式填充列表列表,但是由于您已经从文件中读取了此内容,我认为 itertools.zip_longest
将是一个好的开始.
Padding a list of lists can be done in various ways, but since you are already reading this from a file, I think the itertools.zip_longest
will be a good start.
In [201]: txt = """1 2 3 4 5
...: 6 7 8
...: 9 10 11 12 13 14
...: 15 16 17 18 19"""
读取并解析文本行:
In [202]: alist = []
In [203]: for line in txt.splitlines():
...: alist.append([int(i) for i in line.split()])
...:
In [204]: alist
Out[204]: [[1, 2, 3, 4, 5], [6, 7, 8], [9, 10, 11, 12, 13, 14], [15, 16, 17, 18, 19]]
zip_longest
(此处为 PY3 形式)采用填充值:
zip_longest
(here in PY3 form) takes a fillvalue:
In [205]: from itertools import zip_longest
In [206]: list(zip_longest(*alist, fillvalue=0))
Out[206]:
[(1, 6, 9, 15),
(2, 7, 10, 16),
(3, 8, 11, 17),
(4, 0, 12, 18),
(5, 0, 13, 19),
(0, 0, 14, 0)]
In [207]: np.array(_).T
Out[207]:
array([[ 1, 2, 3, 4, 5, 0],
[ 6, 7, 8, 0, 0, 0],
[ 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 0]])
zip(*)
也可用于转置"列表列表:
zip(*)
can also be used to 'transpose' the list of lists:
In [209]: list(zip(*alist1))
Out[209]:
[(1, 2, 3, 4, 5, 0),
(6, 7, 8, 0, 0, 0),
(9, 10, 11, 12, 13, 14),
(15, 16, 17, 18, 19, 0)]
我猜你正在做的事情更像是:
I'm guessing you are doing something more like:
In [211]: maxlen = max([len(i) for i in alist])
In [212]: maxlen
Out[212]: 6
In [213]: arr = np.zeros((len(alist), maxlen),int)
In [214]: for row, line in zip(arr, alist):
...: row[:len(line)] = line
...:
In [215]: arr
Out[215]:
array([[ 1, 2, 3, 4, 5, 0],
[ 6, 7, 8, 0, 0, 0],
[ 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 0]])
这对我来说看起来不错.
Which looks pretty good to me.
一个普通的发帖人 Divakar 喜欢发布一个使用 cumsum
的解决方案.看看我能不能重现它.它涉及构建一个非零值应该去的一维掩码.向后工作我们需要一个面具,如:
A regular poster, Divakar, likes to post a solution that uses cumsum
. Let's see if I can reproduce it. It involves constructing a 1d mask where the nonzero values are supposed to go. Working backwards we need a mask like:
In [240]: mask=arr.ravel()>0
In [241]: mask
Out[241]:
array([ True, True, True, True, True, False, True, True, True,
False, False, False, True, True, True, True, True, True,
True, True, True, True, True, False], dtype=bool)
所以:
In [242]: arr.flat[mask] = np.hstack(alist)
这个映射有一个我还没有完全内化的技巧!
There's a trick to this mapping that I haven't quite internalized!
诀窍是针对 [0,1,2,3,4,5]
广播长度:
The trick is the broadcast the lengths against [0,1,2,3,4,5]
:
In [276]: lens=[len(i) for i in alist]
In [277]: maxlen=max(lens)
In [278]: mask=np.array(lens)[:,None]>np.arange(maxlen)
In [279]: mask
Out[279]:
array([[ True, True, True, True, True, False],
[ True, True, True, False, False, False],
[ True, True, True, True, True, True],
[ True, True, True, True, True, False]], dtype=bool)
In [280]: arr = np.zeros((len(alist), maxlen),int)
In [281]: arr[mask] = np.hstack(alist)
In [282]: arr
Out[282]:
array([[ 1, 2, 3, 4, 5, 0],
[ 6, 7, 8, 0, 0, 0],
[ 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 0]])
这篇关于将具有不同大小行的数据加载到 Numpy 数组中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!