numpy loadtxt:ValueError:错误的列数 [英] Numpy loadtxt: ValueError: Wrong number of columns
问题描述
文件TEST.txt的结构如下:
Having the file TEST.txt structured as following:
a 45
b 45 55
c 66
当我尝试打开它时:
import numpy as np
a= np.loadtxt(r'TEST.txt',delimiter='\t',dtype=str)
我遇到以下错误:
ValueError:第2行的列数错误
ValueError: Wrong number of columns at line 2
这显然是由于第二行包含三列而不是两列,但是我无法使用文档找到解决问题的答案.
It's clearly due to the fact that the second line has three columns instead of two, but I can't find an answer to my problem using the documentation.
反正我可以解决这个问题,将所有数据保存到一个数组中吗?
Is there anyway I can fix it keeping all the data into an array?
在Matlab中,我可以执行以下操作:
In Matlab I can do something like:
a=textscan(fopen('TEST.txt'),'%s%s%s');
Python中类似的东西会被淘汰.
Something similar in Python would be apreciated.
推荐答案
尝试np.genfromtxt
.它处理缺失值; loadtxt
否.比较他们的文档.
Try np.genfromtxt
. It handles missing values; loadtxt
does not. Compare their docs.
当分隔符为空格时,缺少值可能会比较棘手,但是使用制表符应该可以.如果仍然有问题,请使用,
分隔符对其进行测试.
Missing values can be tricky when the delimiter is white space, but with tabs it should be ok. If there still are problems, test it with a ,
delimiter.
糟糕-您仍然需要额外的定界符
oops - you still need the extra delimiter
例如
a, 34,
b, 43, 34
c, 34
loadtxt
和genfromtxt
都接受任何逐行传递txt的可迭代对象.因此,简单的操作是readlines
,调整缺少值和定界符的行,并将该行列表传递给加载程序.或者,您可以将其写为过滤器"或生成器.在以前的许多SO问题中已经描述了这种方法.
Both loadtxt
and genfromtxt
accept any iterable that delivers the txt line by line. So a simple thing is to readlines
, tweak the lines that have missing values and delimiters, and pass that list of lines to the loader. Or you can write this a 'filter' or generator. This approach has been described in a number of previous SO questions.
In [36]: txt=b"""a\t45\t\nb\t45\t55\nc\t66\t""".splitlines()
In [37]: txt
Out[37]: [b'a\t45\t', b'b\t45\t55', b'c\t66\t']
In [38]: np.genfromtxt(txt,delimiter='\t',dtype=str)
Out[38]:
array([['a', '45', ''],
['b', '45', '55'],
['c', '66', '']],
dtype='<U2')
我正在使用Python3,因此字节字符串标有'b'(对于我和我的孩子).
I'm using Python3 so the byte strings are marked with a 'b' (for baby and me).
对于字符串,这是过大的;但是genfromtxt
使得为每列构造具有不同dtypes的结构化数组变得容易.请注意,这样的数组是1d,具有命名字段-未编号的列.
For strings, this is overkill; but genfromtxt
makes it easy to construct a structured array with different dtypes for each column. Note that such array is 1d, with named fields - not numbered columns.
In [50]: np.genfromtxt(txt,delimiter='\t',dtype=None)
Out[50]:
array([(b'a', 45, -1), (b'b', 45, 55), (b'c', 66, -1)],
dtype=[('f0', 'S1'), ('f1', '<i4'), ('f2', '<i4')])
我可以定义以下函数来填充行:
to pad the lines I could define a function like:
def foo(astr,delimiter=b',',cnt=3,fill=b' '):
c = astr.strip().split(delimiter)
c.extend([fill]*cnt)
return delimiter.join(c[:cnt])
并将其用作:
In [85]: txt=b"""a\t45\nb\t45\t55\nc\t66""".splitlines()
In [87]: txt1=[foo(txt[0],b'\t',3,b'0') for t in txt]
In [88]: txt1
Out[88]: [b'a\t45\t0', b'a\t45\t0', b'a\t45\t0']
In [89]: np.genfromtxt(txt1,delimiter='\t',dtype=None)
Out[89]:
array([(b'a', 45, 0), (b'a', 45, 0), (b'a', 45, 0)],
dtype=[('f0', 'S1'), ('f1', '<i4'), ('f2', '<i4')])
这篇关于numpy loadtxt:ValueError:错误的列数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!