numpy loadtxt:ValueError:错误的列数 [英] Numpy loadtxt: ValueError: Wrong number of columns

查看:1030
本文介绍了numpy loadtxt:ValueError:错误的列数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

文件TEST.txt的结构如下:

Having the file TEST.txt structured as following:

a   45
b   45  55
c   66

当我尝试打开它时:

import numpy as np
a= np.loadtxt(r'TEST.txt',delimiter='\t',dtype=str)

我遇到以下错误:

ValueError:第2行的列数错误

ValueError: Wrong number of columns at line 2

这显然是由于第二行包含三列而不是两列,但是我无法使用文档找到解决问题的答案.

It's clearly due to the fact that the second line has three columns instead of two, but I can't find an answer to my problem using the documentation.

反正我可以解决这个问题,将所有数据保存到一个数组中吗?

Is there anyway I can fix it keeping all the data into an array?

在Matlab中,我可以执行以下操作:

In Matlab I can do something like:

a=textscan(fopen('TEST.txt'),'%s%s%s');

Python中类似的东西会被淘汰.

Something similar in Python would be apreciated.

推荐答案

尝试np.genfromtxt.它处理缺失值; loadtxt否.比较他们的文档.

Try np.genfromtxt. It handles missing values; loadtxt does not. Compare their docs.

当分隔符为空格时,缺少值可能会比较棘手,但是使用制表符应该可以.如果仍然有问题,请使用,分隔符对其进行测试.

Missing values can be tricky when the delimiter is white space, but with tabs it should be ok. If there still are problems, test it with a , delimiter.

糟糕-您仍然需要额外的定界符

oops - you still need the extra delimiter

例如

a, 34, 
b, 43, 34
c, 34

loadtxtgenfromtxt都接受任何逐行传递txt的可迭代对象.因此,简单的操作是readlines,调整缺少值和定界符的行,并将该行列表传递给加载程序.或者,您可以将其写为过滤器"或生成器.在以前的许多SO问题中已经描述了这种方法.

Both loadtxt and genfromtxt accept any iterable that delivers the txt line by line. So a simple thing is to readlines, tweak the lines that have missing values and delimiters, and pass that list of lines to the loader. Or you can write this a 'filter' or generator. This approach has been described in a number of previous SO questions.

In [36]: txt=b"""a\t45\t\nb\t45\t55\nc\t66\t""".splitlines()
In [37]: txt
Out[37]: [b'a\t45\t', b'b\t45\t55', b'c\t66\t']
In [38]: np.genfromtxt(txt,delimiter='\t',dtype=str)
Out[38]: 
array([['a', '45', ''],
       ['b', '45', '55'],
       ['c', '66', '']], 
      dtype='<U2')

我正在使用Python3,因此字节字符串标有'b'(对于我和我的孩子).

I'm using Python3 so the byte strings are marked with a 'b' (for baby and me).

对于字符串,这是过大的;但是genfromtxt使得为每列构造具有不同dtypes的结构化数组变得容易.请注意,这样的数组是1d,具有命名字段-未编号的列.

For strings, this is overkill; but genfromtxt makes it easy to construct a structured array with different dtypes for each column. Note that such array is 1d, with named fields - not numbered columns.

In [50]: np.genfromtxt(txt,delimiter='\t',dtype=None)
Out[50]: 
array([(b'a', 45, -1), (b'b', 45, 55), (b'c', 66, -1)], 
      dtype=[('f0', 'S1'), ('f1', '<i4'), ('f2', '<i4')])

我可以定义以下函数来填充行:

to pad the lines I could define a function like:

def foo(astr,delimiter=b',',cnt=3,fill=b' '):
    c = astr.strip().split(delimiter)
    c.extend([fill]*cnt)
    return delimiter.join(c[:cnt])

并将其用作:

In [85]: txt=b"""a\t45\nb\t45\t55\nc\t66""".splitlines()

In [87]: txt1=[foo(txt[0],b'\t',3,b'0') for t in txt]
In [88]: txt1
Out[88]: [b'a\t45\t0', b'a\t45\t0', b'a\t45\t0']
In [89]: np.genfromtxt(txt1,delimiter='\t',dtype=None)
Out[89]: 
array([(b'a', 45, 0), (b'a', 45, 0), (b'a', 45, 0)], 
      dtype=[('f0', 'S1'), ('f1', '<i4'), ('f2', '<i4')])

这篇关于numpy loadtxt:ValueError:错误的列数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆