numpy.loadtxt和制表符分隔的值:不理解数据类型 [英] numpy.loadtxt and tab separated values: data type not understood
问题描述
我在使用numpy.loadtxt
导入制表符分隔值时遇到问题.
I'm having problems importing tab separated values using numpy.loadtxt
.
我需要导入的行具有以下形式:
The rows I need to import have the following form:
01-Aug-2013 1143_051-100 r 702 135 32 7
我只想读取第0、2、3、4、5、6列.这是我到目前为止的内容:
I only want to read columns 0,2,3,4,5,6. This is what I have so far:
numpy.loadtxt(test,dtype= (str,str,int,int,int,int), delimiter= "\t", usecols = (0,2,3,4,5,6))
这将返回data type not understood
.我在这里想念什么?
This returns data type not understood
. What am I missing here?
推荐答案
为实现快速索引编制,NumPy依赖于每个具有固定宽度的dtype.因此,如果指定字符串dtype,则还必须指定字符串中的字节数.所以
To achieve fast indexing, NumPy relies on each dtype having a fixed width. So if you specify a string dtype, you also have to specify the number of bytes in the string. So
dtype = '|S11,|S1,<i4,<i4,<i4,<i4'
可以处理您发布的数据.
would work for the data you posted.
但是,当字符串的宽度可变时,使用np.genfromtxt
代替np.loadtxt
更容易,因为您可以指定dtype=None
并让np.genfromtxt
对每一列的dtype进行有根据的猜测.>
However, it is easier to use np.genfromtxt
instead of np.loadtxt
when the strings have variable width, since you can specify dtype=None
and let np.genfromtxt
make an educated guess about the dtype of each column.
In [15]: np.genfromtxt('data', delimiter='\t', dtype=None, usecols=(0,2,3,4,5,6))
Out[15]:
array(('01-Aug-2013', 'r', 702, 135, 32, 7),
dtype=[('f0', 'S11'), ('f1', 'S1'), ('f2', '<i4'), ('f3', '<i4'), ('f4', '<i4'), ('f5', '<i4')])
或
In [16]: np.loadtxt('data', delimiter='\t', dtype='|S11,|S1,<i4,<i4,<i4,<i4', usecols=(0,2,3,4,5,6))
Out[16]:
array(('01-Aug-2013', 'r', 702, 135, 32, 7),
dtype=[('f0', 'S11'), ('f1', 'S1'), ('f2', '<i4'), ('f3', '<i4'), ('f4', '<i4'), ('f5', '<i4')])
这篇关于numpy.loadtxt和制表符分隔的值:不理解数据类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!