numpy.loadtxt和制表符分隔的值:不理解数据类型 [英] numpy.loadtxt and tab separated values: data type not understood

查看:608
本文介绍了numpy.loadtxt和制表符分隔的值:不理解数据类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用numpy.loadtxt导入制表符分隔值时遇到问题.

I'm having problems importing tab separated values using numpy.loadtxt.

我需要导入的行具有以下形式:

The rows I need to import have the following form:

01-Aug-2013 1143_051-100    r   702 135 32  7   

我只想读取第0、2、3、4、5、6列.这是我到目前为止的内容:

I only want to read columns 0,2,3,4,5,6. This is what I have so far:

numpy.loadtxt(test,dtype= (str,str,int,int,int,int), delimiter= "\t", usecols = (0,2,3,4,5,6))

这将返回data type not understood.我在这里想念什么?

This returns data type not understood. What am I missing here?

推荐答案

为实现快速索引编制,NumPy依赖于每个具有固定宽度的dtype.因此,如果指定字符串dtype,则还必须指定字符串中的字节数.所以

To achieve fast indexing, NumPy relies on each dtype having a fixed width. So if you specify a string dtype, you also have to specify the number of bytes in the string. So

dtype = '|S11,|S1,<i4,<i4,<i4,<i4'

可以处理您发布的数据.

would work for the data you posted.

但是,当字符串的宽度可变时,使用np.genfromtxt代替np.loadtxt更容易,因为您可以指定dtype=None并让np.genfromtxt对每一列的dtype进行有根据的猜测.


However, it is easier to use np.genfromtxt instead of np.loadtxt when the strings have variable width, since you can specify dtype=None and let np.genfromtxt make an educated guess about the dtype of each column.

In [15]: np.genfromtxt('data', delimiter='\t', dtype=None, usecols=(0,2,3,4,5,6))
Out[15]: 
array(('01-Aug-2013', 'r', 702, 135, 32, 7), 
      dtype=[('f0', 'S11'), ('f1', 'S1'), ('f2', '<i4'), ('f3', '<i4'), ('f4', '<i4'), ('f5', '<i4')])

In [16]: np.loadtxt('data', delimiter='\t', dtype='|S11,|S1,<i4,<i4,<i4,<i4', usecols=(0,2,3,4,5,6))
Out[16]: 
array(('01-Aug-2013', 'r', 702, 135, 32, 7), 
      dtype=[('f0', 'S11'), ('f1', 'S1'), ('f2', '<i4'), ('f3', '<i4'), ('f4', '<i4'), ('f5', '<i4')])

这篇关于numpy.loadtxt和制表符分隔的值:不理解数据类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆