读取使用numpy的loadtxt功能从文本文件中值 [英] read values from a text file using numpy loadtxt function

查看:3199
本文介绍了读取使用numpy的loadtxt功能从文本文件中值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这种形式的文件:

label1, value1, value2, value3,
label2, value1, value2, value3,
...

我想用numpy的loadtxt功能,这样我就可以有它的数组中的值每个标签,所以最后的结果将是一个数组的数组来阅读,其中每个阵列包括标签和功能像这样的数组

I want to read it using numpy loadtxt function so I can have each label with its values in an array, so the final result will be an array of arrays, each array of them include the label and an array of features like this:

array([[label1, [value1, value2, value3]],
       [label2, [value1, value2, value3]]])

我曾尝试以下,但没有工作:

I have tried the following but did not work:

c = StringIO(u"text.txt")
np.loadtxt(c,
   dtype={'samples': ('label', 'features'), 'formats': ('s9',np.float)},
   delimiter=',', skiprows=0)

任何想法?

推荐答案

您是与定义DTYPE右道。你只是缺少场形状。

You are on the right tract with defining the dtype. You are just missing the field shape.

我将演示:

一个文本文件 - 行的列表(在PY3字节):

A 'text' file - a list of lines (bytes in Py3):

In [95]: txt=b"""label1, 12, 23.2, 232
   ....: label2, 23, 2324, 324
   ....: label3, 34, 123, 2141
   ....: label4, 0, 2, 3
   ....: """

In [96]: txt=txt.splitlines()

A DTYPE 2个字段,一个用绳子,其他与(对'场形状'3)花车:

A dtype with 2 fields, one with strings, the other with floats (3 for 'field shape'):

In [98]: dt=np.dtype([('label','U10'),('values', 'float',(3))])

In [99]: data=np.genfromtxt(txt,delimiter=',',dtype=dt)

In [100]: data
Out[100]: 
array([('label1', [12.0, 23.2, 232.0]), ('label2', [23.0, 2324.0, 324.0]),
       ('label3', [34.0, 123.0, 2141.0]), ('label4', [0.0, 2.0, 3.0])], 
      dtype=[('label', '<U10'), ('values', '<f8', (3,))])

In [101]: data['label']
Out[101]: 
array(['label1', 'label2', 'label3', 'label4'], 
      dtype='<U10')

In [103]: data['values']
Out[103]: 
array([[  1.20000000e+01,   2.32000000e+01,   2.32000000e+02],
       [  2.30000000e+01,   2.32400000e+03,   3.24000000e+02],
       [  3.40000000e+01,   1.23000000e+02,   2.14100000e+03],
       [  0.00000000e+00,   2.00000000e+00,   3.00000000e+00]])

使用该定义的数值可以作为一个二维数组来访问。子阵列像这样的AP下的preciated。

With this definition the numeric values can be accessed as a 2d array. Sub-arrays like this are under appreciated.

DTYPE 可与字典语法来指定,但我更熟悉的元组形式的列表中。

The dtype could be been specified with the dictionary syntax, but I'm more familiar with the list of tuples form.

等同DTYPE规格:

np.dtype("U10, (3,)f")
np.dtype({'names':['label','values'], 'formats':['S10','(3,)f']})
np.genfromtxt(txt,delimiter=',',dtype='S10,(3,)f')

===============================

===============================

我认为这TXT,如果解析与 DTYPE =无将产生

I think that this txt, if parsed with dtype=None would produce

In [30]: y
Out[30]: 
array([('label1', 12.0, 23.2, 232.0), ('label2', 23.0, 2324.0, 324.0),
       ('label3', 34.0, 123.0, 2141.0), ('label4', 0.0, 2.0, 3.0)], 
      dtype=[('f0', '<U10'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8')])

的可转换为子场的形式与

The could be converted to the subfield form with

y.view(dt)

这只要基础数据重新presentation(视为字节的平面列表)的工作原理是兼容的(这里10 UNI code字符(40字节)和3辆彩车,每条记录)。

This works as long as the underlying data representation (seen as a flat list of bytes) is compatible (here 10 unicode characters (40 bytes), and 3 floats, per record).

这篇关于读取使用numpy的loadtxt功能从文本文件中值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆