numpy的结构数组名称和指标 [英] Numpy Structured Arrays by Name AND Index

查看:236
本文介绍了numpy的结构数组名称和指标的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我似乎从来没有numpy的阵列来对我很好地工作。 (

I can never seem to get NumPy arrays to work nicely for me. :(

我的数据很简单:150行4彩车后面跟着一个字符串。我试过如下:

My dataset is simple: 150 rows of 4 floats followed by one string. I tried the following:

data = np.genfromtxt("iris.data2", delimiter=",", names=["SL", "SW", "PL", "PW", "class"], dtype=[float, float, float, float, '|S16'])

print(data.shape) ---> (150, 0)
print(data["PL"])
print(data[:, 0:3]) <---error

所以,我做一个简单的文件替换改变了它只有5浮动。我只能这样做,因为我无法获得非均匀阵列既列名和索引访问很好地工作。但现在,我已均质,它仍然给我回的形状(150,0)和一个错误。

So I changed it just 5 floats by doing a simple file replace. I only do this because I couldn't get the non-homogenous array to work nicely with both column name and index accessing. But now that I have made it homogenous, it still gives me back a shape of (150, 0) and an error.

data = np.genfromtxt("iris.data", delimiter=",", names=["SL", "SW", "PL", "PW", "class"])

print(data.shape) ---> (150, 0)
print(data["PL"])
print(data[:, 0:3]) <--- error

当我完全删除名称,它为索引列的存取权限,但显然不是名字了。

When I remove the names entirely, it works for index-column acces, but obviously not names anymore.

data = np.genfromtxt("iris.data", delimiter=",")

print(data.shape) ---> (150, 5)
# print(data["PL"])
print(data[:, 0:3]) ---> WORKS GREAT!!!

这是为什么?如何解决?理想情况下,我想没有一个引脚悬空code替换字符串既名称和索引列访问,但如果我需要为了得到名称和索引列访问我会做到这一点。

Why is this and how do I fix it? Ideally I would like both name and index column access without replacing the string with a float-code, but I will do it if I need to in order to get name and index column access.

推荐答案

有一个一维数组结构化的领域,二维数组的列之间有明显的区别。它们是不可互换。字段名是不是简单的列标签。如果说不清楚你很多需要阅读 DTYPE 结构阵列文档的更多细节。

There's a clear distinction between the fields of a 1d structured array, and the columns of a 2d array. They aren't interchangeable. Field names aren't simply column labels. If that isn't clear you many need to read the dtype or structured array docs in more detail.

定义一个伪文件:

In [93]: txt=b"""1,2,3,4,txt
   ....: 5,6,7,8,abc"""

In [94]: np.genfromtxt(txt.splitlines(),delimiter=',',dtype=None)
Out[94]: 
array([(1, 2, 3, 4, 'txt'), (5, 6, 7, 8, 'abc')], 
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<i4'), ('f4', 'S3')])

通过混合列的默认方式加载它是一个结构数组,2行(形状=(2,)),以及5个字段,索引为数据['F0'] 数据['F0','F2'] 。能力指数几个领域一次是有限的。

With mixed columns the default way to load it is a structured array, with 2 rows (shape=(2,)), and 5 fields, indexed as data['f0'] or data[['f0','f2']]. The ability to index several fields at once is limited.

但是,我们可以定义一个复合DTYPE,如:

But we can define a compound dtype, such as:

In [102]: dt=np.dtype([('data',float,(4,)),('lbl','|S5')])

In [103]: dt
Out[103]: dtype([('data', '<f8', (4,)), ('lbl', 'S5')])

In [104]: np.genfromtxt(txt.splitlines(),delimiter=',',dtype=dt)
Out[104]: 
array([([1.0, 2.0, 3.0, 4.0], 'txt'), ([5.0, 6.0, 7.0, 8.0], 'abc')], 
      dtype=[('data', '<f8', (4,)), ('lbl', 'S5')])

In [105]: data=np.genfromtxt(txt.splitlines(),delimiter=',',dtype=dt)

In [106]: data['data']
Out[106]: 
array([[ 1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.]])

In [107]: data['lbl']
Out[107]: 
array(['txt', 'abc'], 
      dtype='|S5')

In [108]: data[0]
Out[108]: ([1.0, 2.0, 3.0, 4.0], 'txt')

现在数据['数据'] 是一个二维数组,从原来的文本包含数值。

Now data['data'] is a 2d array, containing the numeric values from the original text.

字段名称可以牵强,因为一个元组:

The field names can be fetched as a tuple:

In [112]: data.dtype.names
Out[112]: ('data', 'lbl')

这样就可以对它们执行通常的列表/元组索引,甚至做一些令人费解的观看顺序相反的字段:

so it is possible to perform usual list/tuple indexing on them, and even do something a convoluted as viewing the fields in reverse order:

In [115]: data[list(data.dtype.names[::-1])]
Out[115]: 
array([('txt', [1.0, 2.0, 3.0, 4.0]), ('abc', [5.0, 6.0, 7.0, 8.0])], 
      dtype=[('lbl', 'S5'), ('data', '<f8', (4,))])

这篇关于numpy的结构数组名称和指标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆