切片存在于numpy数组中的元组中的列 [英] to slice columns in a tuple present in a numpy array

查看:91
本文介绍了切片存在于numpy数组中的元组中的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已将文本文件导入到numpy数组中,如下所示.

I have imported a text file into a numpy array as shown below.

data=np.genfromtxt(f,dtype=None,delimiter=',',names=None)

其中f包含我的csv文件的路径

where f contains the path of my csv file

现在数据包含以下内容.

now data contains the following.

array([(534, 116.48482, 39.89821, '2008-02-03 00:00:49'),
   (650, 116.4978, 39.98097, '2008-02-03 00:00:02'),
   (675, 116.31873, 39.9374, '2008-02-03 00:00:04'),
   (715, 116.70027, 40.16545, '2008-02-03 00:00:45'),
   (2884, 116.67727, 39.88201, '2008-02-03 00:00:48'),
   (3799, 116.29838, 40.04533, '2008-02-03 00:00:37'),
   (4549, 116.48405, 39.91403, '2008-02-03 00:00:42'),
   (4819, 116.42967, 39.93963, '2008-02-03 00:00:43')],
    dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('f3', 'S19')])

如果我现在尝试列切片,即使用

If i now try to column slice, ie extract the first or the second column using

data[:,0]

它说索引太多".我发现这是由于存储方式所致.所有行都存储为元组而不是列表/数组. 我想到了使用最丑陋"的方法来执行切片而不必使用迭代的想法.那将是将每行中的元组转换为list并将其放回numpy数组.像这样的东西

It says "too many indices". I figured out that it is due the the way it is being stored. all the rows are being stored as tuples and not as list/array. I thought of using the "ugliest" way possible to perform slicing without having to use iteration. That would be to convert the tuples in each row to list and put it back to the numpy array. something like this

data=np.asarray([list(i) for i in data])

但是对于上述问题,我正在丢失每一列的数据类型.每个元素将存储为字符串,而不是整数或浮点数,后者在前一种情况下会自动检测到.

现在,如果我想在不使用迭代的情况下对列进行切片,有什么办法吗?

But for the above problem, i am losing the datatypes of each column. Each element will be stored as a string rather than integer or float which was automatically detected in the former case.

Now if i want to slice the columns without having to use iteration is there any way?

推荐答案

np.genfromtext为您创建的不是元组数组,而元组应该具有object dtype,而是记录数组.您可以从奇怪的dtype中分辨出来:

What np.genfromtext has created for you is not an array of tuples, which would have had object dtype, but a record array. You can tell from the weird dtype:

dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('f3', 'S19')]

该列表中的每个元组都保存相应字段的名称,其dtype <i4是一个小端字节4字节整数,<f8是一个小端字节8字节浮点,而S19是19个字符长细绳.您可以按以下名称访问字段:

Each of the tuples in that list holds the name of the corresponding field, and its dtype, <i4 is a little endian 4 byte integer, <f8 a little endian 8 byte float and S19 a 19 character long string. You can access the fields by name as:

In [2]: x['f0']
Out[2]: array([ 534,  650,  675,  715, 2884, 3799, 4549, 4819])

In [3]: x['f1']
Out[3]: 
array([ 116.48482,  116.4978 ,  116.31873,  116.70027,  116.67727,
        116.29838,  116.48405,  116.42967])

这篇关于切片存在于numpy数组中的元组中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆