numpy.genfromtxt 生成看起来像元组的数组,而不是二维数组——为什么? [英] numpy.genfromtxt produces array of what looks like tuples, not a 2D array—why?

查看:45
本文介绍了numpy.genfromtxt 生成看起来像元组的数组,而不是二维数组——为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行 genfromtxt,如下所示:

date_conv = lambda x: str(x).replace(":", "/")time_conv = lambda x: str(x)a = np.genfromtxt(input.txt, delimiter=',', skip_header=4,usecols=[0, 1] + radii_indices, 转换器={0: date_conv, 1: time_conv})

input.txt 来自这个要点.

当我查看结果时,它是一维数组而不是二维数组:

<预><代码>>>>np.shape(a)(918,)

它似乎是一个元组数组:

<预><代码>>>>[0]('06/03/2006' ,'08:27:23' ,6.4E-05,0.000336,0.001168,0.002716,0.004274,0.004658,0.003756,0.002697,0.002257,0.002566,0.003522,0.004471,0.00492,0.005602,0.006956,0.008442, 0.008784, 0.006976, 0.003917, 0.001494, 0.000379, 6.4e-05)

如果我从 genfromtxt 调用中删除转换器规范,它可以正常工作并生成一个二维数组:

<预><代码>>>>np.shape(a)(918, 24)

解决方案

返回的内容称为 结构化 ndarray,参见例如这里:http://docs.scipy.org/doc/numpy/user/basics.rec.html.这是因为您的数据不是同质的,即并非所有元素都具有相同的类型:数据包含字符串(前两列)和浮点数.Numpy 数组必须是同构的(有关解释,请参阅此处).

结构化数组通过对每个记录或行使用元组来解决"这种同质性约束,这就是返回的数组是 1D 的原因:一系列元组,但每个元组(行)由多个字段组成,因此您可以将其视为行和列.不同的列可以作为 a['nameofcolumn'] 访问,例如a['Julian_Day'].

移除前两列的转换器时返回二维数组的原因是,在这种情况下,genfromtxt 会考虑所有相同类型的数据,并返回一个普通的ndarray(默认类型是 float,但您可以使用 dtype 参数指定它).

EDIT:如果要使用列名,可以使用 names 参数(并将 skip_header 设置为仅三):

a2 = np.genfromtxt("input.txt", delimiter=',', skip_header=3, names = True, dtype = None,usecols=[0, 1] + radii_indices, 转换器={0: date_conv, 1: time_conv})

你可以做的,例如:

<预><代码>>>>a2['Dateddmmyyyy']数组(['06/03/2006','06/03/2006','18/03/2006','19/03/2006',19/03/2006"、19/03/2006"、19/03/2006"、19/03/2006"、'19/03/2006','19/03/2006'],dtype='|S10')

I'm running genfromtxt like below:

date_conv = lambda x: str(x).replace(":", "/")
time_conv = lambda x: str(x)

a = np.genfromtxt(input.txt, delimiter=',', skip_header=4,
      usecols=[0, 1] + radii_indices, converters={0: date_conv, 1: time_conv})

Where input.txt is from this gist.

When I look at the results, it is a 1D array not a 2D array:

>>> np.shape(a)
(918,)

It seems to be an array of tuples instead:

>>> a[0]
('06/03/2006', '08:27:23', 6.4e-05, 0.000336, 0.001168, 0.002716, 0.004274, 0.004658, 0.003756, 0.002697, 0.002257, 0.002566, 0.003522, 0.004471, 0.00492, 0.005602, 0.006956, 0.008442, 0.008784, 0.006976, 0.003917, 0.001494, 0.000379, 6.4e-05)

If I remove the converters specification from the genfromtxt call it works fine and produces a 2D array:

>>> np.shape(a)
(918, 24)

解决方案

What is returned is called a structured ndarray, see e.g. here: http://docs.scipy.org/doc/numpy/user/basics.rec.html. This is because your data is not homogeneous, i.e. not all elements have the same type: the data contains both strings (the first two columns) and floats. Numpy arrays have to be homogeneous (see here for an explanation).

The structured array 'solves' this constraint of homogeneity by using tuples for each record or row, that's the reason the returned array is 1D: one series of tuples, but each tuple (row) consists of several fields, so you can regard it as rows and columns. The different columns are accessible as a['nameofcolumn'] e.g. a['Julian_Day'].

The reason that it returns a 2D array when removing the converters for the first two columns is that in that case, genfromtxt regards all data of the same type, and a normal ndarray is returned (the default type is float, but you can specify this with the dtype argument).

EDIT: If you want to make use of the column names, you can use the names argument (and set the skip_header at only three):

a2 = np.genfromtxt("input.txt", delimiter=',', skip_header=3, names = True, dtype = None,
                  usecols=[0, 1] + radii_indices, converters={0: date_conv, 1: time_conv})

the you can do e.g.:

>>> a2['Dateddmmyyyy']
array(['06/03/2006', '06/03/2006', '18/03/2006', '19/03/2006',
       '19/03/2006', '19/03/2006', '19/03/2006', '19/03/2006',
       '19/03/2006', '19/03/2006'], 
      dtype='|S10')

这篇关于numpy.genfromtxt 生成看起来像元组的数组,而不是二维数组——为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆