numpy.genfromtxt 生成看起来像元组的数组,而不是二维数组——为什么? [英] numpy.genfromtxt produces array of what looks like tuples, not a 2D array—why?
问题描述
我正在运行 genfromtxt
,如下所示:
date_conv = lambda x: str(x).replace(":", "/")time_conv = lambda x: str(x)a = np.genfromtxt(input.txt, delimiter=',', skip_header=4,usecols=[0, 1] + radii_indices, 转换器={0: date_conv, 1: time_conv})
input.txt
来自这个要点.
当我查看结果时,它是一维数组而不是二维数组:
<预><代码>>>>np.shape(a)(918,)它似乎是一个元组数组:
<预><代码>>>>[0]('06/03/2006' ,'08:27:23' ,6.4E-05,0.000336,0.001168,0.002716,0.004274,0.004658,0.003756,0.002697,0.002257,0.002566,0.003522,0.004471,0.00492,0.005602,0.006956,0.008442, 0.008784, 0.006976, 0.003917, 0.001494, 0.000379, 6.4e-05)如果我从 genfromtxt
调用中删除转换器规范,它可以正常工作并生成一个二维数组:
返回的内容称为 结构化 ndarray,参见例如这里:http://docs.scipy.org/doc/numpy/user/basics.rec.html.这是因为您的数据不是同质的,即并非所有元素都具有相同的类型:数据包含字符串(前两列)和浮点数.Numpy 数组必须是同构的(有关解释,请参阅此处).
结构化数组通过对每个记录或行使用元组来解决"这种同质性约束,这就是返回的数组是 1D 的原因:一系列元组,但每个元组(行)由多个字段组成,因此您可以将其视为行和列.不同的列可以作为 a['nameofcolumn']
访问,例如a['Julian_Day']
.
移除前两列的转换器时返回二维数组的原因是,在这种情况下,genfromtxt
会考虑所有相同类型的数据,并返回一个普通的ndarray(默认类型是 float,但您可以使用 dtype
参数指定它).
EDIT:如果要使用列名,可以使用 names
参数(并将 skip_header
设置为仅三):
a2 = np.genfromtxt("input.txt", delimiter=',', skip_header=3, names = True, dtype = None,usecols=[0, 1] + radii_indices, 转换器={0: date_conv, 1: time_conv})
你可以做的,例如:
<预><代码>>>>a2['Dateddmmyyyy']数组(['06/03/2006','06/03/2006','18/03/2006','19/03/2006',19/03/2006"、19/03/2006"、19/03/2006"、19/03/2006"、'19/03/2006','19/03/2006'],dtype='|S10')I'm running genfromtxt
like below:
date_conv = lambda x: str(x).replace(":", "/")
time_conv = lambda x: str(x)
a = np.genfromtxt(input.txt, delimiter=',', skip_header=4,
usecols=[0, 1] + radii_indices, converters={0: date_conv, 1: time_conv})
Where input.txt
is from this gist.
When I look at the results, it is a 1D array not a 2D array:
>>> np.shape(a)
(918,)
It seems to be an array of tuples instead:
>>> a[0]
('06/03/2006', '08:27:23', 6.4e-05, 0.000336, 0.001168, 0.002716, 0.004274, 0.004658, 0.003756, 0.002697, 0.002257, 0.002566, 0.003522, 0.004471, 0.00492, 0.005602, 0.006956, 0.008442, 0.008784, 0.006976, 0.003917, 0.001494, 0.000379, 6.4e-05)
If I remove the converters specification from the genfromtxt
call it works fine and produces a 2D array:
>>> np.shape(a)
(918, 24)
What is returned is called a structured ndarray, see e.g. here: http://docs.scipy.org/doc/numpy/user/basics.rec.html. This is because your data is not homogeneous, i.e. not all elements have the same type: the data contains both strings (the first two columns) and floats. Numpy arrays have to be homogeneous (see here for an explanation).
The structured array 'solves' this constraint of homogeneity by using tuples for each record or row, that's the reason the returned array is 1D: one series of tuples, but each tuple (row) consists of several fields, so you can regard it as rows and columns. The different columns are accessible as a['nameofcolumn']
e.g. a['Julian_Day']
.
The reason that it returns a 2D array when removing the converters for the first two columns is that in that case, genfromtxt
regards all data of the same type, and a normal ndarray is returned (the default type is float, but you can specify this with the dtype
argument).
EDIT: If you want to make use of the column names, you can use the names
argument (and set the skip_header
at only three):
a2 = np.genfromtxt("input.txt", delimiter=',', skip_header=3, names = True, dtype = None,
usecols=[0, 1] + radii_indices, converters={0: date_conv, 1: time_conv})
the you can do e.g.:
>>> a2['Dateddmmyyyy']
array(['06/03/2006', '06/03/2006', '18/03/2006', '19/03/2006',
'19/03/2006', '19/03/2006', '19/03/2006', '19/03/2006',
'19/03/2006', '19/03/2006'],
dtype='|S10')
这篇关于numpy.genfromtxt 生成看起来像元组的数组,而不是二维数组——为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!