Numpy的genfromtxt根据dtype参数返回不同的结构化数据 [英] Numpy's genfromtxt returns different structured data depending on dtype parameters
问题描述
我有以下内容:
from numpy import genfromtxt
seg_data1 = genfromtxt('./datasets/segmentation.all', delimiter=',', dtype="|S5")
seg_data2 = genfromtxt('./datasets/segmentation.all', delimiter=',', dtype=["|S5"] + ["float" for n in range(19)])
print seg_data1
print seg_data2
print seg_data1[:,0:1]
print seg_data2[:,0:1]
事实证明,seg_data1
和seg_data2
是不同类型的结构.这是打印的内容:
it turns out that seg_data1
and seg_data2
are not the same kind of structure. Here's what printed:
[['BRICK' '140.0' '125.0' ..., '7.777' '0.545' '-1.12']
['BRICK' '188.0' '133.0' ..., '8.444' '0.538' '-0.92']
['BRICK' '105.0' '139.0' ..., '7.555' '0.532' '-0.96']
...,
['CEMEN' '128.0' '161.0' ..., '10.88' '0.540' '-1.99']
['CEMEN' '150.0' '158.0' ..., '12.22' '0.503' '-1.94']
['CEMEN' '124.0' '162.0' ..., '14.55' '0.479' '-2.02']]
[ ('BRICK', 140.0, 125.0, 9.0, 0.0, 0.0, 0.2777779, 0.06296301, 0.66666675, 0.31111118, 6.185185, 7.3333335, 7.6666665, 3.5555556, 3.4444444, 4.4444447, -7.888889, 7.7777777, 0.5456349, -1.1218182)
('BRICK', 188.0, 133.0, 9.0, 0.0, 0.0, 0.33333334, 0.26666674, 0.5, 0.077777736, 6.6666665, 8.333334, 7.7777777, 3.8888888, 5.0, 3.3333333, -8.333333, 8.444445, 0.53858024, -0.92481726)
('BRICK', 105.0, 139.0, 9.0, 0.0, 0.0, 0.27777782, 0.107407436, 0.83333325, 0.52222216, 6.111111, 7.5555553, 7.2222223, 3.5555556, 4.3333335, 3.3333333, -7.6666665, 7.5555553, 0.5326279, -0.96594584)
...,
('CEMEN', 128.0, 161.0, 9.0, 0.0, 0.0, 0.55555534, 0.25185192, 0.77777785, 0.16296278, 7.148148, 5.5555553, 10.888889, 5.0, -4.7777777, 11.222222, -6.4444447, 10.888889, 0.5409177, -1.9963073)
('CEMEN', 150.0, 158.0, 9.0, 0.0, 0.0, 2.166667, 1.6333338, 1.388889, 0.41851807, 8.444445, 7.0, 12.222222, 6.111111, -4.3333335, 11.333333, -7.0, 12.222222, 0.50308645, -1.9434487)
('CEMEN', 124.0, 162.0, 9.0, 0.11111111, 0.0, 1.3888888, 1.1296295, 2.0, 0.8888891, 10.037037, 8.0, 14.555555, 7.5555553, -6.111111, 13.555555, -7.4444447, 14.555555, 0.4799313, -2.0293121)]
[['BRICK']
['BRICK']
['BRICK']
...,
['CEMEN']
['CEMEN']
['CEMEN']]
Traceback (most recent call last):
File "segmentationdata.py", line 14, in <module>
print seg_data2[:,0:1]
IndexError: too many indices for array
我宁愿让genfromtxt
以seg_data1
的形式返回数据,尽管我不知道有任何强制seg_data2
符合该类型的内置方法.据我所知,没有简单的方法可以做到:
I'd rather have genfromtxt
return data in the form of seg_data1
, though I don't know of any built-in way to force seg_data2
to conform to that type. As far as I know there's no easy way to do:
seg_target1 = seg_data1[:,0:1]
seg_data1 = seg_data1[:,1:]
.现在我可以做 for 使用 With 使用 With 此数组的元素或记录显示为元组,并且包含字符串和19个浮点数: A element, or record, of this array is displayed as a tuple, and includes a string and 19 floats: #初始字符列 指定 Specifying 也可以用两个字段指定一个 It is also possible to specify a 我认为您阅读了很多 I think you read enough of ================= ================= 使用文本和数字导入csv的示例: Example of importing csv with text and numbers: 默认:所有浮动 自动dtype选择-4个字段 automatic dtype selection - 4 fields 用户指定的字段dtypes user specified field dtypes 复合dtype,其中数字字段的列数(以及对字符串列的更正) Compound dtype, with column count for the numeric field (and correction to string column) 如果您需要在数字字段之间进行数学运算,则最后一种情况(或更详细的情况)可能最方便. If you need to do math across the numeric fields, this last case (or something more elaborate) might be most convenient. 要生成更复杂的内容,最好在单独的表达式中开发 To generate something more complicated it may be best to develop the 这篇关于Numpy的genfromtxt根据dtype参数返回不同的结构化数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!data.astype(float)
了,但重点是,当我给它dtype
数组时,不是genfromtxt
应该做的吗?seg_data2
. Now I could do data.astype(float)
but the point is, isn't that what genfromtxt
should have done to begin with when I gave it that dtype
array?推荐答案
dtype="|S5"
可以将所有列导入为字符串(5个字符).结果是一个二维数组,其中包含类似dtype="|S5"
you import all columns as strings (5 char). The result is a 2d array with rows like['BRICK' '140.0' '125.0' ..., '7.777' '0.545' '-1.12']
dtype=["|S5"] + ["float" for n in range(19)]
可以为每列指定dtype,结果是结构化数组.它是20个字段的1d.您可以按名称(请参见set_data2.dtype
)而不是按列号访问这些字段.dtype=["|S5"] + ["float" for n in range(19)]
you specify the dtype for each column, the result is a structured array. It is 1d with 20 fields. You access the fields by name (look at set_data2.dtype
), not by column number.('BRICK', 140.0, 125.0, 9.0, 0.0, 0.0, 0.2777779, 0.06296301, 0.66666675, 0.31111118, 6.185185, 7.3333335, 7.6666665, 3.5555556, 3.4444444, 4.4444447, -7.888889, 7.7777777, 0.5456349, -1.1218182)
print set_data2['f0']
dtype=None
应该产生相同的结果,可能带有一些整数列而不是所有浮点数.dtype=None
should produce the same thing, possibly with some integer columns instead of all floats.dtype
,其中一个为字符串列,另一个为19个浮点数.我必须检查文档并运行一些测试用例才能确定格式.dtype
with 2 fields, one the string column, and the other the 19 floats. I'd have to check the docs and run a few test cases to be sure of the format.genfromtxt
文档,以了解可以指定复合dtype,但不足以理解结果. genfromtxt
docs to see that you could specify a compound dtype, but not enough to understand the results. In [139]: txt=b"""one 1 2 3
...: two 4 5 6
...: """
In [140]: np.genfromtxt(txt.splitlines())
Out[140]:
array([[ nan, 1., 2., 3.],
[ nan, 4., 5., 6.]])
In [141]: np.genfromtxt(txt.splitlines(),dtype=None)
Out[141]:
array([(b'one', 1, 2, 3), (b'two', 4, 5, 6)],
dtype=[('f0', 'S3'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<i4')])
In [142]: np.genfromtxt(txt.splitlines(),dtype='str,int,float,int')
Out[142]:
array([('', 1, 2.0, 3), ('', 4, 5.0, 6)],
dtype=[('f0', '<U'), ('f1', '<i4'), ('f2', '<f8'), ('f3', '<i4')])
In [145]: np.genfromtxt(txt.splitlines(),dtype='S5,(3)int')
Out[145]:
array([(b'one', [1, 2, 3]), (b'two', [4, 5, 6])],
dtype=[('f0', 'S5'), ('f1', '<i4', (3,))])
In [146]: _['f0']
Out[146]:
array([b'one', b'two'],
dtype='|S5')
In [149]: _['f1']
Out[149]:
array([[1, 2, 3],
[4, 5, 6]])
dtype
(dtype语法可能很棘手)dtype
in a separate expression (dtype syntax can be tricky)In [172]: dt=np.dtype([('f0','|S5'),('f1',[('f10',int),('f11',float,(2))])])
In [173]: np.genfromtxt(txt.splitlines(),dtype=dt)
Out[173]:
array([(b'one', (1, [2.0, 3.0])), (b'two', (4, [5.0, 6.0]))],
dtype=[('f0', 'S5'), ('f1', [('f10', '<i4'), ('f11', '<f8', (2,))])])