Python genfromtext多种数据类型 [英] Python genfromtext multiple datatypes
问题描述
我想使用genfromtxt读取csv文件. 我有六列是浮动的,一列是字符串.
I would like to read in a csv file using genfromtxt. I have six columns that are float, and one column that is a string.
如何设置数据类型,以便将float列读取为float并将string列读取为字符串?我尝试了dtype ='void',但是那行不通.
How do I set the datatype so that the float columns will be read in as floats and the string column will be read in as strings? I tried dtype='void' but that is not working.
建议?
谢谢
.csv文件
999.9, abc, 34, 78, 12.3
1.3, ghf, 12, 8.4, 23.7
101.7, evf, 89, 2.4, 11.3
x = sys.argv[1]
f = open(x, 'r')
y = np.genfromtxt(f, delimiter = ',', dtype=[('f0', '<f8'), ('f1', 'S4'), (\
'f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<f8'), ('f6', '<f8')])
ionenergy = y[:,0]
units = y[:,1]
错误:
ionenergy = y[:,0]
IndexError: invalid index
当我指定一种数据类型时,我没有收到此错误.
I don't get this error when I specify a single data type..
推荐答案
dtype=None
告诉genfromtxt
猜测适当的dtype.
dtype=None
tells genfromtxt
to guess the appropriate dtype.
来自文档:
dtype:dtype,可选
dtype: dtype, optional
结果数组的数据类型. 如果为None,则dtypes将为 由各列的内容分别确定.
Data type of the resulting array. If None, the dtypes will be determined by the contents of each column, individually.
(我的重点.)
由于您的数据用逗号分隔,因此请确保包含delimiter=','
,否则np.genfromtxt
会将每列(执行最后一列)解释为包括字符串字符(逗号),因此错误地为每个列分配了字符串dtype这些列中的
Since your data is comma-separated, be sure to include delimiter=','
or else np.genfromtxt
will interpret each column (execpt the last) as including a string character (the comma) and therefore mistakenly assign a string dtype to each of those columns.
例如:
import numpy as np
arr = np.genfromtxt('data', dtype=None, delimiter=',')
print(arr.dtype)
# [('f0', '<f8'), ('f1', 'S4'), ('f2', '<i4'), ('f3', '<f8'), ('f4', '<f8')]
这显示每列的名称和dtype.例如,('f3', <f8)
表示第四列的名称为'f3'
,且类型为d< i4. i
表示它是整数dtype.如果您需要第三列是float dtype,那么有一些选择.
This shows the names and dtypes of each column. For example, ('f3', <f8)
means the fourth column has name 'f3'
and is of dtype '<i4. The i
means it is an integer dtype. If you need the third column to be a float dtype then there are a few options.
- 您可以通过在导航栏中添加小数点来手动编辑数据 第三列强制genfromtxt解释该列中的值 成为float dtype.
-
您可以在对genfromtxt的调用中显式提供dtype
- You could manually edit the data by adding a decimal point in the third column to force genfromtxt to interpret values in that column to be of a float dtype.
You could supply the dtype explicitly in the call to genfromtxt
arr = np.genfromtxt(
'data', delimiter=',',
dtype=[('f0', '<f8'), ('f1', 'S4'), ('f2', '<f4'), ('f3', '<f8'), ('f4', '<f8')])
print(arr)
# [(999.9, ' abc', 34, 78.0, 12.3) (1.3, ' ghf', 12, 8.4, 23.7)
# (101.7, ' evf', 89, 2.4, 11.3)]
print(arr['f2'])
# [34 12 89]
该行正在生成错误消息IndexError: invalid index
ionenergy = y[:,0]
当您混合使用dtypes时,np.genfromtxt
返回一个结构化数组.您需要阅读结构化数组,因为用于访问列的语法不同于用于同类dtype普通数组的语法.
When you have mixed dtypes, np.genfromtxt
returns a structured array. You need to read up on structured arrays because the syntax for accessing columns differs from the syntax used for plain arrays of homogenous dtype.
使用
代替y[:, 0]
来访问结构化数组y
的第一列
Instead of y[:, 0]
, to access the first column of the structured array y
, use
y['f0']
或者更好的是,在np.genfromtxt
中提供names
参数,因此您可以使用更相关的列名,例如y['ionenergy']
:
Or, better yet, supply the names
parameter in np.genfromtxt
, so you can use a more relevant column name, like y['ionenergy']
:
import numpy as np
arr = np.genfromtxt(
'data', delimiter=',', dtype=None,
names=['ionenergy', 'foo', 'bar', 'baz', 'quux', 'corge'])
print(arr['ionenergy'])
# [ 999.9 1.3 101.7]
这篇关于Python genfromtext多种数据类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!