Python genfromtext多种数据类型 [英] Python genfromtext multiple datatypes

查看:309
本文介绍了Python genfromtext多种数据类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用genfromtxt读取csv文件. 我有六列是浮动的,一列是字符串.

I would like to read in a csv file using genfromtxt. I have six columns that are float, and one column that is a string.

如何设置数据类型,以便将float列读取为float并将string列读取为字符串?我尝试了dtype ='void',但是那行不通.

How do I set the datatype so that the float columns will be read in as floats and the string column will be read in as strings? I tried dtype='void' but that is not working.

建议?

谢谢

.csv文件

999.9, abc, 34, 78, 12.3
1.3, ghf, 12, 8.4, 23.7
101.7, evf, 89, 2.4, 11.3



x = sys.argv[1]
f = open(x, 'r')
y = np.genfromtxt(f, delimiter = ',', dtype=[('f0', '<f8'), ('f1', 'S4'), (\
'f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<f8'), ('f6', '<f8')])

ionenergy = y[:,0]
units = y[:,1]

错误:

ionenergy = y[:,0]
IndexError: invalid index

当我指定一种数据类型时,我没有收到此错误.

I don't get this error when I specify a single data type..

推荐答案

dtype=None告诉genfromtxt猜测适当的dtype.

dtype=None tells genfromtxt to guess the appropriate dtype.

来自文档:

dtype:dtype,可选

dtype: dtype, optional

结果数组的数据类型. 如果为None,则dtypes将为 由各列的内容分别确定.

Data type of the resulting array. If None, the dtypes will be determined by the contents of each column, individually.

(我的重点.)

由于您的数据用逗号分隔,因此请确保包含delimiter=',',否则np.genfromtxt会将每列(执行最后一列)解释为包括字符串字符(逗号),因此错误地为每个列分配了字符串dtype这些列中的

Since your data is comma-separated, be sure to include delimiter=',' or else np.genfromtxt will interpret each column (execpt the last) as including a string character (the comma) and therefore mistakenly assign a string dtype to each of those columns.

例如:

import numpy as np

arr = np.genfromtxt('data', dtype=None, delimiter=',')

print(arr.dtype)
# [('f0', '<f8'), ('f1', 'S4'), ('f2', '<i4'), ('f3', '<f8'), ('f4', '<f8')]

这显示每列的名称和dtype.例如,('f3', <f8)表示第四列的名称为'f3',且类型为d< i4. i表示它是整数dtype.如果您需要第三列是float dtype,那么有一些选择.

This shows the names and dtypes of each column. For example, ('f3', <f8) means the fourth column has name 'f3' and is of dtype '<i4. The i means it is an integer dtype. If you need the third column to be a float dtype then there are a few options.

  1. 您可以通过在导航栏中添加小数点来手动编辑数据 第三列强制genfromtxt解释该列中的值 成为float dtype.
  2. 您可以在对genfromtxt的调用中显式提供dtype

  1. You could manually edit the data by adding a decimal point in the third column to force genfromtxt to interpret values in that column to be of a float dtype.
  2. You could supply the dtype explicitly in the call to genfromtxt

arr = np.genfromtxt(
    'data', delimiter=',',
    dtype=[('f0', '<f8'), ('f1', 'S4'), ('f2', '<f4'), ('f3', '<f8'), ('f4', '<f8')])


print(arr)
# [(999.9, ' abc', 34, 78.0, 12.3) (1.3, ' ghf', 12, 8.4, 23.7)
#  (101.7, ' evf', 89, 2.4, 11.3)]

print(arr['f2'])
# [34 12 89]


该行正在生成错误消息IndexError: invalid index

ionenergy = y[:,0]

当您混合使用dtypes时,np.genfromtxt返回一个结构化数组.您需要阅读结构化数组,因为用于访问列的语法不同于用于同类dtype普通数组的语法.

When you have mixed dtypes, np.genfromtxt returns a structured array. You need to read up on structured arrays because the syntax for accessing columns differs from the syntax used for plain arrays of homogenous dtype.

使用

代替y[:, 0]来访问结构化数组y的第一列

Instead of y[:, 0], to access the first column of the structured array y, use

y['f0']

或者更好的是,在np.genfromtxt中提供names参数,因此您可以使用更相关的列名,例如y['ionenergy']:

Or, better yet, supply the names parameter in np.genfromtxt, so you can use a more relevant column name, like y['ionenergy']:

import numpy as np
arr = np.genfromtxt(
    'data', delimiter=',', dtype=None,
    names=['ionenergy', 'foo', 'bar', 'baz', 'quux', 'corge'])

print(arr['ionenergy'])
# [ 999.9    1.3  101.7]

这篇关于Python genfromtext多种数据类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆