python3中的numpy genfromtxt问题 [英] numpy genfromtxt issues in Python3

查看:76
本文介绍了python3中的numpy genfromtxt问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将genfromtxt与Python3一起使用,以读取包含字符串和数字的简单 csv 文件.例如,类似以下内容(以下称为"test.csv"):

I'm trying to use genfromtxt with Python3 to read a simple csv file containing strings and numbers. For example, something like (hereinafter "test.csv"):

1,a
2,b
3,c

使用Python2,以下代码可以很好地工作:

with Python2, the following works well:

import numpy
data=numpy.genfromtxt("test.csv", delimiter=",", dtype=None)
# Now data is something like [(1, 'a') (2, 'b') (3, 'c')]

在Python3中,相同的代码返回[(1, b'a') (2, b'b') (3, b'c')].这是预期,因为Python3读取文件的方式不同.因此,我使用转换器来解码字符串:

in Python3 the same code returns [(1, b'a') (2, b'b') (3, b'c')]. This is somehow expected due to the different way Python3 reads the files. Therefore I use a converter to decode the strings:

decodef = lambda x: x.decode("utf-8")
data=numpy.genfromtxt("test.csv", delimiter=",", dtype="f8,S8", converters={1: decodef})

这适用于Python2,但不适用于Python3(相同的[(1, b'a') (2, b'b') (3, b'c')]输出. 但是,如果在Python3中,我使用上面的代码仅读取一列:

This works with Python2, but not with Python3 (same [(1, b'a') (2, b'b') (3, b'c')] output. However, if in Python3 I use the code above to read only one column:

data=numpy.genfromtxt("test.csv", delimiter=",", usecols=(1,), dtype="S8", converters={1: decodef})

输出字符串为['a' 'b' 'c'],已经按预期进行了解码.

the output strings are ['a' 'b' 'c'], already decoded as expected.

我还尝试提供文件作为'rb'模式下open的输出,如在

I've also tried to provide the file as the output of an open with the 'rb' mode, as suggested at this link, but there are no improvements.

为什么仅读取一列而不是读取两列时转换器工作?您能否建议我在Python3中使用genfromtxt的正确方法?难道我做错了什么?预先谢谢你!

Why the converter works when only one column is read, and not when two columns are read? Could you please suggest me the correct way to use genfromtxt in Python3? Am I doing something wrong? Thank you in advance!

推荐答案

我的问题的答案是将dtype用于unicode字符串(例如,U2).

The answer to my problem is using the dtype for unicode strings (U2, for example).

多亏了E.Kehler的回答,我找到了解决方案. 如果在dtype定义中使用str代替S8,则第二列的输出为空:

Thanks to the answer of E.Kehler, I found the solution. If I use str in place of S8 in the dtype definition, then the output for the 2nd column is empty:

numpy.genfromtxt("test.csv", delimiter=",", dtype='f8,str')

输出为:

array([(1.0, ''), (2.0, ''), (3.0, '')], dtype=[('f0', '<f16'), ('f1', '<U0')])

这表明我要解决问题的正确dtype是unicode字符串:

This suggested me that correct dtype to solve my problem is an unicode string:

numpy.genfromtxt("test.csv", delimiter=",", dtype='f8,U2')

给出预期的输出:

array([(1.0, 'a'), (2.0, 'b'), (3.0, 'c')], dtype=[('f0', '<f16'), ('f1', '<U2')])

有用的信息也可以在 numpy数据类型doc页面中找到.

这篇关于python3中的numpy genfromtxt问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆