如何在使用numpy.genfromtxt时保留以减号开头的列名? [英] How to preserve column names starting with a minus when using numpy.genfromtxt?

查看:321
本文介绍了如何在使用numpy.genfromtxt时保留以减号开头的列名?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题类似, numpy.genfromtxt 修改了我的栏名称:

Similar to this question, numpy.genfromtxt modifies my columns' names:

import numpy as np
from io import BytesIO  # http://stackoverflow.com/a/11970414/321973

str = 'x,-1,1\n0,1,1\n1,2,3'
data = np.genfromtxt(BytesIO(str.encode()), delimiter=',', names=True)
print(data.dtype.names)


b $ b

产生('x','1','1_1') 1','1')(或更好,('x',-1,1))。我尝试了 deletechars =〜!@#$%^& *()= +〜\ |]} [{';:/?>,< code> 无效。

yields ('x', '1', '1_1') instead of the desired ('x', '-1', '1') (or even better, ('x', -1, 1)). I tried deletechars="""~!@#$%^&*()=+~\|]}[{';: /?>,<""" as suggested there to no avail.

推荐答案

您看到的行为是由 np.genfromtxt 使用 NameValidator class 这里自动

The behavior you're seeing is caused by the fact that np.genfromtxt uses the NameValidator class here to automatically strip certain non-alphanumeric characters from the field names.

字段名称包含' - '字符,例如:

It's perfectly legal for a field name to contain a '-' character, e.g.:

x = np.array((1,), dtype=[('-1', 'i')])
print(x['-1'])
# 1

事实上,从 np.genfromtxt 中获得的三个修改字段名称中的两个也不是有效的Python标识符('1''1_1',因为它们以数字开头)。

In fact, two out of three of the modified field names you get back from np.genfromtxt are also not "valid Python identifiers" ('1' and '1_1', since they start with digits).

因此,只要使用 np.genfromtxt 设置字段名称,就可以构造您描述的数组。一种方法是初始化一个空数组,明确指定字段名和dtypes,然后用剩余的字符串内容填充:

It's therefore possible to construct the array you describe as long as you bypass using np.genfromtxt to set the field names. One way to do it would be to initialize an empty array, specify the field names and dtypes explicitly, then fill it with the rest of the string contents:

names = str.splitlines()[0].split(',')
types = ('i',) * 3
dtype = zip(names, types)

data = np.empty(2, dtype=dtype)
data[:] = np.genfromtxt(BytesIO(str.encode()), delimiter=',', dtype=dtype,
                        skiprows=1)
print(repr(data))
# array([(0, 0, 1), (1, 0, 2)], 
#       dtype=[('x', '<i4'), ('-1', '<i4'), ('1', '<i4')])

然而,只是因为你不能意味着你应该 - 可能会有其他不可预测的后果,有一个 - '。最安全的选项是坚持只使用有效的Python标识符作为字段名称。

However, just because you can doesn't mean you should - there may well be other unpredictable consequences to having a '-' in one of your field names. The safest option is to stick with using only valid Python identifiers as field names.

这篇关于如何在使用numpy.genfromtxt时保留以减号开头的列名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆