使用numpy导入数据时如何保留列名? [英] How to preserve column names while importing data using numpy?
问题描述
我正在Python中使用numpy库将CSV
文件数据导入到ndarray
中,如下所示:
I am using the numpy library in Python to import CSV
file data into a ndarray
as follows:
data = np.genfromtxt('mydata.csv',
delimiter='\,', dtype=None, names=True)
结果提供以下列名称:
print(data.dtype.names)
('row_label',
'MyDataColumn1_0',
'MyDataColumn1_1')
原始列名称为:
row_label, My-Data-Column-1.0, My-Data-Column-1.1
看来,NumPy
强迫我的列名采用C样式的变量名格式.但是在很多情况下,我的Python脚本需要根据列名访问列,因此我需要确保列名保持不变.为此,NumPy
需要保留原始列名,否则我需要将列名转换为NumPy
使用的格式.
It appears that NumPy
is forcing my column names to adopt C-style variable name formatting. Yet there are many cases where my Python scripts require access to columns according to column name, so I need to ensure that column names remain constant. To accomplish this either NumPy
needs to preserve the original column names or else I need to convert my column names to the format NumPy
is using.
-
是否有一种在导入过程中保留原始列名的方法?
Is there a way to preserve the original column names during import?
如果没有,是否有一种简单的方法可以将列标签转换为使用NumPy
正在使用的格式,最好使用某些NumPy
函数?
If not, is there an easy way to convert column labels to use the format NumPy
is using, preferably using some NumPy
function?
推荐答案
如果设置了names=True
,则数据文件的第一行将通过此函数传递:
if you set names=True
, then the first line of your data file is passed through this function:
validate_names = NameValidator(excludelist=excludelist,
deletechars=deletechars,
case_sensitive=case_sensitive,
replace_space=replace_space)
这些是您可以提供的选项:
These are those options that you can supply:
excludelist : sequence, optional
A list of names to exclude. This list is appended to the default list
['return','file','print']. Excluded names are appended an underscore:
for example, `file` would become `file_`.
deletechars : str, optional
A string combining invalid characters that must be deleted from the
names.
defaultfmt : str, optional
A format used to define default field names, such as "f%i" or "f_%02i".
autostrip : bool, optional
Whether to automatically strip white spaces from the variables.
replace_space : char, optional
Character(s) used in replacement of white spaces in the variables
names. By default, use a '_'.
也许您可以尝试提供自己的deletechars
字符串,它是一个空字符串.但是最好还是修改并传递此参数:
Perhaps you could try to supply your own deletechars
string that is an empty string. But you'd be better off modifying and passing this:
defaultdeletechars = set("""~!@#$%^&*()-=+~\|]}[{';: /?.>,<""")
只需从该集合中取出句点和减号,然后将其传递为:
Just take out the period and minus sign from that set, and pass it as:
np.genfromtxt(..., names=True, deletechars="""~!@#$%^&*()=+~\|]}[{';: /?>,<""")
以下是来源: https://github.com/numpy/numpy/blob /master/numpy/lib/_iotools.py#l245
这篇关于使用numpy导入数据时如何保留列名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!