是否可以直接重命名存储在hdf5文件中的pandas数据框的列? [英] Is it possible to directly rename pandas dataframe's columns stored in hdf5 file?
问题描述
我有一个很大的pandas数据帧存储在hdf5文件中,我需要重命名该数据帧的列.
I have a very large pandas dataframe stored in hdf5 file, and I need to rename the columns of the dataframe.
直接的方法是使用HDFStore.select读取数据帧中的数据块,重命名列,并将数据块存储到另一个hdf5文件中.
The straightforward way is to read the dataframe in chunks using HDFStore.select, rename the columns and store the chunks to another hdf5 file.
但是我认为这是一种愚蠢且低效的方法.有没有一种方法可以直接重命名hdf5文件中的列?
But I think this is a stupid and inefficient way. Is there a way to directly rename the columns in hdf5 file?
推荐答案
可以通过更改元数据来完成.大警告.这可能会损坏您的文件,因此您需要自担风险.
It can be done by changing the meta-data. BIG WARNING. This may corrupt your file, so you are at your own risk.
创建一个商店.必须为表格格式.我没有在这里使用data_columns
,但是对这些名称进行重命名只是很小的改动.
Create a store. Must be a table format. I didn't use data_columns
here, but the change is only slight to rename those.
In [1]: df = DataFrame(np.random.randn(10,3),columns=list('abc'))
In [2]: df.to_hdf('test.h5','df',format='table')
In [24]: df.to_hdf('test.h5','df',format='table')
In [25]: pd.read_hdf('test.h5','df')
Out[25]:
a b c
0 1.366298 0.844646 -0.470735
1 -1.438387 -1.288432 0.250763
2 -1.290225 -0.390315 -0.138440
3 2.343019 0.632340 -0.539334
4 -1.184943 0.566479 1.977939
5 -1.530772 0.757110 -0.013930
6 -0.300345 -0.951563 -1.013957
7 -0.073975 -0.256521 1.024525
8 -0.179189 -1.767918 0.591720
9 0.641028 0.205522 1.947618
获取表本身的句柄
In [26]: store = pd.HDFStore('test.h5')
您需要在2个地方更改元数据.首先在顶层
You need to change meta-data in 2 places. First here at the top-level
In [28]: store.get_storer('df').attrs['non_index_axes']
Out[28]: [(1, ['a', 'b', 'c'])]
In [29]: store.get_storer('df').attrs.non_index_axes = [(1, ['new','b','c'])]
然后在这里
In [31]: store.get_storer('df').table.attrs
Out[31]:
/df/table._v_attrs (AttributeSet), 12 attributes:
[CLASS := 'TABLE',
FIELD_0_FILL := 0,
FIELD_0_NAME := 'index',
FIELD_1_FILL := 0.0,
FIELD_1_NAME := 'values_block_0',
NROWS := 10,
TITLE := '',
VERSION := '2.7',
index_kind := 'integer',
values_block_0_dtype := 'float64',
values_block_0_kind := ['a', 'b', 'c'],
values_block_0_meta := None]
In [33]: store.get_storer('df').table.attrs.values_block_0_kind = ['new','b','c']
关闭要保存的商店
In [34]: store.close()
In [35]: pd.read_hdf('test.h5','df')
Out[35]:
new b c
0 1.366298 0.844646 -0.470735
1 -1.438387 -1.288432 0.250763
2 -1.290225 -0.390315 -0.138440
3 2.343019 0.632340 -0.539334
4 -1.184943 0.566479 1.977939
5 -1.530772 0.757110 -0.013930
6 -0.300345 -0.951563 -1.013957
7 -0.073975 -0.256521 1.024525
8 -0.179189 -1.767918 0.591720
9 0.641028 0.205522 1.947618
这篇关于是否可以直接重命名存储在hdf5文件中的pandas数据框的列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!