使用Pandas,Python将数据附加到HDF5文件 [英] Append data to HDF5 file with Pandas, Python
问题描述
我有带有财务数据的大熊猫DataFrame。
我可以毫无问题地向我的.h5文件添加和连接其他列和DataFrame。
I have large pandas DataFrames with financial data. I have no problem appending and concatenating additional columns and DataFrames to my .h5 file.
财务数据每分钟都会更新,我需要添加一个每分钟.h5文件中所有现有表的数据行。
The financial data is being updated every minute, I need to append a row of data to all of my existing tables inside of my .h5 file every minute.
这是到目前为止我尝试过的,但是无论我做什么,它覆盖.h5文件,而不仅仅是追加数据。
Here is what i have tried so far, but no matter what i do, it overwrites the .h5 file and does not just append data.
HDFStore方式:
HDFStore way:
#we open the hdf5 file
save_hdf = HDFStore('test.h5')
ohlcv_candle.to_hdf('test.h5')
#we give the dataframe a key value
#format=table so we can append data
save_hdf.put('name_of_frame',ohlcv_candle, format='table', data_columns=True)
#we print our dataframe by calling the hdf file with the key
#just doing this as a test
print(save_hdf['name_of_frame'])
我尝试过的另一种方法to_hdf:
The other way I have tried it, to_hdf:
#format=t so we can append data , mode=r+ to specify the file exists and
#we want to append to it
tohlcv_candle.to_hdf('test.h5',key='this_is_a_key', mode='r+', format='t')
#again just printing to check if it worked
print(pd.read_hdf('test.h5', key='this_is_a_key'))
以下是其中一个DataFrame的外观在被读完之后:
Here is what one of the DataFrames looks like after being read_hdf:
time open high low close volume PP
0 1505305260 3137.89 3147.15 3121.17 3146.94 6.205397 3138.420000
1 1505305320 3146.86 3159.99 3130.00 3159.88 8.935962 3149.956667
2 1505305380 3159.96 3160.00 3159.37 3159.66 4.524017 3159.676667
3 1505305440 3159.66 3175.51 3151.08 3175.51 8.717610 3167.366667
4 1505305500 3175.25 3175.53 3170.44 3175.53 3.187453 3173.833333
下次我获取数据时(每分钟),我希望将它的一行添加到我所有列的索引5中,然后添加到6和7中,依此类推,而不必读取和操作内存中的整个文件,因为这样做会破坏点这样做。
如果有更好的方法可以解决此问题,请不要害羞地推荐它。
The next time I am getting data (every minute), i would like a row of it added to index 5 of all my columns..and then 6 and 7 ..and so on, without having to read and manipulate the entire file in memory as that would defeat the point of doing this. If there is a better way of solving this, do not be shy to recommend it.
P.S。抱歉,该表格在此处的格式
P.S. sorry for the formatting of that table in here
推荐答案
pandas.HDFStore.put()具有参数 append
(其中默认为 False
)-指示熊猫覆盖而不是附加。
pandas.HDFStore.put() has parameter append
(which defaults to False
) - that instructs Pandas to overwrite instead of appending.
因此,请尝试以下操作:
So try this:
store = pd.HDFStore('test.h5')
store.append('name_of_frame', ohlcv_candle, format='t', data_columns=True)
我们也可以使用 store .put(...,append = True)
,但是此文件也应该以表格格式创建:
we can also use store.put(..., append=True)
, but this file should also be created in a table format:
store.put('name_of_frame', ohlcv_candle, format='t', append=True, data_columns=True)
注意::附加操作仅对表
( format ='t'
-是 format ='table'
)格式的别名。
NOTE: appending works only for the table
(format='t'
- is an alias for format='table'
) format.
这篇关于使用Pandas,Python将数据附加到HDF5文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!