更新大 pandas DataFrame存储在一个Pytable与另一个大 pandas DataFrame [英] Update pandas DataFrame in stored in a Pytable with another pandas DataFrame

查看：164 发布时间：2017/3/26 2:42:52 python pandas hdf5 pytables dataframe

本文介绍了更新大 pandas DataFrame存储在一个Pytable与另一个大 pandas DataFrame的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试创建一个功能，可以将存储在PyTable中的大熊猫DataFrame更新为一个来自大熊猫DataFrame的新数据。我想检查在特定DatetimeIndexes（值为NaN或新的Timestamp可用）中的PyTable中是否缺少某些数据，将其替换为给定的大熊猫DataFrame中的新值，并将其附加到Pytable。基本上只是更新一个Pytable。我可以在Pandas中使用combine_first方法获取组合的DataFrame。
Pytable下面用虚拟数据创建：

 将大熊猫导入为pd 
 import numpy as np 
 import datetime as dt 
 index = pd.DatetimeIndex（start = dt.datetime（2001,1,1,0,0），periods = 20000，freq ='10T' ）
 data_in_pytable = pd.DataFrame（index = index，data = np.random.randn（20000,2），columns = ['value_1'，'value_2']）
 data.to_hdf（r' C：\pytable.h5'，'test'，mode ='r +'，append = True，complevel = 9，complib ='zlib'）

所以pytable是创建的。假设我有另一个dataFrame，我想用以下方式更新Pytable：

  new_index = pd。 DatetimeIndex（start = dt.datetime（2001,5,1,0,0），periods = 10000，freq ='10T'）
 data_to_update = pd.DataFrame（index = new_index，data = np.random.randn （10000,2），columns = ['value_1'，'value_2']）
 store = pd.HDFStore（r'C：\pytable.h5'，mode ='r +'，complevel = 9，complib ='zlib'）
 store.append（'test'，store.select（'test'）。combine_first（data_to_update））
 store.close（）

问题是PyTable保留原始值，不更新现有值。我现在有重复的条目（按索引），因为原始值不会被覆盖。

总结：
如何使用另一个DataFrame更新PyTable？ p>

谢谢，
Elv

解决方案

最后，发现自己。在我的情况下，如果可以覆盖整个节点，因为combine_first可以获得原始的和新的值，可以使用

  store.put（key，value，table = True，append = False）

而不是

  store.append（key，value）。

I am trying to create a function that updates a pandas DataFrame stored that I have stored in a PyTable with new data from a pandas DataFrame. I want to check if some data is missing in the PyTable for specific DatetimeIndexes (value is NaN or a new Timestamp is available), replace this with new values from a given pandas DataFrame and append this to the Pytable. Basically, just update a Pytable. I can get the combined DataFrame using the combine_first method in Pandas. Below the Pytable is created with dummy data:

import pandas as pd
import numpy as np
import datetime as dt
index = pd.DatetimeIndex(start = dt.datetime(2001,1,1,0,0), periods = 20000,freq='10T')
data_in_pytable = pd.DataFrame(index=index,data=np.random.randn(20000,2),columns=['value_1','value_2'])
data.to_hdf(r'C:\pytable.h5','test',mode='r+',append=True,complevel=9,complib='zlib')

So the pytable is created. Assuming I have another dataFrame with which I want to update the Pytable with:

new_index = pd.DatetimeIndex(start = dt.datetime(2001,5,1,0,0), periods = 10000,freq='10T')
data_to_update=pd.DataFrame(index=new_index,data=np.random.randn(10000,2),columns=['value_1','value_2'])
store=pd.HDFStore(r'C:\pytable.h5',mode='r+',complevel=9,complib='zlib')
store.append('test',store.select('test').combine_first(data_to_update))
store.close()

The problem is that the PyTable keeps the original values, does not update the existing ones. I now have duplicate entries (by index) because the original values are not overwritten.

Summary: How can I update a PyTable with another DataFrame?

Thanks, Elv

解决方案

In the end, I found it out myself. In my case, when it is okay to overwrite the entire Node as the "combine_first" gets you the original and the new values', it is fine to use

store.put(key,value,table=True,append=False)

instead of the

store.append(key,value).

这篇关于更新大 pandas DataFrame存储在一个Pytable与另一个大 pandas DataFrame的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

更新大 pandas DataFrame存储在一个Pytable与另一个大 pandas DataFrame [英] Update pandas DataFrame in stored in a Pytable with another pandas DataFrame

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

更新大 pandas DataFrame存储在一个Pytable与另一个大 pandas DataFrame [英] Update pandas DataFrame in stored in a Pytable with another pandas DataFrame

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭