放大设置DataFrame值 [英] Setting DataFrame values with enlargement

查看:128
本文介绍了放大设置DataFrame值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个DataFrames(带有DatetimeIndex),并想用第二帧(较新的帧)中的数据更新第一帧(较旧的帧).

I have two DataFrames (with DatetimeIndex) and want to update the first frame (the older one) with data from the second frame (the newer one).

新框架可能包含旧框架中已经包含的行的最新数据.在这种情况下,旧框架中的数据应被新框架中的数据覆盖. 同样,较新的框架可能比第一个框架具有更多的列/行. 在这种情况下,旧框架应被新框架中的数据放大.

The new frame may contain more recent data for rows already contained in the the old frame. In this case, data in the old frame should be overwritten with data from the new frame. Also the newer frame may have more columns / rows, than the first one. In this case the old frame should be enlarged by the data in the new frame.

Pandas docs 指出,

Pandas docs state, that

当为该轴设置不存在的键时,.loc/.ix/[]操作可以执行放大操作"

"The .loc/.ix/[] operations can perform enlargement when setting a non-existant key for that axis"

可以通过.loc在任一轴上放大一个DataFrame"

"a DataFrame can be enlarged on either axis via .loc"

但是,这似乎不起作用,并抛出了KeyError.示例:

However this doesn't seem to work and throws a KeyError. Example:

In [195]: df1
Out[195]: 
                     A  B  C
2015-07-09 12:00:00  1  1  1
2015-07-09 13:00:00  1  1  1
2015-07-09 14:00:00  1  1  1
2015-07-09 15:00:00  1  1  1

In [196]: df2
Out[196]: 
                     A  B  C  D
2015-07-09 14:00:00  2  2  2  2
2015-07-09 15:00:00  2  2  2  2
2015-07-09 16:00:00  2  2  2  2
2015-07-09 17:00:00  2  2  2  2

In [197]: df1.loc[df2.index] = df2
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-197-74e630e87cf8> in <module>()
----> 1 df1.loc[df2.index] = df2

/.../pandas/core/indexing.pyc in __setitem__(self, key, value)
    112 
    113     def __setitem__(self, key, value):
--> 114         indexer = self._get_setitem_indexer(key)
    115         self._setitem_with_indexer(indexer, value)
    116 

/.../pandas/core/indexing.pyc in _get_setitem_indexer(self, key)
    107 
    108         try:
--> 109             return self._convert_to_indexer(key, is_setter=True)
    110         except TypeError:
    111             raise IndexingError(key)

/.../pandas/core/indexing.pyc in _convert_to_indexer(self, obj, axis, is_setter)
   1110                 mask = check == -1
   1111                 if mask.any():
-> 1112                     raise KeyError('%s not in index' % objarr[mask])
   1113 
   1114                 return _values_from_object(indexer)

KeyError: "['2015-07-09T18:00:00.000000000+0200' '2015-07-09T19:00:00.000000000+0200'] not in index"

最好的方法是什么(就性能而言,因为我的真实数据要大得多),所以两种方法都可以实现所需的更新和扩大的DataFrame.这是我希望看到的结果:

What is the best way (with respect to performance, as my real data is much larger) two achieve the desired updated and enlarged DataFrame. This is the result I would like to see:

                     A  B  C    D
2015-07-09 12:00:00  1  1  1  NaN
2015-07-09 13:00:00  1  1  1  NaN
2015-07-09 14:00:00  2  2  2    2
2015-07-09 15:00:00  2  2  2    2
2015-07-09 16:00:00  2  2  2    2
2015-07-09 17:00:00  2  2  2    2

推荐答案

df2.combine_first(df1)(

df2.combine_first(df1) (documentation) seems to serve your requirement; PFB code snippet & output

import pandas as pd

print 'pandas-version: ', pd.__version__

df1 = pd.DataFrame.from_records([('2015-07-09 12:00:00',1,1,1),
                                 ('2015-07-09 13:00:00',1,1,1),
                                 ('2015-07-09 14:00:00',1,1,1),
                                 ('2015-07-09 15:00:00',1,1,1)],
                                columns=['Dt', 'A', 'B', 'C']).set_index('Dt')
# print df1

df2 = pd.DataFrame.from_records([('2015-07-09 14:00:00',2,2,2,2),
                                 ('2015-07-09 15:00:00',2,2,2,2),
                                 ('2015-07-09 16:00:00',2,2,2,2),
                                 ('2015-07-09 17:00:00',2,2,2,2),],
                               columns=['Dt', 'A', 'B', 'C', 'D']).set_index('Dt')
res_combine1st = df2.combine_first(df1)
print res_combine1st

输出

pandas-version:  0.15.2
                     A  B  C   D
Dt                              
2015-07-09 12:00:00  1  1  1 NaN
2015-07-09 13:00:00  1  1  1 NaN
2015-07-09 14:00:00  2  2  2   2
2015-07-09 15:00:00  2  2  2   2
2015-07-09 16:00:00  2  2  2   2
2015-07-09 17:00:00  2  2  2   2

这篇关于放大设置DataFrame值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆