放大设置DataFrame值 [英] Setting DataFrame values with enlargement
问题描述
我有两个DataFrames
(带有DatetimeIndex
),并想用第二帧(较新的帧)中的数据更新第一帧(较旧的帧).
I have two DataFrames
(with DatetimeIndex
) and want to update the first frame (the older one) with data from the second frame (the newer one).
新框架可能包含旧框架中已经包含的行的最新数据.在这种情况下,旧框架中的数据应被新框架中的数据覆盖. 同样,较新的框架可能比第一个框架具有更多的列/行. 在这种情况下,旧框架应被新框架中的数据放大.
The new frame may contain more recent data for rows already contained in the the old frame. In this case, data in the old frame should be overwritten with data from the new frame. Also the newer frame may have more columns / rows, than the first one. In this case the old frame should be enlarged by the data in the new frame.
Pandas docs 指出,
Pandas docs state, that
当为该轴设置不存在的键时,.loc/.ix/[]
操作可以执行放大操作"
"The .loc/.ix/[]
operations can perform enlargement when setting a non-existant key for that axis"
和
可以通过.loc
在任一轴上放大一个DataFrame"
"a DataFrame can be enlarged on either axis via .loc
"
但是,这似乎不起作用,并抛出了KeyError
.示例:
However this doesn't seem to work and throws a KeyError
. Example:
In [195]: df1
Out[195]:
A B C
2015-07-09 12:00:00 1 1 1
2015-07-09 13:00:00 1 1 1
2015-07-09 14:00:00 1 1 1
2015-07-09 15:00:00 1 1 1
In [196]: df2
Out[196]:
A B C D
2015-07-09 14:00:00 2 2 2 2
2015-07-09 15:00:00 2 2 2 2
2015-07-09 16:00:00 2 2 2 2
2015-07-09 17:00:00 2 2 2 2
In [197]: df1.loc[df2.index] = df2
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-197-74e630e87cf8> in <module>()
----> 1 df1.loc[df2.index] = df2
/.../pandas/core/indexing.pyc in __setitem__(self, key, value)
112
113 def __setitem__(self, key, value):
--> 114 indexer = self._get_setitem_indexer(key)
115 self._setitem_with_indexer(indexer, value)
116
/.../pandas/core/indexing.pyc in _get_setitem_indexer(self, key)
107
108 try:
--> 109 return self._convert_to_indexer(key, is_setter=True)
110 except TypeError:
111 raise IndexingError(key)
/.../pandas/core/indexing.pyc in _convert_to_indexer(self, obj, axis, is_setter)
1110 mask = check == -1
1111 if mask.any():
-> 1112 raise KeyError('%s not in index' % objarr[mask])
1113
1114 return _values_from_object(indexer)
KeyError: "['2015-07-09T18:00:00.000000000+0200' '2015-07-09T19:00:00.000000000+0200'] not in index"
最好的方法是什么(就性能而言,因为我的真实数据要大得多),所以两种方法都可以实现所需的更新和扩大的DataFrame.这是我希望看到的结果:
What is the best way (with respect to performance, as my real data is much larger) two achieve the desired updated and enlarged DataFrame. This is the result I would like to see:
A B C D
2015-07-09 12:00:00 1 1 1 NaN
2015-07-09 13:00:00 1 1 1 NaN
2015-07-09 14:00:00 2 2 2 2
2015-07-09 15:00:00 2 2 2 2
2015-07-09 16:00:00 2 2 2 2
2015-07-09 17:00:00 2 2 2 2
推荐答案
df2.combine_first(df1)
(documentation)
seems to serve your requirement; PFB code snippet & output
import pandas as pd
print 'pandas-version: ', pd.__version__
df1 = pd.DataFrame.from_records([('2015-07-09 12:00:00',1,1,1),
('2015-07-09 13:00:00',1,1,1),
('2015-07-09 14:00:00',1,1,1),
('2015-07-09 15:00:00',1,1,1)],
columns=['Dt', 'A', 'B', 'C']).set_index('Dt')
# print df1
df2 = pd.DataFrame.from_records([('2015-07-09 14:00:00',2,2,2,2),
('2015-07-09 15:00:00',2,2,2,2),
('2015-07-09 16:00:00',2,2,2,2),
('2015-07-09 17:00:00',2,2,2,2),],
columns=['Dt', 'A', 'B', 'C', 'D']).set_index('Dt')
res_combine1st = df2.combine_first(df1)
print res_combine1st
输出
pandas-version: 0.15.2
A B C D
Dt
2015-07-09 12:00:00 1 1 1 NaN
2015-07-09 13:00:00 1 1 1 NaN
2015-07-09 14:00:00 2 2 2 2
2015-07-09 15:00:00 2 2 2 2
2015-07-09 16:00:00 2 2 2 2
2015-07-09 17:00:00 2 2 2 2
这篇关于放大设置DataFrame值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!