pandas DataFrame concat/更新("upsert")? [英] pandas DataFrame concat / update ("upsert")?
问题描述
我正在寻找一种优雅的方法,将一个DataFrame的所有行追加到另一个DataFrame(两个具有相同索引和列结构的DataFrame),但是在两个DataFrame中都出现相同索引值的情况下,请使用第二个数据帧.
I am looking for an elegant way to append all the rows from one DataFrame to another DataFrame (both DataFrames having the same index and column structure), but in cases where the same index value appears in both DataFrames, use the row from the second data frame.
例如,如果我以:
df1:
A B
date
'2015-10-01' 'A1' 'B1'
'2015-10-02' 'A2' 'B2'
'2015-10-03' 'A3' 'B3'
df2:
date A B
'2015-10-02' 'a1' 'b1'
'2015-10-03' 'a2' 'b2'
'2015-10-04' 'a3' 'b3'
我希望结果是:
A B
date
'2015-10-01' 'A1' 'B1'
'2015-10-02' 'a1' 'b1'
'2015-10-03' 'a2' 'b2'
'2015-10-04' 'a3' 'b3'
这类似于我认为在某些SQL系统中称为"upsert"的功能---更新和插入的组合,在某种意义上说df2
中的每一行要么(a)用于更新现有行如果df1
中已存在行键,则在df1
中;如果行键尚不存在,则在结尾处将(b)插入df1
中.
This is analogous to what I think is called "upsert" in some SQL systems --- a combination of update and insert, in the sense that each row from df2
is either (a) used to update an existing row in df1
if the row key already exists in df1
, or (b) inserted into df1
at the end if the row key does not already exist.
我想出了以下
pd.concat([df1, df2]) # concat the two DataFrames
.reset_index() # turn 'date' into a regular column
.groupby('date') # group rows by values in the 'date' column
.tail(1) # take the last row in each group
.set_index('date') # restore 'date' as the index
这似乎有效,但是这取决于每个groupby组中的行的顺序始终与原始DataFrame相同,而我尚未检查过它,并且看起来令人费解.
which seems to work, but this relies on the order of the rows in each groupby group always being the same as the original DataFrames, which I haven't checked on, and seems displeasingly convoluted.
有人对更直接的解决方案有任何想法吗?
Does anyone have any ideas for a more straightforward solution?
推荐答案
一种解决方案是将df1
与df2
中的新行连接(即索引不匹配的地方).然后用df2
中的值更新值.
One solution is to conatenate df1
with new rows in df2
(i.e. where the index does not match). Then update the values with those from df2
.
df = pd.concat([df1, df2[~df2.index.isin(df1.index)]])
df.update(df2)
>>> df
A B
2015-10-01 A1 B1
2015-10-02 a1 b1
2015-10-03 a2 b2
2015-10-04 a3 b3
根据@chrisb的建议,可以进一步简化如下:
Per the suggestion of @chrisb, this can further be simplified as follows:
pd.concat([df1[~df1.index.isin(df2.index)], df2])
谢谢克里斯!
这篇关于 pandas DataFrame concat/更新("upsert")?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!