pandas DataFrame concat/更新("upsert")? [英] pandas DataFrame concat / update ("upsert")?

查看:84
本文介绍了 pandas DataFrame concat/更新("upsert")?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种优雅的方法,将一个DataFrame的所有行追加到另一个DataFrame(两个具有相同索引和列结构的DataFrame),但是在两个DataFrame中都出现相同索引值的情况下,请使用第二个数据帧.

I am looking for an elegant way to append all the rows from one DataFrame to another DataFrame (both DataFrames having the same index and column structure), but in cases where the same index value appears in both DataFrames, use the row from the second data frame.

例如,如果我以:

df1:
                    A      B
    date
    '2015-10-01'  'A1'   'B1'
    '2015-10-02'  'A2'   'B2'
    '2015-10-03'  'A3'   'B3'

df2:
    date            A      B
    '2015-10-02'  'a1'   'b1'
    '2015-10-03'  'a2'   'b2'
    '2015-10-04'  'a3'   'b3'

我希望结果是:

                    A      B
    date
    '2015-10-01'  'A1'   'B1'
    '2015-10-02'  'a1'   'b1'
    '2015-10-03'  'a2'   'b2'
    '2015-10-04'  'a3'   'b3'

这类似于我认为在某些SQL系统中称为"upsert"的功能---更新和插入的组合,在某种意义上说df2中的每一行要么(a)用于更新现有行如果df1中已存在行键,则在df1中;如果行键尚不存在,则在结尾处将(b)插入df1中.

This is analogous to what I think is called "upsert" in some SQL systems --- a combination of update and insert, in the sense that each row from df2 is either (a) used to update an existing row in df1 if the row key already exists in df1, or (b) inserted into df1 at the end if the row key does not already exist.

我想出了以下

pd.concat([df1, df2])     # concat the two DataFrames
    .reset_index()        # turn 'date' into a regular column
    .groupby('date')      # group rows by values in the 'date' column
    .tail(1)              # take the last row in each group
    .set_index('date')    # restore 'date' as the index

这似乎有效,但是这取决于每个groupby组中的行的顺序始终与原始DataFrame相同,而我尚未检查过它,并且看起来令人费解.

which seems to work, but this relies on the order of the rows in each groupby group always being the same as the original DataFrames, which I haven't checked on, and seems displeasingly convoluted.

有人对更直接的解决方案有任何想法吗?

Does anyone have any ideas for a more straightforward solution?

推荐答案

一种解决方案是将df1df2中的新行连接(即索引不匹配的地方).然后用df2中的值更新值.

One solution is to conatenate df1 with new rows in df2 (i.e. where the index does not match). Then update the values with those from df2.

df = pd.concat([df1, df2[~df2.index.isin(df1.index)]])
df.update(df2)

>>> df
             A   B
2015-10-01  A1  B1
2015-10-02  a1  b1
2015-10-03  a2  b2
2015-10-04  a3  b3

根据@chrisb的建议,可以进一步简化如下:

Per the suggestion of @chrisb, this can further be simplified as follows:

pd.concat([df1[~df1.index.isin(df2.index)], df2])

谢谢克里斯!

这篇关于 pandas DataFrame concat/更新("upsert")?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆