如何在Pandas Python中更新数据框 [英] How to update a dataframe in Pandas Python

查看:88
本文介绍了如何在Pandas Python中更新数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在熊猫中有以下两个数据框:

I have the following two dataframes in pandas:

DF1:
AuthorID1  AuthorID2  Co-Authored
A1         A2         0
A1         A3         0
A1         A4         0
A2         A3         0

DF2:
AuthorID1  AuthorID2  Co-Authored
A1         A2         5
A2         A3         6
A6         A7         9

我希望(不进行循环和比较)在DF1中找到匹配的DF2中的AuthorID1和AuthorID2配对,并相应地更新列值.因此,以上两个表的结果如下:

I would like (without looping and comparing) to find the matching AuthorID1 and AuthorID2 pairing in DF2 that exist in DF1 and update the column values accordingly. So the result for the above two tables would be the following:

Resulting Updated DF1:
AuthorID1  AuthorID2  Co-Authored
A1         A2         5
A1         A3         0
A1         A4         0
A2         A3         6

有没有一种快速的方法来做到这一点?因为我在DF1中有700万行,所以循环和比较将永远耗时.

Is there a fast way to do this? As I have 7 millions rows in DF1 and looping and comparing would just take forever.

更新:请注意,DF2中的后两个不应该是DF1中的更新的一部分,因为它在DF1中不存在

Update: note that the last two in DF2 should not be part of the update in DF1 since it doesn't exist in DF1

推荐答案

您可以使用

You can use update:

df1.update(df2)
print (df1)
  AuthorID1 AuthorID2  Co-Authored
0        A1        A2          5.0
1        A2        A3          6.0
2        A1        A4          0.0
3        A2        A3          0.0

示例:

df1 = pd.DataFrame({'new': {0: 7, 1: 8, 2: 1, 3: 3}, 
                    'AuthorID2': {0: 'A2', 1: 'A3', 2: 'A4', 3: 'A3'}, 
                    'AuthorID1': {0: 'A1', 1: 'A1', 2: 'A1', 3: 'A2'}, 
                    'Co-Authored': {0: 0, 1: 0, 2: 0, 3: 0}})

df2 = pd.DataFrame({'AuthorID2': {0: 'A2', 1: 'A3'},
                    'AuthorID1': {0: 'A1', 1: 'A2'}, 
                    'Co-Authored': {0: 5, 1: 6}})

  AuthorID1 AuthorID2  Co-Authored  new
0        A1        A2            0    7
1        A1        A3            0    8
2        A1        A4            0    1
3        A2        A3            0    3

print (df2)
  AuthorID1 AuthorID2  Co-Authored
0        A1        A2            5
1        A2        A3            6

df1.update(df2)
print (df1)
  AuthorID1 AuthorID2  Co-Authored  new
0        A1        A2          5.0    7
1        A2        A3          6.0    8
2        A1        A4          0.0    1
3        A2        A3          0.0    3

通过评论

我认为您首先需要使用 isin :

I think you need filter df2 by df1 firstly with isin:

df2 = df2[df2[['AuthorID1','AuthorID2']].isin(df1[['AuthorID1','AuthorID2']]).any(1)]
print (df2)
  AuthorID1 AuthorID2  Co-Authored
0        A1        A2            5
1        A2        A3            6

df1.update(df2)
print (df1)
  AuthorID1 AuthorID2  Co-Authored
0        A1        A2          5.0
1        A2        A3          6.0
2        A1        A4          0.0
3        A2        A3          0.0

这篇关于如何在Pandas Python中更新数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆