大 pandas -给定两个数据框,删除差异 [英] pandas - Given two dataframe, remove differences

查看:46
本文介绍了大 pandas -给定两个数据框,删除差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下两个数据帧:df1和df2. 对于每个用户,我要删除包含未在df2中显示的itemid的行.

I have the following two dataframes: df1 and df2. For each user, I want to remove the rows that contains itemids which do not appear in df2.

df1

userid itemid
  1       1
  1       3
  1       4
  2       1
  2       2
  2       3
  2       4


df2

userid itemid
  1       1
  1       2
  1       3
  1       4
  2       1
  2       2
  2       3

由于df1中的userid = 1具有项id 1,3,4,而df2中的userid = 1具有项id 1,2,3,4,因此我不必从df1中删除任何行.但是,对于userid = 2,df1具有项目ID 2、3、4,而df2具有项目ID 1,2,3.在这种情况下,我要删除最后一行,因为itemid = 4不在df2中. 因此,答案应为:

Since userid=1 in df1 has itemids 1,3,4 and userid=1 in df2 has itemids 1,2,3,4, I don't have to remove any rows from df1. However, for userid=2, df1 has itemids 2,3,4, while df2 has itemids 1,2,3. In this case, I want to remove the last row because itemid=4 is not in df2. Therefore, the answer should be the following:

new_df1

userid itemid
  1       1
  1       3
  1       4
  2       1
  2       2
  2       3

请注意,df2不应更改.我只想更改df1.

Please note that df2 shouldn't change. I want only df1 to change.

推荐答案

使用然后通过 query 并通过 drop :

Then filter by query and remove helper column by drop:

print (pd.merge(df1, df2, how='left', indicator=True))
   userid  itemid     _merge
0       1       1       both
1       1       3       both
2       1       4       both
3       2       1       both
4       2       2       both
5       2       3       both
6       2       4  left_only

df = pd.merge(df1, df2, how='left', indicator=True)
       .query("_merge != 'left_only'")
       .drop('_merge',axis=1)
print (df)
   userid  itemid
0       1       1
1       1       3
2       1       4
3       2       1
4       2       2
5       2       3

使用 boolean indexing 的替代解决方案:

Alternative solution with boolean indexing:

df = pd.merge(df1, df2, how='left', indicator=True)
df = df[df['_merge'] != 'left_only'].drop('_merge',axis=1)
print (df)
   userid  itemid
0       1       1
1       1       3
2       1       4
3       2       1
4       2       2
5       2       3

这篇关于大 pandas -给定两个数据框,删除差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆