大 pandas -给定两个数据框,删除差异 [英] pandas - Given two dataframe, remove differences
问题描述
我有以下两个数据帧:df1和df2. 对于每个用户,我要删除包含未在df2中显示的itemid的行.
I have the following two dataframes: df1 and df2. For each user, I want to remove the rows that contains itemids which do not appear in df2.
df1
userid itemid
1 1
1 3
1 4
2 1
2 2
2 3
2 4
df2
userid itemid
1 1
1 2
1 3
1 4
2 1
2 2
2 3
由于df1中的userid = 1具有项id 1,3,4,而df2中的userid = 1具有项id 1,2,3,4,因此我不必从df1中删除任何行.但是,对于userid = 2,df1具有项目ID 2、3、4,而df2具有项目ID 1,2,3.在这种情况下,我要删除最后一行,因为itemid = 4不在df2中. 因此,答案应为:
Since userid=1 in df1 has itemids 1,3,4 and userid=1 in df2 has itemids 1,2,3,4, I don't have to remove any rows from df1. However, for userid=2, df1 has itemids 2,3,4, while df2 has itemids 1,2,3. In this case, I want to remove the last row because itemid=4 is not in df2. Therefore, the answer should be the following:
new_df1
userid itemid
1 1
1 3
1 4
2 1
2 2
2 3
请注意,df2不应更改.我只想更改df1.
Please note that df2 shouldn't change. I want only df1 to change.
推荐答案
Then filter by query
and remove helper column by drop
:
print (pd.merge(df1, df2, how='left', indicator=True))
userid itemid _merge
0 1 1 both
1 1 3 both
2 1 4 both
3 2 1 both
4 2 2 both
5 2 3 both
6 2 4 left_only
df = pd.merge(df1, df2, how='left', indicator=True)
.query("_merge != 'left_only'")
.drop('_merge',axis=1)
print (df)
userid itemid
0 1 1
1 1 3
2 1 4
3 2 1
4 2 2
5 2 3
使用 boolean indexing
的替代解决方案:
Alternative solution with boolean indexing
:
df = pd.merge(df1, df2, how='left', indicator=True)
df = df[df['_merge'] != 'left_only'].drop('_merge',axis=1)
print (df)
userid itemid
0 1 1
1 1 3
2 1 4
3 2 1
4 2 2
5 2 3
这篇关于大 pandas -给定两个数据框,删除差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!