Python Pandas:合并或过滤另一个DataFrame.有没有更好的办法? [英] Python Pandas: Merge or Filter DataFrame by Another. Is there a Better Way?
问题描述
我有时遇到的一种情况是,我有两个数据框(df1
,df2
),并且我想基于df1
和
One situation I sometimes encounter is, I have two dataframes (df1
, df2
) and I want to create a new dataframe (df3
) based on the intersection of multiple columns between df1
and df2
.
例如,我想通过按Campaign
和Group
列过滤df1
来创建df3
.
For example, I want to create df3
by filtering df1
by columns Campaign
and Group
.
import pandas as pd
df1 = pd.DataFrame({'Campaign':['Campaign 1', 'Campaign 2', 'Campaign 3', 'Campaign 3', 'Campaign 4'], 'Group':['Some group', 'Arbitrary Group', 'Group 1', 'Group 2', 'Done Group'], 'Metric':[245,91,292,373,32]}, columns=['Campaign', 'Group', 'Metric'])
df2 = pd.DataFrame({'Campaign':['Campaign 3', 'Campaign 3'], 'Group':['Group 1', 'Group 2'], 'Metric':[23, 456]}, columns=['Campaign', 'Group', 'Metric'])
df1
Campaign Group Metric
0 Campaign 1 Some group 245
1 Campaign 2 Arbitrary Group 91
2 Campaign 3 Group 1 292
3 Campaign 3 Group 2 373
4 Campaign 4 Done Group 32
df2
Campaign Group Metric
0 Campaign 3 Group 1 23
1 Campaign 3 Group 2 456
我知道我可以通过合并做到这一点 ...
df3 = df1.merge(df2, how='inner', on=['Campaign', 'Group'], suffixes=('','_del'))
#df3
Campaign Group Metric Metric_del
0 Campaign 3 Group 1 292 23
1 Campaign 3 Group 2 373 456
,但随后我必须弄清楚如何以_del
结尾的drop
列.我猜是这样:
but then I have to figure out how to drop
columns that end with _del
. I guess this:
df3.select(lambda x: not re.search('_del', x), axis=1)
##The result I'm going for but required merge, then select (2-steps)
Campaign Group Metric
0 Campaign 3 Group 1 292
1 Campaign 3 Group 2 373
问题
我主要感兴趣的是返回df1
,该值只是根据df2
的Campaign|Group
值进行过滤的.
What I'm mainly interested in is returning df1
that's simply filtered on df2
's Campaign|Group
values.
-
是否有一种更好的方式来返回
df1
而不求助于merge
?
Is there a better way to return
df1
without resorting tomerge
?
是否有一种方法可以merge
,但不将df2
的列返回到merge
并且仅返回df1
的列?
Is there a way to merge
but NOT return df2
's columns to the merge
and returning only df1
's columns?
推荐答案
假定您的df1
和df2
具有完全相同的列.您可以先将那些连接键列设置为索引,然后使用df1.reindex(df2.index)
和另一个.dropna()
生成交点.
Assuming that your df1
and df2
have exactly the same columns. You can first set those join-key columns as index and use df1.reindex(df2.index)
and a further .dropna()
to produce the intersection.
df3 = df1.set_index(['Campaign', 'Group'])
df4 = df2.set_index(['Campaign', 'Group'])
# reindex first and dropna will produce the intersection
df3.reindex(df4.index).dropna(how='all').reset_index()
Campaign Group Metric
0 Campaign 3 Group 1 292
1 Campaign 3 Group 2 373
密钥不唯一时,请使用.isin
.
# create some duplicated keys and values
df3 = df3.append(df3)
df4 = df4.append(df4)
# isin
df3[df3.index.isin(df4.index)].reset_index()
Campaign Group Metric
0 Campaign 3 Group 1 292
1 Campaign 3 Group 2 373
2 Campaign 3 Group 1 292
3 Campaign 3 Group 2 373
这篇关于Python Pandas:合并或过滤另一个DataFrame.有没有更好的办法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!