Python Pandas:合并或过滤另一个DataFrame.有没有更好的办法? [英] Python Pandas: Merge or Filter DataFrame by Another. Is there a Better Way?

查看:292
本文介绍了Python Pandas:合并或过滤另一个DataFrame.有没有更好的办法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有时遇到的一种情况是,我有两个数据框(df1df2),并且我想基于df1df3). >.

One situation I sometimes encounter is, I have two dataframes (df1, df2) and I want to create a new dataframe (df3) based on the intersection of multiple columns between df1 and df2.

例如,我想通过按CampaignGroup列过滤df1来创建df3.

For example, I want to create df3 by filtering df1 by columns Campaign and Group.

import pandas as pd
df1 = pd.DataFrame({'Campaign':['Campaign 1', 'Campaign 2', 'Campaign 3', 'Campaign 3', 'Campaign 4'], 'Group':['Some group', 'Arbitrary Group', 'Group 1', 'Group 2', 'Done Group'], 'Metric':[245,91,292,373,32]}, columns=['Campaign', 'Group', 'Metric'])
df2 = pd.DataFrame({'Campaign':['Campaign 3', 'Campaign 3'], 'Group':['Group 1', 'Group 2'], 'Metric':[23, 456]}, columns=['Campaign', 'Group', 'Metric'])

df1

     Campaign            Group  Metric
0  Campaign 1       Some group     245
1  Campaign 2  Arbitrary Group      91
2  Campaign 3          Group 1     292
3  Campaign 3          Group 2     373
4  Campaign 4       Done Group      32

df2

     Campaign    Group  Metric
0  Campaign 3  Group 1      23
1  Campaign 3  Group 2     456

我知道我可以通过合并做到这一点 ...

df3 = df1.merge(df2, how='inner', on=['Campaign', 'Group'], suffixes=('','_del'))
#df3
     Campaign    Group  Metric  Metric_del
0  Campaign 3  Group 1     292          23
1  Campaign 3  Group 2     373         456

,但随后我必须弄清楚如何以_del结尾的drop列.我猜是这样:

but then I have to figure out how to drop columns that end with _del. I guess this:

df3.select(lambda x: not re.search('_del', x), axis=1)
##The result I'm going for but required merge, then select (2-steps)
     Campaign    Group  Metric
0  Campaign 3  Group 1     292
1  Campaign 3  Group 2     373

问题

我主要感兴趣的是返回df1,该值只是根据df2Campaign|Group值进行过滤的.

What I'm mainly interested in is returning df1 that's simply filtered on df2's Campaign|Group values.

  1. 是否有一种更好的方式来返回df1而不求助于merge?

  1. Is there a better way to return df1 without resorting to merge?

是否有一种方法可以merge,但不将df2的列返回到merge并且仅返回df1的列?

Is there a way to merge but NOT return df2's columns to the merge and returning only df1's columns?

推荐答案

假定您的df1df2具有完全相同的列.您可以先将那些连接键列设置为索引,然后使用df1.reindex(df2.index)和另一个.dropna()生成交点.

Assuming that your df1 and df2 have exactly the same columns. You can first set those join-key columns as index and use df1.reindex(df2.index) and a further .dropna() to produce the intersection.

df3 = df1.set_index(['Campaign', 'Group'])
df4 = df2.set_index(['Campaign', 'Group'])
# reindex first and dropna will produce the intersection
df3.reindex(df4.index).dropna(how='all').reset_index()

     Campaign    Group  Metric
0  Campaign 3  Group 1     292
1  Campaign 3  Group 2     373

密钥不唯一时,请使用.isin.

# create some duplicated keys and values
df3 = df3.append(df3)
df4 = df4.append(df4)

# isin
df3[df3.index.isin(df4.index)].reset_index()

     Campaign    Group  Metric
0  Campaign 3  Group 1     292
1  Campaign 3  Group 2     373
2  Campaign 3  Group 1     292
3  Campaign 3  Group 2     373

这篇关于Python Pandas:合并或过滤另一个DataFrame.有没有更好的办法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆