Python Pandas:在多个条件下合并数据框 [英] Python Pandas: Merging data frames on multiple conditions
本文介绍了Python Pandas:在多个条件下合并数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我希望合并在多种情况下通过sql获取的数据帧.
I wish to merge data frames as fetched via sql under multiple condition.
- df1:第一个df包含客户ID,群集ID和客户区域ID.
- 第二个df包含投诉ID,注册号.
df1和df2如下所示:
The df1 and df2 are shown below:
df1
Customer ID Cluster ID Customer Zone ID
CUS1001.A CUS1001.X CUS1000
CUS1001.B CUS1001.X CUS1000
CUS1001.C CUS1001.X CUS1000
CUS1001.D CUS1001.X CUS1000
CUS1001.E CUS1001.X CUS1000
CUS2001.A CUS2001.X CUS2000
df2:
Complain ID RegistrationNumber Status
CUS3501.A 99231 open
CUS1001.B 21340 open
CUS1001.X 32100 open
我希望在满足以下条件的情况下合并这两个数据框:
I wish to merge these two data frame with following condition:
if(Complain ID == Customer ID):
Merge on Customer ID
Elif(Complain ID == Cluster ID):
Merge on Customer ID
Elif (Complain ID == Customer Zone ID):
Merge on Customer ID
Else:
Merge empty row.
最终结果应如下所示:
Customer ID Cluster ID Customer Zone ID Complain ID Regi ID Status
CUS1001.A CUS1001.X CUS1000 CUS1001.X 32100 open
CUS1001.B CUS1001.X CUS1000 CUS1001.B 21340 open
CUS1001.C CUS1001.X CUS1000 CUS1001.X 32100 open
. . . . . .
. . . . . .
CUS2001.A CUS2001.X CUS2000 0 0 0
请帮助!
推荐答案
尝试使用... ...使用pandas
:melt
,merge
和concat
Try this ...using pandas
: melt
, merge
and concat
df=pd.melt(df1)
df=df.merge(df2,left_on='value',right_on='Complain ID',how='left')
df['number']=df.groupby('variable').cumcount()
df=df.groupby('number').bfill()
Target=pd.concat([df1,df.iloc[:5,2:6]],axis=1).fillna(0).drop('number',axis=1)
Target
Out[39]:
Customer ID Cluster ID Customer Zone ID Complain ID RegistrationNumber \
0 CUS1001.A CUS1001.X CUS1000 CUS1001.X 32100.0
1 CUS1001.B CUS1001.X CUS1000 CUS1001.B 21340.0
2 CUS1001.C CUS1001.X CUS1000 CUS1001.X 32100.0
3 CUS1001.D CUS1001.X CUS1000 CUS1001.X 32100.0
4 CUS1001.E CUS1001.X CUS1000 CUS1001.X 32100.0
5 CUS2001.A CUS2001.X CUS2000 0 0.0
Status
0 open
1 open
2 open
3 open
4 open
5 0
更新
通过使用numpy的intersect1d
,我个人比以前更喜欢这种方法.
Update
By using numpy's intersect1d
, Personally I like this approach most than the previous one .
df1.MatchId=[np.intersect1d(x,df2.ComplainID.values) for x in df1[['CustomerID','ClusterID']].values]
df1.MatchId=df1.MatchId.apply(pd.Series)
df1
Out[307]:
CustomerID ClusterID CustomerZoneID MatchId
0 CUS1001.A CUS1001.X CUS1000 CUS1001.X
1 CUS1001.B CUS1001.X CUS1000 CUS1001.B
2 CUS1001.C CUS1001.X CUS1000 CUS1001.X
3 CUS1001.D CUS1001.X CUS1000 CUS1001.X
4 CUS1001.E CUS1001.X CUS1000 CUS1001.X
5 CUS2001.A CUS2001.X CUS2000 NaN
df1.merge(df2,left_on='MatchId',right_on='ComplainID',how='left')
Out[311]:
CustomerID ClusterID CustomerZoneID MatchId ComplainID \
0 CUS1001.A CUS1001.X CUS1000 CUS1001.X CUS1001.X
1 CUS1001.B CUS1001.X CUS1000 CUS1001.B CUS1001.B
2 CUS1001.C CUS1001.X CUS1000 CUS1001.X CUS1001.X
3 CUS1001.D CUS1001.X CUS1000 CUS1001.X CUS1001.X
4 CUS1001.E CUS1001.X CUS1000 CUS1001.X CUS1001.X
5 CUS2001.A CUS2001.X CUS2000 NaN NaN
RegistrationNumber Status
0 32100.0 open
1 21340.0 open
2 32100.0 open
3 32100.0 open
4 32100.0 open
5 NaN NaN
这篇关于Python Pandas:在多个条件下合并数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文