根据来自另一个数据框的多个列条件创建多个列 [英] Create multiple columns based on multiple column conditions from another dataframe

查看:55
本文介绍了根据来自另一个数据框的多个列条件创建多个列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个从csv文件导出的数据帧 df1

I have 2 dataframes derived from csv files df1

 |BID    |Datetime           |TrId |Code|LineNumber|Vol  |Grade      |PId
0|1002867|2019-08-19 01:27:53|1459 |f   |10        |33.88|Vd         |4  
1|1002867|2019-08-19 01:39:05|1460 |f   |10        |18.13|EE         |5  
2|1002867|2019-08-19 01:39:55|1461 |f   |10        |21.8 |Ad         |9  
3|1002867|2019-08-19 01:39:55|1461 |f   |20        |500  |Vd         |10 
4|1002147|2019-08-19 01:26:21|2764 |f   |10        |33.86|V9         |3  
5|1002147|2019-10-19 01:31:57|2765 |f   |10        |3.48 |V9         |2  
9|1001257|2019-08-19 01:49:54|11524|f   |10        |19.93|Ul         |5  

df2

 |sId  |BID    |StartDateTime      |EndDateTime        
0|10007|1002867|2019-07-26 05:11:05|2019-10-05 21:50:55
1|10006|1002147|2019-08-18 05:11:05|2019-10-05 21:50:55
2|10006|1002147|2019-10-05 21:50:55|2019-11-06 21:50:28
3|10006|1002147|2019-10-06 21:50:28|2019-10-08 03:56:20
4|10006|1002147|2019-10-08 03:56:20|2019-10-09 03:50:35
5|10006|1002147|2019-10-09 03:50:35|2019-10-10 05:12:30
6|10006|1002147|2019-10-10 05:12:30|2019-10-11 05:12:38
7|10009|1002348|2019-09-26 04:21:12|2019-10-06 04:16:00
8|10009|1002348|2019-10-06 04:16:00|2019-10-07 04:11:38
9|10009|1002348|2019-10-07 04:11:38|2019-10-08 04:13:12

请注意,两个数据帧的长度都不相同

Note that both dataframes are not of same length

仅在满足以下条件时,我才希望将df2的sId,StartDateTime和EndDateTime列添加到df1:

I want to add the column sId, StartDateTime and EndDateTime from df2 to df1 only if the following conditions match:

如果df1.BID = df2.BID和df1.DateTime在df2.StartDateTime和df2.EndDatetime之间

if df1.BID = df2.BID and df1.DateTime is between df2.StartDateTime and df2.EndDatetime

我的结果应如下所示:

 |BID    |Datetime           |TrId |Code|LineNumber|Vol  |Grade      |PId|sId  |StartDateTime      |EndDateTime        
0|1002867|2019-08-19 01:27:53|1459 |f   |10        |33.88|Vd         |4  |10007|2019-07-26 05:11:05|2019-10-05 21:50:55
1|1002867|2019-08-19 01:39:05|1460 |f   |10        |18.13|EE         |5  |10007|2019-07-26 05:11:05|2019-10-05 21:50:55
2|1002867|2019-08-19 01:39:55|1461 |f   |10        |21.8 |Ad         |9  |10007|2019-07-26 05:11:05|2019-10-05 21:50:55
3|1002867|2019-08-19 01:39:55|1461 |f   |20        |500  |Vd         |10 |10007|2019-07-26 05:11:05|2019-10-05 21:50:55
4|1002147|2019-08-19 01:26:21|2764 |f   |10        |33.86|V9         |3  |10006|2019-08-18 05:11:05|2019-10-05 21:50:55
5|1002147|2019-10-19 01:31:57|2765 |f   |10        |3.48 |V9         |2  |10006|2019-10-05 21:50:55|2019-11-06 21:50:28
9|1001257|2019-08-19 01:49:54|11524|f   |10        |19.93|Ul         |5  |NA   |NA                 |NA                 

我已尝试使用本文中的方法: 根据来自另一个数据框的多个列条件创建列

I have tried using the method from this post: Create column based on multiple column conditions from another dataframe

但是,我在结果中仅获得站点ID,而没有获得StartDateTime和EndDateTime 如何在结果中获取这些列

however I get only the Site Id in my result and not the StartDateTime and EndDateTime How can i get these columns in my result

尝试的代码:

for key, grp in df2.groupby('sId'):
    cols = ['BID', 'StartDateTime', 'EndDateTime']
    masks = (df1['BID'].eq(bid) & df1['Datetime'].between(start, end) for bid, start, end in grp[cols].itertuples(index=False))
    df1.loc[pd.concat(masks, axis=1).any(1), 'sId'] = key

df1['sId'] = df1['sId'].fillna('NA')
print(df1)

仅打印出

 |BID    |Datetime           |TrId |Code|LineNumber|Vol  |Grade      |PId|sId  
0|1002867|2019-08-19 01:27:53|1459 |f   |10        |33.88|Vd         |4  |10007
1|1002867|2019-08-19 01:39:05|1460 |f   |10        |18.13|EE         |5  |10007
2|1002867|2019-08-19 01:39:55|1461 |f   |10        |21.8 |Ad         |9  |10007
3|1002867|2019-08-19 01:39:55|1461 |f   |20        |500  |Vd         |10 |10007
4|1002147|2019-08-19 01:26:21|2764 |f   |10        |33.86|V9         |3  |10006
5|1002147|2019-10-19 01:31:57|2765 |f   |10        |3.48 |V9         |2  |10006
9|1001257|2019-08-19 01:49:54|11524|f   |10        |19.93|Ul         |5  |NA   

推荐答案

假定 df2 中的'sId'始终填充值,下面的代码将提供所需的结果:

Assuming that 'sId' in df2 is always filled with value, than code below provides exactly the desired result:

df3 = pd.merge(df1, df2, on='BID', how="left")
result = df3[df3['Datetime'].between(df3.StartDateTime, df3.EndDateTime) | df3.sId.isna()]

这篇关于根据来自另一个数据框的多个列条件创建多个列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆