根据两个dfs上的date列,将一个df的列追加到另一df-Pandas [英] Append a column from one df to another based on the date column on both dfs - pandas

查看:61
本文介绍了根据两个dfs上的date列,将一个df的列追加到另一df-Pandas的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个df,如下所示。

I have two dfs as shown below.

df1:

Date                t_factor     
2020-02-01             5             
2020-02-02             23              
2020-02-03             14           
2020-02-04             23
2020-02-05             23  
2020-02-06             23          
2020-02-07             30            
2020-02-08             29            
2020-02-09             100
2020-02-10             38
2020-02-11             38               
2020-02-12             38                    
2020-02-13             70           
2020-02-14             70 
2020-02-15             38               
2020-02-16             38                    
2020-02-17             70           
2020-02-18             70 
2020-02-19             38               
2020-02-20             38                    
2020-02-21             70           
2020-02-22             70 
2020-02-23             38               
2020-02-24             38                    
2020-02-25             70           
2020-02-26             70 
2020-02-27             70 

df2:

From                to                   plan          score
2020-02-03          2020-02-05           start         20
2020-02-07          2020-02-08           foundation    25
2020-02-10          2020-02-12           learn         10
2020-02-14          2020-02-16           practice      20
2020-02-15          2020-02-21           exam          30
2020-02-20          2020-02-23           test          10

从上面我想将 plan 列附加到基于df1在df2和日期中的日期值

From the above I would like to append the plan column to df1 based on the From and to date value in df2 and Date value in df1.

期望的输出:

output_df

output_df

Date                t_factor        plan
2020-02-01             5            NaN
2020-02-02             23           NaN   
2020-02-03             14           start          
2020-02-04             23           start
2020-02-05             23           start  
2020-02-06             23           NaN
2020-02-07             30           foundation               
2020-02-08             29           foundation        
2020-02-09             100          NaN
2020-02-10             38           learn
2020-02-11             38           learn              
2020-02-12             38           learn                   
2020-02-13             70           NaN
2020-02-14             70           practice
2020-02-15             38           NaN              
2020-02-16             38           NaN                    
2020-02-17             70           exam      
2020-02-18             70           exam
2020-02-19             38           exam   
2020-02-20             38           NaN                 
2020-02-21             70           NaN         
2020-02-22             70           test
2020-02-23             38           test             
2020-02-24             38           NaN        
2020-02-25             70           NaN
2020-02-26             70           NaN
2020-02-27             70           NaN

注意:

如果有任何重叠的日期,则保留该日期的计划为NaN。

If there is any overlapping date, then keep plan as NaN for that date.

示例:

2020-02-14 2020-02-16 计划做法

2020-02-15 2020-02-21 plan 考试

因此在 2020-02-15 上存在重叠 2020-02-16

因此,计划应该为 NaN 在该日期范围内。

Hence plan should be NaN for that date range.

我想实现func

def (df1, df2)
    return output_df


推荐答案

使用:(如果 From df2 中的c>和日期重叠,我们需要从列中选择值计划可能的最早日期)

Use: (This solution if From and to dates in dataframe df2 overlaps and we need to choose the values from column plan with respect to earliest date possible)

d1 = df1.sort_values('Date')
d2 = df2.sort_values('From')


df = pd.merge_asof(d1, d2[['From', 'plan']], left_on='Date', right_on='From')
df = pd.merge_asof(df, d2[['to', 'plan']],   left_on='Date', right_on='to',
                   direction='forward', suffixes=['', '_r']).drop(['From', 'to'], 1)

df['plan'] = df['plan'].mask(df['plan'].ne(df.pop('plan_r')))

详细信息:

使用 pd.merge_asof d1 d2 asof合并 >在相应列 Date From 上,默认情况下 direction ='backward'创建新的合并数据框 df ,再次使用 pd.merge_asof asof合并数据框 df d2 在相应列 Date ,其中 direction ='forward'

Use pd.merge_asof to perform a asof merge on the dataframes d1 and d2 on corresponding columns Date and From with default direction='backward' to create a new merged dataframe df, again use pd.merge_asof to asof merge the dataframes df and d2 on corresponding columns Date and to with direction='forward'.

print(df)

         Date  t_factor        plan      plan_r
0  2020-02-01         5         NaN       start
1  2020-02-02        23         NaN       start
2  2020-02-03        14       start       start
3  2020-02-04        23       start       start
4  2020-02-05        23       start       start
5  2020-02-06        23       start  foundation
6  2020-02-07        30  foundation  foundation
7  2020-02-08        29  foundation  foundation
8  2020-02-09       100  foundation       learn
9  2020-02-10        38       learn       learn
10 2020-02-11        38       learn       learn
11 2020-02-12        38       learn       learn
12 2020-02-13        70       learn    practice
13 2020-02-14        70    practice    practice
14 2020-02-15        38        exam    practice
15 2020-02-16        38        exam    practice
16 2020-02-17        70        exam        exam
17 2020-02-18        70        exam        exam
18 2020-02-19        38        exam        exam
19 2020-02-20        38        test        exam
20 2020-02-21        70        test        exam
21 2020-02-22        70        test        test
22 2020-02-23        38        test        test
23 2020-02-24        38        test         NaN
24 2020-02-25        70        test         NaN
25 2020-02-26        70        test         NaN
26 2020-02-27        70        test         NaN

使用 Series.ne + Series.mask 以屏蔽 plan 列中的值,其中 plan 不等于 plan_r

Use Series.ne + Series.mask to mask the values in column plan where plan is not equal to plan_r.

print(df)

         Date  t_factor        plan
0  2020-02-01         5         NaN
1  2020-02-02        23         NaN
2  2020-02-03        14       start
3  2020-02-04        23       start
4  2020-02-05        23       start
5  2020-02-06        23         NaN
6  2020-02-07        30  foundation
7  2020-02-08        29  foundation
8  2020-02-09       100         NaN
9  2020-02-10        38       learn
10 2020-02-11        38       learn
11 2020-02-12        38       learn
12 2020-02-13        70         NaN
13 2020-02-14        70    practice
14 2020-02-15        38         NaN
15 2020-02-16        38         NaN
16 2020-02-17        70        exam
17 2020-02-18        70        exam
18 2020-02-19        38        exam
19 2020-02-20        38         NaN
20 2020-02-21        70         NaN
21 2020-02-22        70        test
22 2020-02-23        38        test
23 2020-02-24        38         NaN
24 2020-02-25        70         NaN
25 2020-02-26        70         NaN
26 2020-02-27        70         NaN

这篇关于根据两个dfs上的date列,将一个df的列追加到另一df-Pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆