根据两个dfs上的date列,将一个df的列追加到另一df-Pandas [英] Append a column from one df to another based on the date column on both dfs - pandas
问题描述
我有两个df,如下所示。
I have two dfs as shown below.
df1:
Date t_factor
2020-02-01 5
2020-02-02 23
2020-02-03 14
2020-02-04 23
2020-02-05 23
2020-02-06 23
2020-02-07 30
2020-02-08 29
2020-02-09 100
2020-02-10 38
2020-02-11 38
2020-02-12 38
2020-02-13 70
2020-02-14 70
2020-02-15 38
2020-02-16 38
2020-02-17 70
2020-02-18 70
2020-02-19 38
2020-02-20 38
2020-02-21 70
2020-02-22 70
2020-02-23 38
2020-02-24 38
2020-02-25 70
2020-02-26 70
2020-02-27 70
df2:
From to plan score
2020-02-03 2020-02-05 start 20
2020-02-07 2020-02-08 foundation 25
2020-02-10 2020-02-12 learn 10
2020-02-14 2020-02-16 practice 20
2020-02-15 2020-02-21 exam 30
2020-02-20 2020-02-23 test 10
从上面我想将 plan
列附加到基于df1在df2和日期
中的从
和到
日期值
From the above I would like to append the plan
column to df1 based on the From
and to
date value in df2 and Date
value in df1.
期望的输出:
output_df
output_df
Date t_factor plan
2020-02-01 5 NaN
2020-02-02 23 NaN
2020-02-03 14 start
2020-02-04 23 start
2020-02-05 23 start
2020-02-06 23 NaN
2020-02-07 30 foundation
2020-02-08 29 foundation
2020-02-09 100 NaN
2020-02-10 38 learn
2020-02-11 38 learn
2020-02-12 38 learn
2020-02-13 70 NaN
2020-02-14 70 practice
2020-02-15 38 NaN
2020-02-16 38 NaN
2020-02-17 70 exam
2020-02-18 70 exam
2020-02-19 38 exam
2020-02-20 38 NaN
2020-02-21 70 NaN
2020-02-22 70 test
2020-02-23 38 test
2020-02-24 38 NaN
2020-02-25 70 NaN
2020-02-26 70 NaN
2020-02-27 70 NaN
注意:
如果有任何重叠的日期,则保留该日期的计划为NaN。
If there is any overlapping date, then keep plan as NaN for that date.
示例:
2020-02-14
到 2020-02-16
计划
是做法
。
和 2020-02-15
到 2020-02-21
plan
是考试
。
因此在 2020-02-15
上存在重叠 2020-02-16
。
因此,计划
应该为 NaN
在该日期范围内。
Hence plan
should be NaN
for that date range.
我想实现func
def (df1, df2)
return output_df
推荐答案
使用:(如果 From $ c $数据框
df2
中的c>和至
日期重叠,我们需要从列中选择值计划
可能的最早日期)
Use: (This solution if From
and to
dates in dataframe df2
overlaps and we need to choose the values from column plan
with respect to earliest date possible)
d1 = df1.sort_values('Date')
d2 = df2.sort_values('From')
df = pd.merge_asof(d1, d2[['From', 'plan']], left_on='Date', right_on='From')
df = pd.merge_asof(df, d2[['to', 'plan']], left_on='Date', right_on='to',
direction='forward', suffixes=['', '_r']).drop(['From', 'to'], 1)
df['plan'] = df['plan'].mask(df['plan'].ne(df.pop('plan_r')))
详细信息:
使用 pd.merge_asof
对 d1
和 d2
asof合并 >在相应列 Date
和 From
上,默认情况下 direction ='backward'
创建新的合并数据框 df
,再次使用 pd.merge_asof
到 asof合并数据框 df
和 d2
在相应列 Date
和至
,其中 direction ='forward'
。
Use pd.merge_asof
to perform a asof merge on the dataframes d1
and d2
on corresponding columns Date
and From
with default direction='backward'
to create a new merged dataframe df
, again use pd.merge_asof
to asof merge the dataframes df
and d2
on corresponding columns Date
and to
with direction='forward'
.
print(df)
Date t_factor plan plan_r
0 2020-02-01 5 NaN start
1 2020-02-02 23 NaN start
2 2020-02-03 14 start start
3 2020-02-04 23 start start
4 2020-02-05 23 start start
5 2020-02-06 23 start foundation
6 2020-02-07 30 foundation foundation
7 2020-02-08 29 foundation foundation
8 2020-02-09 100 foundation learn
9 2020-02-10 38 learn learn
10 2020-02-11 38 learn learn
11 2020-02-12 38 learn learn
12 2020-02-13 70 learn practice
13 2020-02-14 70 practice practice
14 2020-02-15 38 exam practice
15 2020-02-16 38 exam practice
16 2020-02-17 70 exam exam
17 2020-02-18 70 exam exam
18 2020-02-19 38 exam exam
19 2020-02-20 38 test exam
20 2020-02-21 70 test exam
21 2020-02-22 70 test test
22 2020-02-23 38 test test
23 2020-02-24 38 test NaN
24 2020-02-25 70 test NaN
25 2020-02-26 70 test NaN
26 2020-02-27 70 test NaN
使用 Series.ne
+ Series.mask
以屏蔽 plan
列中的值,其中 plan
不等于 plan_r
。
Use Series.ne
+ Series.mask
to mask the values in column plan
where plan
is not equal to plan_r
.
print(df)
Date t_factor plan
0 2020-02-01 5 NaN
1 2020-02-02 23 NaN
2 2020-02-03 14 start
3 2020-02-04 23 start
4 2020-02-05 23 start
5 2020-02-06 23 NaN
6 2020-02-07 30 foundation
7 2020-02-08 29 foundation
8 2020-02-09 100 NaN
9 2020-02-10 38 learn
10 2020-02-11 38 learn
11 2020-02-12 38 learn
12 2020-02-13 70 NaN
13 2020-02-14 70 practice
14 2020-02-15 38 NaN
15 2020-02-16 38 NaN
16 2020-02-17 70 exam
17 2020-02-18 70 exam
18 2020-02-19 38 exam
19 2020-02-20 38 NaN
20 2020-02-21 70 NaN
21 2020-02-22 70 test
22 2020-02-23 38 test
23 2020-02-24 38 NaN
24 2020-02-25 70 NaN
25 2020-02-26 70 NaN
26 2020-02-27 70 NaN
这篇关于根据两个dfs上的date列,将一个df的列追加到另一df-Pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!