如果在另一个df中的日期范围之间,python将值赋给pandas df [英] python assign value to pandas df if falls between range of dates in another df

查看:228
本文介绍了如果在另一个df中的日期范围之间,python将值赋给pandas df的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果日期在另一个数据框中的两个日期之间,则创建新列和分配值的最佳方法是什么?

What is the best way to create a new column and assign a value if date falls between two dates in another dataframe ?

例如

dataframe A    
date          values
2017-05-16      x  
2017-04-12      Y


dataframe B    #df contains dates to use to filter and associated id

start            end           id
2017-05-08     2017-05-18      34
2017-04-24     2017-05-08      33
2017-04-03     2017-04-24      32

所需结果

dataframe A     
date          values    id
2017-05-16      x       34 
2017-04-12      Y       32

我研究了pd.cut,它似乎无法满足我的要求,并且编写循环以遍历具有多个条件的数据框似乎效率不高.

I have looked into pd.cut which doesn't seem to work for what I want and it seems inefficient to write a loop to iterate over the dataframe with multiple conditions.

推荐答案

使用IntervalIndex,这是Pandas 0.20.0中的新功能.不过,这似乎仍处于实验阶段,因此其他解决方案可能更可靠.

Using an IntervalIndex, which is new in Pandas 0.20.0. This looks to still be in the experimental phase though, so other solutions may be more reliable.

# Get the 'id' column indexed by the 'start'/'end' intervals.
s = pd.Series(df_b['id'].values, pd.IntervalIndex.from_arrays(df_b['start'], df_b['end']))

# Map based on the date of df_a.
df_a['id'] = df_a['date'].map(s)

结果输出:

        date values  id
0 2017-05-16      x  34
1 2017-04-12      Y  32

或者,如果您不介意更改df_b的索引,则可以直接将其转换为IntervalIndex:

Alternatively, if you don't mind altering the index of df_b, you could just directly convert to an IntervalIndex on it:

# Create an IntervalIndex on df_b.
df_b = df_b.set_index(['start', 'end'])
df_b.index = pd.IntervalIndex.from_tuples(df_b.index)

# Map based on the date of df_a.
df_a['id'] = df_a['date'].map(df_b['id'])

这篇关于如果在另一个df中的日期范围之间,python将值赋给pandas df的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆