评估df每行中的日期时间函数是否在另一df中的日期时间范围内 [英] assessing if date time function in each row of df falls within range of date time in another df

查看:271
本文介绍了评估df每行中的日期时间函数是否在另一df中的日期时间范围内的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是python新手,需要一些有关日期时间函数的问题的帮助。

I am new to python, and need some help with a question I am having regarding the date time function.

我有 df_a 的标题为 time 的列,我试图在此<$ c中创建一个新的列 id $ c> df_a 。

I have df_a which has a column titled time, and I am trying to create a new column id in this df_a.

我希望 id 列由时间确定包含在日期之间的 df_b 列中的时间范围内。和 date_new,例如第一行的日期为 2019-01-07 20:52:41,和 date_new的 2019-01-07 21:07:41 (15分钟的时间间隔),我希望该行的索引在时间为 2019-01-07 20时作为我的ID显示在 df_a 中:56:30; (即id = 0),对于 df_a

I want the id column to be determined by whether or not the time is contained within a range of times on df_b columns between "date" and "date_new", for example the first row has a date of "2019-01-07 20:52:41" and "date_new" of "2019-01-07 21:07:41" (a 15 minute time interval), I would like the index for this row, to appear as my id in df_a for when the time is "2019-01-07 20:56:30" (i.e. with id=0) and so on for all the rows in df_a,

中的所有行,依此类推,这个问题是相似的,但无法弄清楚当我不断获得它时如何使其与我的作品

This question is similar, but cannot figure out how to make it work with mine as I keep getting

python assign value to pandas df if falls between range of dates in another df

s = pd.Series(df_b['id'].values,pd.IntervalIndex.from_arrays(df_b['date'],df_b['date_new'])) 
df_a['id']=df_a['time'].map(s)



ValueError :不能处理非唯一索引

ValueError: cannot handle non-unique indices

一个警告是df_b中的范围并不总是唯一的,这意味着某些间隔包含相同的时间段,在这种情况下,如果它使用其所属的df_b中的第一个时间段的id是很好的,另外df_b中有200多个行,df_a中有2000行,因此定义每个tim会花费很长时间。以for循环类型格式输入句点,除非有比定义每个句点更简单的方法,否则在此先感谢您的帮助!如果可以进行任何澄清,请通知我!

one caveat is that the ranges in df_b are not always unique, meaning some of the intervals contain the same periods of time, in these cases its fine if it uses the id of the first time period in df_b that it falls in, additionally there are over 200 rows in df_b, and 2000 in df_a, so it will take to long to define each time period in a for-loop type format, unless there is an easier way to do it than defining each, thank you in advance for all of your help! if this could use any clarification please let me know!

df_a

time                    id
2019-01-07 22:02:56     NaN
2019-01-07 21:57:12     NaN
2019-01-08 09:35:30     NaN


df_b

date                    date_new               id
2019-01-07 21:50:56    2019-01-07 22:05:56     0
2019-01-08 09:30:30    2019-01-08 09:45:30     1

Expected Result

df_a     
time                    id
2019-01-07 22:02:56     0
2019-01-07 21:57:12     0
2019-01-08 09:35:30     1


推荐答案

让我重新说明您的问题。对于数据框 df_a 中的每一行,您要检查其在 df_a ['time'] 中的值是否在区间内由列 df_b ['date'] df_b ['date_new'] 中的值给出。如果是这样,则将 df_a [ id] 中的值设置为相应的 df_b [ id] 中的值

Let me rephrase your problem. For each row in dataframe df_a you want to check whether its value in df_a['time'] is in the interval given by the values in columns df_b['date'] and df_b['date_new']. If so, set the value in df_a["id"] as that in the corresponding df_b["id"].

如果这是您的问题,这是一个(非常粗糙的)解决方案:

If this is your question, this is a (very rough) solution:

for ia, ra in df_a.iterrows():
    for ib, rb in df_b.iterrows():
        if (ra["time"]>=rb['date']) & (ra["time"]<=rb['date_new']):
            df_a.loc[ia, "id"] = rb["id"]
            break

这篇关于评估df每行中的日期时间函数是否在另一df中的日期时间范围内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆