将班次数据(开始和结束时间)拆分为每小时数据 [英] unstacking shift data (start and end time) into hourly data

查看:83
本文介绍了将班次数据(开始和结束时间)拆分为每小时数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 df 如下所示,它显示了一个人何时开始轮班、结束轮班、工作时间和工作日期.

I have a df as follows which shows when a person started a shift, ended a shift, the amount of hours and the date worked.

Business_Date   Number PayTimeStart PayTimeEnd          Hours
0   2019-05-24  1       2019-05-24 11:00:00 2019-05-24 12:15:00 1.250
1   2019-05-24  2       2019-05-24 12:30:00 2019-05-24 13:30:00 1.00

现在我要做的是将其分解为每小时格式,因此我知道 11:00 - 12:00 之间使用了多少小时

Now what I'm trying to do is break this into an hourly format, so I know how many hours were used between 11:00 - 12:00

因此,在我的脑海中,对于上述情况,我想将 11 - 12 之间的 1 小时放入 11:00 的 bin 中,并将剩余的 0.25 放入下一个 12 的 bin 中

so, in my head, for the above, I want to put the 1 hour between 11 - 12 into the bin for 11:00 and the remainder 0.25 into the next bin of 12

所以我最终会得到类似

so I would end up with something like

    Business Date   Time Hour
0   2019-05-24  11:00 1
1   2019-05-24  12:00 0.75
2   2019-05-24  13:00 0.5

推荐答案

一个想法是使用分钟 - 首先对 Series 使用列表理解和展平,然后按 hours 分组> 用 hour s 来计数 GroupBy.size 并最后除以 60 最后几个小时:

One idea is working with minutes - first use list comprehension with flattening for Series and then grouping by hours with hours for count by GroupBy.size and last divide by 60 for final hours:

s = pd.Series([z for x, y in zip(df['Pay Time Start'], 
                                 df['Pay Time End'] - pd.Timedelta(60, unit='s')) 
                 for z in pd.date_range(x, y, freq='Min')])

df = (s.groupby([s.dt.date.rename('Business Date'), s.dt.hour.rename('Time')])
       .size()
       .div(60)
       .reset_index(name='Hour'))
print (df)
  Business Date  Time  Hour
0    2019-05-24    11  1.00
1    2019-05-24    12  0.75
2    2019-05-24    13  0.50

如果您需要按位置或 ID 分组

If you need to group by a location or ID

 df1 = pd.DataFrame([(z, w) for x, y, w in zip(df['Pay Time Start'], 
                                              df['Pay Time End'] - pd.Timedelta(60, unit='s'), 
                                              df['Location']) for z in pd.date_range(x, y, freq='Min')], 
                   columns=['Date','Location']) 

 df = (df1.groupby([df1['Date'].dt.date.rename('Business Date'), 
                       df1['Date'].dt.hour.rename('Time'), df1['Location']]) 
          .size() .div(60) .reset_index(name='Hour'))

这篇关于将班次数据(开始和结束时间)拆分为每小时数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆