在Python中将具有多个索引的时间间隔划分为每小时的时段 [英] Dividing time intervals with multiple index into hourly buckets in Python

查看:630
本文介绍了在Python中将具有多个索引的时间间隔划分为每小时的时段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我拥有的示例数据集的代码

here is the code for the sample data set I have

data={'ID':[4,4,4,4,22,22,23,25,29],
      'Zone':[32,34,21,34,27,29,32,75,9],
  'checkin_datetime':['04-01-2019 13:07','04-01-2019 13:09','04-01-2019 14:06','04-01-2019 14:55','04-01-2019 20:23'
  ,'04-01-2019 21:38','04-01-2019 21:38','04-01-2019 23:22','04-02-2019 01:00'],
  'checkout_datetime':['04-01-2019 13:09','04-01-2019 13:12','04-01-2019 14:07','04-01-2019 15:06','04-01-2019 21:32'
                       ,'04-01-2019 21:42','04-01-2019 21:45','04-02-2019 00:23','04-02-2019 06:15']
}

df = pd.DataFrame(data,columns= ['ID','Zone', 'checkin_datetime','checkout_datetime'])

df['checkout_datetime'] = pd.to_datetime(df['checkout_datetime'])
df['checkin_datetime'] = pd.to_datetime(df['checkin_datetime'])

使用此数据集,我试图创建以下数据集

Using this data set I am trying to create the following data set

                Checked_in_hour    ID    Zone    checked_in_minutes
                01-04-2019 13:00    4    32        2
                01-04-2019 13:00    4    34        3
                01-04-2019 14:00    4    21        1
                01-04-2019 14:00    4    34        5
                01-04-2019 15:00    4    34        6
                01-04-2019 20:00    22    27       37
                01-04-2019 20:00    22    27       8
                01-04-2019 20:00    22    27       37
                01-04-2019 21:00    22    29       4
                01-04-2019 21:00    23    32       7
                01-04-2019 23:00    25    75       38
                02-04-2019 00:00    25    75       24
                02-04-2019 01:00    29    9        60
                02-04-2019 02:00    29    9        60
                02-04-2019 03:00    29    9        60
                02-04-2019 04:00    29    9        60
                02-04-2019 05:00    29    9        60
                02-04-2019 06:00    29    9        16

签入小时数是通过减去checkin_datetime和checkout_datetime来计算的,并且该时间按小时和时区分组

Where Checked in hour is calculated by subtracting the checkin_datetime and the checkout_datetime and the time is grouped by hours and Zone

到目前为止,这是我在Checked_in_hour级别进行计算的代码,需要在Zone Variable中添加

This is code I have so far which calculates this at Checked_in_hour level which I need to add on the Zone Variable

#working logic
df2 = pd.DataFrame(
index=pd.DatetimeIndex(
    start=df['checkin_datetime'].min(),
    end=df['checkout_datetime'].max(),freq='1T'),
    columns = ['is_checked_in','ID'], data=0)

for index, row in df.iterrows():
    df2['is_checked_in'][row['checkin_datetime']:row['checkout_datetime']] = 1
    df2['ID'][row['checkin_datetime']:row['checkout_datetime']] = row['ID']

df3 = df2.resample('1H').aggregate({'is_checked_in': sum,'ID':max})

推荐答案

不确定这是否有效,但应该可以.

Not sure if this is efficient, but should work.

import pandas as pd
from datetime import timedelta

def group_into_hourly_buckets(df):
    df['duration'] = df['checkout_datetime'] - df['checkin_datetime']
    grouped_data = []
    for idx, row in df.iterrows():
        dur = row['duration'].seconds//60
        start_time = row['checkin_datetime']
        hours_ = 0
        while dur > 0:
            _data = {}
            _data['Checked_in_hour'] = start_time.floor('H') + timedelta(hours=hours_)
            time_spent_in_window = min(dur, 60)
            if (hours_ == 0):
                time_spent_in_window = min(time_spent_in_window, ((start_time.ceil('H') - start_time).seconds)//60)
            _data['checked_in_minutes'] = time_spent_in_window
            _data['ID'] = row['ID']
            _data['Zone'] = row['Zone']
            dur -= time_spent_in_window
            hours_ += 1
            grouped_data.append(_data)
    return pd.DataFrame(grouped_data)

这篇关于在Python中将具有多个索引的时间间隔划分为每小时的时段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆