反转数据框中给定日期的时间 [英] Invert time on given date in dataframe

查看:91
本文介绍了反转数据框中给定日期的时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于包含开始时间和结束时间的数据框,我想反转"它是给定日期的时间.当然,有一种强力"方法可以通过遍历数据框并具有很多if条件来做到这一点,但我想知道是否有更优雅的方法,例如回填/正向填充.

For a dataframe containing start and end times I would like to "invert" it's times for a given date. There certainly is a "brute force" method to do it by looping through the dataframe and having a lot of if conditions, but I wonder if there is a more elegant way, like with backfill/forwardfill.

将其视为一个数据框,其中工作时间用带有开始和结束时间的行表示,而我最终想要的是该日期的空闲时间. 数据帧以单调递增的开始时间排序,并且在日期更改时已截止,因此时间始于例如2019-04-04 22:00和结束于2019-04-05 04:00由两行表示2019-04-04 22:00至2019-04-05 00:00和2019-04-05 00:00至2019-04-05 04:00.这应该使问题更容易.

Consider it as having a dataframe where working hours are represented by rows with start and end time and what I ultimately want is the freetime of this date. The dataframe is ordered in monotonic increasing start times and has cut offs at date change, so a time starting at e.g. 2019-04-04 22:00 and ending at 2019-04-05 04:00 is represented by two rows 2019-04-04 22:00 to 2019-04-05 00:00 and 2019-04-05 00:00 to 2019-04-05 04:00. This should make the problem easier.

示例代码:

import pandas as pd
import datetime

df = pd.DataFrame({'date': [datetime.date(2019, 4, 4), datetime.date(2019, 4, 5), datetime.date(2019, 4, 5)],
                   'start': [pd.Timestamp(2019, 4, 4, 10), pd.Timestamp(2019, 4, 5, 0), pd.Timestamp(2019, 4, 5, 14)],
                   'end': [pd.Timestamp(2019, 4, 4, 16), pd.Timestamp(2019, 4, 5, 4), pd.Timestamp(2019, 4, 5, 18)]})

所以从这里开始:

2019-04-04   2019-04-04 10:00:00   2019-04-04 16:00:00
2019-04-05   2019-04-05 00:00:00   2019-04-05 04:00:00
2019-04-05   2019-04-05 14:00:00   2019-04-05 18:00:00

我希望结果是这样的数据框:

I would expect the result to be a dataframe like this:

2019-04-04   2019-04-04 00:00:00   2019-04-04 10:00:00
2019-04-04   2019-04-04 16:00:00   2019-04-05 00:00:00
2019-04-05   2019-04-05 04:00:00   2019-04-05 14:00:00
2019-04-05   2019-04-05 18:00:00   2019-04-06 00:00:00

非常感谢您的帮助.

推荐答案

您可以使用

You can do this simply enough with shift. The problem comes with the last row that I'm trying to work out how best to reconstruct.

我在最后一行给了我最好的镜头,但是最终变得笨拙.对于最后一行的任何反馈,我都很高兴.原则上,使用shift将使超级简单.您显然可以在添加最后一行之前先删除startend,我只是在演示如何做到这一点而不会丢失数据.

I gave it my best shot on the last row but it ends up being a clumsy mess. I'd be happy for any feedback on that last row. In principle, using shift would make this super easy. You could obviously just drop start and end before adding the last row, I just went with showing how to do it with no data loss.

import pandas as pd
import numpy as np
import datetime as dt

df = pd.DataFrame({'date': [dt.date(2019, 4, 4), dt.date(2019, 4, 5), dt.date(2019, 4, 5)],
                   'start': [pd.Timestamp(2019, 4, 4, 10), pd.Timestamp(2019, 4, 5, 0), pd.Timestamp(2019, 4, 5, 14)],
                   'end': [pd.Timestamp(2019, 4, 4, 16), pd.Timestamp(2019, 4, 5, 4), pd.Timestamp(2019, 4, 5, 18)]})

df = df[['date', 'start', 'end']]

saved_shift_ending = df['end'].iloc[-1]  # we want end of last shift
saved_end_date = df['date'].iloc[-1]     # we also want the date value

start_date = df['date'].min()
end_date = (df['date'].max() + dt.timedelta(days=1))

df['other_start'] = df['end'].shift(1)
df['other_end'] = df['start']

df.loc[0, 'other_start'] = start_date # The first value is NaT after shift

last_row = pd.DataFrame([[saved_end_date.strftime('%Y-%m-%d'), 
                         np.nan, 
                         np.nan, 
                         saved_shift_ending, 
                         end_date]],
                        columns=['date', 'start', 'end', 'other_start',
                                'other_end'])

df = df.append(last_row)

df.drop(['start', 'end'], axis=1, inplace=True)
print(df)

这篇关于反转数据框中给定日期的时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆