pandas :按日期中断日期时间间隔 [英] Pandas: break datetime intervals by day

查看：58 发布时间：2020/5/24 2:46:34 python pandas

本文介绍了 pandas :按日期中断日期时间间隔的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个具有日期时间间隔的DataFrame，就像这样一个:

I have a DataFrame with datetime intervals, like this one:


   id            start_date              end_date
1   1   2016-10-01 00:00:00   2016-10-01 03:00:00
2   1   2016-10-03 05:30:00   2016-10-03 06:30:00
3   2   2016-10-03 23:30:00   2016-10-04 01:00:00  # This line should be splitted
4   1   2016-10-04 05:00:00   2016-10-04 06:00:00
5   2   2016-10-04 05:50:00   2016-10-04 06:00:00
6   1   2016-10-05 18:50:00   2016-10-06 02:00:00  # This one too
....

我想分割"超过一天的时间间隔，以确保每一行都在同一天:

I'd like to "split" the intervals that cover more than one day, to ensure that each rows falls on the same day:


     id            start_date              end_date
1     1   2016-10-01 00:00:00   2016-10-01 03:00:00
2     1   2016-10-03 05:30:00   2016-10-03 06:30:00
3     2   2016-10-03 23:30:00   2016-10-03 23:59:59 # Splitted
4     2   2016-10-04 00:00:00   2016-10-04 01:00:00 # Splitted
5     1   2016-10-04 05:00:00   2016-10-04 06:00:00
6     2   2016-10-04 05:50:00   2016-10-04 06:00:00
7     1   2016-10-05 18:50:00   2016-10-05 23:59:59 # Splitted
8     1   2016-10-06 00:00:00   2016-10-06 02:00:00 # Splitted
....

推荐答案

您可以使用 .dt访问器，以创建执行更新的位置的布尔索引，然后相应地进行调整:

You can use the .dt accessor to create a Boolean index of where to perform the updates, and then make the adjustments accordingly:

# Get the rows to split.
split_rows = (df['start_date'].dt.date != df['end_date'].dt.date)

# Get the new rows to append, adjusting the start_date to the next day.
new_rows = df[split_rows].copy()
new_rows['start_date'] = new_rows['start_date'].dt.date + pd.DateOffset(days=1)

# Adjust the end_date of the existing rows.
df.loc[split_rows, 'end_date'] = df.loc[split_rows, 'start_date'].dt.date + pd.DateOffset(days=1, seconds=-1)

# Append the new rows to the existing dataframe.
df = df.append(new_rows).sort_index().reset_index(drop=True)

以上过程假设start_date和end_date跨度之间的日期差只有一天.如果可能存在多天跨度，则可以将上述过程包装在while循环中:

The process above assumes that there will only be one day between difference in dates between start_date and end_date spans. If it's possible that there are multi-day spans, you can wrap the above process in a while loop:

# Get the rows to split.
split_rows = (df['start_date'].dt.date != df['end_date'].dt.date)

while split_rows.any():
    # Get the new rows, adjusting the start_date to the next day.
    new_rows = df[split_rows].copy()
    new_rows['start_date'] = new_rows['start_date'].dt.date + pd.DateOffset(days=1)

    # Adjust the end_date of the existing rows.
    df.loc[split_rows, 'end_date'] = df.loc[split_rows, 'start_date'].dt.date + pd.DateOffset(days=1, seconds=-1)

    # Append the new rows to the existing dataframe.
    df = df.append(new_rows).sort_index().reset_index(drop=True)

    # Get new rows to split (if the start_date to end_date span is more than 1 day).
    split_rows = (df['start_date'].dt.date != df['end_date'].dt.date)

示例数据的结果输出:

   id          start_date            end_date
0   1 2016-10-01 00:00:00 2016-10-01 03:00:00
1   1 2016-10-03 05:30:00 2016-10-03 06:30:00
2   2 2016-10-03 23:30:00 2016-10-03 23:59:59
3   2 2016-10-04 00:00:00 2016-10-04 01:00:00
4   1 2016-10-04 05:00:00 2016-10-04 06:00:00
5   2 2016-10-04 05:50:00 2016-10-04 06:00:00
6   1 2016-10-05 18:50:00 2016-10-05 23:59:59
7   1 2016-10-06 00:00:00 2016-10-06 02:00:00

这篇关于 pandas :按日期中断日期时间间隔的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas :按日期中断日期时间间隔 [英] Pandas: break datetime intervals by day

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas :按日期中断日期时间间隔 [英] Pandas: break datetime intervals by day

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭