pandas 日期时间周与预期不符 [英] Pandas datetime week not as expected

查看:85
本文介绍了 pandas 日期时间周与预期不符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用Pandas日期时间时,我试图按星期和年份对数据进行分组.但是,我注意到有些年份的最后一天与同年的第一周分组在一起.

When working with Pandas datetimes, I'm trying to group data by the week and year. However, I have noticed some years where the last day of the year ends up grouped with the first week of the same year.

import pandas as pd
day_df = pd.DataFrame(index=pd.date_range('2016-01-01', '2020-12-31'))

for (week, year), subset in day_df.groupby([day_df.index.week, day_df.index.year]):
     if week == 1:
         print('Week:', subset.index.min(), subset.index.max())

Week: 1 2016-01-04 00:00:00 2016-01-10 00:00:00
Week: 1 2017-01-02 00:00:00 2017-01-08 00:00:00
Week: 1 2018-01-01 00:00:00 2018-12-31 00:00:00
Week: 1 2019-01-01 00:00:00 2019-12-31 00:00:00
Week: 1 2020-01-01 00:00:00 2020-01-05 00:00:00

对于2018年和2019年,一年的第一天以一年的最后一天分组!这是预期的行为吗?为什么一年的最后一天是第1周?

For 2018 and 2019, the first day of the year ends up grouped with the final day of the year! Is this behavior expected? Why would the final day of the year be week 1?

我已经通过基本的if语句获得了想要的结果,但是week的这种行为似乎会导致问题,因为这是意外的.

I've gotten the result I want with a basic if statement, but this week behavior seems like it could lead to issues because it's unexpected.

这符合我对分组的意图:

This does what I intended with the grouping:

for (week, year), subset in day_df.groupby([day_df.index.week, day_df.index.year]):
    # Prevent first week of year from including final days of same year
    if set(subset.index.month.unique()) == set([1, 12]):
        subset = subset.loc[subset.index.month == 1]
    if week == 1:
        print('Week:', week, subset.index.min(), subset.index.max())

Week: 1 2016-01-04 00:00:00 2016-01-10 00:00:00
Week: 1 2017-01-02 00:00:00 2017-01-08 00:00:00
Week: 1 2018-01-01 00:00:00 2018-01-07 00:00:00
Week: 1 2019-01-01 00:00:00 2019-01-06 00:00:00
Week: 1 2020-01-01 00:00:00 2020-01-05 00:00:00

推荐答案

答案是,.week()是一周的序数. .week()方法在文档中的最低定义为:

The answer to this lies in the fact that .week() is a week ordinal. The .week() method is minimally defined in the docs as:

DatetimeIndex.week

一年中的第几周

The week ordinal of the year

周序号被正式称为ISO周日期.可以在 date.isocalendar( )在python 3.7.3 datetime文档中.有关周序工作原理的一般说明,您可以在Wikipedia中的 ISO周日期.

The week ordinal is formally known as the ISO week date. Further notes about it in python can be found in under date.isocalendar() in the python 3.7.3 datetime docs. For a general explanation of how the week ordinal works, you can find full detail in wikipedia at ISO week date.

可以在 EpochConverter.com 上找到该周的序号,其中清楚地显示了第一天的年度是2018年12月31日.

The 2019's week ordinal can be found on EpochConverter.com where it clearly shows the first day of the year being December 31. 2018.

如果我们查看2019年的第1周,则可以看到12月31日是第一天,并开始了2019年的第1周.因此,这实际上符合您在年初将其包含在过滤器中的条件.

If we look at week 1 for 2019, we can see that Dec 31 is the first day and starts the week 1 for 2019. So this in fact correctly meets your criteria for being included in your filter for the beginning of the year.

下面,我们过滤2018年底和2019年初,以了解.week在做什么.

Below we filter the end of 2018 and the beginning of 2019 to see what .week is doing.

day_df["ordinal"] = day_df.index.week
day_df["day_of_week"] = day_df.index.weekday
print(day_df.loc["2018-12-28":"2019-01-08"])



             ordinal  day_of_week
2018-12-28       52            4
2018-12-29       52            5
2018-12-30       52            6
2018-12-31        1            0
2019-01-01        1            1
2019-01-02        1            2
2019-01-03        1            3
2019-01-04        1            4
2019-01-05        1            5
2019-01-06        1            6
2019-01-07        2            0
2019-01-08        2            1

您将需要添加一个月标准,以确保它是您在上述问题中发现的1月.这也可以.

You will need to add in a month criteria to ensure it is in January as you discovered in your question above. This works too.

for (week, month, year), subset in day_df.groupby(
    [day_df.index.week, day_df.index.month, day_df.index.year]
):
    if week == 1 and month == 1:
        print("Week:", subset.index.min(), subset.index.max())

如果希望第一天在同一天开始,请使用 5

If you wish to have the first week start on the same day, use [pandas.period.strftime()]5

%U定义为

一年中的一周编号(星期日为一周的第一天),为十进制数字[00,53].新年第一天之前的所有天数 星期日被认为是在第0周.

Week number of the year (Sunday as the first day of the week) as a decimal number [00,53]. All days in a new year preceding the first Sunday are considered to be in week 0.

对于您的数据框,它看起来像:

For your dataframe, this would look like:

day_df['date'] = day_df.index
day_df["day_name"] = day_df['date'].dt.day_name()
day_df['str_from_time'] = day_df['date'].apply(lambda x: x.strftime("%U"))
day_df.loc["2018-12-28":"2019-01-08",['ordinal', 'str_from_time', 'day_of_week', 'day_name']]

            ordinal str_from_time  day_of_week   day_name
2018-12-28       52            51            4     Friday
2018-12-29       52            51            5   Saturday
2018-12-30       52            52            6     Sunday
2018-12-31        1            52            0     Monday
2019-01-01        1            00            1    Tuesday
2019-01-02        1            00            2  Wednesday
2019-01-03        1            00            3   Thursday
2019-01-04        1            00            4     Friday
2019-01-05        1            00            5   Saturday
2019-01-06        1            01            6     Sunday
2019-01-07        2            01            0     Monday
2019-01-08        2            01            1    Tuesday

这篇关于 pandas 日期时间周与预期不符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆