pandas 日期时间索引累计周 [英] Pandas datetime index cumulative week

查看:44
本文介绍了 pandas 日期时间索引累计周的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带日期时间索引的数据框.

<预><代码>>>>df.head()出[6]:12004-01-02 09:00:00+11:00 0.75192004-01-02 10:00:00+11:00 0.75202004-01-02 12:00:00+11:00 0.75152004-01-02 13:00:00+11:00 0.75022004-01-02 14:00:00+11:00 0.7519

我想跟踪周数.我不知道是否可能会错过某些日子,所以不能简单地将条目数除以 7.

如果我执行以下操作,我会在一年内得到周数:

df['temp']= df.index.weekdf[df.index.year==2005].head()出[20]:1 温度2005-01-03 10:00:00+11:00 0.7829 12005-01-03 11:00:00+11:00 0.7815 12005-01-03 12:00:00+11:00 0.7814 12005-01-03 13:00:00+11:00 0.7797 12005-01-03 14:00:00+11:00 0.7731 1

这个问题是周以 52 结束,下一年又从 1 开始.我以为我可以按年份和周数分组以获得累计周数,但同一周可能会分为 2 年,例如

<预><代码>>>>df[df.index.year==2008].resample('d').tail()出[30]:1 温度2008-12-27 00:00:00+11:00 0.683678 522008-12-28 00:00:00+11:00 NaN NaN2008-12-29 00:00:00+11:00 0.689414 12008-12-30 00:00:00+11:00 0.690654 12008-12-31 00:00:00+11:00 0.691058 1>>>df[df.index.year==2009].resample('d').head()出[29]:1 温度2009-01-01 00:00:00+11:00 0.695833 12009-01-02 00:00:00+11:00 0.697680 12009-01-03 00:00:00+11:00 0.705733 12009-01-04 00:00:00+11:00 NaN NaN2009-01-05 00:00:00+11:00 0.711436 2

有没有办法跟踪累计周数?

解决方案

Ken Wei 的解决方案不完整的原因是前一年开始的几周,但大多数发生在明年,pandas 属性为第一,你可以请参见下面的示例:

 weekIndex weekNum<DTYYYYMMDD>2001-12-28 200152 522001-12-31 200101 12002-01-02 200201 12002-01-03 200201 1

如您所见,一周已重复.

作为解决方案,我建议使用创建列表的循环,它很容易转换为 Pandas DataFrame:

df['weekNum'] = df.index.week最后_x = 0分子 = 0cumWeek = 列表()对于 df['weekNum'] 中的 x:如果 x != last_x:分子 += 1cumWeek.append(分子)别的:cumWeek.append(分子)最后_x = xcumWeek = pd.DataFrame(cumWeek, columns=['cumWeek'], index=df.index)df = pd.concat([df, cumWeek], 轴=1)

cumWeek 单独存储所需的输出.

I have a dataframe with datetimeindex.

>>> df.head() 
Out[6]: 
                                1
2004-01-02 09:00:00+11:00  0.7519
2004-01-02 10:00:00+11:00  0.7520
2004-01-02 12:00:00+11:00  0.7515
2004-01-02 13:00:00+11:00  0.7502
2004-01-02 14:00:00+11:00  0.7519

I want to keep track of weeks count. I don't know upfront if some days might be missing so can't simply divide entry number by 7.

If I do the following, I get the weeks within a year:

df['temp']= df.index.week
df[df.index.year==2005].head()
Out[20]: 
                                1  temp
2005-01-03 10:00:00+11:00  0.7829     1
2005-01-03 11:00:00+11:00  0.7815     1
2005-01-03 12:00:00+11:00  0.7814     1
2005-01-03 13:00:00+11:00  0.7797     1
2005-01-03 14:00:00+11:00  0.7731     1

The problem with this is that weeks end at 52 and start again at 1 for the next year. I thought I could group by year and week number to get cumulative week count but the same week could fall into 2 yrs e.g.

>>> df[df.index.year==2008].resample('d').tail()
Out[30]: 
                                  1  temp

2008-12-27 00:00:00+11:00  0.683678    52
2008-12-28 00:00:00+11:00       NaN   NaN
2008-12-29 00:00:00+11:00  0.689414     1
2008-12-30 00:00:00+11:00  0.690654     1
2008-12-31 00:00:00+11:00  0.691058     1

>>> df[df.index.year==2009].resample('d').head()
Out[29]: 
                                  1  temp
2009-01-01 00:00:00+11:00  0.695833     1
2009-01-02 00:00:00+11:00  0.697680     1
2009-01-03 00:00:00+11:00  0.705733     1
2009-01-04 00:00:00+11:00       NaN   NaN
2009-01-05 00:00:00+11:00  0.711436     2

Is there a way to keep track of cumulative weeks?

解决方案

The reason Ken Wei's solution is incomplete is that weeks that starts previous year, but are majority of them happens the next year, pandas attributes as firsts, what you can see in the example below:

              weekIndex  weekNum
<DTYYYYMMDD>                    
2001-12-28       200152       52
2001-12-31       200101        1
2002-01-02       200201        1
2002-01-03       200201        1

As you can see, one week has been duplicated.

As the solution, I suggest using loops that create list, which is easily convertable to pandas DataFrame:

df['weekNum'] = df.index.week

last_x = 0
numerator = 0
cumWeek = list()

for x in df['weekNum']:
    if x != last_x:
        numerator += 1
        cumWeek.append(numerator)
    else:
        cumWeek.append(numerator)
    last_x = x

cumWeek = pd.DataFrame(cumWeek, columns=['cumWeek'], index=df.index)
df = pd.concat([df, cumWeek], axis=1)

cumWeek stores desired output alone.

这篇关于 pandas 日期时间索引累计周的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆