pandas 日期时间索引累计周 [英] Pandas datetime index cumulative week
问题描述
我有一个带日期时间索引的数据框.
<预><代码>>>>df.head()出[6]:12004-01-02 09:00:00+11:00 0.75192004-01-02 10:00:00+11:00 0.75202004-01-02 12:00:00+11:00 0.75152004-01-02 13:00:00+11:00 0.75022004-01-02 14:00:00+11:00 0.7519我想跟踪周数.我不知道是否可能会错过某些日子,所以不能简单地将条目数除以 7.
如果我执行以下操作,我会在一年内得到周数:
df['temp']= df.index.weekdf[df.index.year==2005].head()出[20]:1 温度2005-01-03 10:00:00+11:00 0.7829 12005-01-03 11:00:00+11:00 0.7815 12005-01-03 12:00:00+11:00 0.7814 12005-01-03 13:00:00+11:00 0.7797 12005-01-03 14:00:00+11:00 0.7731 1
这个问题是周以 52 结束,下一年又从 1 开始.我以为我可以按年份和周数分组以获得累计周数,但同一周可能会分为 2 年,例如
<预><代码>>>>df[df.index.year==2008].resample('d').tail()出[30]:1 温度2008-12-27 00:00:00+11:00 0.683678 522008-12-28 00:00:00+11:00 NaN NaN2008-12-29 00:00:00+11:00 0.689414 12008-12-30 00:00:00+11:00 0.690654 12008-12-31 00:00:00+11:00 0.691058 1>>>df[df.index.year==2009].resample('d').head()出[29]:1 温度2009-01-01 00:00:00+11:00 0.695833 12009-01-02 00:00:00+11:00 0.697680 12009-01-03 00:00:00+11:00 0.705733 12009-01-04 00:00:00+11:00 NaN NaN2009-01-05 00:00:00+11:00 0.711436 2有没有办法跟踪累计周数?
Ken Wei 的解决方案不完整的原因是前一年开始的几周,但大多数发生在明年,pandas 属性为第一,你可以请参见下面的示例:
weekIndex weekNum<DTYYYYMMDD>2001-12-28 200152 522001-12-31 200101 12002-01-02 200201 12002-01-03 200201 1
如您所见,一周已重复.
作为解决方案,我建议使用创建列表的循环,它很容易转换为 Pandas DataFrame:
df['weekNum'] = df.index.week最后_x = 0分子 = 0cumWeek = 列表()对于 df['weekNum'] 中的 x:如果 x != last_x:分子 += 1cumWeek.append(分子)别的:cumWeek.append(分子)最后_x = xcumWeek = pd.DataFrame(cumWeek, columns=['cumWeek'], index=df.index)df = pd.concat([df, cumWeek], 轴=1)
cumWeek 单独存储所需的输出.
I have a dataframe with datetimeindex.
>>> df.head()
Out[6]:
1
2004-01-02 09:00:00+11:00 0.7519
2004-01-02 10:00:00+11:00 0.7520
2004-01-02 12:00:00+11:00 0.7515
2004-01-02 13:00:00+11:00 0.7502
2004-01-02 14:00:00+11:00 0.7519
I want to keep track of weeks count. I don't know upfront if some days might be missing so can't simply divide entry number by 7.
If I do the following, I get the weeks within a year:
df['temp']= df.index.week
df[df.index.year==2005].head()
Out[20]:
1 temp
2005-01-03 10:00:00+11:00 0.7829 1
2005-01-03 11:00:00+11:00 0.7815 1
2005-01-03 12:00:00+11:00 0.7814 1
2005-01-03 13:00:00+11:00 0.7797 1
2005-01-03 14:00:00+11:00 0.7731 1
The problem with this is that weeks end at 52 and start again at 1 for the next year. I thought I could group by year and week number to get cumulative week count but the same week could fall into 2 yrs e.g.
>>> df[df.index.year==2008].resample('d').tail()
Out[30]:
1 temp
2008-12-27 00:00:00+11:00 0.683678 52
2008-12-28 00:00:00+11:00 NaN NaN
2008-12-29 00:00:00+11:00 0.689414 1
2008-12-30 00:00:00+11:00 0.690654 1
2008-12-31 00:00:00+11:00 0.691058 1
>>> df[df.index.year==2009].resample('d').head()
Out[29]:
1 temp
2009-01-01 00:00:00+11:00 0.695833 1
2009-01-02 00:00:00+11:00 0.697680 1
2009-01-03 00:00:00+11:00 0.705733 1
2009-01-04 00:00:00+11:00 NaN NaN
2009-01-05 00:00:00+11:00 0.711436 2
Is there a way to keep track of cumulative weeks?
The reason Ken Wei's solution is incomplete is that weeks that starts previous year, but are majority of them happens the next year, pandas attributes as firsts, what you can see in the example below:
weekIndex weekNum
<DTYYYYMMDD>
2001-12-28 200152 52
2001-12-31 200101 1
2002-01-02 200201 1
2002-01-03 200201 1
As you can see, one week has been duplicated.
As the solution, I suggest using loops that create list, which is easily convertable to pandas DataFrame:
df['weekNum'] = df.index.week
last_x = 0
numerator = 0
cumWeek = list()
for x in df['weekNum']:
if x != last_x:
numerator += 1
cumWeek.append(numerator)
else:
cumWeek.append(numerator)
last_x = x
cumWeek = pd.DataFrame(cumWeek, columns=['cumWeek'], index=df.index)
df = pd.concat([df, cumWeek], axis=1)
cumWeek stores desired output alone.
这篇关于 pandas 日期时间索引累计周的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!