pandas 卷帘窗日期时间索引:“偏移量"是什么意思? [英] pandas rolling window & datetime indexes: What does `offset` mean?

查看:115
本文介绍了 pandas 卷帘窗日期时间索引:“偏移量"是什么意思?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

滚动窗口功能 pandas.DataFrame.rolling 接受window参数,其描述如下:

The rolling window function pandas.DataFrame.rolling of pandas 0.22 takes a window argument that is described as follows:

窗口:整数或偏移量

移动窗口的大小.这是用于 计算统计信息.每个窗口的大小都是固定的.

Size of the moving window. This is the number of observations used for calculating the statistic. Each window will be a fixed size.

如果它是一个偏移量,那么这将是每个窗口的时间段. 每个窗口的大小将根据观察结果而变化 包括在时间段内.这仅适用于datetimelike 索引.这是0.19.0中的新功能

If its an offset then this will be the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetimelike indexes. This is new in 0.19.0

在这种情况下,偏移量到底是什么?

What actually is an offset in this context?

推荐答案

简而言之,如果您使用offset之类的"2D"(2天),则熊猫将在索引中使用日期时间信息(如果有) ,则可能考虑到任何缺失的行或不规则的频率.但是,如果您使用像2这样的简单int,则熊猫会将索引视为简单的整数索引[0,1,2,...],并忽略索引中的任何日期时间信息.

In a nutshell, if you use an offset like "2D" (2 days), pandas will use the datetime info in the index (if available), potentially accounting for any missing rows or irregular frequencies. But if you use a simple int like 2, then pandas will treat the index as a simple integer index [0,1,2,...] and ignore any datetime info in the index.

一个简单的例子应该清楚地说明这一点:

A simple example should make this clear:

df=pd.DataFrame({'x':range(4)}, 
    index=pd.to_datetime(['1-1-2018','1-2-2018','1-4-2018','1-5-2018']))

            x
2018-01-01  0
2018-01-02  1
2018-01-04  2
2018-01-05  3

请注意,(1)索引是日期时间,但(2)索引缺少"2018-01-03".因此,如果您使用像2这样的普通整数,则rolling只会查看最后两行,而与datetime值无关(在某种意义上,它的行为类似于iloc[i-1:i],其中i是当前行):

Note that (1) the index is a datetime, but also (2) it is missing '2018-01-03'. So if you use a plain integer like 2, rolling will just look at the last two rows, regardless of the datetime value (in a sense it's behaving like iloc[i-1:i] where i is the current row):

df.rolling(2).count()

              x
2018-01-01  1.0
2018-01-02  2.0
2018-01-04  2.0
2018-01-05  2.0

相反,如果您使用2天的偏移量('2D'),则rolling将使用实际的日期时间值并考虑日期时间索引中的任何不规则性.

Conversely, if you use an offset of 2 days ('2D'), rolling will use the actual datetime values and accounts for any irregularities in the datetime index.

df.rolling('2D').count()

              x
2018-01-01  1.0
2018-01-02  2.0
2018-01-04  1.0
2018-01-05  2.0

还请注意,使用日期偏移量时,您需要按升序对索引进行排序,但使用简单的整数则无所谓(因为无论如何您都忽略了索引).

Also note, you need the index to be sorted in ascending order when using a date offset, but it doesn't matter when using a simple integer (since you're just ignoring the index anyway).

这篇关于 pandas 卷帘窗日期时间索引:“偏移量"是什么意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆