pandas 滚动给出 NaN [英] Pandas rolling gives NaN

查看:58
本文介绍了 pandas 滚动给出 NaN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在看关于窗口函数的教程,但我不太明白为什么下面的代码会产生 NaN.

如果我理解正确,代码会创建一个大小为 2 的滚动窗口.为什么第一行、第四行和第五行都有 NaN?起初,我认为这是因为将 NaN 与另一个数字相加会产生 NaN,但后来我不确定为什么第二行不会是 NaN.

dft = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]},index=pd.date_range('20130101 09:00:00', period=5, freq='s'))在 [58] 中:dft.rolling(2).sum()出[58]:乙2013-01-01 09:00:00 NaN2013-01-01 09:00:01 1.02013-01-01 09:00:02 3.02013-01-01 09:00:03 NaN2013-01-01 09:00:04 NaN

解决方案

首先要注意的是,默认情况下 rolling 会查找要聚合的 n-1 个先前的数据行,其中 n 是窗口大小.如果不满足该条件,它将为窗口返回 NaN.这就是第一行发生的事情.在第四行和第五行,这是因为总和中的一个值是 NaN.

如果您想避免返回 NaN,您可以将 min_periods=1 传递给将窗口中所需的最小有效观察数减少到 1 而不是 2 的方法:

<预><代码>>>>dft.rolling(2, min_periods=1).sum()乙2013-01-01 09:00:00 0.02013-01-01 09:00:01 1.02013-01-01 09:00:02 3.02013-01-01 09:00:03 2.02013-01-01 09:00:04 4.0

I'm looking at the tutorials on window functions, but I don't quite understand why the following code produces NaNs.

If I understand correctly, the code creates a rolling window of size 2. Why do the first, fourth, and fifth rows have NaN? At first, I thought it's because adding NaN with another number would produce NaN, but then I'm not sure why the second row wouldn't be NaN.

dft = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]}, 
                   index=pd.date_range('20130101 09:00:00', periods=5, freq='s'))


In [58]: dft.rolling(2).sum()
Out[58]: 
                       B
2013-01-01 09:00:00  NaN
2013-01-01 09:00:01  1.0
2013-01-01 09:00:02  3.0
2013-01-01 09:00:03  NaN
2013-01-01 09:00:04  NaN

解决方案

The first thing to notice is that by default rolling looks for n-1 prior rows of data to aggregate, where n is the window size. If that condition is not met, it will return NaN for the window. This is what's happening at the first row. In the fourth and fifth row, it's because one of the values in the sum is NaN.

If you would like to avoid returning NaN, you could pass min_periods=1 to the method which reduces the minimum required number of valid observations in the window to 1 instead of 2:

>>> dft.rolling(2, min_periods=1).sum()
                       B
2013-01-01 09:00:00  0.0
2013-01-01 09:00:01  1.0
2013-01-01 09:00:02  3.0
2013-01-01 09:00:03  2.0
2013-01-01 09:00:04  4.0

这篇关于 pandas 滚动给出 NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆