ValueError:应用滚动("2H").mean()时索引必须是单调的 [英] ValueError: index must be monotonic when applying rolling("2H").mean()

查看:94
本文介绍了ValueError:应用滚动("2H").mean()时索引必须是单调的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下DataFrame df:

I have the following DataFrame df:

                   TIME     DELAY
0   2016-01-01 06:30:00     0
1   2016-01-01 14:10:00     2
2   2016-01-01 07:05:00     2
3   2016-01-01 11:00:00     1
4   2016-01-01 10:40:00     0
5   2016-01-01 08:10:00     7
6   2016-01-01 11:35:00     2
7   2016-01-02 13:50:00     2
8   2016-01-02 14:50:00     4
9   2016-01-02 14:05:00     1

请注意,该行未按日期时间对象排序.

Please notice that row are not sorted by a datetime object.

对于每一行,我想知道最近2个小时的平均延迟.为此,我执行了以下代码:

For each row I want to know the average delay for the last 2 hours. To do this task, I executed the following code:

df.index = pd.DatetimeIndex(df["TIME"])
df["DELAY_LAST2HOURS"] = df["DELAY"].rolling("2H").mean()

但是我遇到了这个错误:

However I got this error:

ValueError: index must be monotonic

如何正确解决任务?

推荐答案

问题是DatetimeIndex未排序,因此需要

Problem is DatetimeIndex is not sorted, so need DataFrame.sort_index:

df.index = pd.DatetimeIndex(df["TIME"])
df = df.sort_index()
df["DELAY_LAST2HOURS"] = df["DELAY"].rolling("2H").mean()
print (df)
                                    TIME  DELAY  DELAY_LAST2HOURS
TIME                                                             
2016-01-01 06:30:00  2016-01-01 06:30:00      0          0.000000
2016-01-01 07:05:00  2016-01-01 07:05:00      2          1.000000
2016-01-01 08:10:00  2016-01-01 08:10:00      7          3.000000
2016-01-01 10:40:00  2016-01-01 10:40:00      0          0.000000
2016-01-01 11:00:00  2016-01-01 11:00:00      1          0.500000
2016-01-01 11:35:00  2016-01-01 11:35:00      2          1.000000
2016-01-01 14:10:00  2016-01-01 14:10:00      2          2.000000
2016-01-02 13:50:00  2016-01-02 13:50:00      2          2.000000
2016-01-02 14:05:00  2016-01-02 14:05:00      1          1.500000
2016-01-02 14:50:00  2016-01-02 14:50:00      4          2.333333

如果没有必要,所有内容都应合并为原始的TIME列:

All together should be if not necessary original TIME column:

df["TIME"] = pd.to_datetime(df["TIME"])

df = df.set_index('TIME').sort_index()
df["DELAY_LAST2HOURS"] = df["DELAY"].rolling("2H").mean()
print (df)
                     DELAY  DELAY_LAST2HOURS
TIME                                        
2016-01-01 06:30:00      0          0.000000
2016-01-01 07:05:00      2          1.000000
2016-01-01 08:10:00      7          3.000000
2016-01-01 10:40:00      0          0.000000
2016-01-01 11:00:00      1          0.500000
2016-01-01 11:35:00      2          1.000000
2016-01-01 14:10:00      2          2.000000
2016-01-02 13:50:00      2          2.000000
2016-01-02 14:05:00      1          1.500000
2016-01-02 14:50:00      4          2.333333

df["TIME"] = pd.to_datetime(df["TIME"])
df = df.sort_values('TIME').set_index('TIME')

df["DELAY_LAST2HOURS"] = df["DELAY"].rolling("2H").mean()

这篇关于ValueError:应用滚动("2H").mean()时索引必须是单调的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆